This is the official repository of "Event-based solutions for human-centered applications: A comprehensive review" submitted to Frontiers in Signal Processing journal.
Event cameras, or dynamic vision sensors, capture changes in light intensity asynchronously, providing high temporal resolution and energy efficiency. These features make them ideal for human-centered applications, such as analyzing facial expressions and body motion dynamics. However, research in this area is fragmented. This repository unifies advancements in body and face-related tasks, offering a comprehensive review of databases, event data representation, and processing models. It serves as a foundational resource for researchers, guiding future work in human-centered event camera applications.
| Name | Details |
|---|
| Event Count | Event data is aggregated by counting the number of events that occur at each pixel within a fixed time interval. This approach provides a straightforward summary of activity, often used as a baseline representation. |
| Event Histogram | Similar to the event count, but instead of a single time interval, events are grouped and counted in temporal bins, creating a distribution of event activity that captures variations with more levels of detail. |
| Time Surface / Surface of Active Events | Represents data as a continuous map where each pixel value corresponds to the most recent timestamp of an event at that location. This highlights recent activity and is often used to track motion or identify edges. |
| Memory Surface | Event data are represented as a temporal map where each pixel’s value indicates the time elapsed since the last event occurred at that location within a fixed time window. This approach encodes temporal information by retaining a ”memory” of inactivity, making it useful for identifying patterns, and tracking regions with recent or ongoing motion. |
| Voxel Grid | Event data is sliced temporally into small time intervals, creating a sequence of event slices. These slices are then stacked into a 3D grid, where each voxel represents the activity in a spatial region during a specific time window. This allows for preserving both spatial and temporal resolution. |
| Spike Tensor | Represents data as binary tensors indicating the occurrence of spikes in specific spatiotemporal locations. The tensor is separated into two channels for positive and negative polarities. |
| Graph | Represents data as a graph, where events are treated as nodes in a graph with polarity as the node feature. Then, edges are created between nodes to represent spatiotemporal relationships, often used for tasks like pattern recognition. |
| E2VID Frame | Represents data as reconstructed frames by using neural networks to convert the sparse event stream into intensity frames. This allows event data to be used with traditional frame-based computer vision methods. |
| Temporal Binary Representation | Events are first stacked together into intermediate binary representations where each pixel can be considered as a binary string. These frames are then grouped into a single frame by applying binary to decimal conversion. Most popular in face analysis applications. |
| Target Application | Authors & Year | Findings | Improvements of Event |
|---|
| Gait Recognition | Wang et al. (2019) | For viewing angles 72, 90 and 108, EV-Gait performs better than RGB based approaches | 3% increase in accuracy |
| Sokolova and Konushin (2019) | Similar perfomances reported for Event-based and RGB approaches | - |
| Wang et al. (2021) | For viewing angle 90 degrees, EV-Gait-Graph performs better than RGB based approaches | 0.5% increase in accuracy |
| Eddine and Dugelay (2022) | Advantage of event data over RGB and thermal for gait recognition | 2% increase in accuracy |
| Tao et al. (2024) | They report the advantage of event over RGB across all different rotation angles for gait recognition | Up to 14% increase in accuracy |
| Action Recognition | Plizzari et al. (2022) | Event data can surpass RGB for action recognition in unseen scenarios on test data | 4% increase in accuracy |
| de Blegiers et al. (2023) | Event surpass RGB action recognition models in different setups | Up to 14% increase in accuracy |
| Pose Estimation | Goyal et al. (2023) | Pose estimation from event data surpasses RGB data | Up to 5% increase in accuracy |
| Kohyama et al. (2024) | Event does not suffer from motion blur as RGB does for 3D-based pose estimation | Error (in mm) is divided by 5 in certain scenarios |
| Face Detection | Barua et al. (2016) | Comparable results to Viola-Jones face detector | - |
| Ryan et al. (2023) | Traditional RGB models perform better on RGB images than on their simulated event data counterpart | - |
| Lip Reading | Kanamaru et al. (2023) | They combined event and RGB modalities for lip reading | - |
| Microexpression & Emotion Recognition | Becattini et al. (2022) | Event data overperforms RGB for detecting three types of expressions: Positive, Neutral, Negative | Up to 9% increase in accuracy |
| Berlincioni et al. (2023) | Event overperforms RGB in the prediction of seven different emotions | Up to 15% increase in accuracy |
| Xiao et al. (2024) | Event and RGB are merged as input to the network | 1% increase in accuracy |
| Cultrera et al. (2024) | For the estimation of some action units event data delivers better performance | For 6 out of 24 action units, event data is more accurate |
| Adra et al. (2024) | Event data gives more information than RGB for microexpression recognition | Up to 13% increase in accuracy |
- Miao, S., Chen, G., Ning, X., Zi, Y., Ren, K., Bing, Z., et al. (2019). Neuromorphic vision datasets for pedestrian detection, action recognition, and fall detection
- Calabrese, E., Taverni, G., Easthope, C. A., Skriabine, S., Corradi, F., Longinotti, L., et al. (2019). Dhp19: Dynamic vision sensor 3d human pose dataset
- Wang, Y., Du, B., Shen, Y., Wu, K., Zhao, G., Sun, J., et al. (2019). Ev-gait: Event-based robust gait recognition using dynamic vision sensors
- Wang, Y., Zhang, X., Shen, Y., Du, B., Zhao, G., Cui, L., et al. (2021). Event-stream representation for human gaits identification using deep neural networks (IEEE), vol. 44, 3436–3449
- Eddine, M. J. and Dugelay, J.-L. (2022). Gait3: An event-based, visible and thermal database for gait recognition
- Liu, Q., Xing, D., Tang, H., Ma, D., and Pan, G. (2021). Event-based action recognition using motion information and spiking neural networks.
- Gao, Y., Lu, J., Li, S., Ma, N., Du, S., Li, Y., et al. (2023). Action recognition and benchmark
using event cameras
- Gao, Y., Lu, J., Li, S., Li, Y., and Du, S. (2024). Hypergraph-based multi-view action recognition using event cameras
- Wang, Q., Xu, Z., Lin, Y., Ye, J., Li, H., Zhu, G., et al. (2025). Dailydvs-200: A comprehensive
benchmark dataset for event-based action recognition
- Angelopoulos, A. N., Martel, J. N., Kohli, A. P., Conradt, J., and Wetzstein, G. (2020). Event-based near-eye gaze tracking beyond 10,000 hz
- Chen, G., Hong, L., Dong, J., Liu, P., Conradt, J., and Knoll, A. (2020a). Eddd: Event-based
drowsiness driving detection through facial motion analysis with neuromorphic vision sensor
- Chen, G., Wang, F., Yuan, X., Li, Z., Liang, Z., and Knoll, A. (2020b). Neurobiometric: an eye blink based biometric authentication system using an event-based neuromorphic vision sensor
- Tan, G., Wang, Y., Han, H., Cao, Y., Wu, F., and Zha, Z. J. (2022). Multi-grained spatio-temporal features perceived network for event-based lip-reading
- Tan, G., Wan, Z., Wang, Y., Cao, Y., and Zha, Z. J. (2024). Tackling event-based lip-reading by exploring multigrained spatiotemporal clues. IEEE Transactions on Neural Networks and Learning Systems
- Bissarinova, U., Rakhimzhanova, T., Kenzhebalin, D., and Varol, H. A. (2023). Faces in event streams(fes): An annotated face dataset for event cameras (Authorea)
- Berlincioni, L., Cultrera, L., Albisani, C., Cresti, L., Leonardo, A., Picchioni, S., et al. (2023).Neuromorphic event-based facial expression recognition
- Kanamaru, T., Arakane, T., and Saitoh, T. (2023). Isolated single sound lip-reading using a
frame-based camera and event-based camera
- Adra, M., Mirabet-Herranz, N., and Dugelay, J.-L. (2024). Beyond rgb: Tri-modal
microexpression recognition with rgb, thermal, and event data
- Bi, Y., Chadha, A., Abbas, A., Bourtsoulatze, E., and Andreopoulos, Y. (2020). Graph-based
spatio-temporal feature learning for neuromorphic vision sensing
- Plizzari, C., Planamente, M., Goletto, G., Cannici, M., Gusso, E., Matteucci, M., et al. (2022).
E2 (go) motion: Motion augmented event stream for egocentric action recognition
- Zou, S., Mu, Y., Zuo, X., Wang, S., and Cheng, L. (2023). Event-based human pose tracking by spiking spatiotemporal transformer
- Goyal, G., Di Pietro, F., Carissimi, N., Glover, A., and Bartolozzi, C. (2023). Moveenet: Online
high-frequency human pose estimation with an event camera
- Verschae, R. and Bugueno-Cordova, I. (2023). Event-based gesture and facial expression
recognition: A comparative analysis
- Ryan, C., Elrasad, A., Shariff, W., Lemley, J., Kielty, P., Hurney, P., et al. (2023). Real-time multi-task facial analytics with event cameras. IEEE Access
- Ren, H., Zhou, Y., Fu, H., Huang, Y., Xu, R., and Cheng, B. (2023a). Ttpoint: A tensorized point
cloud network for lightweight action recognition with event cameras. In Proceedings of the 31st ACM International Conference on Multimedia. 8026–8034
- Ren, H., Zhou, Y., Huang, Y., Fu, H., Lin, X., Song, J., et al. (2023b). Spikepoint: An efficient point-based spiking neural network for event cameras action recognition. arXiv preprint arXiv:2310.07189
- Bulzomi, H., Schweiker, M., Gruel, A., and Martinet, J. (2023). End-to-end neuromorphic lip-reading. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4101–4108
- Vicente-Sola, A., Manna, D. L., Kirkland, P., Di Caterina, G., and Bihl, T. J. (2025). Spiking neural networks for event-based action recognition: A new task to understand their advantage. Neurocomputing 611, 128657
- Eisl, D., Herzog, F., Dugelay, J.-L., Apvrille, L., and Rigoll, G. (2023). Introducing a framework for single-human tracking using event-based cameras. In 2023 IEEE International Conference on Image Processing (ICIP) (IEEE), 3269–3273
- Fu, L. and Yan, S. (2023). Hypergraph neural network for gait recognition based on event camera. In Third International Conference on Advanced Algorithms and Signal Image Processing (AASIP 2023) (SPIE), vol. 12799, 1151–1155
- Tan, G., Wang, Y., Han, H., Cao, Y., Wu, F., and Zha, Z. J. (2022). Multi-grained spatio-temporal features perceived network for event-based lip-reading
- Rios-Navarro, A., Pi ˜nero-Fuentes, E., Canas-Moreno, S., Javed, A., Harkin, J., and Linares-Barranco, A.(2023). Lipsfus: A neuromorphic dataset for audio-visual sensory fusion of lip reading. In 2023 IEEE
International Symposium on Circuits and Systems (ISCAS) (IEEE), 1–5
- Xiao, P., Zhang, Y., Kai, D., Peng, Y., Zhang, Z., and Sun, X. (2024). Estme: Event-driven spatio-temporal motion enhancement for micro-expression recognition. In 2024 IEEE International Conference on Multimedia and Expo (ICME) (IEEE), 1–6
- Kohyama, K., Shiba, S., and Aoki, Y. (2024). 3d human scan with a moving event camera. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5586–5596
- Iddrisu, K., Shariff, W., OConnor, N. E., Lemley, J., and Little, S. (2024). Evaluating
image-based face and eye tracking with event cameras
- de Blegiers, T., Dave, I. R., Yousaf, A., and Shah, M. (2023). Eventtransact: A video transformer-based framework for event-camera based action recognition. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE), 1–7
- Cultrera, L., Becattini, F., Berlincioni, L., Ferrari, C., and Del Bimbo, A. (2024). Spatio-temporal transformers for action unit classification with event cameras. arXiv preprint arXiv:2410.21958
- Guo, C. and Huang, H. (2023). Gleffn: A global-local event feature fusion network for micro-expression recognition. In Proceedings of the 3rd Workshop on Facial Micro-Expression: Advanced Techniques for
Multi-Modal Facial Expression Analysis. 17–24
- Savran, A. (2023). Fully convolutional event-camera voice activity detection based on event intensity. In 2023 Innovations in Intelligent Systems and Applications Conference (ASYU) (IEEE), 1–6
- Himmi, S., Parret, V., Chhatkuli, A., and Van Gool, L. (2024). Ms-evs: Multispectral event-based vision for deep learning based face detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 616–625
- Tao, Y., Chang, C.-H., Sa¨ıghi, S., and Gao, S. (2024). Gaitspike: Event-based gait recognition with spiking neural network. In 2024 IEEE 6th International Conference on AI Circuits and Systems (AICAS) (IEEE), 357–361
- Barchid, S., Allaert, B., Aissaoui, A., Mennesson, J., and Djeraba, C. C. (2023). Spiking-fer: Spiking neural network for facial expression recognition with event cameras. In 20th International Conference on Content-based Multimedia Indexing. 1–7. doi:10.1145/3617233.3617235
- Li, X., Neil, D., Delbruck, T., and Liu, S.-C. (2019). Lip reading deep network exploiting multi-modal spiking visual and auditory sensors
- Xu, L., Xu, W., Golyanik, V., Habermann, M., Fang, L., and Theobalt, C. (2020). Eventcap: Monocular 3d capture of high-speed human motions using an event camera. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4968–4978
- Barua, S., Miyatani, Y., and Veeraraghavan, A. (2016). Direct face detection and video
reconstruction from event cameras
- Savran, A., Tavarone, R., Higy, B., Badino, L., and Bartolozzi, C. (2018). Energy and computation
efficient audio-visual voice activity detection driven by event-cameras. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018) (IEEE), 333–340
- Lenz, G., Ieng, S.-H., and Benosman, R. (2020). Event-based face detection and tracking using
the dynamics of eye blinks
- Sokolova, A. and Konushin, A. (2019). Human identification by gait from event-based camera. In 2019 16th International Conference on Machine Vision Applications (MVA) (IEEE), 1–6
- Ryan, C., O’Sullivan, B., Elrasad, A., Cahill, A., Lemley, J., Kielty, P., et al. (2021). Real-time face & eye tracking and blink detection using event cameras. Neural Networks 141, 87–97
- Banerjee, A., Prasad, S. S., Mehta, N. K., Kumar, H., Saurav, S., and Singh, S. (2022). Gaze
detection using encoded retinomorphic events
- Becattini, F., Palai, F., and Bimbo, A. D. (2022). Understanding human reactions looking at
facial microexpressions with an event camera. doi:10.1109/TII.2022.3195063
- Moreira, G., Grac¸ a, A., Silva, B., Martins, P., and Batista, J. (2022). Neuromorphic event-based face identity recognition
This research is a part of the HEIMDALL project, funded by the BPI as part of the AAP I-Demo.
Additionally, the work was supported by the European Union’s Horizon Europe research and innovation program under Grant Agreement No 101094831 for the Converge-Telecommunications and Computer Vision Convergence Tools for Research Infrastructures project.