Human-centered applications of event data

January 31, 2025 · View on GitHub

This is the official repository of "Event-based solutions for human-centered applications: A comprehensive review" submitted to Frontiers in Signal Processing journal.

Event cameras, or dynamic vision sensors, capture changes in light intensity asynchronously, providing high temporal resolution and energy efficiency. These features make them ideal for human-centered applications, such as analyzing facial expressions and body motion dynamics. However, research in this area is fragmented. This repository unifies advancements in body and face-related tasks, offering a comprehensive review of databases, event data representation, and processing models. It serves as a foundational resource for researchers, guiding future work in human-centered event camera applications.

Datasets

Real event-based datasets for human-centered applications

Body Datasets

YearAuthorsName#Videos#PeopleModalitiesApplication#Classes
2019Miao et al. (2019)Action Dataset TUM29115EVAction Recognition10
2019Calabrese et al. (2019)DHP19224417EVPose Estimation-
2019Wang et al. (2019)DVS128-Gait-Day400020EVGait Recognition-
2019Wang et al. (2019)DVS128-Gait-Night400020EVGait Recognition-
2021Liu et al. (2021)DailyAction-DVS144015EVAction Recognition12
2022Eddine and Dugelay (2022)Gait316856RGB-EV-THGait Recognition-
2023Gao et al. (2023)THU-E-ACT-5010500105EVAction Recognition50
2023Gao et al. (2023)THU-E-ACT-50-CHL233018EVAction Recognition50
2024Gao et al. (2024)THU-MV-E-ACT-5031500105EVAction Recognition50
2025Wang et al. (2025)DailyDVS-2002200046RGB-EVAction Recognition200

Face Datasets

YearAuthorsName#Videos#PeopleModalitiesApplication
2016Barua et al. (2016)--40EVFace Detection
2019Li et al. (2019)-3400034EV-audioLip Reading
2020Angelopoulos et al. (2020)-2424EVEye gaze tracking
2020Chen et al. (2020a)EDDD26026EVDrowsiness
2020Lenz et al. (2020)-4810EVFace Detection
2020Chen et al. (2020b)NeuroBiometric18045EVAuthentication
2022Banerjee et al. (2022)-33606RGB-EVEye gaze tracking
2022Beccattini et al. (2022)-45525RGB-EVMER
2022Tan et al. (2022)DVS-Lip1987140EVLip Reading
2022Moreira et al. (2022)NVSFD43640EVIdentity Recognition
2023Bissarinova et al. (2023)FES∼400073EVFace Detection
2023Berlincioni et al. (2023)NEFER29105RGB-EVMER
2023Kanamaru et al. (2023)-150020EVLip Reading
2024Adra et al. (2024)VETEX250630RGB-EV-THMER

Synthetic datasets for human-centered applications

Body Datasets

YearAuthorsName#Videos#PeopleApplication#Classes
2019Wang et al. (2019)EV-CASIA-B8184124Gait Recognition-
2020Bi et al. (2020)HMDB51-DVS6766-Action Recognition51
2020Bi et al. (2020)UCF101-DVS13320-Action Recognition101
2022Plizzari et al. (2022)N-EPIC-Kitchens64-Action Recognition8
2023Zou et al. (2023)SynEventHPD919747Pose Estimation-
2023Goyal et al. (2023)eH36m7487Pose Estimation-

Face Datasets

YearAuthorsName#Videos#PeopleApplication
2022Moreira et al. (2022)SynFED653630Identity Recognition
2023Barchid et al. (2023)ADFES19822Face Expression Recognition
2023Barchid et al. (2023)Oulu-CASIA48080Face Expression Recognition
2023Barchid et al. (2023) ; Verschae and Bugueno-Cordova (2023)e-CK+32793Face Expression Recognition
2023Barchid et al. (2023) ; Verschae and Bugueno-Cordova (2023)e-MMI2900+75Face Expression Recognition
2023Ryan et al. (2023)--5Multitask Facial Analysis
2024Tan et al. (2024)DVS-LRW100107664-Lip Reading

Event Data Representations

NameDetails
Event CountEvent data is aggregated by counting the number of events that occur at each pixel within a fixed time interval. This approach provides a straightforward summary of activity, often used as a baseline representation.
Event HistogramSimilar to the event count, but instead of a single time interval, events are grouped and counted in temporal bins, creating a distribution of event activity that captures variations with more levels of detail.
Time Surface / Surface of Active EventsRepresents data as a continuous map where each pixel value corresponds to the most recent timestamp of an event at that location. This highlights recent activity and is often used to track motion or identify edges.
Memory SurfaceEvent data are represented as a temporal map where each pixel’s value indicates the time elapsed since the last event occurred at that location within a fixed time window. This approach encodes temporal information by retaining a ”memory” of inactivity, making it useful for identifying patterns, and tracking regions with recent or ongoing motion.
Voxel GridEvent data is sliced temporally into small time intervals, creating a sequence of event slices. These slices are then stacked into a 3D grid, where each voxel represents the activity in a spatial region during a specific time window. This allows for preserving both spatial and temporal resolution.
Spike TensorRepresents data as binary tensors indicating the occurrence of spikes in specific spatiotemporal locations. The tensor is separated into two channels for positive and negative polarities.
GraphRepresents data as a graph, where events are treated as nodes in a graph with polarity as the node feature. Then, edges are created between nodes to represent spatiotemporal relationships, often used for tasks like pattern recognition.
E2VID FrameRepresents data as reconstructed frames by using neural networks to convert the sparse event stream into intensity frames. This allows event data to be used with traditional frame-based computer vision methods.
Temporal Binary RepresentationEvents are first stacked together into intermediate binary representations where each pixel can be considered as a binary string. These frames are then grouped into a single frame by applying binary to decimal conversion. Most popular in face analysis applications.

Model Architecture

Papers presented in this survey, classified by the type of AI architecture used for their models

SNNGraph NNCNNTransformersNot AI-based
Liu et al. (2021)Wang et al. (2021)Li et al. (2019)Xu et al. (2020)Barua et al. (2016)
Barchid et al. (2023)Eisl et al. (2023)Wang et al. (2019)de Blegiers et al. (2023)Savran et al. (2018)
Ren et al. (2023b)Fu and Yan (2023)Sokolova and Konushin (2019)Zou et al. (2023)Lenz et al. (2020)
Bulzomi et al. (2023)Gao et al. (2024)Ryan et al. (2021)Cultrera et al. (2023)Chen et al. (2020b)
Tao et al. (2024)Banerjee et al. (2022)Angelopoulos et al. (2020)
Vicente-Sola et al. (2025)Becattini et al. (2022)Eddine and Dugelay (2022)
Moreira et al. (2022)Ren et al. (2023a)
Plizzari et al. (2022)Guo and Huang (2023)
Ryan et al. (2023)Savran (2023)
Gao et al. (2023)Himmi et al. (2024)
Rios-Navarro et al. (2023)
Bissarinova et al. (2023)
Berlincioni et al. (2023)
Goyal et al. (2023)
Kanamaru et al. (2023)
Xiao et al. (2024)
Kohyama et al. (2024)
Adra et al. (2024)
Iddrisu et al. (2024)

Applications

Applications of event cameras for human data along with an exhaustive selection of relevant work

Body

Human TrackingGait RecognitionAction RecognitionPose Estimation
Eisl et al. (2023)Wang et al. (2021)Liu et al. (2021)Sokolova and Konushin (2019)
Xu et al. (2020)Sokolova and Konushin (2019)Plizzari et al. (2022)Zou et al. (2023)
Wang et al. (2021)Ren et al. (2023a)Goyal et al. (2023)
Eddine and Dugelay (2022)Ren et al. (2023b)Kohyama et al. (2024)
Fu and Yan (2023)de Blegiers et al. (2023)
Tao et al. (2024)Gao et al. (2023)
Gao et al. (2024)
Vicente-Sola et al. (2025)
Wang et al. (2025)

Face

Face DetectionIdentity RecognitionLip ReadingEye Blinking & GazeMicroexpression & Emotion Recognition
Barua et al. (2016)Chen et al. (2020b)Savran et al. (2018)Lenz et al. (2020)Beccatini et al. (2022)
Lenz et al. (2020)Moreira et al. (2022)Li et al. (2019)Chen et al. (2020b)Barchid et al. (2023)
Ryan et al. (2021)Rios-Navarro et al. (2023)Angelopoulos et al. (2020)Berlincioni et al. (2023)
Bissarinova et al. (2023)Savran (2023)Ryan et al. (2021)Guo and Huang (2023)
Ryan et al. (2023)Kanamaru et al. (2023)Banerjee et al. (2022)Xiao et al. (2024)
Himmi et al. (2024)Bulzomi et al. (2023)Iddrisu et al. (2024)Cultrera et al. (2023)
Iddrisu et al. (2024)Moreira et al. (2022)Adra et al. (2024)

Event-RGB comparison

Summary of the works included in this survey that compare their event-based networks with RGB-trained models.

Target ApplicationAuthors & YearFindingsImprovements of Event
Gait RecognitionWang et al. (2019)For viewing angles 72, 90 and 108, EV-Gait performs better than RGB based approaches3% increase in accuracy
Sokolova and Konushin (2019)Similar perfomances reported for Event-based and RGB approaches-
Wang et al. (2021)For viewing angle 90 degrees, EV-Gait-Graph performs better than RGB based approaches0.5% increase in accuracy
Eddine and Dugelay (2022)Advantage of event data over RGB and thermal for gait recognition2% increase in accuracy
Tao et al. (2024)They report the advantage of event over RGB across all different rotation angles for gait recognitionUp to 14% increase in accuracy
Action RecognitionPlizzari et al. (2022)Event data can surpass RGB for action recognition in unseen scenarios on test data4% increase in accuracy
de Blegiers et al. (2023)Event surpass RGB action recognition models in different setupsUp to 14% increase in accuracy
Pose EstimationGoyal et al. (2023)Pose estimation from event data surpasses RGB dataUp to 5% increase in accuracy
Kohyama et al. (2024)Event does not suffer from motion blur as RGB does for 3D-based pose estimationError (in mm) is divided by 5 in certain scenarios
Face DetectionBarua et al. (2016)Comparable results to Viola-Jones face detector-
Ryan et al. (2023)Traditional RGB models perform better on RGB images than on their simulated event data counterpart-
Lip ReadingKanamaru et al. (2023)They combined event and RGB modalities for lip reading-
Microexpression & Emotion RecognitionBecattini et al. (2022)Event data overperforms RGB for detecting three types of expressions: Positive, Neutral, NegativeUp to 9% increase in accuracy
Berlincioni et al. (2023)Event overperforms RGB in the prediction of seven different emotionsUp to 15% increase in accuracy
Xiao et al. (2024)Event and RGB are merged as input to the network1% increase in accuracy
Cultrera et al. (2024)For the estimation of some action units event data delivers better performanceFor 6 out of 24 action units, event data is more accurate
Adra et al. (2024)Event data gives more information than RGB for microexpression recognitionUp to 13% increase in accuracy

Bibliography

  • Miao, S., Chen, G., Ning, X., Zi, Y., Ren, K., Bing, Z., et al. (2019). Neuromorphic vision datasets for pedestrian detection, action recognition, and fall detection
  • Calabrese, E., Taverni, G., Easthope, C. A., Skriabine, S., Corradi, F., Longinotti, L., et al. (2019). Dhp19: Dynamic vision sensor 3d human pose dataset
  • Wang, Y., Du, B., Shen, Y., Wu, K., Zhao, G., Sun, J., et al. (2019). Ev-gait: Event-based robust gait recognition using dynamic vision sensors
  • Wang, Y., Zhang, X., Shen, Y., Du, B., Zhao, G., Cui, L., et al. (2021). Event-stream representation for human gaits identification using deep neural networks (IEEE), vol. 44, 3436–3449
  • Eddine, M. J. and Dugelay, J.-L. (2022). Gait3: An event-based, visible and thermal database for gait recognition
  • Liu, Q., Xing, D., Tang, H., Ma, D., and Pan, G. (2021). Event-based action recognition using motion information and spiking neural networks.
  • Gao, Y., Lu, J., Li, S., Ma, N., Du, S., Li, Y., et al. (2023). Action recognition and benchmark using event cameras
  • Gao, Y., Lu, J., Li, S., Li, Y., and Du, S. (2024). Hypergraph-based multi-view action recognition using event cameras
  • Wang, Q., Xu, Z., Lin, Y., Ye, J., Li, H., Zhu, G., et al. (2025). Dailydvs-200: A comprehensive benchmark dataset for event-based action recognition
  • Angelopoulos, A. N., Martel, J. N., Kohli, A. P., Conradt, J., and Wetzstein, G. (2020). Event-based near-eye gaze tracking beyond 10,000 hz
  • Chen, G., Hong, L., Dong, J., Liu, P., Conradt, J., and Knoll, A. (2020a). Eddd: Event-based drowsiness driving detection through facial motion analysis with neuromorphic vision sensor
  • Chen, G., Wang, F., Yuan, X., Li, Z., Liang, Z., and Knoll, A. (2020b). Neurobiometric: an eye blink based biometric authentication system using an event-based neuromorphic vision sensor
  • Tan, G., Wang, Y., Han, H., Cao, Y., Wu, F., and Zha, Z. J. (2022). Multi-grained spatio-temporal features perceived network for event-based lip-reading
  • Tan, G., Wan, Z., Wang, Y., Cao, Y., and Zha, Z. J. (2024). Tackling event-based lip-reading by exploring multigrained spatiotemporal clues. IEEE Transactions on Neural Networks and Learning Systems
  • Bissarinova, U., Rakhimzhanova, T., Kenzhebalin, D., and Varol, H. A. (2023). Faces in event streams(fes): An annotated face dataset for event cameras (Authorea)
  • Berlincioni, L., Cultrera, L., Albisani, C., Cresti, L., Leonardo, A., Picchioni, S., et al. (2023).Neuromorphic event-based facial expression recognition
  • Kanamaru, T., Arakane, T., and Saitoh, T. (2023). Isolated single sound lip-reading using a frame-based camera and event-based camera
  • Adra, M., Mirabet-Herranz, N., and Dugelay, J.-L. (2024). Beyond rgb: Tri-modal microexpression recognition with rgb, thermal, and event data
  • Bi, Y., Chadha, A., Abbas, A., Bourtsoulatze, E., and Andreopoulos, Y. (2020). Graph-based spatio-temporal feature learning for neuromorphic vision sensing
  • Plizzari, C., Planamente, M., Goletto, G., Cannici, M., Gusso, E., Matteucci, M., et al. (2022). E2 (go) motion: Motion augmented event stream for egocentric action recognition
  • Zou, S., Mu, Y., Zuo, X., Wang, S., and Cheng, L. (2023). Event-based human pose tracking by spiking spatiotemporal transformer
  • Goyal, G., Di Pietro, F., Carissimi, N., Glover, A., and Bartolozzi, C. (2023). Moveenet: Online high-frequency human pose estimation with an event camera
  • Verschae, R. and Bugueno-Cordova, I. (2023). Event-based gesture and facial expression recognition: A comparative analysis
  • Ryan, C., Elrasad, A., Shariff, W., Lemley, J., Kielty, P., Hurney, P., et al. (2023). Real-time multi-task facial analytics with event cameras. IEEE Access
  • Ren, H., Zhou, Y., Fu, H., Huang, Y., Xu, R., and Cheng, B. (2023a). Ttpoint: A tensorized point cloud network for lightweight action recognition with event cameras. In Proceedings of the 31st ACM International Conference on Multimedia. 8026–8034
  • Ren, H., Zhou, Y., Huang, Y., Fu, H., Lin, X., Song, J., et al. (2023b). Spikepoint: An efficient point-based spiking neural network for event cameras action recognition. arXiv preprint arXiv:2310.07189
  • Bulzomi, H., Schweiker, M., Gruel, A., and Martinet, J. (2023). End-to-end neuromorphic lip-reading. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4101–4108
  • Vicente-Sola, A., Manna, D. L., Kirkland, P., Di Caterina, G., and Bihl, T. J. (2025). Spiking neural networks for event-based action recognition: A new task to understand their advantage. Neurocomputing 611, 128657
  • Eisl, D., Herzog, F., Dugelay, J.-L., Apvrille, L., and Rigoll, G. (2023). Introducing a framework for single-human tracking using event-based cameras. In 2023 IEEE International Conference on Image Processing (ICIP) (IEEE), 3269–3273
  • Fu, L. and Yan, S. (2023). Hypergraph neural network for gait recognition based on event camera. In Third International Conference on Advanced Algorithms and Signal Image Processing (AASIP 2023) (SPIE), vol. 12799, 1151–1155
  • Tan, G., Wang, Y., Han, H., Cao, Y., Wu, F., and Zha, Z. J. (2022). Multi-grained spatio-temporal features perceived network for event-based lip-reading
  • Rios-Navarro, A., Pi ˜nero-Fuentes, E., Canas-Moreno, S., Javed, A., Harkin, J., and Linares-Barranco, A.(2023). Lipsfus: A neuromorphic dataset for audio-visual sensory fusion of lip reading. In 2023 IEEE International Symposium on Circuits and Systems (ISCAS) (IEEE), 1–5
  • Xiao, P., Zhang, Y., Kai, D., Peng, Y., Zhang, Z., and Sun, X. (2024). Estme: Event-driven spatio-temporal motion enhancement for micro-expression recognition. In 2024 IEEE International Conference on Multimedia and Expo (ICME) (IEEE), 1–6
  • Kohyama, K., Shiba, S., and Aoki, Y. (2024). 3d human scan with a moving event camera. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5586–5596
  • Iddrisu, K., Shariff, W., OConnor, N. E., Lemley, J., and Little, S. (2024). Evaluating image-based face and eye tracking with event cameras
  • de Blegiers, T., Dave, I. R., Yousaf, A., and Shah, M. (2023). Eventtransact: A video transformer-based framework for event-camera based action recognition. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE), 1–7
  • Cultrera, L., Becattini, F., Berlincioni, L., Ferrari, C., and Del Bimbo, A. (2024). Spatio-temporal transformers for action unit classification with event cameras. arXiv preprint arXiv:2410.21958
  • Guo, C. and Huang, H. (2023). Gleffn: A global-local event feature fusion network for micro-expression recognition. In Proceedings of the 3rd Workshop on Facial Micro-Expression: Advanced Techniques for Multi-Modal Facial Expression Analysis. 17–24
  • Savran, A. (2023). Fully convolutional event-camera voice activity detection based on event intensity. In 2023 Innovations in Intelligent Systems and Applications Conference (ASYU) (IEEE), 1–6
  • Himmi, S., Parret, V., Chhatkuli, A., and Van Gool, L. (2024). Ms-evs: Multispectral event-based vision for deep learning based face detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 616–625
  • Tao, Y., Chang, C.-H., Sa¨ıghi, S., and Gao, S. (2024). Gaitspike: Event-based gait recognition with spiking neural network. In 2024 IEEE 6th International Conference on AI Circuits and Systems (AICAS) (IEEE), 357–361
  • Barchid, S., Allaert, B., Aissaoui, A., Mennesson, J., and Djeraba, C. C. (2023). Spiking-fer: Spiking neural network for facial expression recognition with event cameras. In 20th International Conference on Content-based Multimedia Indexing. 1–7. doi:10.1145/3617233.3617235
  • Li, X., Neil, D., Delbruck, T., and Liu, S.-C. (2019). Lip reading deep network exploiting multi-modal spiking visual and auditory sensors
  • Xu, L., Xu, W., Golyanik, V., Habermann, M., Fang, L., and Theobalt, C. (2020). Eventcap: Monocular 3d capture of high-speed human motions using an event camera. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4968–4978
  • Barua, S., Miyatani, Y., and Veeraraghavan, A. (2016). Direct face detection and video reconstruction from event cameras
  • Savran, A., Tavarone, R., Higy, B., Badino, L., and Bartolozzi, C. (2018). Energy and computation efficient audio-visual voice activity detection driven by event-cameras. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018) (IEEE), 333–340
  • Lenz, G., Ieng, S.-H., and Benosman, R. (2020). Event-based face detection and tracking using the dynamics of eye blinks
  • Sokolova, A. and Konushin, A. (2019). Human identification by gait from event-based camera. In 2019 16th International Conference on Machine Vision Applications (MVA) (IEEE), 1–6
  • Ryan, C., O’Sullivan, B., Elrasad, A., Cahill, A., Lemley, J., Kielty, P., et al. (2021). Real-time face & eye tracking and blink detection using event cameras. Neural Networks 141, 87–97
  • Banerjee, A., Prasad, S. S., Mehta, N. K., Kumar, H., Saurav, S., and Singh, S. (2022). Gaze detection using encoded retinomorphic events
  • Becattini, F., Palai, F., and Bimbo, A. D. (2022). Understanding human reactions looking at facial microexpressions with an event camera. doi:10.1109/TII.2022.3195063
  • Moreira, G., Grac¸ a, A., Silva, B., Martins, P., and Batista, J. (2022). Neuromorphic event-based face identity recognition

Acknowledgement

This research is a part of the HEIMDALL project, funded by the BPI as part of the AAP I-Demo. Additionally, the work was supported by the European Union’s Horizon Europe research and innovation program under Grant Agreement No 101094831 for the Converge-Telecommunications and Computer Vision Convergence Tools for Research Infrastructures project.