Ego-centric Video Understanding
Ego-centric video understanding is a critical area in computer vision and embodied AI, as it directly models how humans perceive and interact with their surroundings from a first-person perspective. Unlike third-person views, ego-centric data captures fine-grained details of hand-object interactions, gaze patterns, and motion cues that are essential for understanding human intent and behavior in context. This perspective is especially valuable for applications in assistive robotics, augmented reality (AR), skill assessment and coaching, and human-robot collaboration, where anticipating human actions and detecting mistakes in real time can significantly improve system responsiveness and safety. Our work contributes to this domain in three impactful ways: (1) Ego-centric hand trajectory forecasting enables robots and AR systems to anticipate user intentions and prepare appropriate responses or augmentations ahead of time. (2) Ego-centric activity mistake recognition supports intelligent feedback mechanisms, crucial for tutoring, rehabilitation, or safety-critical environments like surgery or manufacturing. (3) Ego-centric instructional video generation allows scalable simulation and training data creation, facilitating the development of robust models for manipulation planning and human behavior modeling. Collectively, these capabilities push the boundary of perceptual intelligence in first-person settings and lay the foundation for trustworthy, adaptive, and context-aware embodied agents.
-
Publications
- Yujiang Pu, Zhanbo Huang, Vishnu Boddeti, Yu Kong. Show Me: Generating Instructional Videos with Diffusion Models. Under Review, 2025. Project
- Wenliang Guo, Yujiang Pu, Yu Kong. Procedural Mistake Detection via Action Effect Modeling. Under Review, 2025. Project
- Wentao Bao, Lele Chen, Libing Zeng, Zhong Li, Yi Xu, Junsong Yuan, and Yu Kong. Uncertainty-aware State Space Transformer for Egocentric 3D Trajectory Forecasting. International Conference on Computer Vision (ICCV), pp. 13656-13665, 2023
Enjoy Reading This Article?
Here are some more articles you might like to read next: