EFE:End-to-end Frame-to-Gaze Estimation

1ETH Zurich 2Lunit Inc. 3TU Delft
GAZE 2023, CVPR
Overview

Abstract

Despite the recent development of learning-based gaze estimation methods, most methods require one or more eye or face region crops as inputs and produce a gaze direction vector as output. Cropping results in a higher resolution in the eye regions and having fewer confounding factors (such as clothing and hair) is believed to benefit the final model performance. However, this eye/face patch cropping process is expensive, erroneous, and implementation-specific for different methods. In this paper, we propose a frame-to-gaze network that directly predicts both 3D gaze origin and 3D gaze direction from the raw frame out of the camera without any face or eye cropping. Our method demonstrates that direct gaze regression from the raw downscaled frame, from FHD/HD to VGA/HVGA resolution, is possible despite the challenges of having very few pixels in the eye region. The proposed method achieves comparable results to state-of-the-art methods in Point-of-Gaze (PoG) estimation on three public gaze datasets: GazeCapture, MPIIFaceGaze, and EVE, and generalizes well to extreme camera view changes.

Architecture

Overview

We present our tailored end-to-end architecture for frame-to-gaze estimation problem.

BibTeX

@InProceedings{Balim_2023_CVPR,
    author    = {Balim, Haldun and Park, Seonwook and Wang, Xi and Zhang, Xucong and Hilliges, Otmar},
    title     = {EFE: End-to-End Frame-To-Gaze Estimation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
    month     = {June},
    year      = {2023},
    pages     = {2687-2696}
}