**TempCLR: Reconstructing Hands via Time-Coherent Contrastive Learning** Andrea Ziani1,♣, [Zicong Fan](https://zc-alexfan.github.io)1,2,♣, [Muhammed Kocabas](https://ps.is.tuebingen.mpg.de/person/mkocabas)1,2, [Sammy Christen](https://ait.ethz.ch/people/sammyc/)1, [Otmar Hilliges](https://ait.ethz.ch/people/hilliges/)1
1ETH Zürich, 2Max Planck Institute for Intelligent Systems, Tübingen, ♣ Equal contribution
*In Proceedings of the International Conference on 3D Vision (3DV), 2022, Prague, Czechia.*
Code
## Goal ![Reconstructing 3D hand from a single RGB image](./assets/task.png width=80%) ## Key Insight - Large amount of unlabelled monocular RGB video data in the wild - Limited accurate 3D annotation (often in-the-lab and not diverse) - Use contrastive learning to enforce similar hand poses to have similar embeddings. ![](./assets/tsne.png width=90%) ## Video ![](https://www.youtube.com/watch?v=VSsKx8SnFio) ## In-the-wild results (no 3D supervision from this dataset) TempCLR: ![](./assets/tempclr/itw1.gif width=30%) ![](./assets/tempclr/itw2.gif width=30%) Without TempCLR: ![](./assets/expose/itw1.gif width=30%) ![](./assets/expose/itw2.gif width=30%) ## In-the-lab results (no 3D supervision from this dataset) TempCLR: ![](./assets/tempclr/hanco1.gif width=30%) ![](./assets/tempclr/hanco2.gif width=30%) Without TempCLR: ![](./assets/expose/hanco1.gif width=30%) ![](./assets/expose/hanco2.gif width=30%) ## Abstract We introduce TempCLR, a new time-coherent contrastive learning approach for the structured regression task of 3D hand reconstruction. Unlike previous time-contrastive methods for hand pose estimation, our framework considers temporal consistency in its augmentation scheme, and accounts for the differences of hand poses along the temporal direction. Our data-driven method leverages unlabelled videos and a standard CNN, without relying on synthetic data, pseudo-labels, or specialized architectures. Our approach improves the performance of fully-supervised hand reconstruction methods by 15.9% and 7.6% in PA-V2V on the HO-3D and FreiHAND datasets respectively, thus establishing new state-of-the-art performance. Finally, we demonstrate that our approach produces smoother hand reconstructions through time, and is more robust to heavy occlusions compared to the previous state-of-the-art which we show quantitatively and qualitatively. ## Citing Us ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~text @inProceedings{ziani2022tempclr, title={TempCLR: Reconstructing Hands via Time-Coherent Contrastive Learning}, author={Ziani, Andrea and Fan, Zicong and Kocabas, Muhammed and Christen, Sammy and Hilliges, Otmar}, booktitle={International Conference on 3D Vision (3DV)}, year={2022} } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~