4D-DRESS: A 4D Dataset of Real-world Human Clothing with Semantic Annotations

Department of Computer Science, ETH Zurich

Dataset Overview. 4D-DRESS is the first real-world 4D dataset of human clothing, capturing 64 human outfits in more than 520 motion sequences. These sequences include a) high-quality 4D textured scans; for each scan, we annotate b) vertex-level semantic labels, thereby obtaining c) the corresponding garment meshes and fitted SMPL(-X) body meshes. Totally, 4D-DRESS captures dynamic motions of 4 dresses, 28 lower, 30 upper, and 32 outer garments. For each garment, we also provide its canonical template mesh to benefit the future human clothing study.


The studies of human clothing for digital avatars have predominantly relied on synthetic datasets. While easy to collect, synthetic data often fall short in realism and fail to capture authentic clothing dynamics. Addressing this gap, we introduce 4D-DRESS, the first real-world 4D dataset advancing human clothing research with its high-quality 4D textured scans and garment meshes. 4D-DRESS captures 64 outfits in 520 human motion sequences amounting to a total of 78k textured scans. Creating a real-world clothing dataset is challenging, particularly in annotating and segmenting the extensive and complex 4D human scans. To address this, we develop a semi-automatic 4D human parsing pipeline. We efficiently combine a human-in-the-loop process with automation to accurately label 4D scans in diverse garments and body movements. Leveraging precise annotations and high-quality garment meshes, we establish a number of benchmarks for clothing simulation and reconstruction. 4D-DRESS offers realistic and challenging data that complements synthetic sources, paving the way for advancements in research of lifelike human clothing.

Paper Video

4D Human Parsing Method

4D Human Parsing Method. We first render current and previous frame scans into multi-view images and labels. 3.1) Then collect multi-view parsing results from the image parser, optical flows, and segmentation masks. 3.2) Finally, we project multi-view labels to 3D vertices and optimize vertex labels using the Graph Cut algorithm with vertex-wise unary energy and edge-wise binary energy. 3.3) The manual rectification labels can be easily introduced by checking the multi-view rendered labels.

4D-DRESS Dataset Subjects

4D-DRESS captures 32 subjects with 64 real-world human outfits in more than 520 motion sequences and 78k scan frames.

4D-DRESS Dataset Contents

4D-DRESS provides each sequence with a) high-quality 4D textured scans in b) vertex-level semantic annotations, plus c) the corresponding multi-view captured images and rendered labels, as well as d) the extracted garments and fitted SMPL(-X) bodies.


title={4D-DRESS: A 4D Dataset of Real-world Human Clothing with Semantic Annotations},
author={Wang, Wenbo and Ho, Hsuan-I and Guo, Chen and Rong, Boxiang and Grigorev, Artur and Song, Jie and Zarate, Juan Jose and Hilliges, Otmar},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},