HiFace: High-Fidelity 3D Face Reconstruction
by Learning Static and Dynamic Details

ICCV 2023

Zenghao Chai^1,2 Tianke Zhang² Tianyu He³ Xu Tan³ Tadas Baltrušaitis⁴
HsiangTao Wu⁵ Runnan Li⁵ Sheng Zhao⁵ Chun Yuan² Jiang Bian³

¹ National University of Singapore ² Tsinghua University ³Microsoft Research Asia ⁴Microsoft Mixed Reality & AI Lab ⁵Microsoft Cloud + AI

PDF Supplementary arXiv Poster

Abstract

3D Morphable Models (3DMMs) demonstrate great potential for reconstructing faithful and animatable 3D facial surfaces from a single image. The facial surface is influenced by the coarse shape, as well as the static detail (e.g., person-specific appearance) and dynamic detail (e.g., expression-driven wrinkles). Previous work struggles to decouple the static and dynamic details through image-level supervision, leading to reconstructions that are not realistic. In this paper, we aim at high-fidelity 3D face reconstruction and propose HiFace to explicitly model the static and dynamic details. Specifically, the static detail is modeled as the linear combination of a displacement basis, while the dynamic detail is modeled as the linear interpolation of two displacement maps with polarized expressions. We exploit several loss functions to jointly learn the coarse shape and fine details with both synthetic and real-world datasets, which enable HiFace to reconstruct high-fidelity 3D shapes with animatable details. Extensive quantitative and qualitative experiments demonstrate that HiFace presents state-of-the-art reconstruction quality and faithfully recovers both the static and dynamic details.

Overview of HiFace

Illustration of HiFace. (a). The learning architecture of HiFace. Given a monocular image, we regress its shape and detail coefficients to synthesize a realistic 3D face, and leverage a differentiable renderer to train the whole model end-to-end from synthetic and real-world images. (b). The pipeline of Static and Dynamic Decoupling for DeTail Reconstruction (SD-DeTail). We explicitly decouple the static and dynamic factors to synthesize realistic and animatable details. Given the shape and static coefficients, we regress the static and dynamic details through displacement bases and interpolate them into the final details through vertex tension.

Coarse Reconstruction

For coarse shape reconstruction, our HiFace faithfully recovers the coarse shape of the given identity and outperforms the previous learning-based methods, and is on par with Dense, the state-of-the-art optimization-based method.

Detail Reconstruction

For detail shape reconstruction, HiFace achieves the most realistic reconstruction quality, and faithfully recovers facial details of a given image, which significantly outperforms previous methods by a large margin.

Flexibility of SD-DeTail

SD-DeTail can be easily plugged into previous optimization-based methods. Given the identity and expression coefficients (β, ξ) from the optimization-based method, SD-DeTail can generate realistic details based on the coarse shape and further improve the visual quality.

Detail Animation

Given a source image (yellow box), we use the driving images (green box) to drive its person-specific details and expressions. For each source image, we manipulate the static (1st-row), dynamic (2nd-row), or both (3rd-row) factors. The results (red box) show that HiFace is flexible to animate details from static, dynamic, or both factors, and presents vivid animation quality with realistic shapes.

BibTeX

@InProceedings{Chai_2023_ICCV,
    author    = {Chai, Zenghao and Zhang, Tianke and He, Tianyu and Tan, Xu and Baltrusaitis, Tadas and Wu, HsiangTao and Li, Runnan and Zhao, Sheng and Yuan, Chun and Bian, Jiang},
    title     = {HiFace: High-Fidelity 3D Face Reconstruction by Learning Static and Dynamic Details},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {9087-9098}
}