Logo

EX-4D: EXtreme Viewpoint 4D Video Synthesis via Depth Watertight Mesh

Tao Hu*, Haoyang Peng*, Xiao Liu, Yuewen Ma

Pico, Bytedance

* Equal Contribution

Applications: World Generation

Input Image

Input Image

Output 360° World Video

Abstract

Generating high-quality camera-controllable videos from monocular input is challenging, particularly under extreme viewpoints. Existing methods often struggle with geometric inconsistencies and occlusion artifacts. We introduce EX-4D, a novel framework that addresses these challenges through a Depth Watertight Mesh representation that explicitly models both visible and occluded regions, ensuring geometric consistency even in extreme camera poses.

Our approach includes a simulated masking strategy that generates effective training data from monocular videos, eliminating the need for paired multi-view datasets. A lightweight LoRA-based video diffusion adapter synthesizes high-quality, physically consistent, and temporally coherent videos.

Demo Video

Demonstration of EX-4D's extreme viewpoint synthesis capabilities showing smooth camera transitions and consistent temporal dynamics across challenging viewpoint changes.

Key Features

🔧 Depth Watertight Mesh

Novel geometric representation that models both visible and occluded regions for consistent extreme viewpoint synthesis.

🎭 Simulated Masking

Training strategy that creates effective data from monocular videos without requiring multi-view datasets.

⚡ Lightweight Adapter

LoRA-based video diffusion adapter with only 1% trainable parameters for efficient high-quality synthesis.

Method Overview

Method Overview

Our EX-4D framework transforms monocular videos into extreme viewpoint 4D videos through three key steps: (1) constructing a DW-Mesh as geometric prior to handle boundary occlusions, (2) generating training masks to simulate novel view occlusions, and (3) using a lightweight video diffusion adapter for physically consistent and temporally coherent synthesis.

Citation

  @misc{hu2025ex4dextremeviewpoint4d,
      title={EX-4D: EXtreme Viewpoint 4D Video Synthesis via Depth Watertight Mesh}, 
      author={Tao Hu and Haoyang Peng and Xiao Liu and Yuewen Ma},
      year={2025},
      eprint={2506.05554},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2506.05554}, 
}