Junhao Chen
Open to PhD 2027

Junhao Chen

Tsinghua University

3D/4D World Modeling & Spatial Intelligence · Generative Modeling for Structured & Controllable Worlds

Research Interests & Opportunities

👋 I am a master’s student at Tsinghua University, supervised by Prof. Ruqi Huang (expected graduation: Fall 2027). Before Tsinghua, I graduated first in my class (rank 1/233, B.S. in Software Engineering) from Harbin Engineering University. My Google Scholar:

My research goal is to build generative world models that can represent, generate, verify, and improve digital and physical worlds.


🔥 Actively seeking PhD positions (Fall 2027) and RA / visiting student opportunities. I am especially interested in advisors and collaborators working on world models, 3D/4D vision, video generation, code generation, LLM/VLM agents, auto research, and AI for robot / CAD / hardware design.


💼 I also have professor friends recruiting Research Interns (Beijing / Hangzhou / Shenzhen / Shanghai, competitive salary), MS/PhD students, and undergraduate research interns. If you are interested in 3D / VLM / video / animation generation and understanding, feel free to contact me! junhao-c24@mails.tsinghua.edu.cn

Feel free to contact me by email if you’d like to discuss or collaborate. 欢迎优秀的本科/研究生联系科研合作!

😥 Click here to enter emo time !

News

🔥 News

Selected Work

Representative publications and projects.

Work across 3D world modeling, interactive video, and controllable character generation.

📝 Publications

🧭 3D/4D World Modeling, Spatial Intelligence, and Embodied AI

World Representation, Articulation, and Spatial Intelligence

inreview
Feedforward 3D Editing Learns from Semantic-Part Transformation

Feedforward 3D Editing Learns from Semantic-Part Transformation

Jiawei Weng *, Saining Zhang * †, Zhenxin Diao *, Peishuo Li, Henghaofan Zhang, Junhao Chen, Hao Zhao †

  • PartFlow is a feedforward 3D editing network trained on Pxform, editing an existing 3D asset to match a target edit image without per-asset optimization or 3D masks at inference.
ECCV 2026
OVOW

One Video, One World: Turning Monocular Video into Physical 4D Scenes

Junhao Chen *, Boran Zhang *, Mingjin Chen, Henghaofan Zhang, Saining Zhang, Congcong Zhu, Hao Zhao, Ruqi Huang †, Zhihao Li, Yufei Wang †

  • OVOW is a fully training-free system that turns a single monocular video into an instance-level, simulation-ready 4D mesh scene for downstream embodied AI and physics engines.
CVPR 2026
sym

Animator-Centric Skeleton Generation on Objects with Fine-Grained Details

Mingze Sun, Cheng Zeng, Jiansong Pei, Junhao Chen, Chaoyue Song, Shaohui Wang, Tianyuan Chang, Bin Huang, Zijiao Zeng †, Ruqi Huang †

  • Uses semantic-aware tokenization, a large rigged-mesh corpus, and a density-control module to generate high-quality, controllable skeletons for complex 3D assets.
Machine Vision and Applications 2026
arxiv 2026
sym

From Frames to Sequences: Temporally Consistent Human-Centric Dense Prediction

Xingyu Miao, Junting Dong †, Qin Zhao, Yuhang Yang, Junhao Chen, Yang Long †

arXiv
  • Learns temporally consistent human-centric segmentation, depth, and normals via synthetic video supervision and a two-stage static→dynamic training pipeline.

Interaction-Aware Embodied World Modeling

CVPR 2026

HVG-3D: Bridging Real and Simulation Domains for 3D-Conditional Hand-Object Interaction Video Synthesis

Mingjin Chen *, Junhao Chen *, Zhaoxin Fan †, Yujian Lee, Zichen Dang, Lili Wang, Yawen Cui, Lap-Pui Chau , Yi Wang †

arXiv
  • HVG-3D: A 3D-aware HOI video diffusion framework with 3D ControlNet that turns one image plus 3D control signals into spatially precise, temporally coherent interaction videos.
ICLR 2026

🎛 Generative Modeling for Structured and Controllable Worlds

Structured and Multimodal Content Generation

CVPR 2026

LottieGPT: Tokenizing Vector Animation for Autoregressive Generation

Junhao Chen *, Kejun Gao *, Yuehan Cui, Mingze Sun, Mingjin Chen, Shaohui Wang, Xiaoxiao Long, Fei Ma, Qi Tian, Hao Zhao †, Ruqi Huang †

GitHub Repo Stars LottieSVG-10M LottieAnimation-660K
  • Tokenizes Lottie vector animations and finetunes a multimodal model to generate coherent, editable vector animations from text or visual prompts.
ICLR 2026

GarmentGPT: Compositional Garment Pattern Generation via Discrete Latent Tokenization

Fangsheng Weng *, Junhao Chen *, Xiang Li, Jie Qin, Hanzhong Guo, Shaochun Hao, Xiaoguang Han †

[📜Paper]

  • Uses RVQ-VAE tokenization and a VLM generator to produce garment sewing patterns from discrete latent tokens, achieving strong accuracy on large curated datasets.

Controllable Visual Generation

arxiv 2023
sym

FineStyler: Text-guided Instance-level Fine-grained Image Style Transfer

Junhao Chen, Rong Peng, Xiang Li, Jingbo Sun, Hao Zhao, Ruqi Huang

GitHub Repo Stars Open In Colab arXiv

  • This work enables fine-grained stylization of a single image through text-guidance!

🎙 Multimodal Perception and Understanding

ECCV 2026
LipsFlow

A First Exploration of Neuromorphic OT-CFM for Multi-Speaker VSR

Lin Chen, Jingping Fang, Hairui Liu, Chenyang Xu, Junhao Chen, Xiaorui Li, Weidong Cai, Xiaoming Chen

  • LipsFlow tackles multi-speaker visual speech recognition by converting RGB videos into event streams and modeling fine-grained articulatory dynamics with efficient OT-CFM inference.
COLING 2024
sym

MMAD: Multi-modal Movie Audio Description

Xiaojun Ye, Junhao Chen, Xiang Li, Haidong Xin, Chao Li, Sheng Zhou †, Jiajun Bu

GitHub Repo Stars [📜Paper]

  • This work has unlocked a whole new experience of watching movies for the visually impaired.

🧠 Foundation Models, Reasoning, and Evaluation

ACL 2026
PairCoder

PairCoder: Pair Programming-Inspired Two-Agent Collaboration for Code Generation

Junhao Chen, Xiang Li, Yibin Xu, Yuehan Cui, Fangsheng Weng, Hao Zhao, Fei Ma, Qi Tian

  • PairCoder adapts pair programming into efficient two-agent LLM collaboration, using dynamic Driver-Navigator interaction to improve code generation quality with far lower token cost than typical multi-agent frameworks.
EMNLP 2025
sym

LLMsPark: A Benchmark for Evaluating Large Language Models in Strategic Gaming Contexts

Junhao Chen, Jingbo Sun, Xiang Li, Haidong Xin, Yuhao Xue, Yibin Xu, Hao Zhao †

arXiv

  • This work evaluates LLMs through a game-theoretic framework.
ACL 2025
sym
EMNLP 2023
sym

ZhuJiu: A Multi-dimensional, Multi-faceted Chinese Benchmark for Large Language Models

Baoli Zhang, Haining Xie, Pengfan Du, Junhao Chen, Pengfei Cao, Yubo Chen †, Shengping Liu, Kang Liu, Jun Zhao

[🏆Leaderboard ] arXiv [📜Paper] [🎥Video]

  • This work serves as a benchmark for evaluating the Chinese language capabilities of large language models.
ICANN 2023
sym

Towards Energy-Efficient Sentiment Classification with Spiking Neural Networks

Junhao Chen, Xiaojun Ye, Jingbo Sun, Chao Li †

[📜Paper]

  • This work applies a pulsed neural network to a natural language sentiment categorization task, reaching the leading edge in terms of energy consumption.

Recognition

Honors, awards, and competition results.

Selected recognition from research and engineering work.

🎖 Honors and Awards

Innovation and Entrepreneurship Competition Award Cumulative Awards National *10, Provincial *45, School-level *11, totaling 66.
Honors awards cumulative awards national *6, provincial *2, school-level *20, a total of 28.
Competition awards and individual honors total 94 (as of 11, 18, 2024).
List of all awards received.

Journey

Education, research experience, and service.

The path that shaped the current research direction.

📖 Educations

💻 Experiences

🧑‍💻 Professional Services

Reviewer@ACL ARR (2025.02 - now) ICLR (2026), AAAI (2026), ICML (Gold Reviewer with complimentary registration@2026), TMLR 2026, TVCG 2026, ECCV (2026), ACM MM (2026), NeurIPS (2026), SIGGRAPH Asia(2026)

Reach

Visitors.