Junhao Chen

Tsinghua University

📝 Publications

🧑‍🎨 Controllable World Model

CVPR 2026

LottieGPT: Tokenizing Vector Animation for Autoregressive Generation

Junhao Chen, Kejun Gao, Yuehan Cui, Mingze Sun, Mingjin Chen, Shaohui Wang, Xiaoxiao Long, Fei Ma, Qi Tian, Hao Zhao †, Ruqi Huang †

Tokenizes Lottie vector animations and finetunes a multimodal model to generate coherent, editable vector animations from text or visual prompts.

CVPR 2026

HVG-3D: Bridging Real and Simulation Domains for 3D-Conditional Hand-Object Interaction Video Synthesis

Mingjin Chen *, Junhao Chen *, Zhaoxin Fan †, Yujian Lee, Zichen Dang, Lili Wang, Yawen Cui, Lap-Pui Chau † , Yi Wang

HVG-3D: A 3D-aware HOI video diffusion framework with 3D ControlNet that turns one image plus 3D control signals into spatially precise, temporally coherent interaction videos.

CVPR 2026

sym

Animator-Centric Skeleton Generation on Objects with Fine-Grained Details

Mingze Sun, Cheng Zeng, Jiansong Pei, Junhao Chen, Chaoyue Song, Shaohui Wang, Tianyuan Chang, Bin Huang, Zijiao Zeng, Ruqi Huang †

Uses semantic-aware tokenization, a large rigged-mesh corpus, and a density-control module to generate high-quality, controllable skeletons for complex 3D assets.

ICLR 2026

GarmentGPT: Compositional Garment Pattern Generation via Discrete Latent Tokenization

Fangsheng Weng *, Junhao Chen *, Xiang Li, Jie Qin, Hanzhong Guo, Shaochun Hao, Xiaoguang Han †

Uses RVQ-VAE tokenization and a VLM generator to produce garment sewing patterns from discrete latent tokens, achieving strong accuracy on large curated datasets.

arxiv 2026

sym

From Frames to Sequences: Temporally Consistent Human-Centric Dense Prediction

Xingyu Miao, Junting Dong †, Qin Zhao, Yuhang Yang, Junhao Chen, Yang Long †

Learns temporally consistent human-centric segmentation, depth, and normals via synthetic video supervision and a two-stage static→dynamic training pipeline.

ICLR 2026

DanceTogether! Identity-Preserving Multi-Person Interactive Video Generation

Junhao Chen, Mingjin Chen, Jianjin Xu, Xiang Li, Junting Dong †, Mingze Sun, Puhua Jiang, Hongxiang Li, Yuhang Yang, Hao Zhao, Xiaoxiao Long, Ruqi Huang †

This work generates identity-preserving multi-person interactive dance videos with controllable motion and appearance!

CVPR 2025

DRiVE: Diffusion-based Rigging Empowers Generation of Versatile and Expressive Characters

Mingze Sun *, Junhao Chen *, Junting Dong †, Yurun Chen, Xinyu Jiang, Shiwei Mao, Puhua Jiang, Jingbo Wang, Bo Dai, Ruqi Huang †

This work generates skeleton and skinning with clothes and hair for 3d gaussian avatar!

Machine Vision and Applications 2026

Ultraman: Single Image 3D Human Reconstruction with Ultra Speed and Detail

Mingjin Chen *, Junhao Chen *, Huan-ang Gao, Xiaoxue Chen, Zhaoxin Fan, Hao Zhao †

This work converts a single image of the human body into a lifelike 3D model!

COLING 2025

Idea23D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs

Junhao Chen *, Xiang Li *, Xiaojun Ye, Chao Li, Zhaoxin Fan †, Hao Zhao †

This work enables automated 3D model design and generation for people!

COLING 2024

sym

MMAD: Multi-modal Movie Audio Description

Xiaojun Ye, Junhao Chen, Xiang Li, Haidong Xin, Chao Li, Sheng Zhou †, Jiajun Bu

This work has unlocked a whole new experience of watching movies for the visually impaired.

arxiv 2023

sym

FineStyler: Text-guided Instance-level Fine-grained Image Style Transfer

Junhao Chen, Rong Peng, Xiang Li, Jingbo Sun, Hao Zhao, Ruqi Huang

This work enables fine-grained stylization of a single image through text-guidance!

👀 Large Language Model

EMNLP 2025

sym

LLMsPark: A Benchmark for Evaluating Large Language Models in Strategic Gaming Contexts

Junhao Chen, Jingbo Sun, Xiang Li, Haidong Xin, Yuhao Xue, Yibin Xu, Hao Zhao †

This work evaluates LLMs through a game-theoretic framework.

ACL 2025

sym

IW-Bench: Evaluating Large Multimodal Models for Converting Image-to-Web

Hongcheng Guo, Wei Zhang, Junhao Chen, Yaonan Gu, Jian Yang, Junjia Du, Shaosheng Cao, Binyuan Hui, Tianyu Liu, Jianxin Ma, Chang Zhou, Zhoujun Li

This work is a benchmark for evaluating MLLM image-2-html code generation capabilities.

EMNLP 2023

sym

ZhuJiu: A Multi-dimensional, Multi-faceted Chinese Benchmark for Large Language Models

Baoli Zhang, Haining Xie, Pengfan Du, Junhao Chen, Pengfei Cao, Yubo Chen †, Shengping Liu, Kang Liu, Jun Zhao

[🏆Leaderboard ] [📜Paper] [🎥Video]

This work serves as a benchmark for evaluating the Chinese language capabilities of large language models.

ICANN 2023

sym

Towards Energy-Efficient Sentiment Classification with Spiking Neural Networks

Junhao Chen, Xiaojun Ye, Jingbo Sun, Chao Li †

This work applies a pulsed neural network to a natural language sentiment categorization task, reaching the leading edge in terms of energy consumption.