📝 Publications
🧑‍🎨 Controllable World Model
LottieGPT: Tokenizing Vector Animation for Autoregressive Generation
Junhao Chen, Kejun Gao, Yuehan Cui, Mingze Sun, Mingjin Chen, Shaohui Wang, Xiaoxiao Long, Fei Ma, Qi Tian, Hao Zhao †, Ruqi Huang â€
- Tokenizes Lottie vector animations and finetunes a multimodal model to generate coherent, editable vector animations from text or visual prompts.
HVG-3D: Bridging Real and Simulation Domains for 3D-Conditional Hand-Object Interaction Video Synthesis
Mingjin Chen *, Junhao Chen *, Zhaoxin Fan †, Yujian Lee, Zichen Dang, Lili Wang, Yawen Cui, Lap-Pui Chau †, Yi Wang
- HVG-3D: A 3D-aware HOI video diffusion framework with 3D ControlNet that turns one image plus 3D control signals into spatially precise, temporally coherent interaction videos.
Animator-Centric Skeleton Generation on Objects with Fine-Grained Details
Mingze Sun, Cheng Zeng, Jiansong Pei, Junhao Chen, Chaoyue Song, Shaohui Wang, Tianyuan Chang, Bin Huang, Zijiao Zeng, Ruqi Huang â€
- Uses semantic-aware tokenization, a large rigged-mesh corpus, and a density-control module to generate high-quality, controllable skeletons for complex 3D assets.
GarmentGPT: Compositional Garment Pattern Generation via Discrete Latent Tokenization
Fangsheng Weng *, Junhao Chen *, Xiang Li, Jie Qin, Hanzhong Guo, Shaochun Hao, Xiaoguang Han â€
- Uses RVQ-VAE tokenization and a VLM generator to produce garment sewing patterns from discrete latent tokens, achieving strong accuracy on large curated datasets.
From Frames to Sequences: Temporally Consistent Human-Centric Dense Prediction
Xingyu Miao, Junting Dong †, Qin Zhao, Yuhang Yang, Junhao Chen, Yang Long â€
- Learns temporally consistent human-centric segmentation, depth, and normals via synthetic video supervision and a two-stage static→dynamic training pipeline.
DanceTogether! Identity-Preserving Multi-Person Interactive Video Generation
Junhao Chen, Mingjin Chen, Jianjin Xu, Xiang Li, Junting Dong †, Mingze Sun, Puhua Jiang, Hongxiang Li, Yuhang Yang, Hao Zhao, Xiaoxiao Long, Ruqi Huang â€
- This work generates identity-preserving multi-person interactive dance videos with controllable motion and appearance!
DRiVE: Diffusion-based Rigging Empowers Generation of Versatile and Expressive Characters
Mingze Sun *, Junhao Chen *, Junting Dong †, Yurun Chen, Xinyu Jiang, Shiwei Mao, Puhua Jiang, Jingbo Wang, Bo Dai, Ruqi Huang â€
- This work generates skeleton and skinning with clothes and hair for 3d gaussian avatar!
Ultraman: Single Image 3D Human Reconstruction with Ultra Speed and Detail
Mingjin Chen *, Junhao Chen *, Huan-ang Gao, Xiaoxue Chen, Zhaoxin Fan, Hao Zhao â€
- This work converts a single image of the human body into a lifelike 3D model!
Idea23D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs
Junhao Chen *, Xiang Li *, Xiaojun Ye, Chao Li, Zhaoxin Fan †, Hao Zhao â€
- This work enables automated 3D model design and generation for people!
🎙 Multi-modal
MMAD: Multi-modal Movie Audio Description
Xiaojun Ye, Junhao Chen, Xiang Li, Haidong Xin, Chao Li, Sheng Zhou †, Jiajun Bu
- This work has unlocked a whole new experience of watching movies for the visually impaired.
FineStyler: Text-guided Instance-level Fine-grained Image Style Transfer
Junhao Chen, Rong Peng, Xiang Li, Jingbo Sun, Hao Zhao, Ruqi Huang
- This work enables fine-grained stylization of a single image through text-guidance!
đź‘€ Large Language Model
LLMsPark: A Benchmark for Evaluating Large Language Models in Strategic Gaming Contexts
Junhao Chen, Jingbo Sun, Xiang Li, Haidong Xin, Yuhao Xue, Yibin Xu, Hao Zhao â€
- This work evaluates LLMs through a game-theoretic framework.
IW-Bench: Evaluating Large Multimodal Models for Converting Image-to-Web
Hongcheng Guo, Wei Zhang, Junhao Chen, Yaonan Gu, Jian Yang, Junjia Du, Shaosheng Cao, Binyuan Hui, Tianyu Liu, Jianxin Ma, Chang Zhou, Zhoujun Li
- This work is a benchmark for evaluating MLLM image-2-html code generation capabilities.

ZhuJiu: A Multi-dimensional, Multi-faceted Chinese Benchmark for Large Language Models
Baoli Zhang, Haining Xie, Pengfan Du, Junhao Chen, Pengfei Cao, Yubo Chen †, Shengping Liu, Kang Liu, Jun Zhao
[🏆Leaderboard ]
[📜Paper]
[🎥Video]
- This work serves as a benchmark for evaluating the Chinese language capabilities of large language models.

Towards Energy-Efficient Sentiment Classification with Spiking Neural Networks
Junhao Chen, Xiaojun Ye, Jingbo Sun, Chao Li â€
- This work applies a pulsed neural network to a natural language sentiment categorization task, reaching the leading edge in terms of energy consumption.