Jinkun Cao

I am a Member of Technical Staff at Amazon Frontier AI & Robotics (FAR) working on robots.

I was a Research Scientist at Meta Superintelligence Lab (MSL) and FAIR. I obtained Robotics PhD from CMU with Kris Kitani. My PhD research was supported by Meta PhD Fellowship. I received bachelor degree from Shanghai Jiao Tong University.

I am interested in how robots can gain knowledge in a scalable way from human experience to interact with our shared physical world. My research spans the interaction of computer vision and robotics.

Github | Google Scholar | Linkedin | DBLP | Resume (usually out-dated)
Email: jinkuncao [AT] gmail.com

News

[2026/03]: I moved from Meta to Amazon Frontier AI & Robotics (FAR) to continue working on robotics.
[2026/03]: SAM3D Body was accepted to CVPR 2026 with all three reviewers giving 6/6 ratings!
[2025/12]: I am attending NeurIPS at San Diego between Dec 1 - 5. I will be giving lightning talks about SAM3D Body at Meta Booth on Dec 3 and Dec 4.
[2025/11]: SAM3D Body is released as a part of SAM3D and together with SAM3! I was part of the SAM3D Body team and led the development of hand pose estimation and joint model training. Give it a try at our Playground! Code and models are open-source on Github and Hugging Face.
[2025/10]: GENMO was accepted at ICCV as a Highlight and I would be at ICCV, Honolulu to present.
[2025/7]: I passed my PhD thesis defense on "Estimating and Generating Human Motions from Interactions". Thesis is here.

—— See Older News ——

[2024/9]: Three papers accepted to NeurIPS 2024 and one paper accepted to NeurIPS DB Track 2024. See you in Vancouver! (Hopefully I could get a visa)

[2024/3]: Concluded my tracking research in a thesis. My recent research is more about the generation of human shape, motion and behavior.

[2024/2]: SimXR for pose estimation and simulation from head-mounted cameras is accepted by CVPR 2024. Congrats to Zen!

[2024/2]: CSC-Tracker for multi-object tracking is accepted by ICRA 2024. Stay tuned for more details!

[2024/1]: Two papers are accepted by ICLR as Spotlights. Check UniHSI and PULSE for details!

[2023/7]: One paper about humanoid control is accpeted to ICCV. Congrats, Zen! Looking forward to the trip to Paris.

[2023/4]: Awarded Meta PhD Research Fellowship since 2023. Thank you Meta!

[2023/2]: Deep OC-SORT is available on arxiv and Github, ranking 1st on MOT17, MOT20 and DanceTrack among published papers.

[2023/2]: Two papers are accepted to CVPR 2023 (including OC-SORT). See you at Vancouver.

[2022/9]: A paper is accepted to BMVC'2022 for multi-object tracking. The paper is coming to the public soon.

[2022/9]: MED paper is accepted by NeurIPS'2022. We study the disentanglement property of high-dimensional representation models and introduce contrastive learning methods into disentanglement benchmarks. Stay tuned for a heavily revised veresion of paper.

[2022/8]: OC-SORT is supported by mmtracking now. Try it for more flexible and advanced features!

[2022/3]: The code of OC-SORT is released. It achieves SOTA performance on multiple MOT datasets in a pure motion-based fashion.

[2022/3]: We are organizing "Multiple Object Tracking in Complex Environments Workshop” in ECCV'2022, Tel Aviv, Israel.

[2022/2]: DanceTrack is accepted in CVPR'2022. We propose a challenging multi-object tracking dataset.

Selected Publications

* indicates equal contribution | I am a main contributor of the highlighted projects.

SAM 3D Body: Robust Full-Body Human Mesh Recovery

SAM3D Body Team at Meta

I led the whole-body and hand pose efforts and contributed the general model training

CVPR 2026 [Tech Report] [Online Playground] [Blog] [Github]

Contact4D: A Video Dataset for Whole-body Human Motion and Finger Contact in Dexterous Operations

Jyun-Ting Song, Jungeun Kim, Jinkun Cao, Yu Lei, Takuma Yagi, Kris Kitani

3DV 2026 [paper]

Joint Diffusion for Universal Hand-Object Grasp Generation

Jinkun Cao, Jingyuan Liu, Kris Kitani, Yi Zhou

TMLR 2025 [openreview]

GENMO: A GENarlist Model for Human MOtion

Jiefeng Li, Jinkun Cao, Haotian Zhang, Davis Rampe, Jan Kautz, Umar Iqbal, and Ye Yuan

ICCV 2025 [Highlight][project]

Grasping Diverse Objects with Simulated Humanoids

Zhengyi Luo*, Jinkun Cao*, Sammy Christen, Alexander Winkler, Kris Kitani, Weipeng Xu

NeurIPS 2024 [project]

Mixed Gaussian Flow for Diverse Trajectory Prediction

Jiahe Chen*, Jinkun Cao*, Kris Kitani, Jiangmiao Pang

NeurIPS 2024 [arxiv]

Harmony4D: A Video Dataset for In-The-Wild Close Human Interactions

Rawal Khirodkar*, Jyun-Ting Song*, Jinkun Cao, Zhengyi Luo, Kris Kitani

NeurIPS 2024 (Dataset and Benchmark Track) [project]

Real-Time Simulated Avatar from Head-Mounted Sensors

Zhengyi Luo, Jinkun Cao, Rawal Khirodkar, Alexander Winkler, Jing Huang, Kris Kitani, Weipeng Xu

CVPR 2024 [arxiv] [project]

Multi-Object Tracking by Hierarchical Visual Representations

Jinkun Cao, Jiangmiao Pang, Kris Kitani

ICRA 2024 [arxiv]

Universal Humanoid Motion Representations for Physics-Based Control

Zhengyi Luo, Jinkun Cao, Josh Merel, Alexander Winkler, Jing Huang, Kris Kitani, Weipeng Xu

ICLR 2024 (Spotlight) [arxiv] [project]

Perpetual Humanoid Control for Real-time Simulated Avatars

Zhengyi Luo, Jinkun Cao, Alexander Winkler, Kris Kitani, Weipeng Xu

ICCV 2023 [project page] [arxiv] [code]

Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking

Jinkun Cao, Jiangmiao Pang, Xinshuo Weng, Rawal Khirodkar, Kris Kitani

CVPR 2023 [arxiv] [code] [mmtracking]

Track Targets by Dense Spatio-Temporal Position Encoding

Jinkun Cao, Hao Wu, Kris Kitani

BMVC 2022 [Oral] [arxiv]

An Empirical Study on Disentanglement of Negative-free Contrastive Learning

Jinkun Cao, Ruiqian Nai, Qing Yang, Jialei Huang, Yang Gao

NeurIPS 2022 [arxiv] [code]

DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion

Peize Sun*, Jinkun Cao*, Yi Jiang, Zehuan Yuan, Song Bai, Kris Kitani, Ping Luo

CVPR 2022 [arxiv] [code] [project page] [codalab]

Instance-aware predictive navigation in multi-agent environments

Jinkun Cao, Xin Wang, Trevor Darrell, Fisher Yu

ICRA 2021 [arxiv] [code]

Cross-Domain Adaptation for Animal Pose Estimation

Jinkun Cao, Hongyang Tang, Hao-Shu Fang, Xiaoyong Shen, Cewu Lu, Yu-Wing Tai

ICCV 2019 [Oral] (4.3% acceptance rate) [arxiv] [dataset]

Services

Conference Reviewer: ICCV, ECCV, CVPR, ISMAR, ICRA, IROS, NeurIPS, AAAI, ICML, ICLR, Siggraph, Siggraph-Asia
Journal Reviewer: RA-L, IEEE Trans. Multimedia, TMLR, IJCV, TPAMI, TOG
Workshop Organizer:
- "Multiple Object Tracking in Complex Environments Workshop” in ECCV'2022