News
- [Sept. 2024] 1 paper accepted to CoRL 2024.
- [Jul. 2024] I joined TikTok as a research scientist.
- [Jun. 2024] 1 paper accepted to RA-L 2024.
- [Feb. 2024] 1 paper accepted to CVPR 2024.
- [Sept. 2023] 3 papers accepted to NeurIPS 2023.
- [Feb. 2023] 1 paper accepted to CVPR 2023.
- [Jan. 2023] 3 papers accepted to ICLR 2023 (1 oral 2 posters)!
- [Feb. 2022] G-DOOM for deformable object manipulation has been accepted to ICRA 2022!
- [May 2021] PROMPT for ab-initio object manipulation has been accepted by RSS 2021.
- [Oct. 2020] CVRL for model-based RL under complex observations has been accepted by CoRL 2020.
- [Sept. 2020] BALMS for long-tailed visual recognition has been accepted by NeurIPS 2020.
- [Jul. 2020] STAR for pedestrian trajectory prediction has been accepted by ECCV 2020.
- [Dec. 2019] DPFRL for reinforcement learning under complex and partial observations has been accepted by ICLR 2020.
- [Nov. 2019] PF-RNNs for sequence modeling under uncertainty has been accepted to AAAI 2020.
- [Jun. 2019] DAN was nominated for the best system paper and best student paper of RSS 2019!
|
Selected Publications (Full publication list)
|
|
BiGym: A Demo-Driven Mobile Bi-Manual Manipulation Benchmark
Nikita Cherniadev*,
Nicholas Backshall*,
Xiao Ma*,
Yunfan Lu,
Younggyo Seo,
Stephen James (*equal contributions)
Conference on Robot Learning (CoRL), 2024  
project page
/
pdf
/
code
We present BiGym, a new benchmark and learning environment for mobile bi-manual demo-driven robotic manipulation. BiGym consists of 40 diverse tasks in home environment, and provides human-collected demonstrations.
|
|
Green Screen Augmentation Enables Scene Generalisation in Robotic Manipulation
Eugene Teoh*,
Sumit Patidar*,
Xiao Ma,
Stephen James (*equal contributions)
ArXiv Preprint, 2024  
project page
/
pdf
/
code
GreenAug provides a simple visual augmentation to robot policies by first collecting data with a green screen, then augmenting it with different textures. The resulting policy can be transferred to unseen visually distinct novel locations (scenes).
|
|
Redundancy-aware Action Spaces for Robot Learning
Pietro Mazzaglia*,
Nicholas Backshall*
Xiao Ma,
Stephen James (*equal contributions)
IEEE Robotics and Automation Letters (RA-L), 2024  
project page
/
pdf
/
code
We present a new family of action spaces that benefits from both the efficiency from the task space and the flexibility from the joint space for the robotic manipulation.
|
|
Hierarchical Diffusion Policy for Multi-Task Robotic Manipulation
Xiao Ma,
Sumit Patidar,
Iain Haughton,
Stephen James
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024  
project page
/
pdf
/
code
Hierarchical Diffusion Policy (HDP) factorises the policy space into a 1) high-level task-planning agent and 2) low-level goal-conditioned diffusion policy, which achieves both task-level generalisation and flexible low-level control.
|
|
InsActor: Instruction-driven Physics-based Characters
Jiawei Ren*,
Mingyuan Zhang,
Cunjun Yu*,
Xiao Ma,
Liang Pan,
Ziwei Liu,
Conference on Neural Information Processing Systems (NeurIPS), 2023  
project page
/
pdf
/
code
InsActor is a principled generative framework that leverages recent advancements in diffusion-based human motion models for physics-based human animation generation.
|
|
Efficient Diffusion Policies for Offline Reinforcement Learning
Bingyi Kang*,
Xiao Ma*,
Chao Du,
Tianyu Pang,
Shuicheng Yan (*equal contributions)
Conference on Neural Information Processing Systems (NeurIPS), 2023  
pdf
/
code
We introduce Efficient Diffusion Policies (EDPs), a more general, faster, and better diffusion policy class for offline RL. EDPs reduce the training time of DQL from 5 days to 5 hours!
|
|
Mutual Information Regularized Offline Reinforcement Learning
Xiao Ma*,
Bingyi Kang*,
Zhongwen Xu,
Min Lin,
Zhongwen Xu,
Shuicheng Yan (*equal contributions)
Conference on Neural Information Processing Systems (NeurIPS), 2023  
pdf
/
code
MISA is a general framework for offline RL motivated by mutual information estimation. We show that both Conservative Q Learning (CQL) and TD3+BC can be considered as its variants.
|
|
Imitation Learning via Differentiable Physics
Siwei Chen,
Xiao Ma,
Zhongwen Xu
Computer Vision and Pattern Recognition (CVPR), 2023  
pdf
/
code
/
bibtex
We present Imitation Learning via Differentiable Physics (ILD), which casts the imitation learning as a state-matching task through differentiable physics-based Chamfer distance loss. ILD significantly improves the sample efficiency and generalization of imitation learning algorithms with only one expert demonstration.
|
|
DaxBench: Benchmarking Deformable Object Manipulation with Differentiable Physics
Siwei Chen*,
Cunjun Yu*,
Yiqing Xu*,
Linfeng Li,
Xiao Ma,
Zhongwen Xu,
David Hsu
(*equal contributions)
International Conference on Learning Representations (ICLR), 2023   (Oral)
pdf
/
bibtex
We present DaXBench, a comprehensive benchmark for deformable object manipulation, including planning, imitation learning, and reinforcement learning, based on a scalable and differentiable physics simulator coded in JAX.
|
|
DiffMimic: Efficient Motion Mimicking with Differentiable Physics
Jiawei Ren*,
Cunjun Yu*,
Siwei Chen,
Xiao Ma,
Liang Pan,
Ziwei Liu,
(*equal contributions)
International Conference on Learning Representations (ICLR), 2023  
project page
/
pdf
/
code
/
live demo
/
bibtex
DiffMimic scales motion imitation for simulated characters with differentiable physics. Training controllers on large-scale motion database is more accessible with DiffMimic.
|
|
Learning Latent Graph Dynamics for Deformable Object Manipulation
Xiao Ma,
David Hsu,
Wee Sun Lee,
International Conference on Robotics and Automation (ICRA), 2022  
project page
/
pdf
/
bibtex
We present G-DOOM for deformable object manipulation. G-DOOM abstract an deformable object as a keypoint-based graph and models the spatio-temporal keypoint interactions with Recurrent Graph Dynamics. G-DOOM achieves SOTA performance on a set of deformable object manipulation tasks.
|
|
Ab Initio Particle-based Object Manipulation
Siwei Chen,
Xiao Ma,
Yunfan Lu,
David Hsu,
Robotics: Science and Systems (RSS), 2021  
project page
/
pdf
/
code
/
bibtex
This paper introduces PROMPT, a framework for particle-based object manipulation. PROMPT performs high-quality online point cloud reconstruction from multi-view images captured by an eye-in-hand camera. It achieves high performance in object grasping, pushing, and placing.
|
|