Xiao Ma (马骁)

I am a Research Scientist at Dyson Robot Learning Lab. I obtained my PhD from National University of Singapore, advised by Prof. David Hsu. I also worked closely with Prof. Wee Sun Lee. I received my B.Sc. in Computer Science from Shanghai Jiao Tong University in 2017, where I was advised by Prof. Fan Wu and Prof. Xiaofeng Gao. Previously, I have spent wonderful time at Sea AI Lab, hosted by Prof. Shuicheng Yan and Dr. Min Lin, and at SenseTime Research, hosted by Dr. Shuai Yi.

I'm broadly interested in reinforcement learning, representation learning, information theory, and their applications to robot learning in unstructured environments.

Collaborations and discussions are always welcomed! Please feel free to email me if you're interested.

Email  /  CV  /  Google Scholar  /  Twitter  /  Github

profile photo

- [Feb. 2024] 1 paper accepted to CVPR 2024.

- [Sept. 2023] 3 papers accepted to NeurIPS 2023.

- [Feb. 2023] 1 paper accepted to CVPR 2023.

- [Feb. 2023] I joined Dyson Robot Learning Lab as a Lead Researcher.

- [Jan. 2023] 3 papers accepted to ICLR 2023 (1 oral 2 posters)!

- [Apr. 2022] HILMO for reinforcement learning under mixed observability has been accepted to WAFR 2022!

- [Feb. 2022] G-DOOM for deformable object manipulation has been accepted to ICRA 2022!

- [May 2021] PROMPT for ab-initio object manipulation has been accepted by RSS 2021.

- [Oct. 2020] CVRL for model-based RL under complex observations has been accepted by CoRL 2020.

- [Sept. 2020] BALMS for long-tailed visual recognition has been accepted by NeurIPS 2020.

- [Jul. 2020] STAR for pedestrian trajectory prediction has been accepted by ECCV 2020.

- [Dec. 2019] DPFRL for reinforcement learning under complex and partial observations has been accepted by ICLR 2020.

- [Nov. 2019] PF-RNNs for sequence modeling under uncertainty has been accepted to AAAI 2020.

- [Jun. 2019] DAN was nominated for the best system paper and best student paper of RSS 2019!

Selected Publications (Full publication list)
Hierarchical Diffusion Policy for Multi-Task Robotic Manipulation
Xiao Ma, Sumit Patidar, Iain Haughton, Stephen James
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024  
project page / pdf / code

Hierarchical Diffusion Policy (HDP) factorises the policy space into a 1) high-level task-planning agent and 2) low-level goal-conditioned diffusion policy, which achieves both task-level generalisation and flexible low-level control.

InsActor: Instruction-driven Physics-based Characters
Jiawei Ren*, Mingyuan Zhang, Cunjun Yu*, Xiao Ma, Liang Pan, Ziwei Liu,
Conference on Neural Information Processing Systems (NeurIPS), 2023  
project page / pdf / code

InsActor is a principled generative framework that leverages recent advancements in diffusion-based human motion models for physics-based human animation generation.

Efficient Diffusion Policies for Offline Reinforcement Learning
Bingyi Kang*, Xiao Ma*, Chao Du, Tianyu Pang, Shuicheng Yan (*equal contributions)
Conference on Neural Information Processing Systems (NeurIPS), 2023  
pdf / code

We introduce Efficient Diffusion Policies (EDPs), a more general, faster, and better diffusion policy class for offline RL. EDPs reduce the training time of DQL from 5 days to 5 hours!

Mutual Information Regularized Offline Reinforcement Learning
Xiao Ma*, Bingyi Kang*, Zhongwen Xu, Min Lin, Zhongwen Xu, Shuicheng Yan (*equal contributions)
Conference on Neural Information Processing Systems (NeurIPS), 2023  
pdf / code

MISA is a general framework for offline RL motivated by mutual information estimation. We show that both Conservative Q Learning (CQL) and TD3+BC can be considered as its variants.

Imitation Learning via Differentiable Physics
Siwei Chen, Xiao Ma, Zhongwen Xu
Computer Vision and Pattern Recognition (CVPR), 2023  
pdf / code / bibtex

We present Imitation Learning via Differentiable Physics (ILD), which casts the imitation learning as a state-matching task through differentiable physics-based Chamfer distance loss. ILD significantly improves the sample efficiency and generalization of imitation learning algorithms with only one expert demonstration.

DaxBench: Benchmarking Deformable Object Manipulation with Differentiable Physics
Siwei Chen*, Cunjun Yu*, Yiqing Xu*, Linfeng Li, Xiao Ma, Zhongwen Xu, David Hsu (*equal contributions)
International Conference on Learning Representations (ICLR), 2023   (Oral)
pdf / bibtex

We present DaXBench, a comprehensive benchmark for deformable object manipulation, including planning, imitation learning, and reinforcement learning, based on a scalable and differentiable physics simulator coded in JAX.

DiffMimic: Efficient Motion Mimicking with Differentiable Physics
Jiawei Ren*, Cunjun Yu*, Siwei Chen, Xiao Ma, Liang Pan, Ziwei Liu, (*equal contributions)
International Conference on Learning Representations (ICLR), 2023  
project page / pdf / code / live demo / bibtex

DiffMimic scales motion imitation for simulated characters with differentiable physics. Training controllers on large-scale motion database is more accessible with DiffMimic.

RPM: Generalizable Behaviors for Multi-Agent Reinforcement Learning
Wei Qiu, Xiao Ma, Bo An, Svetlana Obraztsova, Shuicheng Yan, Siwei Chen*, Zhongwen Xu,
International Conference on Learning Representations (ICLR), 2023
pdf / bibtex

We present Ranked Policy Memory (RPM) which simulates unseen agents by ranking history agents in multi-agent RL to encourage better generalization during evaluation.

Learning Latent Graph Dynamics for Deformable Object Manipulation
Xiao Ma, David Hsu, Wee Sun Lee,
International Conference on Robotics and Automation (ICRA), 2022  
project page / pdf / bibtex

We present G-DOOM for deformable object manipulation. G-DOOM abstract an deformable object as a keypoint-based graph and models the spatio-temporal keypoint interactions with Recurrent Graph Dynamics. G-DOOM achieves SOTA performance on a set of deformable object manipulation tasks.

Ab Initio Particle-based Object Manipulation
Siwei Chen, Xiao Ma, Yunfan Lu, David Hsu,
Robotics: Science and Systems (RSS), 2021  
project page / pdf / code / bibtex

This paper introduces PROMPT, a framework for particle-based object manipulation. PROMPT performs high-quality online point cloud reconstruction from multi-view images captured by an eye-in-hand camera. It achieves high performance in object grasping, pushing, and placing.

Contrastive Variational Reinforcement Learning for Complex Observations
Xiao Ma, Siwei Chen, David Hsu, Wee Sun Lee,
In Proceedings of The 4nd Conference on Robot Learning (CoRL), 2020  
project page / pdf / code / talk / bibtex

We introduce CVRL, contrastive model-based reinforcement learning for complex observations. Different from standard generative models, CVRL learns a contrastive latent world model and significantly improves the robustness against complex observations.

Balanced Meta-Softmax for Long-Tailed Visual Recognition
Jiawei Ren Cunjun Yu, Shunan Sheng, Xiao Ma, Haiyu Zhao, Shuai Yi, Hongsheng Li
Advances in Neural Information Processing Systems (NeurIPS), 2020  
pdf / code / bibtex

Our key observation is that softmax is biased under the long-tailed distribution. BALMS provides a mathematically unbiased gradient estimate for long-tailed distributions and applies meta-learning to further improve the data sampling process.

Spatio-Temporal Graph Transformer Networks for Pedestrian Trajectory Prediction
Cunjun Yu*, Xiao Ma*, Jiawei Ren, Haiyu Zhao, Shuai Yi (* equal contribution)
European Conference on Computer Vision (ECCV), 2020  
project page / pdf / code / talk / bibtex

We introduce STAR, the first transformer-based pedestrian trajectory predictor. STAR generalizes the Transformers into spatio-temporal graphs and significantly improves the trajectory prediction accuracy (2x).

Discriminative Particle Filter Reinforcement Learning for Complex Partial Observations
Xiao Ma, Peter Karkus, David Hsu, Wee Sun Lee,
International Conference on Learning Representations (ICLR), 2020  
project page / pdf / code / talk / bibtex

We introduce DPFRL for reinforcement learning for complex partial observations. DPFRL encodes a discriminative particle algorithm as a differentiable computational graph in neural networks which improves the belief tracking.

Particle Filter Recurrent Neural Networks
Xiao Ma*, Peter Karkus*, David Hsu, Wee Sun Lee (* equal contribution)
AAAI Conference on Artificial Intelligence (AAAI), 2020  
pdf / code / bibtex

We introduce PF-RNNs for general sequence prediction under uncertainty. PF-RNNs encodes a differentiable particle filter algorithm with standard RNNs and improves the general sequence prediction performance.

Differentiable Algorithm Networks for Composable Robot Learning
Peter Karkus, Xiao Ma, David Hsu, Leslie Kaelbling, Wee Sun Lee Tomas Lozano-Perez
Robotics: Science and Systems (RSS), 2019   best system paper finalist & best student paper finalist
pdf / bibtex

A DAN is composed of neural network modules, each encoding a differentiable algorithm and an associated model; and it is trained end-to-end from data. The algorithms and models act as structural priors to reduce the data requirements for learning; end-to-end learning allows the modules to adapt to one another and compensate for imperfect models and algorithms.

Stolen from Jon Barron