
I am an Assistant Professor at Halıcıoğlu Data Science Institute and Department of Computer Science and Engineering (affiliate) at UC San Diego. I lead the Hao AI Lab at UCSD. I cofounded LMNet.ai (2023), and we have joined force with Snowflake since November 2023. During 2016 - 2021, I worked for the ML platform startup Petuum Inc. Here is a short Bio.
Prospective students and postdocs: I am recruiting new PhD students and postdocs. We also have openings for MS/undergrad research interns. Please check out this page to see how to get involved.
Research
I study the intersection area of machine learning and systems. I am equally interested in designing strong, efficient, and secure machine learning models and algorithms, and in building scalable, practical distributed systems that can support real-world machine learning workloads.
Our Lab (@haoailab) develops open models, algorithms, and systems to democratize the access of large models.
Current Projects
- LLM inference and serving systems: DeepConf [ICLR'26], Dynasor [NeurIPS'25], DistServe [OSDI'24], vLLM [SOSP'23]
- Efficient ML architectures/algorithms: d3LLM [Preprint'26], Jacobi Forcing [Preprint'25], VSA/STA [NeurIPS'25, ICML'25]
- Open data, models, and evals: VideoScience-Bench [Preprint'26], FastWan Series, LMGame Bench [ICLR'26, ICLR'25]
- Model-parallel ML Systems: DistCA [MLSys'26], Alpa [OSDI'22, MLSys'23]
Some of my research have been actively developed and maintained as open source software:
- FastVideo: A lightweight framework for accelerating large video diffusion models.
- LMGame: Evaluate and improve AI by repurposing computer games.
- Lookahead Decoding: A parallel LLM decoding method that trades FLOPs for fewer decoding steps.
- vLLM: A high-throughput and memory-efficient inference engine for LLMs.
- Ray Collective: CPU/GPU collective communication primitives on Ray.
Some previous projects:
- FastChat: An open platform for training, serving, and evaluating Large Language Models.
- Chatbot Arena: Benchmarking LLMs in the Wild with Elo Ratings.
- Vicuna: A series of popular open-source LLM chatbots available in 7B/13B/33B sizes.
- Alpa: Training large-scale neural networks with auto parallelization. Scales to 1000+ GPUs.
- AutoDist: Automatic data-parallel training on TensorFlow.
- DyNet: The Dynamic Neural Network Toolkit.
- Poseidon: Parameter server on distributed GPUs.
Students and Postdocs
Current Members
- Junda Chen, PhD (w/ Tajana Rosing)
- Yichao Fu, PhD
- Rui Ge, Undergrad Intern
- Lanxiang Hu, PhD (w/ Tajana Rosing)
- Mingjia Huo, PhD (w/ Tajana Rosing)
- Susan Li, Undergrad Intern
- Will Lin, PhD
- Matthew Noto, Undergrad Intern
- Yu-Yang Qian, Visiting PhD
- Abhilash Shankarampeta, Master
- David Su, PhD
- Junli Wang, PhD (w/ Prithviraj Ammanabrolu)
- Haoyang Yu, Undergrad Intern
- Peiyuan Zhang, PhD
- Yuxuan Zhang, Master
- Yiming Zhao, Master
- Wei Zhou, Master
Past Students
- Minghang Deng, Master (2024) -> Ant Group
- Yonqqi Chen, Master (2024) -> Stealth startup
- Runlong Su, Master (2024) -> Bytedance
- Zheyu Fu, Master (2024) -> NVIDIA
- Ashwin Ramachandran, Master (2024) -> ContextFort (co-founder)
- Siqi Zhu, Undergrad Intern (2024) -> PhD @ UIUC
- Anze Xie, Master (2023) -> MBZUAI IFM Lab
- Hangliang Ding, Undergrad Intern (2024) -> Bytedance
- Jiangfei Duan, Visiting PhD (2023) -> Alibaba Group
- Runyu Lu, Undergrad Intern (2023) -> PhD @ UMich
- Dacheng Li, Master (2020) -> PhD @ UC Berkeley
- Hexu Zhao, Undergrad Intern (2022) -> PhD @ NYU
- Yonghao Zhuang, Undergrad Intern (2021) -> PhD @ CMU
Recent Talks
- 02/2026Tutorial at Nvidia Research Radar Talk Series
- 01/2026Talk at Nvidia Dynamo Day
- 12/2025Talk at Workshop on Next Practices in Video Generation and Evaluation @ NeurIPS 2025
- 12/2025Talk at The First Workshop on Efficient Reasoning @ NeurIPS 2025
- 05/2025Talk at MBZUAI IFM Launching Event
- 04/2025Talk at Rugters Efficient AI Seminar
- 04/2025Talk at Microsoft Research Aisa ACE Talk Series
- 04/2025Talk at CMU 11868 LLM Systems
- 03/2025Talk at Bytedance AIP Spearhead Tech Talk Series
- 02/2025Talk at Faster LLM Inference Seminar @ Weizmann Institute of Science
- 11/2024Talk at UWaterloo Invited Talk
- 10/2024Talk at LinkedIn AI Seminar
- 10/2024Talk at PyTorch Webinar
- 09/2024Talk at Microsoft GenAI AIMS Talk
- 04/2024Talk at UChicago AI+System Seminar
- 03/2024Talk at NSF Open-Source Generative AI (OSGAI) Workshop
- 03/2024Talk at Essence VC Q1 Virtual Conference: LLM Inference
- 02/2024Talk at PKU Alumni Association of Northern California (PKUAANC)
- 12/2023Panel at Instruction Workshop @ NeurIPS 2023
- 11/2023Tutorial at ODSC West
Experience
- Assistant Professor, UC San Diego, 2023 - Present
- Software Engineer, Snowflake, 2023 - Present
- Postdoc, UC Berkeley, 2021 - 2023
- Director of Scalable Machine Learning, Petuum Inc, 2016 - 2021
- Ph.D. Student, Carnegie Mellon University, 2014 - 2020 (on leave 2016 - 2020)