Hao Zhang

Assistant Professor

HDSI, CSE (affiliate)

Email: haozhang AT ucsd.edu

I am an Assistant Professor at Halıcıoğlu Data Science Institute and Department of Computer Science and Engineering (affiliate) at UC San Diego. I lead the Hao AI Lab at UCSD. I cofounded LMNet.ai (2023), and we have joined force with Snowflake since November 2023. During 2016 - 2021, I worked for the ML platform startup Petuum Inc. Here is a short Bio.

Prospective students and postdocs: I am recruiting new PhD students and postdocs. We also have openings for MS/undergrad research interns. Please check out this page to see how to get involved.

Research

I study the intersection area of machine learning and systems. I am equally interested in designing strong, efficient, and secure machine learning models and algorithms, and in building scalable, practical distributed systems that can support real-world machine learning workloads.

Our Lab (@haoailab) develops open models, algorithms, and systems to democratize the access of large models.

Current Projects

LLM inference and serving systems: DeepConf [ICLR'26], Dynasor [NeurIPS'25], DistServe [OSDI'24], vLLM [SOSP'23]
Efficient ML architectures/algorithms: d3LLM [Preprint'26], Jacobi Forcing [Preprint'25], VSA/STA [NeurIPS'25, ICML'25]
Open data, models, and evals: VideoScience-Bench [Preprint'26], FastWan Series, LMGame Bench [ICLR'26, ICLR'25]
Model-parallel ML Systems: DistCA [MLSys'26], Alpa [OSDI'22, MLSys'23]

Some of my research have been actively developed and maintained as open source software:

FastVideo: A lightweight framework for accelerating large video diffusion models.
LMGame: Evaluate and improve AI by repurposing computer games.
Lookahead Decoding: A parallel LLM decoding method that trades FLOPs for fewer decoding steps.
vLLM: A high-throughput and memory-efficient inference engine for LLMs.
Ray Collective: CPU/GPU collective communication primitives on Ray.

Some previous projects:

FastChat: An open platform for training, serving, and evaluating Large Language Models.
Chatbot Arena: Benchmarking LLMs in the Wild with Elo Ratings.
Vicuna: A series of popular open-source LLM chatbots available in 7B/13B/33B sizes.
Alpa: Training large-scale neural networks with auto parallelization. Scales to 1000+ GPUs.
AutoDist: Automatic data-parallel training on TensorFlow.
DyNet: The Dynamic Neural Network Toolkit.
Poseidon: Parameter server on distributed GPUs.

Students and Postdocs

Current Members

Junda Chen, PhD (w/ Tajana Rosing)
Yichao Fu, PhD
Rui Ge, Undergrad Intern
Lanxiang Hu, PhD (w/ Tajana Rosing)
Mingjia Huo, PhD (w/ Tajana Rosing)
Susan Li, Undergrad Intern
Will Lin, PhD
Matthew Noto, Undergrad Intern
Yu-Yang Qian, Visiting PhD
Abhilash Shankarampeta, Master
David Su, PhD
Junli Wang, PhD (w/ Prithviraj Ammanabrolu)
Haoyang Yu, Undergrad Intern
Peiyuan Zhang, PhD
Yuxuan Zhang, Master
Yiming Zhao, Master
Wei Zhou, Master

Past Students

Minghang Deng, Master (2024) -> Ant Group
Yonqqi Chen, Master (2024) -> Stealth startup
Runlong Su, Master (2024) -> Bytedance
Zheyu Fu, Master (2024) -> NVIDIA
Ashwin Ramachandran, Master (2024) -> ContextFort (co-founder)
Siqi Zhu, Undergrad Intern (2024) -> PhD @ UIUC
Anze Xie, Master (2023) -> MBZUAI IFM Lab
Hangliang Ding, Undergrad Intern (2024) -> Bytedance
Jiangfei Duan, Visiting PhD (2023) -> Alibaba Group
Runyu Lu, Undergrad Intern (2023) -> PhD @ UMich
Dacheng Li, Master (2020) -> PhD @ UC Berkeley
Hexu Zhao, Undergrad Intern (2022) -> PhD @ NYU
Yonghao Zhuang, Undergrad Intern (2021) -> PhD @ CMU

Recent Talks

02/2026Tutorial at Nvidia Research Radar Talk Series
01/2026Talk at Nvidia Dynamo Day
12/2025Talk at Workshop on Next Practices in Video Generation and Evaluation @ NeurIPS 2025
12/2025Talk at The First Workshop on Efficient Reasoning @ NeurIPS 2025
05/2025Talk at MBZUAI IFM Launching Event
04/2025Talk at Rugters Efficient AI Seminar
04/2025Talk at Microsoft Research Aisa ACE Talk Series
04/2025Talk at CMU 11868 LLM Systems
03/2025Talk at Bytedance AIP Spearhead Tech Talk Series
02/2025Talk at Faster LLM Inference Seminar @ Weizmann Institute of Science
11/2024Talk at UWaterloo Invited Talk
10/2024Talk at LinkedIn AI Seminar
10/2024Talk at PyTorch Webinar
09/2024Talk at Microsoft GenAI AIMS Talk
04/2024Talk at UChicago AI+System Seminar
03/2024Talk at NSF Open-Source Generative AI (OSGAI) Workshop
03/2024Talk at Essence VC Q1 Virtual Conference: LLM Inference
02/2024Talk at PKU Alumni Association of Northern California (PKUAANC)
12/2023Panel at Instruction Workshop @ NeurIPS 2023
11/2023Tutorial at ODSC West

Experience

Assistant Professor, UC San Diego, 2023 - Present
Software Engineer, Snowflake, 2023 - Present
Postdoc, UC Berkeley, 2021 - 2023
Director of Scalable Machine Learning, Petuum Inc, 2016 - 2021
Ph.D. Student, Carnegie Mellon University, 2014 - 2020 (on leave 2016 - 2020)