profile photo

Zhenyu "Allen" Zhang

Hi there! I'm a second-year Ph.D. student at UT Austin, advised by Prof. Zhangyang "Atlas" Wang. I am also collaborating with Prof. Beidi Chen at CMU and Dr. Yuandong Tian at Meta. My reserach focuses on efficient and reliable machine learning systems, specifically in the following topics:

  • Efficient training and inference for large foundation models
  • Scaling multi-modality foundation models with manageable costs
  • Unconventional computation paradigm, esp, quantum computing

 /   /   /   / 


  • [May. 2024] Five ICML'24 accepted, Galore, Cache Merging, Adaptive LLM Pruning, Sparse Low-Rank KV Cache, Once-for-All Sparse Training.

  • [Apr. 2024] Grateful to be awareded the MLSys'24 Student Travel Grant.

  • [Feb. 2024] One MLSys'24 accepted, Q-Hitter: Sparse-quantized KV cache.

  • [Jan. 2024] Two ICLR'24 accepted, JoMA: training dynamics of LLMs, and SMoE merging.

  • [Dec. 2023] One AAAI'24 accepted, Sparsity-guided concept bottleneck models.

  • [Oct. 2023] Excited to start my wonderful internship at Microsoft Research.

  • [Sep. 2023] One NeurIPS'23 accepted, LLM heavy-hitter oracle.

  • [Jul. 2023] One QCE'23 accepted, Sparse exploration of quantum circuits.

Selected Publications (full list)

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Jiawei Zhao, Zhenyu Zhang, Beidi Chen, Zhangyang Wang, Anima Anandkumar, Yuandong Tian

ICML 2024  /  Paper  /  Code  /  Hacker News HuggingFace LLaMA-Factory FedML Axolotl AICoffeeBreak

Oral Presentation

H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models

Zhenyu Zhang, Ying Sheng, Tianyi Zhou, Tianlong Chen, Lianmin Zheng, Ruisi Cai, Zhao Song, Yuandong Tian, Christopher Ré, Clark Barrett, Zhangyang Wang, Beidi Chen

NeurIPS 2023  /  Paper  /  Blog  /  Code  /  llama-recipes  /  Media (AI era/新智元)

Q-Hitter: A Better Token Oracle for Efficient LLM Inference via Sparse-Quantized KV Cache

Zhenyu Zhang*, Shiwei Liu*, Runjin Chen, Bhavya Kailkhura, Beidi Chen, Zhangyang Wang

MLSys 2024  /  Paper  /  Code

Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy

Pingzhi Li, Zhenyu Zhang, Prateek Yadav, Yi-Lin Sung, Yu Cheng, Mohit Bansal, Tianlong Chen

ICLR 2024  /  Paper  /  Code

Spotlight Presentation

JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention

Yuandong Tian, Yiping Wang, Zhenyu Zhang, Beidi Chen, Simon Du

ICLR 2024  /  Paper

QuantumSEA: In-Time Sparse Exploration for Noise Adaptive Quantum Circuits

Tianlong Chen, Zhenyu Zhang, Hanrui Wang, Jiaqi Gu, Zirui Li, David Z. Pan, Frederic T. Chong, Song Han, Zhangyang Wang

QCE 2023  /  Paper  /  Code

Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers

Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal, Shiwei Liu, Zhangyang Wang

ICLR 2023  /  Paper  /  Code

Spotlight Presentation

Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together!

Shiwei Liu*, Tianlong Chen*, Zhenyu Zhang, Xuxi Chen, Tianjin Huang, Ajay Jaiswal, Zhangyang Wang

ICLR 2023  /  Paper  /  Code

Spotlight Presentation

Sparse Winning Tickets are Data-Efficient Image Recognizers

Mukund Varma T, Xuxi Chen, Zhenyu Zhang, Tianlong Chen, Subhashini Venugopalan, Zhangyang Wang

NeurIPS 2022  /  Paper  /  Code

Spotlight Presentation

Randomized Channel Shuffling: Minimal-Overhead Backdoor Attack Detection without Clean Datasets

Ruisi Cai*, Zhenyu Zhang*, Tianlong Chen, Xiaohan Chen, Zhangyang Wang

NeurIPS 2022  /  Paper  /  Code

Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free

Tianlong Chen*, Zhenyu Zhang*, Yihua Zhang*, Shiyu Chang, Sijia Liu, Zhangyang Wang

CVPR 2022  /  Paper  /  Code

Sparsity Winning Twice: Better Robust Generalization from More Efficient Training

Tianlong Chen*, Zhenyu Zhang*, Pengjun Wang*, Santosh Balachandra*, Haoyu Ma*, Zehao Wang, Zhangyang Wang

ICLR 2022  /  Paper  /  Code

Efficient Lottery Ticket Finding: Less Data is More

Zhenyu Zhang*, Xuxi Chen*, Tianlong Chen*, Zhangyang Wang

ICML 2021  /  Paper  /  Code

Robust Overfitting May be Mitigated by Properly Learned Smoothening

Tianlong Chen*, Zhenyu Zhang*, Sijia Liu, Shiyu Chang, Zhangyang Wang

ICLR 2021  /  Paper  /  Code

Long Live the Lottery: The Existence of Winning Tickets in Lifelong Learning

Tianlong Chen*, Zhenyu Zhang*, Sijia Liu, Shiyu Chang, Zhangyang Wang

ICLR 2021  /  Paper  /  Code

GANs Can Play Lottery Tickets Too

Xuxi Chen*, Zhenyu Zhang*, Yongduo Sui, Tianlong Chen

ICLR 2021  /  Paper  /  Code

Work Experience

Microsoft Research

Reserach Intern, Sep. 2023 - Present

Advisor: Dr. Zhewei Yao Dr. Xiaoxia Wu

Lawrence Livermore National Laboratory

Reserach Intern, May. 2023 - Aug. 2023

Advisor: Dr. Bhavya Kailkhura Dr. Brian Bartoldson Dr. James Diffenderfer


  • Invited Conference Reviewer: NeurIPS, ICLR, ICML, CVPR, ICCV, ECCV, ICIP, ICME, CPAL, ACCV
  • Invited Journal Reviewer: TNNLS and JMLR