Lei Gao

I am a PhD student at the Ming Hsieh Department of Electrical and Computer Engineering at USC working with Prof. Murali Annavaram. I obtained my Bachelor’s degree from UC Santa Barbara and my Master’s degree from USC.

My research interests are centered around efficient LLM fine-tuning and inference, machine learning systems, and responsible AI. Here is a copy of my CV.

News

10/02/2025: Our paper DistilLock: Safeguarding LLMs from Unauthorized Knowledge Distillation on the Edge, led by my mentees Asmita Mohanty and Gezheng Kang, has been accepted to the NeurIPS Lock-LLM Workshop 2025.
08/20/2025: Our paper MobiZO: Enabling Efficient LLM Fine-Tuning at the Edge via Inference Engines has been accepted to EMNLP Main 2025.
07/07/2025: Our paper DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding has been accepted to CoLM 2025.
05/19/2025: Joined Microsoft as a Research Intern in Azure’s Strategic Planning and Architecture group, AI System Architecture team, working on performance modeling for LLM serving systems. Grateful to be mentored by Vinay Gangadhar and Mark Hill.
05/15/2025: Our paper KVPR: Efficient LLM Inference with I/O-Aware KV Cache Partial Recomputation has been accepted to ACL Findings 2025.
12/10/2024: Our paper Efficient LLM Inference with I/O-Aware Partial KV Cache Recomputation has been accepted to the AAAI SEAS Workshop 2025.
10/15/2024: Our paper Enabling Resource-Efficient On-Device Fine-Tuning of LLMs Using Only Inference Engines has been accepted to the NeurIPS ENLSP Workshop 2024.
03/13/2024: Our paper Ethos: Rectifying Language Models in Orthogonal Parameter Space has been accepted to NAACL Findings 2024.
12/18/2023: Our paper Ethos: Rectifying Language Models in Orthogonal Parameter Space has been accepted to the AAAI ReLM Workshop 2024 (spotlight presentation).

Teaching Assistant

EE508/599: Systems for Machine Learning (Fall 2023, Spring 2024, Spring 2025). I created slides and final project for our class.
EE109: Introduction to Embedded Systems (Fall 2024)

Professional Service

Reviewer: ARR (2025, 2024), ICLR SCOPE Workshop (2025), NeurIPS ENLSP Workshop (2024), IEEE Computer Architecture Letters (2023)

Volunteer Service

Mentorship: USC CURVE (Fall 2024, Spring 2025), USC VGSA (Fall 2025)
Talks: KVPR for Efficient LLM Batched Inference (Salesforce 2025, AMD 2025); vLLM for Efficient LLM Online Serving (Palo Alto Networks 2025)

Lei Gao (高雷)

News

Teaching Assistant

Professional Service

Volunteer Service