Xuefei Ning

Research Assistant Professor at Tsinghua University

NICS-EFC, EE Dept., Tsinghua University

NOTE: I’m not maintaining this website actively. Please check this website https://nics-effalg.com/ for my updates.

Started from 2024, I’m a research assistant professor in the NICS-EFC group at the Department of Electronic Engineering, Tsinghua University. I got my B.S. and Ph.D. degrees from the department of Electronic Engineering, Tsinghua University, in 2016 and 2021, advised by Prof. Huazhong Yang and Prof. Yu Wang. I spent two years (from 2021.12 to 2023.12) as a post-doctoral researcher with and Prof. Yu Wang and Prof. Pinyan Lu.

My past research interests mainly lie in Model Compression and Neural Architecture Search (NAS). Currently, I’m leading the Efficient Deep Learning Algorithm (EffAlg) Team in the NICS-EFC group. Check the team website. I’ve been advising 10+ graduate and undergraduate students. And the current major focus of my team is efficient AIGC, including language and vision generative models.

Our group is continuously recruiting visiting students and engineers who are interested in efficient deep learning. I’ve been instructing quite a few undergraduate, master and PhD students for their first projects, and I must say I learned a lot and get quite some experiences on how to help different students learn, improve ability, and accomplish some goals. I’m sure we can do something interesting and maybe impactful together. Email me and Prof. Yu Wang if you’re interested!

Interests

Neural Architecture Search
Efficient Deep Learning

Education

PhD in EE, 2016-2021
Tsinghua University
BE in EE, 2012-2016
Tsinghua University

Updates

2024/04/23: Our survey on efficient LLM inference is public on arXiv. Any discussions and suggestions are welcome! Email me and Zixuan Zhou!
2024/04/05: Our paper on more efficient “training” of consistency models is public on arXiv. This work proposes a method, Linear Combination of Saved Checkpoints (LCSC). LCSC uses gradient-free search-based checkpoint combination to obtain the final weights, achieving significant training speedups (23x on CIFAR-10 and 15x on ImageNet-64) compared to full gradient-based training. LCSC can be used to enhance pre-trained models with a small cost. Check our paper and code.
2024/02/29: Our paper on evaluating quantized large language models is public on arXiv. This work evaluates 11 LLM families, different tasks (including emergent abilities, dialogue, long-context tasks, and so on), and different tensor types (Weight, Weight-Activation, Key-Value Cache). We provide quantative suggestions and qualitative insights on the quantization. Practitioners could benefit from this work with a full scope of quantization suggestions. Check our paper and code.
- 2024/05/02: This work is accepted by ICML'24. Congrats to the students.
2024/02/27: 1 paper, FlashEval, is accepted by CVPR'24. This work is on selecting a compact data subset to evaluate text-to-image Diffusion models. Congrats and thanks to the students and collaborators. Check our paper on arXiv.
2024/02/09: Our paper on long-context benchmark, LV-Eval, is public on arXiv. Check the code and HuggingFace page.
2024/01/17: 2 papers are acccepted by ICLR'24. One is Skeleton-of-Thoughts, which accelerates LLM generation by letting the LLM itself to plan and generate segments in parallel, achieving ~2x speed-ups; Another is USF, which summarizes the sampling strategies for diffusion and search for the best sampling strategy. Congrats and thanks to the collaborators and students.
2023/12/17: My students and collaborators present two of our work in the Efficient Natural Language and Speech Processing workshop at NeurIPS'23: LLM-MQ: Mixed-precision Quantization for Efficient LLM Deployment, and Skeleton-of-Thoughts.
2023/09/22: 1 paper is accepted by NeurIPS'23. Congrats and thanks to the collaborators.
2023/08/27: Give a tutorial talk on LLM quantization for an AWS competition. Here is the tutorial-only slide! And here is the video.
2023/07/27: Our technical report on prompting techniques for efficient LLM generation (work still in progress) – “Skeleton-of-Thoughts” – is public! Check the website for an introduction and demos. The code is available here.
2023/07/17: 1 paper is accepted by ICCV'23. This work is on Dynamic Inference for Efficient 3D Perception. Congrats to the students and collaborators. Check the website for more information.
2023/04/27: Give a 1.5h talk at Huawei on Model Compression for Efficient DL.
2023/04/25: 1 paper is accepted by ICML'23. This work is on Searching Model Schedule for Efficient Diffusion – OMS-DPM. Congrats to the students and collaborators. Check the website for an introduction and demos.
2022/12/05: 1 paper, GATES++, is accepted by TPAMI.
2022/11/25: Give a 60-min talk at Inceptio.ai on practices of applying NAS for efficient DL, including (1) “how to efficient search”: sample-based and one-shot workflows, and (2) “how to consider hardware efficiency objectives in NAS”.
2022/11/24: Give a 75-min talk at Renming University on NAS research.
2022/11/19: 3 papers, DELE / MOSP / EIO, have been accepted by AAAI'23. The topics are efficient NAS, LLCV pruning, and efficient adversarial ensemble training, respecitvely. And the NAS work, DELE, is selected as Oral presentation. Congrats to the students and collaborators.
2022/11/09: Give a 20-min talk (starting from 50:20) at AI-Time on TA-GATES. The talk is in Chinese.
2022/09/15: 1 paper, TA-GATES, is accepted by NeurIPS'22 as Spotlight.

Publications

omited, see my google scholar for a full list

Selected Talks

An Introduction to Quantization of Large Language Models

A talk about efficient LLM with a special focus on quantization.

Last updated on Aug 30, 2023

Model Compression Towards Efficient Deep Learning Inference

A talk on model compression towards efficient DL inference

Last updated on Aug 29, 2023

Neural Architecture Search and Architecture Encoding

A talk on NAS researches at Renmin University.

Last updated on Dec 12, 2022