Research
* = equal contribution
|
Express Language Modeling
Albert Gong,
Annabelle Michael Carrell,
Raaz Dwivedi,
Lester Mackey
arXiv preprint, 2026
Code
/
arXiv
Tl;dr—Express provides state-of-the-art causal attention guarantees, an efficient I/O-aware Triton implementation, and practical improvements for prefill, cache compression, and decoding.
|
Learning from Synthetic Data Improves Multi-hop Reasoning
Anmol Kabra,
Yilun Yin,
Albert Gong,
Kamilė Stankevičiūtė,
Dongyoung Go,
Johann Lee,
Katie Z. Luo,
Carla P. Gomes,
Kilian Q. Weinberger
ICLR, 2026
Code
/
arXiv
Tl;dr—RL fine-tuning LLMs on synthetic data improves real-world multi-hop reasoning by teaching knowledge composition skills.
|
N2: A Unified Python Package and Test Bench for Nearest Neighbor-Based Matrix Completion
Caleb Chin,
Aashish Khubchandani,
Harshvardhan Maskara,
Kyuseong Choi,
Jacob Feitelberg,
Albert Gong,
Manit Paul,
Tathagata Sadhukhan,
Anish Agarwal,
Raaz Dwivedi
arXiv preprint, 2025
Code
/
arXiv
/
Poster (CODEML Workshop)
Tl;dr—Introduced the N2 package and N2-Bench test bench for nearest neighbor-based matrix completion.
|
PhantomWiki: On-Demand Datasets for Reasoning and Retrieval Evaluation
Albert Gong*,
Kamilė Stankevičiūtė*,
Chao Wan*,
Anmol Kabra*,
Raphael Thesmar,
Johann Lee,
JT Klenke,
Carla P. Gomes,
Kilian Q. Weinberger
ICML, 2025 (Oral presentation at ICML Workshop on Long Context Foundation Models)
Code
/
arXiv
/
Poster
/
Slides
Tl;dr—Created a framework to automatically generate both the document corpus and question-answer pairs for benchmarking RAG and agentic workflows.
|
Low-Rank Thinning
Annabelle Michael Carrell,
Albert Gong,
Abhishek Shetty,
Raaz Dwivedi,
Lester Mackey
ICML, 2025
Code (see below)
/
arXiv
Tl;dr—Developed new analysis of thinning algorithms that adapts to low-rank structures, enabling faster dot-product attention in Transformers (Thinformer), stochastic gradient descent (KH-SGD), and deep kernel hypothesis testing (DeepCTT).
|
Supervised Kernel Thinning
Albert Gong,
Kyuseong Choi,
Raaz Dwivedi
NeurIPS, 2024
Code
/
arXiv
/
Poster
/
Slides
/
Video
Tl;dr—Used distribution compression to speed up kernel smoothing and kernel ridge regression.
|
|