Anxin (Bob) Guo
郭岸新

Logo

bobguo2023[At]u[Dot]northwestern[Dot]edu

View My GitHub Profile

About Me

I’m a third-year computer science PhD student in Northwestern University’s theory group. My research interests are in theoretical computer science and the theoretical foundations of machine learning.

Before beginning my PhD studies at Northwestern University, I completed both my undergraduate and master’s degrees at the same institution. (Go Cats!)

Publications

Hallucination is a Consequence of Space-Optimality: A Rate-Distortion Theorem for Membership Testing
with Jingwei Li. ICML 2026 (spotlight).
arXiv

Summary: We model the memorization of random, non-inferable facts as a membership testing problem, connecting Bloom-filter-style error metrics with the log-loss behavior of language models. In the sparse-fact regime, we prove a rate-distortion theorem showing that the optimal space-error tradeoff is governed by a KL-divergence frontier. The result gives an information-theoretic explanation for high-confidence hallucinations: under limited capacity, even an optimal model may assign high confidence to some non-facts rather than simply abstaining or forgetting.

Agnostic Learning of Arbitrary ReLU Activation under Gaussian Marginals
with Aravindan Vijayaraghavan. COLT 2025.
arXiv | conference version | PDF | recorded virtual talk

Summary: We give the first algorithm for agnostic PAC learning of an arbitrarily biased ReLU neuron under Gaussian input distributions, up to constant approximation. We also prove a hardness separation between SQ (statistical query) and CSQ (correlational statistical query) models for this problem, showing a limitation of gradient-based algorithms in this setting.

To Store or Not to Store: A Graph-Theoretic Approach for Dataset Versioning
with Jingwei Li, Pattara Sukprasert, Samir Khuller, Amol Deshpande, and Koyel Mukherjee. IPDPS 2024.
arXiv | conference version

Summary: We study a graph-theoretic framework for dataset versioning that optimizes storage costs while controlling retrieval costs across versions. On the theory side, we prove hardness-of-approximation results and give provably near-optimal algorithms for tree-like graphs of bounded treewidth. These results also lead to practical heuristics with up to 1000x speedups for the "MinSum Retrieval" problem on real-world GitHub repositories.

Education

Northwestern University, Evanston, Illinois (2019-2023)
B.A. in Math and M.S. in Computer Science

Northwestern University, Evanston, Illinois (2023-present)
Ph.D. student in Computer Science

Awards

Ph.D. student research award (Northwestern CS department), 2024-2025.

Barris Award for outstanding TA, 2024 Fall quarter.

Junior Career Award in Mathematics (Northwestern math department), 2021-2022.

Academic Activities

Research Experience
IDEAL Summer Intern, University of Chicago, Summer 2026. Hosted by Prof. Chao Gao.
IDEAL Summer Intern, Toyota Technological Institute at Chicago (TTIC), Summer 2025. Hosted by Prof. Zhiyuan Li.

Service
Program Committee / Reviewer, Reliable ML from Unreliable Data, NeurIPS 2025 Workshop.
Volunteer, FOCS 2024 (65th IEEE Symposium on Foundations of Computer Science), Chicago, IL.

Teaching and Programs
Ross Mathematics Program: first-year student (2018), junior counselor (2019), counselor (2020).
Directed Reading Program: Spring 2021, read Linear Representations of Finite Groups with Wenyuan Li.

Miscellaneous

I have written two articles on 知乎 (Chinese version of Quora?) about theoretical computer science. Here’s the link.

I lived in Beijing for the first 18 years of my life. What I like the most about the city is its subway system and shared bicycles, which make travelling very easy and cheap. (I’m not a big fan of driving, due to the traumatic experience of looking for parking spots for half an hour.)