Shuhao (Sullivan) Zhang

M.S. Student

University of California, San Diego

Research Interests

LLM Reasoning & Test-time Scaling
Agentic Systems & Self-evolving Agents
Reinforcement Learning for LLMs
Data-centric AI
Multimodal Learning

About

I am Shuhao (Sullivan) Zhang, an M.S. student in the Department of Computer Science and Engineering at the University of California, San Diego, advised by Prof. Pengtao Xie.

Previously, I received my B.S. in Data Science from the University of Science and Technology Beijing, where I graduated in the top 7% of my class.

My current research focuses on pre-hoc reasoning and metacognitive control for LLM systems — using lightweight estimators and self-monitoring signals to route, defend, and self-improve LLM agents at test time. I am actively seeking Research Assistant / Visiting Researcher opportunities in agentic AI, test-time scaling, and multimodal reasoning.

News

2026-06 Joined AIBuildAI Inc. as a Research Intern on AI agents.
2026-05 Our paper Send a SCOUT First is now on arXiv (2605.30837); submitted to EMNLP 2026.
2026-05 Our paper Metacognitive Harness is on arXiv (2605.14186); under review at NeurIPS 2026.
2026-04 Our paper SCOPE has been accepted to ICML 2026!
2025-09 Started my M.S. at UC San Diego.
2025-06 Graduated from USTB with a B.S. in Data Science (top 7% of class).

Selected Publications

LLMs Know When They Know, but Do Not Act on It: A Metacognitive Harness for Test-time Scaling

Qi Cao, Yufan Wang, Peijia Qin, Shuhao Zhang, Pengtao Xie

NeurIPS 2026 Under Review

A metacognitive harness that separates monitoring from reasoning: the model emits a pre-solve feeling-of-knowing and a post-solve judgment-of-learning signal, turned into an explicit test-time control interface for retry / aggregation. On a fixed Claude Sonnet-4.6 base, raised pooled accuracy from 48.3 to 56.9 without any parameter updates, exceeding the strongest leaderboard entries on HLE-Verified, LiveCodeBench v6, and R-Bench-V.

arXiv ↗

Send a SCOUT First: Pre-hoc Reasoning for Adaptive Detector Allocation in Prompt-Injection Defense

Shuhao Zhang^†, Jiarui Li^†, Qi Cao, Ruiyi Zhang, Pengtao Xie

EMNLP 2026 Under Review

Reframed prompt-injection defense as per-input detector allocation. SCOUT summarizes each detector's behavior as a fingerprint over a fixed anchor set, then trains a small predictor (Qwen3-4B; SFT + GRPO) to estimate, for every new request, which detectors are reliable and how long each will take. A single operator threshold trades safety vs. latency. Cuts attack-success rate by 46% and wall-clock by 40% vs. an always-on GPT-4o judge; transfers to BIPIA, IPI, and IHEval with no retraining.

arXiv ↗

Models Under SCOPE: Scalable and Controllable Routing via Pre-hoc Reasoning

Qi Cao^†, Shuhao Zhang^†, Ruizhe Zhou, Ruiyi Zhang, Peijia Qin, Pengtao Xie

ICML 2026 Accepted

SCOPE is a budget-aware LLM routing framework that predicts how accurate and how expensive each model will be before running it, enabling controllable cost–accuracy trade-offs and naturally handling new models. Up to 95.1% cost reduction or 25.7% accuracy gain under different priorities.

Project ↗

IDC-CDR: Cross-domain Recommendation based on Intent Disentanglement and Contrast Learning

Jing Xu, Mingxin Gan, Hang Zhang, Shuhao Zhang

Information Processing and Management, 2024

A cross-domain recommendation model that disentangles user intent for stronger transfer, with an emphasis on interpretability of latent intent factors.

Using Machine Learning Models for Short-Term Prediction of Dissolved Oxygen in a Microtidal Estuary

Mina Gachloo, Qianqian Liu, Yang Song, Guozhi Wang, Shuhao Zhang, Nathan Hall

Water 16(14):1998, 2024

Applied ML models for short-term prediction of dissolved oxygen, with relevance to healthcare / biological time-series modeling.