About

Hello. I am an incoming DPhil student at the University of Oxford. I work on AI safety and interpretability. My research focuses on how intelligence emerges and evolves in AI models through interaction with the world, and how to ensure that these systems remain aligned with and beneficial to humanity.

My CV

For anything to discuss, feel free to book a meeting with me.

Featured Publications

Query Circuits: Explaining How Language Models Answer User Prompts
Tung-Yu Wu, Fazl Barez
International Conference on Machine Learning (ICML), 2026

Chain-of-Thought Is Not Explainability
Fazl Barez, Tung-Yu Wu, Iván Arcuschin, Michael Lan, Vincent Wang, Noah Siegel, Nicolas Collignon, Clement Neo, Isabelle Lee, Alasdair Paren, Adel Bibi, Robert Trager, Damiano Fornasiere, John Yan, Yanai Elazar, Yoshua Bengio
Preprint, 2025

U-shaped and Inverted-U Scaling behind Emergent Abilities of Large Language Models
Tung-Yu Wu, Pei-Yu Lo
International Conference on Learning Representations (ICLR), 2025 & Oral Presentation at NeurIPS’24 Workshop on Attributing Model Behavior at Scale (ATTRIB)

Data-Efficient 3D Visual Grounding via Order-Aware Referring
Tung-Yu Wu, Sheng-Yu Huang, Yu-Chiang Wang
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2025

AND: Audio Network Dissection for Interpreting Deep Acoustic Models
Tung-Yu Wu, Yu-Xiang Lin, Tsui-Wei Weng
International Conference on Machine Learning (ICML), 2024

The Efficacy of Self-Supervised Speech Models for Audio Representations
Tung-Yu Wu, Tsu-Yuan Hsu, Chen-An Li, Tzu-Han Lin, Hung-yi Lee
HEAR: Holistic Evaluation of Audio Representations, NeurIPS’22 Competition Track, 2022 — Lightning Talk

Locally Interpretable One-Class Anomaly Detection for Credit Card Fraud Detection
Tung-Yu Wu, You-Ting Wang
IEEE International Conference on Technologies and Applications of Artificial Intelligence (TAAI), 2021 — Best Paper Award

Tung-Yu (Tony) Wu

Featured Publications