About
Hello. I am an incoming DPhil student at the University of Oxford. I work on AI safety and interpretability. My research focuses on how intelligence emerges and evolves in AI models through interaction with the world, and how to ensure that these systems remain aligned with and beneficial to humanity.
Featured Publications
Query Circuits: Explaining How Language Models Answer User Prompts
Tung-Yu Wu, Fazl Barez
International Conference on Machine Learning (ICML), 2026
Chain-of-Thought Is Not Explainability
Fazl Barez, Tung-Yu Wu, Iván Arcuschin, Michael Lan, Vincent Wang, Noah Siegel, Nicolas Collignon, Clement Neo, Isabelle Lee, Alasdair Paren, Adel Bibi, Robert Trager, Damiano Fornasiere, John Yan, Yanai Elazar, Yoshua Bengio
Preprint, 2025
U-shaped and Inverted-U Scaling behind Emergent Abilities of Large Language Models
Tung-Yu Wu, Pei-Yu Lo
International Conference on Learning Representations (ICLR), 2025 & Oral Presentation at NeurIPS’24 Workshop on Attributing Model Behavior at Scale (ATTRIB)
Data-Efficient 3D Visual Grounding via Order-Aware Referring
Tung-Yu Wu, Sheng-Yu Huang, Yu-Chiang Wang
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2025
AND: Audio Network Dissection for Interpreting Deep Acoustic Models
Tung-Yu Wu, Yu-Xiang Lin, Tsui-Wei Weng
International Conference on Machine Learning (ICML), 2024
The Efficacy of Self-Supervised Speech Models for Audio Representations
Tung-Yu Wu, Tsu-Yuan Hsu, Chen-An Li, Tzu-Han Lin, Hung-yi Lee
HEAR: Holistic Evaluation of Audio Representations, NeurIPS’22 Competition Track, 2022 — Lightning Talk
Locally Interpretable One-Class Anomaly Detection for Credit Card Fraud Detection
Tung-Yu Wu, You-Ting Wang
IEEE International Conference on Technologies and Applications of Artificial Intelligence (TAAI), 2021 — Best Paper Award
