Yutong Bai

Yutong Bai

Yutong is currently a Postdoctoral Researcher at UC Berkeley (Berkeley Artificial Intelligence Research), advised by Prof. Alexei (Alyosha) Efros, Prof. Jitendra Malik, and Prof. Trevor Darrell.

Prior to that, she received her PhD in Computer Science from Johns Hopkins University, where she was advised by Prof. Alan Yuille. She has also interned at Meta AI (FAIR Labs) and Google Brain, and was named a 2023 Apple Scholar and an MIT EECS Rising Star.

Email / Google Scholar / Twitter / GitHub

News

[01/21/2026] Invited talk at Stanford.
[12/02/2025] Invited talk in NeurIPS tutorial.
[11/20/2025] Guest lecture at NYU.
[10/21/2025] Guest lecture at UC Berkeley & UCSF.
[10/20/2025] Invited talk at ICCV workshop.
[09/25/2025] Invited talk at Princeton.
[09/24/2025] Invited talk at Cornell Tech.
[09/22/2025] Invited seminar talk at Cornell.
[09/17/2025] Invited talk at Allen Institute for AI.
[09/15/2025] Invited talk at CMU VASC Seminar.
[09/08/2025] Guest lecture and invited talk at UMich.
[09/05/2025] Invited seminar talk at UIUC.
[08/22/2025] Invited seminar talk at Caltech.
[08/01/2025] Invited talk at UW.
[07/03/2025] Invited talk at 1X.
[06/12/2025] Invited talk at CVPR 2025 workshop.
[10/28/2024] Invited talk at MIT.
[09/30/2024] Guest lecture at UC Berkeley.
[08/16/2024] Awarded EECS Rising Star in MIT.
[09/29/2024] Invited talk at ECCV 2024 workshop.
[06/18/2024] Invited talk at CVPR 2024 workshop.
[03/08/2024] Invited talk at UPenn GRASP Seminar.
[03/07/2024] Guest lecture at UPenn.
[01/25/2024] Invited talk at Adobe.
[01/23/2024] Invited talk at UT Austin.
[01/16/2024] Invited talk at Distinguished AI Lecture Series at Imperial College London.
[12/06/2024] Invited talk at CMU.
[12/03/2024] Invited talk at UCSD.
[11/15/2023] Invited talk at UMD.
[10/25/2023] Awarded Machine Learning Rising Star in UMD.
[09/18/2023] Awarded EECS Rising Star in Georgia Tech.
[03/02/2023] Awarded Apple Fellowship.
[06/21/2022] Awarded CVPR Best Paper Finalist.

Research Vision

I aim to build intelligent systems from first principles: systems that do not merely fit patterns or follow instructions, but that gradually develop structure, abstraction, and behavior through learning itself.

I'm interested in how intelligence emerges, not from handcrafted pipelines or heuristics tailored to specific tasks, but from exposure to behaviorally rich, understructured environments, where models must learn what to attend to, how to reason, and how to improve. This requires designing learning systems that are not narrowly optimized for a goal, but that can organize themselves and grow increasingly competent through interaction, experience, and computation.

I see scale as a tool, but not as the whole solution. Larger models open up more capacity, but what fills that capacity and how it forms is just as important. My research explores how we can use scale to amplify the right signals: not just data quantity, but the structural richness of behavior, and the dynamics of learning itself.

To that end, I focus on:

Understanding what makes behavior intelligent, especially when it's easy for humans but hard for machines;
Designing systems that learn internal structure from raw behavioral input, without task scaffolds or dense supervision;
Creating conditions where models discover abstraction and reasoning, not because they are explicitly told to, but because learning leads them there.

I believe intelligence is not something we can fully define or supervise in advance. It must emerge over time, shaped by data, computation, and inductive processes inside the model. My work is an attempt to understand and enable that emergence.

Publications

	Whole-Body Conditioned Egocentric Video Prediction Yutong Bai, Danny Tran, Amir Bar*, Yann LeCun†, Trevor Darrell†, Jitendra Malik† NeurIPS, 2025 paper / project page
	Sequential Modeling Enables Scalable Learning for Large Vision Models Yutong Bai, Xinyang Geng, Karttikeya Mangalam, Amir Bar, Alan Yuille, Trevor Darrell, Jitendra Malik, Alexei A. Efros CVPR, 2024 paper / project page / code / model
	Transformers Discover Molecular Structure Without Graph Priors Tobias Kreiman, Yutong Bai, Fadi Atieh, Elizabeth Weaver, Eric Qu, Aditi S. Krishnapriyan arXiv, 2025 paper / project page / code & model
	The Serial Scaling Hypothesis Yuxi Liu, Konpat Preechakul, Kananart Kuwaranancharoen, Yutong Bai ICLR, 2026 paper
	TARDIS STRIDE: A Spatio-Temporal Road Image Dataset and World Model for Autonomy Héctor Carrión, Yutong Bai, Víctor A. Hernández Castro*, Kishan Panaganti, Ayush Zenith, Matthew Trang, Tony Zhang, Pietro Perona, Jitendra Malik arXiv, 2025 paper / project page / data / code / model
	Point-Level Region Contrast for Object Detection Pre-Training Yutong Bai, Xinlei Chen, Alexander Kirillov, Alan Yuille, Alexander C. Berg CVPR, 2022 (Nominated for CVPR Best Paper - Top 0.4%) paper / code / video / poster
	Evaluating Multiview Object Consistency in Humans and Image Models Tyler Bonnen, Stephanie Fu, Yutong Bai, Thomas O'Connell, Yoni Friedman, Nancy Kanwisher, Josh Tenenbaum, Alexei Efros NeurIPS, 2024 paper / project page / code / data
	Intriguing Properties of Text-guided Diffusion Models Qihao Liu, Adam Kortylewski, Yutong Bai, Song Bai, Alan Yuille ICLR, 2024 paper / project page
	Analyzing The Language of Visual Tokens David M Chan, Rodolfo Corona, Joonyong Park, Cheol Jun Cho, Yutong Bai, Trevor Darrell arXiv, 2024 paper
	KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models Eunice Yiu, Maan Qraitem, Anisa Noor Majhi, Charlie Wong, Yutong Bai, Shiry Ginosar, Alison Gopnik, Kate Saenko ICLR, 2025 paper / project page / code
	Vibe Spaces for Creatively Connecting and Expressing Visual Concepts Huzheng Yang, Katherine Xu, Andrew Lu, Michael D. Grossberg, Yutong Bai, Jianbo Shi CVPR, 2026 paper / project page / demo
	AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time Junyu Zhang, Runpei Dong, Han Wang, Xuying Ning, Haoran Geng, Peihao Li, Xialin He, Yutong Bai, Jitendra Malik, Saurabh Gupta, Huan Zhang EMNLP, 2025 paper / project page / code
	Masked Autoencoders Enable Efficient Knowledge Distillers Yutong Bai, Zeyu Wang, Junfei Xiao, Chen Wei, Huiyu Wang, Alan L Yuille, Yuyin Zhou, Cihang Xie CVPR, 2023 paper / code / model
	Are Transformers More Robust than CNNs? Yutong Bai, Jieru Mei, Alan Yuille, Cihang Xie NeurIPS, 2021 paper / code / model
	Can Temporal Information Help with Contrastive Self-Supervised Learning? Yutong Bai, Haoqi Fan, Ishan Misra, Ganesh Venkatesh, Yongyi Lu, Yuyin Zhou, Qihang Yu, Vikas Chandra, Alan Yuille arXiv, 2020 paper
	C2FNAS: Coarse-to-Fine Neural Architecture Search for 3D Medical Image Segmentation Qihang Yu, Dong Yang, Holger Roth, Yutong Bai, Yixiao Zhang, Alan Yuille, Daguang Xu CVPR, 2020 paper
	Semantic Part Detection via Matching: Learning to Generalize to Novel Viewpoints from Limited Training Data Yutong Bai, Qing Liu, Lingxi Xie, Weichao Qiu, Yan Zheng, Alan Yuille ICCV, 2019 paper / code
	Clevr-ref+: Diagnosing Visual Reasoning with Referring Expressions Runtao Liu, Chenxi Liu, Yutong Bai, Alan L Yuille CVPR, 2019 paper / project page
	CoKe: Contrastive Learning for Robust Keypoint Detection Yutong Bai, Angtian Wang, Adam Kortylewski, Alan Yuille WACV, 2023 paper
	Delving Into Masked Autoencoders for Multi-Label Thorax Disease Classification Junfei Xiao, Yutong Bai, Alan Yuille, Zongwei Zhou WACV, 2023 paper / code
	REOrdering Patches Improves Vision Models Declan Kutscher, David M Chan, Yutong Bai, Trevor Darrell, Ritwik Gupta arXiv, 2025 paper / project page / code
	AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information? Kaixiong Gong, Kaituo Feng, Bohao Li, Yibing Wang, Mofan Cheng, Shijia Yang, Jiaming Han, Benyou Wang, Yutong Bai, Zhuoran Yang, Xiangyu Yue arXiv, 2024 paper / project page / code / data
	Finding Visual Task Vectors Alberto Hojel, Yutong Bai, Trevor Darrell, Amir Globerson, Amir Bar ECCV, 2024 paper / code / model
	Mask Guided Matting via Progressive Refinement Network Qihang Yu, Jianming Zhang, He Zhang, Yilin Wang, Zhe Lin, Ning Xu, Yutong Bai, Alan Yuille CVPR, 2021 paper / code
	Glance-and-Gaze Vision Transformer Qihang Yu, Yingda Xia, Yutong Bai, Yongyi Lu, Alan L. Yuille, Wei Shen NeurIPS, 2021 paper
	Can CNNs Be More Robust Than Transformers? Zeyu Wang, Yutong Bai, Yuyin Zhou, Cihang Xie ICLR, 2023 paper / code
	LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning Dantong Niu, Yuvan Sharma, Giscard Biamby, Jerome Quenum, Yutong Bai, Baifeng Shi, Trevor Darrell, Roei Herzig CoRL, 2024 paper / project page / code / data
	Making Your First Choice: To Address Cold Start Problem in Medical Active Learning Liangyu Chen, Yutong Bai, Siyu Huang, Yongyi Lu, Bihan Wen, Alan Yuille, Zongwei Zhou PMLR, 2022 paper / code
	Focalizing regions of biomarker relevance facilitates biomarker prediction on histopathological images Jiefeng Gan, Hanchen Wang, Hui Yu, Zitong He, Wenjuan Zhang, Ke Ma, Lianghui Zhu, Yutong Bai, Zongwei Zhou, Alan Yullie, Xiang Bai, Mingwei Wang, Dehua Yang, Yanyan Chen, Guoan Chen, Joan Lasenby, Chao Cheng, Jia Wu, Jianjun Zhang, Xinggang Wang, Yaobing Chen, Guoping Wang, Tian Xia iScience, 2023 paper
	Vector Quantized Feature Fields for Fast 3D Semantic Lifting George Tang, Aditya Agarwal, Weiqiao Han, Trevor Darrell, Yutong Bai arXiv, 2025 paper
	Fast AdvProp Jieru Mei, Yucheng Han, Yutong Bai, Yixiao Zhang, Yingwei Li, Xianhang Li, Alan Yuille, Cihang Xie ICLR, 2022 paper / code / model