profile photo

Yutong Bai

Yutong is currently a Postdoc Researcher at UC Berkeley (Berkeley Artificial Intelligence Research), advised by Prof. Alexei (Alyosha) Efros, Prof. Jitendra Malik and Prof. Trevor Darrell.

Prior to that, she obtained CS PhD degree at Johns Hopkins University advised by Prof. Alan Yuille. She used to intern at Meta AI (FAIR Labs) and Google Brain, and is selected as 2023 Apple Scholar and MIT EECS Rising Star.

Email  /  Google Scholar  /  Twitter  /  Github  

News
( ▼ Expand )
Research Vision

I aim to build intelligent systems from first principles—systems that do not merely fit patterns or follow instructions, but that gradually develop structure, abstraction, and behavior through learning itself.

I’m interested in how intelligence emerges—not from handcrafted pipelines or task-specific heuristics, but from exposure to behaviorally rich, understructured environments, where models must learn what to attend to, how to reason, and how to improve. This requires designing learning systems that are not narrowly optimized for a goal, but that can self-organize and grow increasingly competent through interaction, experience, and computation.

I see scale as a tool, but not as the whole solution. Larger models open up more capacity, but what fills that capacity—and how it forms—is just as important. My research explores how we can use scale to amplify the right signals: not just data quantity, but the structural richness of behavior, and the dynamics of learning itself.

To that end, I focus on:

  • Understanding what makes behavior intelligent—especially when it’s easy for humans but hard for machines;
  • Designing systems that learn internal structure from raw behavioral input, without task scaffolds or dense supervision;
  • Creating conditions where models discover abstraction and reasoning, not because they are explicitly told to—but because learning leads them there.
  • I believe intelligence is not something we can fully define or supervise in advance—it must emerge over time, shaped by data, computation, and inductive processes inside the model. My work is an attempt to understand and enable that emergence.

    Publications
    ( show selected / show all by date / show all by topic )

    Whole-Body Conditioned Egocentric Video Prediction
    Yutong Bai*, Danny Tran*, Amir Bar*, Yann LeCun, Trevor Darrell, Jitendra Malik
    Arxiv, 2025

    Sequential Modeling Enables Scalable Learning for Large Vision Models
    Yutong Bai*, Xinyang Geng*, Karttikeya Mangalam, Amir Bar, Alan Yuille, Trevor Darrell, Jitendra Malik, Alexei A. Efros
    CVPR, 2024

    paper  /  project page  /  code  /  model

    TARDIS STRIDE: A Spatio-Temporal Road Image Dataset and World Model for Autonomy
    Héctor Carrión*, Yutong Bai*, Víctor A. Hernández Castro*, Kishan Panaganti, Ayush Zenith, Matthew Trang, Tony Zhang, Pietro Perona, Jitendra Malik
    Arxiv, 2025

    paper  /  project page  /  data  /  code  /  model

    Point-Level Region Contrast for Object Detection Pre-Training
    Yutong Bai, Xinlei Chen, Alexander Kirillov, Alan Yuille, Alexander C. Berg
    CVPR, 2022   (Nominated for CVPR Best Paper - Top 0.4%)

    paper  /  code  /  video  /  poster

    Evaluating Multiview Object Consistency in Humans and Image Models
    Tyler Bonnen, Stephanie Fu, Yutong Bai, Thomas O'Connell, Yoni Friedman, Nancy Kanwisher, Josh Tenenbaum, Alexei Efros
    NeurIPS, 2024

    paper  /  project page  /  code  /  data

    Intriguing Properties of Text-guided Diffusion Models
    Qihao Liu, Adam Kortylewski, Yutong Bai, Song Bai, Alan Yuille
    ICLR, 2024

    Analyzing The Language of Visual Tokens
    David M Chan, Rodolfo Corona, Joonyong Park, Cheol Jun Cho, Yutong Bai, Trevor Darrell
    Arxiv, 2024

    KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models
    Eunice Yiu, Maan Qraitem, Anisa Noor Majhi, Charlie Wong, Yutong Bai, Shiry Ginosar, Alison Gopnik, Kate Saenko
    ICLR, 2025

    paper  /  project page  /  code

    "I Know It When I See It": Mood Spaces for Connecting and Expressing Visual Concepts
    Huzheng Yang, Katherine Xu, Michael D. Grossberg, Yutong Bai, Jianbo Shi
    Arxiv, 2025

    paper  /  project page  /  demo

    Masked Autoencoders Enable Efficient Knowledge Distillers
    Yutong Bai, Zeyu Wang, Junfei Xiao, Chen Wei, Huiyu Wang, Alan L Yuille, Yuyin Zhou, Cihang Xie
    CVPR, 2023

    paper  /  code  /  model

    Can Temporal Information Help with Contrastive Self-Supervised Learning?
    Yutong Bai, Haoqi Fan, Ishan Misra, Ganesh Venkatesh, Yongyi Lu, Yuyin Zhou, Qihang Yu, Vikas Chandra, Alan Yuille
    Arxiv, 2020

    C2FNAS: Coarse-to-Fine Neural Architecture Search for 3D Medical Image Segmentation
    Qihang Yu, Dong Yang, Holger Roth, Yutong Bai, Yixiao Zhang, Alan Yuille, Daguang Xu
    CVPR, 2020

    Semantic Part Detection via Matching: Learning to Generalize to Novel Viewpoints from Limited Training Data
    Yutong Bai, Qing Liu, Lingxi Xie, Weichao Qiu, Yan Zheng, Alan Yuille
    ICCV, 2019

    paper  /  code

    Clevr-ref+: Diagnosing Visual Reasoning with Referring Expressions
    Runtao Liu, Chenxi Liu, Yutong Bai, Alan L Yuille
    CVPR, 2019

    CoKe: Contrastive Learning for Robust Keypoint Detection
    Yutong Bai, Angtian Wang, Adam Kortylewski, Alan Yuille
    WACV, 2023

    Delving Into Masked Autoencoders for Multi-Label Thorax Disease Classification
    Junfei Xiao, Yutong Bai, Alan Yuille, Zongwei Zhou
    WACV, 2023

    paper  /  code

    AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
    Junyu Zhang, Runpei Dong, Han Wang, Xuying Ning, Haoran Geng, Peihao Li, Xialin He, Yutong Bai, Jitendra Malik, Saurabh Gupta, Huan Zhang
    Arxiv, 2025

    paper  /  project page  /  code

    REOrdering Patches Improves Vision Models
    Declan Kutscher, David M Chan, Yutong Bai, Trevor Darrell, Ritwik Gupta
    Arxiv, 2025

    paper  /  project page  /  code

    AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?
    Kaixiong Gong, Kaituo Feng, Bohao Li, Yibing Wang, Mofan Cheng, Shijia Yang, Jiaming Han, Benyou Wang, Yutong Bai, Zhuoran Yang, Xiangyu Yue
    Arxiv, 2024

    paper  /  project page  /  code  /  data

    Finding Visual Task Vectors
    Alberto Hojel, Yutong Bai, Trevor Darrell, Amir Globerson, Amir Bar
    ECCV, 2024

    paper  /  code  /  model

    Mask Guided Matting via Progressive Refinement Network
    Qihang Yu, Jianming Zhang, He Zhang, Yilin Wang, Zhe Lin, Ning Xu, Yutong Bai, Alan Yuille
    CVPR, 2021

    paper  /  code

    Glance-and-Gaze Vision Transformer
    Qihang Yu, Yingda Xia, Yutong Bai, Yongyi Lu, Alan L. Yuille, Wei Shen
    NeurIPS, 2021

    Can CNNs Be More Robust Than Transformers?
    Zeyu Wang, Yutong Bai, Yuyin Zhou, Cihang Xie
    ICLR, 2023

    paper  /  code

    LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning
    Dantong Niu, Yuvan Sharma, Giscard Biamby, Jerome Quenum, Yutong Bai, Baifeng Shi, Trevor Darrell, Roei Herzig
    CoRL, 2024

    paper  /  project page  /  code  /  data

    Making Your First Choice: To Address Cold Start Problem in Medical Active Learning
    Liangyu Chen, Yutong Bai, Siyu Huang, Yongyi Lu, Bihan Wen, Alan Yuille, Zongwei Zhou
    PMLR, 2022

    paper  /  code

    Focalizing regions of biomarker relevance facilitates biomarker prediction on histopathological images
    Jiefeng Gan, Hanchen Wang, Hui Yu, Zitong He, Wenjuan Zhang, Ke Ma, Lianghui Zhu, Yutong Bai, Zongwei Zhou, Alan Yullie, Xiang Bai, Mingwei Wang, Dehua Yang, Yanyan Chen, Guoan Chen, Joan Lasenby, Chao Cheng, Jia Wu, Jianjun Zhang, Xinggang Wang, Yaobing Chen, Guoping Wang, Tian Xia
    iScience, 2023

    Vector Quantized Feature Fields for Fast 3D Semantic Lifting
    George Tang, Aditya Agarwal, Weiqiao Han, Trevor Darrell, Yutong Bai
    Arxiv, 2025

    Fast AdvProp
    Jieru Mei, Yucheng Han, Yutong Bai, Yixiao Zhang, Yingwei Li, Xianhang Li, Alan Yuille, Cihang Xie
    ICLR, 2022

    paper  /  code  /  model