profile photo

Yutong Bai

Yutong is currently a Postdoc Researcher at UC Berkeley (BAIR), advised by Prof. Alexei (Alyosha) Efros, Prof. Jitendra Malik and Prof. Trevor Darrell.

Prior to that, she obtained CS PhD degree at Johns Hopkins University advised by Prof. Alan Yuille. She used to intern at Meta AI (FAIR Labs) and Google Brain, and is selected as 2023 Apple Scholar and MIT EECS Rising Star.

Email  /  Google Scholar  /  Twitter  /  Github  

News
( ▼ Expand )
Research

Her research aims to build up the AI system with less supervision and strong robustness. Explorations include representation learning, self-supervised learning, and generative modeling.

Publications
( show selected / show all by date / show all by topic )

Whole-Body Conditioned Egocentric Video Prediction
Yutong Bai*, Danny Tran*, Amir Bar*, Yann LeCun, Trevor Darrell, Jitendra Malik
Arxiv, 2025

Sequential Modeling Enables Scalable Learning for Large Vision Models
Yutong Bai*, Xinyang Geng*, Karttikeya Mangalam, Amir Bar, Alan Yuille, Trevor Darrell, Jitendra Malik, Alexei A. Efros
CVPR, 2024

paper  /  project page  /  code  /  model

TARDIS STRIDE: A Spatio-Temporal Road Image Dataset and World Model for Autonomy
Héctor Carrión*, Yutong Bai*, Víctor A. Hernández Castro*, Kishan Panaganti, Ayush Zenith, Matthew Trang, Tony Zhang, Pietro Perona, Jitendra Malik
Arxiv, 2025

paper  /  project page  /  data  /  code  /  model

Point-Level Region Contrast for Object Detection Pre-Training
Yutong Bai, Xinlei Chen, Alexander Kirillov, Alan Yuille, Alexander C. Berg
CVPR, 2022   (Nominated for CVPR Best Paper - Top 0.4%)

paper  /  code  /  video  /  poster

Evaluating Multiview Object Consistency in Humans and Image Models
Tyler Bonnen, Stephanie Fu, Yutong Bai, Thomas O'Connell, Yoni Friedman, Nancy Kanwisher, Josh Tenenbaum, Alexei Efros
NeurIPS, 2024

paper  /  project page  /  code  /  data

Intriguing Properties of Text-guided Diffusion Models
Qihao Liu, Adam Kortylewski, Yutong Bai, Song Bai, Alan Yuille
ICLR, 2024

Analyzing The Language of Visual Tokens
David M Chan, Rodolfo Corona, Joonyong Park, Cheol Jun Cho, Yutong Bai, Trevor Darrell
Arxiv, 2024

KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models
Eunice Yiu, Maan Qraitem, Anisa Noor Majhi, Charlie Wong, Yutong Bai, Shiry Ginosar, Alison Gopnik, Kate Saenko
ICLR, 2025

paper  /  project page  /  code

"I Know It When I See It": Mood Spaces for Connecting and Expressing Visual Concepts
Huzheng Yang, Katherine Xu, Michael D. Grossberg, Yutong Bai, Jianbo Shi
Arxiv, 2025

paper  /  project page  /  demo

Masked Autoencoders Enable Efficient Knowledge Distillers
Yutong Bai, Zeyu Wang, Junfei Xiao, Chen Wei, Huiyu Wang, Alan L Yuille, Yuyin Zhou, Cihang Xie
CVPR, 2023

paper  /  code  /  model

Can Temporal Information Help with Contrastive Self-Supervised Learning?
Yutong Bai, Haoqi Fan, Ishan Misra, Ganesh Venkatesh, Yongyi Lu, Yuyin Zhou, Qihang Yu, Vikas Chandra, Alan Yuille
Arxiv, 2020

C2FNAS: Coarse-to-Fine Neural Architecture Search for 3D Medical Image Segmentation
Qihang Yu, Dong Yang, Holger Roth, Yutong Bai, Yixiao Zhang, Alan Yuille, Daguang Xu
CVPR, 2020

Semantic Part Detection via Matching: Learning to Generalize to Novel Viewpoints from Limited Training Data
Yutong Bai, Qing Liu, Lingxi Xie, Weichao Qiu, Yan Zheng, Alan Yuille
ICCV, 2019

paper  /  code

Clevr-ref+: Diagnosing Visual Reasoning with Referring Expressions
Runtao Liu, Chenxi Liu, Yutong Bai, Alan L Yuille
CVPR, 2019

CoKe: Contrastive Learning for Robust Keypoint Detection
Yutong Bai, Angtian Wang, Adam Kortylewski, Alan Yuille
WACV, 2023

Delving Into Masked Autoencoders for Multi-Label Thorax Disease Classification
Junfei Xiao, Yutong Bai, Alan Yuille, Zongwei Zhou
WACV, 2023

paper  /  code

AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Junyu Zhang, Runpei Dong, Han Wang, Xuying Ning, Haoran Geng, Peihao Li, Xialin He, Yutong Bai, Jitendra Malik, Saurabh Gupta
Arxiv, 2025

paper  /  project page  /  code

REOrdering Patches Improves Vision Models
Declan Kutscher, David M Chan, Yutong Bai, Trevor Darrell, Ritwik Gupta
Arxiv, 2025

paper  /  project page  /  code

AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?
Kaixiong Gong, Kaituo Feng, Bohao Li, Yibing Wang, Mofan Cheng, Shijia Yang, Jiaming Han, Benyou Wang, Yutong Bai, Zhuoran Yang, Xiangyu Yue
Arxiv, 2024

paper  /  project page  /  code  /  data

Finding Visual Task Vectors
Alberto Hojel, Yutong Bai, Trevor Darrell, Amir Globerson, Amir Bar
ECCV, 2024

paper  /  code  /  model

Mask Guided Matting via Progressive Refinement Network
Qihang Yu, Jianming Zhang, He Zhang, Yilin Wang, Zhe Lin, Ning Xu, Yutong Bai, Alan Yuille
CVPR, 2021

paper  /  code

Glance-and-Gaze Vision Transformer
Qihang Yu, Yingda Xia, Yutong Bai, Yongyi Lu, Alan L. Yuille, Wei Shen
NeurIPS, 2021

Can CNNs Be More Robust Than Transformers?
Zeyu Wang, Yutong Bai, Yuyin Zhou, Cihang Xie
ICLR, 2023

paper  /  code

LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning
Dantong Niu, Yuvan Sharma, Giscard Biamby, Jerome Quenum, Yutong Bai, Baifeng Shi, Trevor Darrell, Roei Herzig
CoRL, 2024

paper  /  project page  /  code  /  data

Making Your First Choice: To Address Cold Start Problem in Medical Active Learning
Liangyu Chen, Yutong Bai, Siyu Huang, Yongyi Lu, Bihan Wen, Alan Yuille, Zongwei Zhou
PMLR, 2022

paper  /  code

Focalizing regions of biomarker relevance facilitates biomarker prediction on histopathological images
Jiefeng Gan, Hanchen Wang, Hui Yu, Zitong He, Wenjuan Zhang, Ke Ma, Lianghui Zhu, Yutong Bai, Zongwei Zhou, Alan Yullie, Xiang Bai, Mingwei Wang, Dehua Yang, Yanyan Chen, Guoan Chen, Joan Lasenby, Chao Cheng, Jia Wu, Jianjun Zhang, Xinggang Wang, Yaobing Chen, Guoping Wang, Tian Xia
iScience, 2023

Vector Quantized Feature Fields for Fast 3D Semantic Lifting
George Tang, Aditya Agarwal, Weiqiao Han, Trevor Darrell, Yutong Bai
Arxiv, 2025

Fast AdvProp
Jieru Mei, Yucheng Han, Yutong Bai, Yixiao Zhang, Yingwei Li, Xianhang Li, Alan Yuille, Cihang Xie
ICLR, 2022

paper  /  code  /  model