Research
Her research aims to build up the AI system with less supervision and strong robustness. Explorations include representation learning, self-supervised learning, and generative modeling.
Your browser does not support the video tag.
Whole-Body Conditioned Egocentric Video Prediction
Yutong Bai *, Danny Tran *, Amir Bar *, Yann LeCun † , Trevor Darrell † , Jitendra Malik †
Arxiv, 2025
Sequential Modeling Enables Scalable Learning for Large Vision Models
Yutong Bai *, Xinyang Geng *, Karttikeya Mangalam , Amir Bar , Alan Yuille , Trevor Darrell , Jitendra Malik , Alexei A. Efros
CVPR, 2024
TARDIS STRIDE: A Spatio-Temporal Road Image Dataset and World Model for Autonomy
Héctor Carrión *, Yutong Bai *, Víctor A. Hernández Castro *, Kishan Panaganti , Ayush Zenith , Matthew Trang , Tony Zhang , Pietro Perona , Jitendra Malik
Arxiv, 2025
Point-Level Region Contrast for Object Detection Pre-Training
Yutong Bai , Xinlei Chen , Alexander Kirillov , Alan Yuille , Alexander C. Berg
CVPR, 2022   (Nominated for CVPR Best Paper - Top 0.4%)
Evaluating Multiview Object Consistency in Humans and Image Models
Tyler Bonnen , Stephanie Fu , Yutong Bai , Thomas O'Connell , Yoni Friedman , Nancy Kanwisher , Josh Tenenbaum , Alexei Efros
NeurIPS, 2024
Intriguing Properties of Text-guided Diffusion Models
Qihao Liu , Adam Kortylewski , Yutong Bai , Song Bai , Alan Yuille
ICLR, 2024
Analyzing The Language of Visual Tokens
David M Chan , Rodolfo Corona , Joonyong Park , Cheol Jun Cho , Yutong Bai , Trevor Darrell
Arxiv, 2024
KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models
Eunice Yiu , Maan Qraitem , Anisa Noor Majhi , Charlie Wong , Yutong Bai , Shiry Ginosar , Alison Gopnik , Kate Saenko
ICLR, 2025
"I Know It When I See It": Mood Spaces for Connecting and Expressing Visual Concepts
Huzheng Yang , Katherine Xu , Michael D. Grossberg , Yutong Bai , Jianbo Shi
Arxiv, 2025
Masked Autoencoders Enable Efficient Knowledge Distillers
Yutong Bai , Zeyu Wang , Junfei Xiao , Chen Wei , Huiyu Wang , Alan L Yuille , Yuyin Zhou , Cihang Xie
CVPR, 2023
Can Temporal Information Help with Contrastive Self-Supervised Learning?
Yutong Bai ,
Haoqi Fan ,
Ishan Misra ,
Ganesh Venkatesh ,
Yongyi Lu ,
Yuyin Zhou ,
Qihang Yu ,
Vikas Chandra ,
Alan Yuille
Arxiv, 2020
C2FNAS: Coarse-to-Fine Neural Architecture Search for 3D Medical Image Segmentation
Qihang Yu ,
Dong Yang ,
Holger Roth ,
Yutong Bai ,
Yixiao Zhang ,
Alan Yuille ,
Daguang Xu
CVPR, 2020
Semantic Part Detection via Matching: Learning to Generalize to Novel Viewpoints from Limited Training Data
Yutong Bai , Qing Liu , Lingxi Xie , Weichao Qiu , Yan Zheng , Alan Yuille
ICCV, 2019
Clevr-ref+: Diagnosing Visual Reasoning with Referring Expressions
Runtao Liu , Chenxi Liu , Yutong Bai , Alan L Yuille
CVPR, 2019
CoKe: Contrastive Learning for Robust Keypoint Detection
Yutong Bai , Angtian Wang , Adam Kortylewski , Alan Yuille
WACV, 2023
Delving Into Masked Autoencoders for Multi-Label Thorax Disease Classification
Junfei Xiao , Yutong Bai , Alan Yuille , Zongwei Zhou
WACV, 2023
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Junyu Zhang , Runpei Dong , Han Wang , Xuying Ning , Haoran Geng , Peihao Li , Xialin He , Yutong Bai , Jitendra Malik , Saurabh Gupta
Arxiv, 2025
REOrdering Patches Improves Vision Models
Declan Kutscher , David M Chan , Yutong Bai , Trevor Darrell , Ritwik Gupta
Arxiv, 2025
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?
Kaixiong Gong , Kaituo Feng , Bohao Li , Yibing Wang , Mofan Cheng , Shijia Yang , Jiaming Han , Benyou Wang , Yutong Bai , Zhuoran Yang , Xiangyu Yue
Arxiv, 2024
Finding Visual Task Vectors
Alberto Hojel , Yutong Bai , Trevor Darrell , Amir Globerson , Amir Bar
ECCV, 2024
Mask Guided Matting via Progressive Refinement Network
Qihang Yu , Jianming Zhang , He Zhang , Yilin Wang , Zhe Lin , Ning Xu , Yutong Bai , Alan Yuille
CVPR, 2021
Glance-and-Gaze Vision Transformer
Qihang Yu , Yingda Xia , Yutong Bai , Yongyi Lu , Alan L. Yuille , Wei Shen
NeurIPS, 2021
Can CNNs Be More Robust Than Transformers?
Zeyu Wang , Yutong Bai , Yuyin Zhou , Cihang Xie
ICLR, 2023
LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning
Dantong Niu , Yuvan Sharma , Giscard Biamby , Jerome Quenum , Yutong Bai , Baifeng Shi , Trevor Darrell , Roei Herzig
CoRL, 2024
Making Your First Choice: To Address Cold Start Problem in Medical Active Learning
Liangyu Chen , Yutong Bai , Siyu Huang , Yongyi Lu , Bihan Wen , Alan Yuille , Zongwei Zhou
PMLR, 2022
Focalizing regions of biomarker relevance facilitates biomarker prediction on histopathological images
Jiefeng Gan , Hanchen Wang , Hui Yu , Zitong He , Wenjuan Zhang , Ke Ma , Lianghui Zhu , Yutong Bai , Zongwei Zhou , Alan Yullie , Xiang Bai , Mingwei Wang , Dehua Yang , Yanyan Chen , Guoan Chen , Joan Lasenby , Chao Cheng , Jia Wu , Jianjun Zhang , Xinggang Wang , Yaobing Chen , Guoping Wang , Tian Xia
iScience, 2023
Vector Quantized Feature Fields for Fast 3D Semantic Lifting
George Tang , Aditya Agarwal , Weiqiao Han , Trevor Darrell , Yutong Bai
Arxiv, 2025
Fast AdvProp
Jieru Mei , Yucheng Han , Yutong Bai , Yixiao Zhang , Yingwei Li , Xianhang Li , Alan Yuille , Cihang Xie
ICLR, 2022