MReaL Lab

News

7 Papers Accepted by AAAI 2026 (One Oral)

Object Fusion via Diffusion Time-step for Customized Image Editing with Single Example

Pushing Rendering Boundaries: Hard Gaussian Splatting

DragNeXt: Rethinking Drag-Based Image Editing

Personalize Your Gaussian: Consistent 3D Scene Personalization from a Single Image

DEPO: Dual-Efficiency Preference Optimization for LLM Agents

NeuSpring: Neural Spring Fields for Reconstruction and Simulation of Deformable Objects from Videos

Hierarchical Semantic Alignment for Image Clustering

4 Papers Accepted by NeurIPS 2025 (Two Spotlights)

Enhancing CLIP Robustness via Cross-Modality Alignment

Selftok-Zero: Reinforcement Learning for Visual Generation via Discrete and Autoregressive Visual Tokens

Co-Reinforcement Learning for Unified Multimodal Understanding and Generation

Vinci: Deep Thinking in Text-to-Image Generation using Unified Model with Reinforcement Learning

5 Papers Accepted by ICCV 2025

Dynamic Multimodal Prototype Learning in Vision-Language Models

Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization

Distilling Parallel Gradients for Fast ODE Solvers of Diffusion Models

Nautilus: Locality-aware Autoencoder for Scalable Mesh Generation

Corvid: Improving Multimodal Large Language Models Towards Chain-of-Thought Reasoning

One Paper Accepted by TPAMI

Gamba: Marry Gaussian Splatting with Mamba for Single-View 3D Reconstruction

4 Papers Accepted by CVPR 2025 (One Spotlight)

On Path to Multimodal Generalist: Levels and Benchmarks [Spotlight]

3D Question Answering via only 2D Vision-Language Models

VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models

Diffusion Model with Causal Generation and Cache Sharing

6 Papers Accepted by CVPR 2025 (Two Orals, Three Highlights)

Project-Probe-Aggregate: Efficient Fine-Tuning for Group Robustness [Highlight]

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training

Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene [Highlight]

Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens [ Oral]

CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction [Highlight]

AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea [ Oral]

One Paper Accepted by ICLR 2025

Towards Semantic Equivalence of Tokenization in Multimodal LLM

One Paper Accepted by AAAI 2025

SGDiff: Scene Graph Guided Diffusion Model for Image Collaborative SegCaptioning

9 Papers Accepted by NeurIPS 2024 (Two Spotlights)

Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration[spotlight]

Enhancing Zero-Shot Vision Models by Label-Free Prompt Distribution Learning and Bias Correcting[spotlight]

Robust Fine-tuning of Zero-shot Models via Variance Reduction

Unified Generative and Discriminative Training for Multi-modal Large Language Models

Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

MVGamba: Unify 3D Content Generation as State Space Sequence Modeling

Decoupled Kullback-Leibler Divergence Loss

Lever LM: Configuring In-Context Sequence to Lever Large Vision Language Models

Action Imitation in Common Action Space for Customized Action Image Synthesis

4 Papers Accepted by ECCV 2024

Rethinking and Improving Visual Prompt Selection for In-Context Learning Segmentation Framework

Instruction Tuning-free Visual Token Complement for Multimodal LLMs

Few-shot NeRF by Adaptive Rendering Loss Regularization

View-Consistent 3D Editing with Gaussian Splatting

3 Papers Accepted by ICML 2024 (One Oral and One Spotlight)

Auto-Encoding Morph-Tokens for Multimodal LLM [spotlight]

Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition [oral]

Non-confusing Generation of Customized Concepts in Diffusion Models

One Paper Accepted by TPAMI

NICEST: Noisy Label Correction and Training for Robust Scene Graph Generation

9 Papers Accepted by CVPR 2024

Diffusion Time-step Curriculum for One Image to 3D Generation

Distributionally Generative Augmentation for Fair Facial Attribute Classification

Doubly Abductive Counterfactual Inference for Text-based Image Editing

Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior

Few‑shot Learner Parameterization by Diffusion Time‑steps

Discriminative Probing and Tuning for Text-to-Image Generation

Empowering Dynamics-aware Text-to-Video Diffusion with LLMs

Classes Are Not Equal: An Empirical Study on Image Recognition Fairness

DisCo: Disentangled Control for Realistic Human Dance Generation

2 Papers Accepted by ICLR 2024

Exploring Diffusion Time-steps for Unsupervised Representation Learning

Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions

2 Papers Accepted by AAAI 2024

Dual-Perspective Knowledge Enrichment for Semi-Supervised 3D Object Detection

MGNet: Learning Correspondences via Multiple Graphs

4 Papers Accepted by NeurIPS 2023

Generalized Logit Adjustment: Calibrating Fine-tuned Models by Removing Label Bias in Foundation Models

Tuning Multi-mode Token-level Prompt Alignment across Modalities

Make the U in UDA Matter: Invariant Consistency Learning for Unsupervised Domain Adaptation

Imagine That! Abstract-to-Intricate Text-to-Image Synthesis with Scene Graph Hallucination Diffusion.

7 Papers Accepted by ICCV 2023

Equivariant Similarity for Vision-Language Foundation Models

Prompt-aligned Gradient for Prompt Tuning

Random Boxes Are Open-world Object Detectors

Invariant Feature Regularization for Fair Face Recognition

Invariant Training 2D-3D Joint Hard Samples for Few-Shot Point Cloud Recognition

Mitigating and Evaluating Static Bias of Action Representations in the Background and the Foreground

Learning Trajectory-Word Alignments for Video-Language Tasks

One Paper Accepted by TPAMI

Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering

Two Papers Accepted by ACL 2023

Counterfactual Active Learning for Out-of-Distribution Generalization

Hypothetical Training for Robust Machine Reading Comprehension of Tabular Context

Three Papers Accepted by CVPR 2023

Unbiased Multiple Instance Learning for Weakly Supervised Video Anomaly Detection

Bootstrap Your Own Prior: Towards Distribution-Agnostic Novel Class Discovery

Semantic Scene Completion with Cleaner Self

One Paper Accepted by ICLR 2023

Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection

One Paper Accepted by AAAI 2023

Debiased Fine-Tuning for Vision-language Models by Prompt Regularization

One Paper Accepted by NeurIPS 2022

Respecting Transfer Gap in Knowledge Distillation

One Paper Accepted by IJCV

Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning

Four Papers Accepted by ECCV 2022 (One Oral)

Identifying Hard Noise in Long-Tailed Sample Distribution[oral]

Invariant Feature Learning for Generalized Long-Tailed Classification

Class Is Invariant to Context and Vice Versa: On Learning Invariance for Out-Of-Distribution Generalization

Equivariance and Invariance Inductive Bias for Learning from Insufficient Data

One Paper Accepted by ICML 2022

Certified Robustness Against Natural Language Attacks by Causal Intervention

One Paper Accepted by CVPR 2022

Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation

Two Papers Accepted by ACL 2022

Cross-KQA Pro: A Dataset with Explicit Compositional Programs for Complex Question Answering over Knowledge Base

Learning to Imagine: Integrating Counterfactual Thinking in Neural Discrete Reasoning

One Paper Accepted by ICLR 2022

On Non-Random Missing Labels in Semi-Supervised Learning

Two Papers (Two Orals) Accepted by AAAI 2022

Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification[oral]

Deconfounded Visual Grounding[oral]

One Paper Accepted by TPAMI 2022

Deconfounded image captioning: A causal retrospect

Three Papers (One Spotlight) Accepted by NeurIPS 2021

Self-Supervised Learning Disentangled Group Representation as Feature [spotlight]

How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial Robustness?

Introspective Distillation for Robust Question Answering

One Paper Accepted by EMNLP 2021

TransferNet: An Effective and Transparent Framework for Multi-hop Question Answering over Relation Graph

Four Papers (One Oral) Accepted by ICCV 2021

Transporting Causal Mechanisms for Unsupervised Domain Adaptation [oral]

Causal Attention for Unbiased Visual Recognition

Self-Regulation for Semantic Segmentation

Auto-Parsing Network for Image Captioning and Visual Question Answering

1st Causality in Vision Workshop, CVPR 2021

Our group will host the 1st Causality in Vision Workshop on CVPR 2021

Workshop Homepage: http://www.causalityinvision.com/

Five Papers Accepted by CVPR 2021

Distilling Causal Effect of Data in Class-Incremental Learning

Counterfactual VQA: A Cause-Effect Look at Language Bias

Counterfactual Zero-Shot and Open-Set Visual Recognition

Causal Attention for Vision-Language Tasks

The Blessings of Unlabeled Background in Untrimmed Videos

One Paper Accepted by TMM 2021

Align R-CNN: A Pairwise Head Network for Visual Relationship Detection

Two Papers Accepted by TPAMI and AAAI 2021

Auto-encoding and Distilling Scene Graphs for Image Captioning

Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding

Three Papers (One Oral) Accepted by NeurIPS 2020

Causal Intervention for Weakly-Supervised Semantic Segmentation [oral]

Interventional Few-Shot Learning

Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect

One Paper Accepted by ACM-MM 2020

Hierarchical Scene Graph Encoder-Decoder for Image Paragraph Captioning

Two Papers Accepted by ECCV and TMM

Feature Pyramid Transformer, ECCV

Self-Adaptive Neural Module Transformer for Visual Question Answering, TMM

Eight Papers (Two Oral) Accepted by CVPR 2020

Iterative Context-Aware Graph Inference for Visual Dialog [oral]

Unbiased Scene Graph Generation from Biased Training [oral]

Visual Commonsense R-CNN

Learning to Segment the Tail

Two Causal Principles for Improving Visual Dialog

More Grounded Image Captioning by Distilling Image-Text Matching Model

Counterfactual Samples Synthesizing for Robust Visual Question Answering

Learning Filter Pruning Criteria for Deep Convolutional Neural Networks Acceleration

PREMIA Best Student Paper, Silver Award

Our paper "Learning to Compose Dynamic Tree Structures for Visual Contexts" received the PREMIA Best Student Paper, Silver Award (2nd Place)

Four Papers (Two Oral) Accepted by ICCV 2019

Counterfactual Critic Multi-Agent Training for Scene Graph Generation [oral]

Learning to Assemble Neural Module Tree Networks for Visual Grounding [oral]

Making History Matter: History-Advantage Sequence Training for Visual Dialog

Learning to Collocate Neural Modules for Image Captioning

One Paper Accepted by TPAMI

Three Papers Accepted by ACM MM 2019

CVPR 2019 conference

Our Team MReaL-BDAI wins the first place in Visual Dialogue Challenge

Our paper "Learning to Compose Dynamic Tree Structures for Visual Contexts" appears on the Best Paper Finallists.

News

7 Papers Accepted by AAAI 2026 (One Oral)

4 Papers Accepted by NeurIPS 2025 (Two Spotlights)

5 Papers Accepted by ICCV 2025

One Paper Accepted by TPAMI

4 Papers Accepted by CVPR 2025 (One Spotlight)

6 Papers Accepted by CVPR 2025 (Two Orals, Three Highlights)

One Paper Accepted by ICLR 2025

One Paper Accepted by AAAI 2025

9 Papers Accepted by NeurIPS 2024 (Two Spotlights)

4 Papers Accepted by ECCV 2024

3 Papers Accepted by ICML 2024 (One Oral and One Spotlight)

One Paper Accepted by TPAMI

9 Papers Accepted by CVPR 2024

2 Papers Accepted by ICLR 2024

2 Papers Accepted by AAAI 2024

4 Papers Accepted by NeurIPS 2023

7 Papers Accepted by ICCV 2023

One Paper Accepted by TPAMI

Two Papers Accepted by ACL 2023

Three Papers Accepted by CVPR 2023

One Paper Accepted by ICLR 2023

One Paper Accepted by AAAI 2023

One Paper Accepted by NeurIPS 2022

One Paper Accepted by IJCV

Four Papers Accepted by ECCV 2022 (One Oral)

One Paper Accepted by ICML 2022

One Paper Accepted by CVPR 2022

Two Papers Accepted by ACL 2022

One Paper Accepted by ICLR 2022

Two Papers (Two Orals) Accepted by AAAI 2022

One Paper Accepted by TPAMI 2022

Three Papers (One Spotlight) Accepted by NeurIPS 2021

One Paper Accepted by EMNLP 2021

Four Papers (One Oral) Accepted by ICCV 2021

1st Causality in Vision Workshop, CVPR 2021

Five Papers Accepted by CVPR 2021

One Paper Accepted by TMM 2021

Two Papers Accepted by TPAMI and AAAI 2021

Three Papers (One Oral) Accepted by NeurIPS 2020

One Paper Accepted by ACM-MM 2020

Two Papers Accepted by ECCV and TMM

Eight Papers (Two Oral) Accepted by CVPR 2020

PREMIA Best Student Paper, Silver Award

Four Papers (Two Oral) Accepted by ICCV 2019

One Paper Accepted by TPAMI

Three Papers Accepted by ACM MM 2019

CVPR 2019 conference

Two Papers Accepted by TPAMI

Four Papers (Three Oral) Accepted by CVPR 2019

Two Papers Accepted by AAAI 2019

Contact Us