9 Papers Accepted by NeurIPS 2024 (2 Spotlights)
Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration[spotlight]
Enhancing Zero-Shot Vision Models by Label-Free Prompt Distribution Learning and Bias Correcting[spotlight]
Robust Fine-tuning of Zero-shot Models via Variance Reduction
Unified Generative and Discriminative Training for Multi-modal Large Language Models
Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
MVGamba: Unify 3D Content Generation as State Space Sequence Modeling
Decoupled Kullback-Leibler Divergence Loss
Lever LM: Configuring In-Context Sequence to Lever Large Vision Language Models
Action Imitation in Common Action Space for Customized Action Image Synthesis
4 Papers Accepted by ECCV 2024
Rethinking and Improving Visual Prompt Selection for In-Context Learning Segmentation Framework
Instruction Tuning-free Visual Token Complement for Multimodal LLMs
Few-shot NeRF by Adaptive Rendering Loss Regularization
View-Consistent 3D Editing with Gaussian Splatting
3 Papers Accepted by ICML 2024 (One Oral and One Spotlight)
Auto-Encoding Morph-Tokens for Multimodal LLM [spotlight]
Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition [oral]
Non-confusing Generation of Customized Concepts in Diffusion Models
1 Paper1 Accepted by TPAMI
NICEST: Noisy Label Correction and Training for Robust Scene Graph Generation
9 Papers Accepted by CVPR 2024
Diffusion Time-step Curriculum for One Image to 3D Generation
Distributionally Generative Augmentation for Fair Facial Attribute Classification
Doubly Abductive Counterfactual Inference for Text-based Image Editing
Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior
Few‑shot Learner Parameterization by Diffusion Time‑steps
Discriminative Probing and Tuning for Text-to-Image Generation
Empowering Dynamics-aware Text-to-Video Diffusion with LLMs
Classes Are Not Equal: An Empirical Study on Image Recognition Fairness
DisCo: Disentangled Control for Realistic Human Dance Generation
2 Papers Accepted by ICLR 2024
Exploring Diffusion Time-steps for Unsupervised Representation Learning
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions
2 Papers Accepted by AAAI 2024
Dual-Perspective Knowledge Enrichment for Semi-Supervised 3D Object Detection
MGNet: Learning Correspondences via Multiple Graphs
4 Papers Accepted by NeurIPS 2023
Generalized Logit Adjustment: Calibrating Fine-tuned Models by Removing Label Bias in Foundation Models
Tuning Multi-mode Token-level Prompt Alignment across Modalities
Make the U in UDA Matter: Invariant Consistency Learning for Unsupervised Domain Adaptation
Imagine That! Abstract-to-Intricate Text-to-Image Synthesis with Scene Graph Hallucination Diffusion.
7 Papers Accepted by ICCV 2023
Equivariant Similarity for Vision-Language Foundation Models
Prompt-aligned Gradient for Prompt Tuning
Random Boxes Are Open-world Object Detectors
Invariant Feature Regularization for Fair Face Recognition
Invariant Training 2D-3D Joint Hard Samples for Few-Shot Point Cloud Recognition
Mitigating and Evaluating Static Bias of Action Representations in the Background and the Foreground
Learning Trajectory-Word Alignments for Video-Language Tasks
One Paper Accepted by TPAMI
Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering
Two Papers Accepted by ACL 2023
Counterfactual Active Learning for Out-of-Distribution Generalization
Hypothetical Training for Robust Machine Reading Comprehension of Tabular Context
Three Papers Accepted by CVPR 2023
Unbiased Multiple Instance Learning for Weakly Supervised Video Anomaly Detection
Bootstrap Your Own Prior: Towards Distribution-Agnostic Novel Class Discovery
Semantic Scene Completion with Cleaner Self
One Paper Accepted by ICLR 2023
Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection
One Paper Accepted by AAAI 2023
Debiased Fine-Tuning for Vision-language Models by Prompt Regularization
One Paper Accepted by NeurIPS 2022
Respecting Transfer Gap in Knowledge Distillation
One Paper Accepted by IJCV
Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning
Four Papers Accepted by ECCV 2022 (One Oral)
Identifying Hard Noise in Long-Tailed Sample Distribution[oral]
Invariant Feature Learning for Generalized Long-Tailed Classification
Class Is Invariant to Context and Vice Versa: On Learning Invariance for Out-Of-Distribution Generalization
Equivariance and Invariance Inductive Bias for Learning from Insufficient Data
One Paper Accepted by ICML 2022
Certified Robustness Against Natural Language Attacks by Causal Intervention
One Paper Accepted by CVPR 2022
Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation
Two Papers Accepted by ACL 2022
Cross-KQA Pro: A Dataset with Explicit Compositional Programs for Complex Question Answering over Knowledge Base
Learning to Imagine: Integrating Counterfactual Thinking in Neural Discrete Reasoning
One Paper Accepted by ICLR 2022
On Non-Random Missing Labels in Semi-Supervised Learning
Two Papers (Two Orals) Accepted by AAAI 2022
Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification[oral]
Deconfounded Visual Grounding[oral]
One Paper Accepted by TPAMI 2022
Deconfounded image captioning: A causal retrospect
Three Papers (One Spotlight) Accepted by NeurIPS 2021
Self-Supervised Learning Disentangled Group Representation as Feature [spotlight]
How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial Robustness?
Introspective Distillation for Robust Question Answering
One Paper Accepted by EMNLP 2021
TransferNet: An Effective and Transparent Framework for Multi-hop Question Answering over Relation Graph
Four Papers (One Oral) Accepted by ICCV 2021
Transporting Causal Mechanisms for Unsupervised Domain Adaptation [oral]
Causal Attention for Unbiased Visual Recognition
Self-Regulation for Semantic Segmentation
Auto-Parsing Network for Image Captioning and Visual Question Answering
1st Causality in Vision Workshop, CVPR 2021
Our group will host the 1st Causality in Vision Workshop on CVPR 2021
Workshop Homepage: http://www.causalityinvision.com/
Five Papers Accepted by CVPR 2021
Distilling Causal Effect of Data in Class-Incremental Learning
Counterfactual VQA: A Cause-Effect Look at Language Bias
Counterfactual Zero-Shot and Open-Set Visual Recognition
Causal Attention for Vision-Language Tasks
The Blessings of Unlabeled Background in Untrimmed Videos
One Paper Accepted by TMM 2021
Align R-CNN: A Pairwise Head Network for Visual Relationship Detection
Two Papers Accepted by TPAMI and AAAI 2021
Auto-encoding and Distilling Scene Graphs for Image Captioning
Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding
Three Papers (One Oral) Accepted by NeurIPS 2020
Causal Intervention for Weakly-Supervised Semantic Segmentation [oral]
Interventional Few-Shot Learning
Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect
One Paper Accepted by ACM-MM 2020
Hierarchical Scene Graph Encoder-Decoder for Image Paragraph Captioning
Two Papers Accepted by ECCV and TMM
Feature Pyramid Transformer, ECCV
Self-Adaptive Neural Module Transformer for Visual Question Answering, TMM
Eight Papers (Two Oral) Accepted by CVPR 2020
Iterative Context-Aware Graph Inference for Visual Dialog [oral]
Unbiased Scene Graph Generation from Biased Training [oral]
Visual Commonsense R-CNN
Learning to Segment the Tail
Two Causal Principles for Improving Visual Dialog
More Grounded Image Captioning by Distilling Image-Text Matching Model
Counterfactual Samples Synthesizing for Robust Visual Question Answering
Learning Filter Pruning Criteria for Deep Convolutional Neural Networks Acceleration
PREMIA Best Student Paper, Silver Award
Our paper "Learning to Compose Dynamic Tree Structures for Visual Contexts" received the PREMIA Best Student Paper, Silver Award (2nd Place)
Four Papers (Two Oral) Accepted by ICCV 2019
Counterfactual Critic Multi-Agent Training for Scene Graph Generation [oral]
Learning to Assemble Neural Module Tree Networks for Visual Grounding [oral]
Making History Matter: History-Advantage Sequence Training for Visual Dialog
Learning to Collocate Neural Modules for Image Captioning
One Paper Accepted by TPAMI
Three Papers Accepted by ACM MM 2019
CVPR 2019 conference
Our Team MReaL-BDAI wins the first place in Visual Dialogue Challenge
Our paper "Learning to Compose Dynamic Tree Structures for Visual Contexts" appears on the Best Paper Finallists.
Two Papers Accepted by TPAMI
Four Papers (Three Oral) Accepted by CVPR 2019
Two Papers Accepted by AAAI 2019