Welcome to MReaL! (Machine Reasoning and Learning, pronounced Me Real). Current AI is substantially different from human intelligence in crucial ways because our mind is bicameral: the right brain hemisphere is for perception, which is similar to existing deep learning systems; the left hemisphere is for logic reasoning; and the two of them work so differently and collaboratively that yield creative intelligence. To this end, at MReaL, we are seeking in-principle reasoning algorithms that take the complementary advantages of the modern deep neural networks for learning representations and the old-school symbolic operations for reasoning.

News

9 Papers Accepted by NeurIPS 2024 (2 Spotlights)

  • Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration[spotlight]
  • Enhancing Zero-Shot Vision Models by Label-Free Prompt Distribution Learning and Bias Correcting[spotlight]
  • Robust Fine-tuning of Zero-shot Models via Variance Reduction
  • Unified Generative and Discriminative Training for Multi-modal Large Language Models
  • Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
  • MVGamba: Unify 3D Content Generation as State Space Sequence Modeling
  • Decoupled Kullback-Leibler Divergence Loss
  • Lever LM: Configuring In-Context Sequence to Lever Large Vision Language Models
  • Action Imitation in Common Action Space for Customized Action Image Synthesis
  • 4 Papers Accepted by ECCV 2024

  • Rethinking and Improving Visual Prompt Selection for In-Context Learning Segmentation Framework
  • Instruction Tuning-free Visual Token Complement for Multimodal LLMs
  • Few-shot NeRF by Adaptive Rendering Loss Regularization
  • View-Consistent 3D Editing with Gaussian Splatting
  • 3 Papers Accepted by ICML 2024 (One Oral and One Spotlight)

  • Auto-Encoding Morph-Tokens for Multimodal LLM [spotlight]
  • Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition [oral]
  • Non-confusing Generation of Customized Concepts in Diffusion Models
  • 1 Paper1 Accepted by TPAMI

  • NICEST: Noisy Label Correction and Training for Robust Scene Graph Generation
  • 9 Papers Accepted by CVPR 2024

  • Diffusion Time-step Curriculum for One Image to 3D Generation
  • Distributionally Generative Augmentation for Fair Facial Attribute Classification
  • Doubly Abductive Counterfactual Inference for Text-based Image Editing
  • Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior
  • Few‑shot Learner Parameterization by Diffusion Time‑steps
  • Discriminative Probing and Tuning for Text-to-Image Generation
  • Empowering Dynamics-aware Text-to-Video Diffusion with LLMs
  • Classes Are Not Equal: An Empirical Study on Image Recognition Fairness
  • DisCo: Disentangled Control for Realistic Human Dance Generation
  • 2 Papers Accepted by ICLR 2024

  • Exploring Diffusion Time-steps for Unsupervised Representation Learning
  • Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions
  • 2 Papers Accepted by AAAI 2024

  • Dual-Perspective Knowledge Enrichment for Semi-Supervised 3D Object Detection
  • MGNet: Learning Correspondences via Multiple Graphs
  • 4 Papers Accepted by NeurIPS 2023

  • Generalized Logit Adjustment: Calibrating Fine-tuned Models by Removing Label Bias in Foundation Models
  • Tuning Multi-mode Token-level Prompt Alignment across Modalities
  • Make the U in UDA Matter: Invariant Consistency Learning for Unsupervised Domain Adaptation
  • Imagine That! Abstract-to-Intricate Text-to-Image Synthesis with Scene Graph Hallucination Diffusion.
  • 7 Papers Accepted by ICCV 2023

  • Equivariant Similarity for Vision-Language Foundation Models
  • Prompt-aligned Gradient for Prompt Tuning
  • Random Boxes Are Open-world Object Detectors
  • Invariant Feature Regularization for Fair Face Recognition
  • Invariant Training 2D-3D Joint Hard Samples for Few-Shot Point Cloud Recognition
  • Mitigating and Evaluating Static Bias of Action Representations in the Background and the Foreground
  • Learning Trajectory-Word Alignments for Video-Language Tasks
  • One Paper Accepted by TPAMI

  • Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering
  • Two Papers Accepted by ACL 2023

  • Counterfactual Active Learning for Out-of-Distribution Generalization
  • Hypothetical Training for Robust Machine Reading Comprehension of Tabular Context
  • Three Papers Accepted by CVPR 2023

  • Unbiased Multiple Instance Learning for Weakly Supervised Video Anomaly Detection
  • Bootstrap Your Own Prior: Towards Distribution-Agnostic Novel Class Discovery
  • Semantic Scene Completion with Cleaner Self
  • One Paper Accepted by ICLR 2023

  • Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection
  • One Paper Accepted by AAAI 2023

  • Debiased Fine-Tuning for Vision-language Models by Prompt Regularization
  • One Paper Accepted by NeurIPS 2022

  • Respecting Transfer Gap in Knowledge Distillation
  • One Paper Accepted by IJCV

  • Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning
  • Four Papers Accepted by ECCV 2022 (One Oral)

  • Identifying Hard Noise in Long-Tailed Sample Distribution[oral]
  • Invariant Feature Learning for Generalized Long-Tailed Classification
  • Class Is Invariant to Context and Vice Versa: On Learning Invariance for Out-Of-Distribution Generalization
  • Equivariance and Invariance Inductive Bias for Learning from Insufficient Data
  • One Paper Accepted by ICML 2022

  • Certified Robustness Against Natural Language Attacks by Causal Intervention
  • One Paper Accepted by CVPR 2022

  • Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation
  • Two Papers Accepted by ACL 2022

  • Cross-KQA Pro: A Dataset with Explicit Compositional Programs for Complex Question Answering over Knowledge Base
  • Learning to Imagine: Integrating Counterfactual Thinking in Neural Discrete Reasoning
  • One Paper Accepted by ICLR 2022

  • On Non-Random Missing Labels in Semi-Supervised Learning
  • Two Papers (Two Orals) Accepted by AAAI 2022

  • Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification[oral]
  • Deconfounded Visual Grounding[oral]
  • One Paper Accepted by TPAMI 2022

  • Deconfounded image captioning: A causal retrospect
  • Three Papers (One Spotlight) Accepted by NeurIPS 2021

  • Self-Supervised Learning Disentangled Group Representation as Feature [spotlight]
  • How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial Robustness?
  • Introspective Distillation for Robust Question Answering
  • One Paper Accepted by EMNLP 2021

  • TransferNet: An Effective and Transparent Framework for Multi-hop Question Answering over Relation Graph
  • Four Papers (One Oral) Accepted by ICCV 2021

  • Transporting Causal Mechanisms for Unsupervised Domain Adaptation [oral]
  • Causal Attention for Unbiased Visual Recognition
  • Self-Regulation for Semantic Segmentation
  • Auto-Parsing Network for Image Captioning and Visual Question Answering
  • 1st Causality in Vision Workshop, CVPR 2021

  • Our group will host the 1st Causality in Vision Workshop on CVPR 2021
  • Workshop Homepage: http://www.causalityinvision.com/
  • Five Papers Accepted by CVPR 2021

  • Distilling Causal Effect of Data in Class-Incremental Learning
  • Counterfactual VQA: A Cause-Effect Look at Language Bias
  • Counterfactual Zero-Shot and Open-Set Visual Recognition
  • Causal Attention for Vision-Language Tasks
  • The Blessings of Unlabeled Background in Untrimmed Videos
  • One Paper Accepted by TMM 2021

  • Align R-CNN: A Pairwise Head Network for Visual Relationship Detection
  • Two Papers Accepted by TPAMI and AAAI 2021

  • Auto-encoding and Distilling Scene Graphs for Image Captioning
  • Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding
  • Three Papers (One Oral) Accepted by NeurIPS 2020

  • Causal Intervention for Weakly-Supervised Semantic Segmentation [oral]
  • Interventional Few-Shot Learning
  • Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect
  • One Paper Accepted by ACM-MM 2020

  • Hierarchical Scene Graph Encoder-Decoder for Image Paragraph Captioning
  • Two Papers Accepted by ECCV and TMM

  • Feature Pyramid Transformer, ECCV
  • Self-Adaptive Neural Module Transformer for Visual Question Answering, TMM
  • Eight Papers (Two Oral) Accepted by CVPR 2020

  • Iterative Context-Aware Graph Inference for Visual Dialog [oral]
  • Unbiased Scene Graph Generation from Biased Training [oral]
  • Visual Commonsense R-CNN
  • Learning to Segment the Tail
  • Two Causal Principles for Improving Visual Dialog
  • More Grounded Image Captioning by Distilling Image-Text Matching Model
  • Counterfactual Samples Synthesizing for Robust Visual Question Answering
  • Learning Filter Pruning Criteria for Deep Convolutional Neural Networks Acceleration
  • PREMIA Best Student Paper, Silver Award

  • Our paper "Learning to Compose Dynamic Tree Structures for Visual Contexts" received the PREMIA Best Student Paper, Silver Award (2nd Place)
  • Four Papers (Two Oral) Accepted by ICCV 2019

  • Counterfactual Critic Multi-Agent Training for Scene Graph Generation [oral]
  • Learning to Assemble Neural Module Tree Networks for Visual Grounding [oral]
  • Making History Matter: History-Advantage Sequence Training for Visual Dialog
  • Learning to Collocate Neural Modules for Image Captioning
  • One Paper Accepted by TPAMI

    Three Papers Accepted by ACM MM 2019

    CVPR 2019 conference

  • Our Team MReaL-BDAI wins the first place in Visual Dialogue Challenge
  • Our paper "Learning to Compose Dynamic Tree Structures for Visual Contexts" appears on the Best Paper Finallists.
  • Two Papers Accepted by TPAMI

    Four Papers (Three Oral) Accepted by CVPR 2019

    Two Papers Accepted by AAAI 2019

    Contact Us