MReaL Lab

Title: Sharing on Reinforcement Learning
Speaker: Mingjie Lao, April 24 2019

Abstract: The objective of Reinforcement learning is to find the optimal policy that maximizes rewards in the long run. In this talk I will talk about 3 types of RL algorithms: 1. Policy gradient; 2. Actor-Critic; 3. Q-learning. Concepts will be explained with illustration and papers from OpenAI will be shared.

View Slides

Title: Deep Video Feature Learning
Speaker: Ruiwei Zhao, April 17 2019

Abstract: Video feature learning for action recognition is a challenging task that has been extensively studied in the research community. How to properly exploit the motion and temporal information are key to the design of the models. In this talk, I will review some famous CNN/LSTM based networks designed for action recognition, including multi-stream CNNs, 3D convolution with its variants, and non-local neural networks, etc.

View Slides

Title: Image Captioning
Speaker: Xu Yang, April 3 2019

Abstract: The target of image captioning is to generate a syntactically and semantically correct sentence which can describe the main content of the given image. Compared with early image captioners which are rules/templates based, the modern captioning models have achieved striking advances by three key techniques, i.e., encoder-decoder based pipeline, attention technique, and RL-based training objective. However, these image captioners lack the ability of commonsense reasoning, which is one important inductivce bias owned by humans. For exploiting such language inductive bias, Scene Graph Auto-Encoder (SGAE) is proposed for generating more descriptive captions.

View Slides

Title: Recursive Visual Attention in Visual Dialog
Speaker: Yulei Niu, March 27 2019

Abstract: Visual dialog is a challenging vision-language task, which requires the agent to answer multi-round questions about an image. It typically needs to address two major problems: (1) How to answer visually-grounded questions, which is the core challenge in visual question answering (VQA); (2) How to infer the co-reference between questions and the dialog history. An example of visual co-reference is: pronouns (e.g., `they') in the question (e.g., `Are they on or off?') are linked with nouns (e.g., `lamps') appearing in the dialog history (e.g., `How many lamps are there?') and the object grounded in the image. In this work, to resolve the visual co-reference for visual dialog, we propose a novel attention mechanism called Recursive Visual Attention (RvA). Specifically, our dialog agent browses the dialog history until the agent has sufficient confidence in the visual co-reference resolution, and refines the visual attention recursively. The quantitative and qualitative experimental results on the large-scale VisDial v0.9 and v1.0 datasets demonstrate that the proposed RvA not only outperforms the state-of-the-art methods, but also achieves reasonable recursion and interpretable attention maps without additional annotations.

View Slides

Title: Instance Segmentation
Speaker: Xinting Hu, March 13 2019

Abstract: Instance segmentation is a long-lasting problem in computer vision and a basic component of many applications, such as autonomous driving. Current instance segmentation methods based on deep neural networks can be categorized into two types, depending on how the method approaches the problem by starting from either detection or segmentation modules. In this presentation, I will give introduction of thses two kinds of methods and futhur cover some ideas about panoptic segmentation.

View Slides

Title: Visual Relationship Detection (VRD)
Speaker: Mitra, March 06 2019

Abstract: Visual relationships represent the visible and detectable various interactions between each object pair. Also, reasoning those relationships- formalized as Visual Relation Detection (VRD) task- can be fed into higher level tasks such as image captioning, visual question answering, image-text matching, as the intermediate building block. Reviewing of the task’s challenges, available datasets, and some state-of-the-arts is the purpose of seminar ahead.

View Slides

Title: A Closer Look at CNN Architecture Design for Image Recognition
Speaker: Long Chen, Feb 27 2019

Abstract: The appearance of AlexNet relight the popularity of deep learning, and many tasks in computer vision had obtained significant improvements based on these basic CNN architectures pretrained on ImageNet. However, even now in 2019, many researchers still only familiar with the ResNet (the winner of 2015 ILSVRC challenges). In this presentation, I will review some famous CNN architectures and analysis the philosophy of these designs, to help design better CNN architecture in our own specific tasks.

View Slides

Title: Visual Question Answering
Speaker: Kaihua Tang, Feb 20 2019

Abstract: Visual Question Answering is an important step from low-level cognition tasks, like visual recognition/detection, sentence analysis, towards general artificial intelligence. Although current VQA system is still not perfect, it motivates the community to think about how to build a bridge between visual information and textual information, which are two most important source of information the human can absorb.

View Slides

Title: Visual Dialog
Speaker: Jiaxin Qi, Feb 13 2019

Abstract: Visual Dialog is one of the prototype tasks introduced in recent years, which can be viewed as the multi-round VQA. It aims to give a proper answer based on visual and textual contents in the dialog. In this seminar, I will give a brief introduction of its definition, dataset, metrics and methods.

View Slides

Title: Semantic Image Segmentation
Speaker: Dong Zhang, Jan 30 2019

Abstract: The semantic image segmentation is the most elaborate one of three core tasks in computer vision, which aims to assign a correct semantic label to every pixel in an image. In this seminar, I will explain the definition of semantic segmentation and introduce the corresponding datasets. Besides, I will make an analysis and summary to mainstream image semantic segmentation models. At last, I will report to you on my progress in this task. Your comments and criticism are greatly welcomed.

View Slides

Title: An Introduction to Visual Grounding
Speaker: Daqing Liu, Jan 23 2019

Abstract: Visual grounding is a task to localize an object in an image based on a query in nature language. It has attracted a lot of attention in the recent years. In this seminar, I'll introduce visual grounding including a) the task definition, b) the datasets, c) a series of papers in mainstream, and d) my recent works. Welcome to join in and please feel free to ask any questions.

View Slides

Title: Visual Reasoning
Speaker: Jiaxin Shi, Jan 16 2019

Abstract: Visual reasoning aims to answer questions about complicated interactions between visual objects. Existing models can be divided into two categories: holistic approaches and module approaches. I will introduce typical works of these two categories.

View Slides

Seminar

Title: Sharing on Reinforcement Learning
Speaker: Mingjie Lao, April 24 2019

Title: Deep Video Feature Learning
Speaker: Ruiwei Zhao, April 17 2019

Title: Image Captioning
Speaker: Xu Yang, April 3 2019

Title: Recursive Visual Attention in Visual Dialog
Speaker: Yulei Niu, March 27 2019

Title: Instance Segmentation
Speaker: Xinting Hu, March 13 2019

Title: Visual Relationship Detection (VRD)
Speaker: Mitra, March 06 2019

Title: A Closer Look at CNN Architecture Design for Image Recognition
Speaker: Long Chen, Feb 27 2019

Title: Visual Question Answering
Speaker: Kaihua Tang, Feb 20 2019

Title: Visual Dialog
Speaker: Jiaxin Qi, Feb 13 2019

Title: Semantic Image Segmentation
Speaker: Dong Zhang, Jan 30 2019

Title: An Introduction to Visual Grounding
Speaker: Daqing Liu, Jan 23 2019

Title: Visual Reasoning
Speaker: Jiaxin Shi, Jan 16 2019

Other Resources

The code of paper "Auto-Encoding Scene Graphs for Image Captioning, CVPR 2019"

The code of paper "Learning to Compose Dynamic Tree Structures for Visual Contexts, CVPR 2019"

Seminar

Title: Sharing on Reinforcement Learning Speaker: Mingjie Lao, April 24 2019

Title: Deep Video Feature Learning Speaker: Ruiwei Zhao, April 17 2019

Title: Image Captioning Speaker: Xu Yang, April 3 2019

Title: Recursive Visual Attention in Visual Dialog Speaker: Yulei Niu, March 27 2019

Title: Instance Segmentation Speaker: Xinting Hu, March 13 2019

Title: Visual Relationship Detection (VRD) Speaker: Mitra, March 06 2019

Title: A Closer Look at CNN Architecture Design for Image Recognition Speaker: Long Chen, Feb 27 2019

Title: Visual Question Answering Speaker: Kaihua Tang, Feb 20 2019

Title: Visual Dialog Speaker: Jiaxin Qi, Feb 13 2019

Title: Semantic Image Segmentation Speaker: Dong Zhang, Jan 30 2019

Title: An Introduction to Visual Grounding Speaker: Daqing Liu, Jan 23 2019

Title: Visual Reasoning Speaker: Jiaxin Shi, Jan 16 2019

Other Resources

The code of paper "Auto-Encoding Scene Graphs for Image Captioning, CVPR 2019"

The code of paper "Learning to Compose Dynamic Tree Structures for Visual Contexts, CVPR 2019"

Title: Sharing on Reinforcement Learning
Speaker: Mingjie Lao, April 24 2019

Title: Deep Video Feature Learning
Speaker: Ruiwei Zhao, April 17 2019

Title: Image Captioning
Speaker: Xu Yang, April 3 2019

Title: Recursive Visual Attention in Visual Dialog
Speaker: Yulei Niu, March 27 2019

Title: Instance Segmentation
Speaker: Xinting Hu, March 13 2019

Title: Visual Relationship Detection (VRD)
Speaker: Mitra, March 06 2019

Title: A Closer Look at CNN Architecture Design for Image Recognition
Speaker: Long Chen, Feb 27 2019

Title: Visual Question Answering
Speaker: Kaihua Tang, Feb 20 2019

Title: Visual Dialog
Speaker: Jiaxin Qi, Feb 13 2019

Title: Semantic Image Segmentation
Speaker: Dong Zhang, Jan 30 2019

Title: An Introduction to Visual Grounding
Speaker: Daqing Liu, Jan 23 2019

Title: Visual Reasoning
Speaker: Jiaxin Shi, Jan 16 2019