当前位置: > 学术报告 > 文科 > 正文

文科

Image Captioning and Visual Question

发布时间:2016-12-23 浏览:

讲座题目:Image Captioning and Visual Question

讲座人:Qi Wu 教授

讲座时间:09:00

讲座日期:2016-12-23

地点:长安校区 文津楼三段522学术研讨室

主办单位:计算机科学学院 智能视觉计算科研团队

讲座内容:The fields of natural language processing (NLP) and computer vision (CV)have seen great advances in their respective goals of analysing and generatingtext, and of understanding images and videos. While both fields share a similarset of methods rooted in artificial intelligence and machine learning, theyhave historically developed separately. Recent years, however, have seen anupsurge of interest in problems that require combination of linguistic andvisual information. For example, Image Captioning and Visual Question Answering(VQA) are two important research topics in this area.

In this talk I will first outline some of the most recent progresses,present some theories and techniques for these two Vision-to-Language tasks,and then discuss our recent works. In these works, we first propose a method ofincorporating high-level concepts into the successful CNN-RNN approach, andshow that it achieves a significant improvement on the state-of-the-art in bothimage captioning and visual question answering. We further show that the samemechanism can be used to incorporate external knowledge, which is criticallyimportant for answering high level visual questions. Our final model achieves the best reportedresults on both image captioning and visual question answering on severalbenchmark datasets.