GVU Center Brown Bag: Visual Dialog: Towards AI Agents That Can See, Talk, and Act


We are witnessing unprecedented advances in computer vision and artificial intelligence (AI). What lies next for AI? We believe that the next generation of intelligent systems (say the next generation of Google's Assistant, Facebook's M, Apple’s Siri, Amazon’s Alexa) will need to posses the ability to `perceive' their environment (through vision, audition, or other sensors), `communicate’ (i.e., hold a natural language dialog with humans and other agents), and `act’ (e.g., aid humans by executing API calls or commands in a virtual or embodied environment), for tasks such as:

  • Aiding visually impaired users in understanding their surroundings or social media content (AI: ‘John just uploaded a picture from his vacation in Hawaii’, Human: ‘Great, is he at the beach?’, AI: ‘No, on a mountain’)
  • Aiding analysts in making decisions based on large quantities of surveillance data (Human: ‘Did anyone enter this room last week?’, AI: ‘Yes, 27 instances logged on camera’, Human: ‘Were any of them carrying a black bag?’),
  • Interacting with an AI assistant (Human: ‘Alexa – can you see the baby in the baby monitor?’, AI: ‘Yes, I can’, Human: ‘Is he sleeping or playing?’).
  • Robotics applications (e.g. search and rescue missions) where the operator may be ‘situationally blind’ and operating via language (Human: ‘Is there smoke in any room around you?’, AI: ‘Yes, in one room’, Human: ‘Go there and look for people’).
In this talk, I will present a range of projects from my lab (some in collaboration with Prof. Devi Parikh’s lab) towards building such visually grounded conversational agents.

Speaker Bio:

Dhruv Batra is an Assistant Professor in the School of Interactive Computing at Georgia Tech and a Research Scientist at Facebook AI Research (FAIR). His research interests lie at the intersection of machine learning, computer vision, and artificial intelligence, with a focus on developing intelligent systems that are able to concisely summarize their beliefs about the world with diverse predictions, integrate information and beliefs across different sub-components or `modules' of AI (vision, language, reasoning, dialog), and interpretable AI systems that provide explanations and justifications for why they believe what they believe. In past, he has also worked on topics such as interactive co-segmentation of large image collections, human body pose estimation, action recognition, depth estimation, and distributed optimization for inference and learning in probabilistic graphical models.

He is a recipient of the Office of Naval Research (ONR) Young Investigator Program (YIP) award (2016), the National Science Foundation (NSF) CAREER award (2014), Army Research Office (ARO) Young Investigator Program (YIP) award (2014), Virginia Tech College of Engineering Outstanding New Assistant Professor award (2015), two Google Faculty Research Awards (2013, 2015), Amazon Academic Research award (2016), Carnegie Mellon Dean's Fellowship (2007), and several teaching commendations at Virginia Tech. His research is supported by NSF, ARO, ARL, ONR, DARPA, Amazon, Google, Microsoft, and NVIDIA. Research from his lab has been featured in Bloomberg Business, The Boston Globe, MIT Technology Review, Newsweek, WVTF Radio IQ, and a number of popular press magazines and newspapers. From 2013-2016, he was an Assistant Professor in the Bradley Department of Electrical and Computer Engineering at Virginia Tech, where he led the VT Machine Learning & Perception group and was a member of the Virginia Center for Autonomous Systems (VaCAS) and the VT Discovery Analytics Center (DAC). From 2010-2012, he was a Research Assistant Professor at Toyota Technological Institute at Chicago (TTIC), a philanthropically endowed academic computer science institute located on the University of Chicago campus.

He received his M.S. and Ph.D. degrees from Carnegie Mellon University in 2007 and 2010 respectively, advised by Tsuhan Chen. In past, he has held visiting positions at the Machine Learning Department at CMU, CSAIL MIT, Microsoft Research, and Facebook AI Research.

Event Details


    Technology Square Research Building, 1st Floor Ballroom, Atlanta, Ga


  • Thursday, August 31, 2017
    11:30 am - 1:00 pm