"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

May 30, 2021

Lecture - Visual Question Answering Based on Image and Video - Thao Minh Le

Visual Question Answering Application

demo -  Link

Pics and videos are everywhere, words are how humans communicate




  • Vision + ML + NLP - Interesection of all fields
  • The Flow
    • Low level image processing
    • Objects and Shapes
    • Object recognition, Relationship between objects
    • Relationship between object, events
    • Pizza, Type of pizza 
Applications
  • Visually impaired assistance

  • Video Analytics analysis
  • Check a piece of information
  • Open-ended questions
  • Choice-based questions
  • Counting type questions
Next Steps
  • Perception - Reasoning - Multistep reasoning
  • Difficult for a single model to address
  • Obtain knowledge - Form Relationships 
  • Dataset - 1 Million questions
  • bag of words to embed
  • BOW + LSTM

  • Reasoning - Chaining of relative predicates to arrive at the conclusion


  • Objects - RCNN
  • Contextual words - Bidirectional LSTM
  • Connect all objects in sequence
  • Semantic similarities representation
  • Relational Reasoning on Visial QA

  • Conditional Relation Network Unit


Every weekend makes me feel guilty about vision current state of art vs what I am working on when I will bridge the knowledge gap!!!

No comments: