Visual Question Answering Application
demo - Link
Pics and videos are everywhere, words are how humans communicate
- Vision + ML + NLP - Interesection of all fields
- The Flow
- Low level image processing
- Objects and Shapes
- Object recognition, Relationship between objects
- Relationship between object, events
- Pizza, Type of pizza
Applications
- Visually impaired assistance
- Video Analytics analysis
- Check a piece of information
- Open-ended questions
- Choice-based questions
- Counting type questions
Next Steps
- Perception - Reasoning - Multistep reasoning
- Difficult for a single model to address
- Obtain knowledge - Form Relationships
- Dataset - 1 Million questions
- bag of words to embed
- BOW + LSTM
- Reasoning - Chaining of relative predicates to arrive at the conclusion
- Objects - RCNN
- Contextual words - Bidirectional LSTM
- Connect all objects in sequence
- Semantic similarities representation
- Relational Reasoning on Visial QA
- Conditional Relation Network Unit
Every weekend makes me feel guilty about vision current state of art vs what I am working on when I will bridge the knowledge gap!!!
No comments:
Post a Comment