"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

August 30, 2018

Day #124 - Quora Answers Ranking (NLP + ML Analysis)

This post is to understand Quora answer ranking system

Reposting some of key lines from article
Snippet #1
A supervised approach means having a training dataset that is used to extract features from and create a model. Item-wise regression means that the model will give us a numeric score for each answer that we can use to rank them.

Snippet #2
Features Rating
At Quora we define good answers to have the following five properties:
  • Answers the question that was asked.
  • Provides knowledge that is reusable by anyone interested in the question.
  • Answers that are supported with rationale.
  • Demonstrates credibility and is factually correct.
  • Is clear and easy to read.
Analysis #1
My understanding on their approach
  • Keywords would be extracted from the question to identify features that it talks about. These keywords may be used to weight the answers or relate to answers
They may also leverage the following data from answers to rank and assign score among other answers
  • Number of Views
  • Number of Upvotes
  • Context (Topic)
  • Number of Comments
  • Score based on words(features) used
  • Assign a Overall score
Analysis #2
  • For answers related to same topic matching keywords, Match it to existing features and compare with it to provide comparative ranking. 
Analysis #3
  • I suppose they may do OCR as well to extract text from images.
Analysis #4
  • Translating Quora answers into any other language might need substantial re-work and building corpus in the target language
They could have shared a sample example code snippet along with textual definitions

Happy Learning!!!

No comments: