Paper #1 - Recent Trends in Named Entity Recognition (NER)
Key Notes
- ‘Named Entity Recognition’ refers to identifying person, organization, location
- NER belongs to a general class of problems in NLP called sequence tagging
- Prominent supervised learning methods - Hidden Markov Models (HMM), Decision Trees, Maximum Entropy Models (ME)
- Unsupervised clustering method using lexical resources eg. Wordnet
Paper #2 - A Survey on Deep Learning for Named Entity Recognition
Key Notes
- Rule-based approaches, which do not need annotated data as they rely on hand-crafted rules
- Unsupervised learning approaches, which rely on unsupervised algorithms Feature-based supervised learning approaches, which rely on supervised learning algorithms
- 71% of search queries contain at least one named entity
- word-level representation - continuous bagof-words (CBOW) and continuous skip-gram models
- Commonly used word embeddings include Google Word2Vec, Stanford
- GloVe, Facebook fastText and SENNA.
- CharNER considers a sentence as a sequence of characters and utilizes LSTMs to extract characterlevel representations.
- Besides word-level and character-level representations, some studies also incorporate additional information (e.g., gazetteers [18], [108], lexical similarity [109], linguistic dependency [110] and visual features [111]) into the final representations of words
- Key Notes
- Curated Document Databases (CDD) play an important role in helping researchers find relevant articles in scientific literature
- Document ranking has been extensively used in the context of document retrieval
- Recent work on Learning to Rank (LETOR) has used word embeddings of various kind as the input
- Word embeddings can be learnt from scratch or a pre-trained embedding model can be adopted
- A popular algorithm for generating vector representations of words is GloVE (Global Vectors for Word Representation), an unsupervised learning algorithm that operates by aggregating global word-word co-occurrence statistics
- Semantic document ranking models take into account the context of terms in relation to their neighbouring terms
- Context of the word “bank”, either as: (i) an organisation for investing and borrowing money, (ii) the side of a river or lake, (iii) a long heap of some substance
- A popular choice of pre-trained contextual model is the Bidirectional Encoder Representations from Transformer (BERT)
- An alternative contextual model that can be used is the embeddings from Language Model ELMo
- A knowledge graph is a collection of vertices and edges where the vertices represent entities or concepts, and the edges represent a relationship between entities and/or concepts.
- OIE4KGC (Open Information Extraction for Knowledge Graph Construction)
More Reads
- Ad-hoc retrieval with BERT
- Multi-Stage Document Ranking with BERT
- Pretrained Transformers for Text Ranking: BERT and Beyond
- Composite Re-Ranking for Efficient Document Search with BERT
- CEDR: Contextualized Embeddings for Document Ranking
Code Experiments
Keep Exploring!!!
No comments:
Post a Comment