Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): NLP - NER - Papers

October 09, 2021

NLP - NER - Papers

Paper #1 - Recent Trends in Named Entity Recognition (NER)

Key Notes

‘Named Entity Recognition’ refers to identifying person, organization, location
NER belongs to a general class of problems in NLP called sequence tagging
Prominent supervised learning methods - Hidden Markov Models (HMM), Decision Trees, Maximum Entropy Models (ME)

Unsupervised clustering method using lexical resources eg. Wordnet

Paper #2 - A Survey on Deep Learning for Named Entity Recognition

Key Notes

Rule-based approaches, which do not need annotated data as they rely on hand-crafted rules
Unsupervised learning approaches, which rely on unsupervised algorithms Feature-based supervised learning approaches, which rely on supervised learning algorithms

71% of search queries contain at least one named entity

word-level representation - continuous bagof-words (CBOW) and continuous skip-gram models
Commonly used word embeddings include Google Word2Vec, Stanford
GloVe, Facebook fastText and SENNA.
CharNER considers a sentence as a sequence of characters and utilizes LSTMs to extract characterlevel representations.
Besides word-level and character-level representations, some studies also incorporate additional information (e.g., gazetteers [18], [108], lexical similarity [109], linguistic dependency [110] and visual features [111]) into the final representations of words

Paper #3 - Document Ranking for Curated Document Databases using BERT and Knowledge Graph Embeddings: Introducing GRAB-Rank

Key Notes
Curated Document Databases (CDD) play an important role in helping researchers find relevant articles in scientific literature
Document ranking has been extensively used in the context of document retrieval
Recent work on Learning to Rank (LETOR) has used word embeddings of various kind as the input
Word embeddings can be learnt from scratch or a pre-trained embedding model can be adopted
A popular algorithm for generating vector representations of words is GloVE (Global Vectors for Word Representation), an unsupervised learning algorithm that operates by aggregating global word-word co-occurrence statistics
Semantic document ranking models take into account the context of terms in relation to their neighbouring terms
Context of the word “bank”, either as: (i) an organisation for investing and borrowing money, (ii) the side of a river or lake, (iii) a long heap of some substance
A popular choice of pre-trained contextual model is the Bidirectional Encoder Representations from Transformer (BERT)
An alternative contextual model that can be used is the embeddings from Language Model ELMo
A knowledge graph is a collection of vertices and edges where the vertices represent entities or concepts, and the edges represent a relationship between entities and/or concepts.

OIE4KGC (Open Information Extraction for Knowledge Graph Construction)

More Reads

Code Experiments

Compare documents similarity using Python | NLP

Measuring the Document Similarity in Python

How to compute the similarity between two text documents?

How to do semantic document similarity using BERT

How to cluster text documents using BERT

Question answering using transformers and BERT

Calculating Document Similarities using BERT, word2vec, and other models

Pykg2vec representation of entities and relations in Knowledge Graphs

Pykg2vec: Python Library for KGE Methods

Python NLP Tutorial: Building A Knowledge Graph using Python and SpaCy

Keep Exploring!!!

No comments:

Subscribe to: Post Comments (Atom)