"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;
Showing posts with label NER. Show all posts
Showing posts with label NER. Show all posts

February 26, 2023

NLP - NER - Entity Recognition

I have worked on NLP, and custom NER examples. This streamlit demo covers NER in domain context and multiple entities link






  • Name Entity Recognition - Extract Organizations, People, Locations, and many other entities from long, free-text financial documents.
  • Extract Financial Relationships - Automatically identify relationships between companies, products, and people – even when they are mentioned using aliases.
  • Classify Financial Text - Classify texts into 77 banking-related categories like credit reports, mortgages, money transfers and more.
  • Financial Sentiment Analysis - Identify positive, negative or neutral sentiments in financial news.
  • Financial De-identification - De-identify and mask sensitive personal information in documents and images.

Ref - Link

Keep Exploring!!!

February 22, 2023

NLP, Recruitment, AI - Automated Matching :) Eightfold - Analysis

A ton of zero-shot / few shot / data labeling to customize and build models for

Some of the Key features / NLP overlaps

Custom Entity Recognition

  • Education
  • Place
  • Company 
  • Domain
  • Skills Extraction

Custom Embedding for

  • Similar Projects Search
  • Project Summarization
  • Creating custom embedding for each domain

NLP + Ranking

  • Ranking and retrieving based on location/salary/education
  • Skill Distribution / Contribution
  • Domain Extraction from Company names
Vision
  • OCR + Vision for Content Retrieval

Search / Retrieval

  • Vector database for Finetuning/indexing vector databases to search and retrieve the closest matches

From JD

  • Familiar with Language models, and transformers like BERT, GPT-3, T-5 etc.
  • Prior experience building and deploying machine learning models in production at scale
  • Familiarity with MLOps tools and pipelines (MLflow, Metaflow).
  • Integration with Workday, SuccessFactors, Taleo, PeopleSoft, iCIMS, SmartRecruiters,
  • REST APIs, microservices, data ingestion and processing systems, and distributed systems.



September 03, 2022

Conditional Random Fields - NER Notes

  • NER - A named entity is a “real-world object” that’s assigned a name – for example, a person, a country, a product, or a book title. 
  • CRF can take context into account.
  • Each prediction is dependent only on its immediate neighbors. 
  • CRF model to predict the conditional probability of Y by training the model parameters
  • CRF builds transition probability that accounts for the likelihood of observing each transition between labels in the sequence
  • CRF is a discriminative approach, It builds both likely transition and unlikely transitions
  • A Discriminative model ‌models the decision boundary between the classes

Ref - Link

Feature Functions - Notes

Ref - Link


Ref - Link


Ref - Link

NER Approaches


Keep Exploring!!!

October 09, 2021

NLP - NER - Papers

Paper #1 - Recent Trends in Named Entity Recognition (NER)

Key Notes

  • ‘Named Entity Recognition’ refers to identifying person, organization, location
  • NER belongs to a general class of problems in NLP called sequence tagging 
  • Prominent supervised learning methods - Hidden Markov Models (HMM), Decision Trees, Maximum Entropy Models (ME)

  • Unsupervised clustering method using lexical resources eg. Wordnet

Paper #2 - A Survey on Deep Learning for Named Entity Recognition

Key Notes

  • Rule-based approaches, which do not need annotated data as they rely on hand-crafted rules
  • Unsupervised learning approaches, which rely on unsupervised algorithms Feature-based supervised learning approaches, which rely on supervised learning algorithms

  • 71% of search queries contain at least one named entity



  • word-level representation  - continuous bagof-words (CBOW) and continuous skip-gram models
  • Commonly used word embeddings include Google Word2Vec, Stanford
  • GloVe, Facebook fastText and SENNA.
  • CharNER considers a sentence as a sequence of characters and utilizes LSTMs to extract characterlevel representations.
  • Besides word-level and character-level representations, some studies also incorporate additional information (e.g., gazetteers [18], [108], lexical similarity [109], linguistic dependency [110] and visual features [111]) into the final representations of words



Paper #3 - Document Ranking for Curated Document Databases using BERT and Knowledge Graph Embeddings: Introducing GRAB-Rank

  • Key Notes
  • Curated Document Databases (CDD) play an important role in helping researchers find relevant articles in scientific literature
  • Document ranking has been extensively used in the context of document retrieval
  • Recent work on Learning to Rank (LETOR) has used word embeddings of various kind as the input
  • Word embeddings can be learnt from scratch or a pre-trained embedding model can be adopted
  • A popular algorithm for generating vector representations of words is GloVE (Global Vectors for Word Representation), an unsupervised learning algorithm that operates by aggregating global word-word co-occurrence statistics
  • Semantic document ranking models take into account the context of terms in relation to their neighbouring terms
  • Context of the word “bank”, either as: (i) an organisation for investing and borrowing money, (ii) the side of a river or lake, (iii) a long heap of some substance
  • A popular choice of pre-trained contextual model is the Bidirectional Encoder Representations from Transformer (BERT)
  • An alternative contextual model that can be used is the embeddings from Language Model ELMo
  • A knowledge graph is a collection of vertices and edges where the vertices represent entities or concepts, and the edges represent a relationship between entities and/or concepts. 

  • OIE4KGC (Open Information Extraction for Knowledge Graph Construction)

More Reads