"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

September 25, 2022

Document Q&A

From OCR, Document Extraction, Understanding, Hugging face has come a long way :)

DocQnA Pipeline very impressive




#Install packages in colab
#!apt install tesseract-ocr
#!apt install libtesseract-dev
#!pip install Pillow
#!pip install pytesseract
#!pip install transformers
# You can use a http link, a local path or a PIL.Image object
img_path = "https://huggingface.co/spaces/impira/docquery/resolve/main/contract.jpeg"
from transformers import pipeline
pipe = pipeline("document-question-answering")
pipe(image=img_path, question="what is the purchase amount?")
pipe(image=img_path, question="what is the industry name?")

Results


Keep Exploring!!!

  • TesserOCR
  • MMOCR
  • OCRmypdf
  • EasyOCR
  • PaddleOCR
  • Kraken
  • OCRopus
  • PyOCR
  • Tesseract


Keep Learning!!!

No comments: