Mobile Multimedia Interaction
The new video how to improve OCR and get better format – getting also the indenthttps://youtu.be/jEYQqLtK_Xw
Love it very basic steps to start working with images and ocr. We can build off this, not sure why so many are down voting the video. No where does creator state this will do your entire project and handle all types of issues…
Extract tabular data from PDF with Python – Tabula, Camelot, PyPDF2https://www.youtube.com/watch?v=702lkQbZx50Easily extract tables from websites with pandas and pythonhttps://www.youtube.com/watch?v=OXA_ZD1gR6AEasily extract information from excel with Python and Pandashttps://www.youtube.com/watch?v=hJMH_1o8eU0
This is the new video about how to extract tabular data with python:
Extract tabular data from PDF with Python – Tabula, Camelot, PyPDF2https://youtu.be/702lkQbZx50
Feel free to ask question!
install tesseract-ocr: sudo apt-get install tesseract-ocr
install pill and pytesseract(used for connection to tesseract-ocr): pip install pillow pip install pytesseract
from PIL import Imageimport pytesseract
file = Image.open("/home/user/sample.png")str = pytesseract.image_to_string(file, lang='eng')
The article is updated with:how to improve the OCR processing: * Use white color themes (dark text on white background)* Scale the image to the optimal size
good article on OCR efficiency – https://docparser.com/blog/improve-ocr-accuracy/
New video will be done about extraction of tables from PDF and improving OCR.
Your email address will not be published. Required fields are marked *
Save my name, email, and website in this browser for the next time I comment.