  1. The new video how to improve OCR and get better format – getting also the indent

  2. Love it very basic steps to start working with images and ocr. We can build off this, not sure why so many are down voting the video. No where does creator state this will do your entire project and handle all types of issues…

  4. This is the new video about how to extract tabular data with python:

    Feel free to ask question!

  5. install tesseract-ocr:
    sudo apt-get install tesseract-ocr

    install pill and pytesseract(used for connection to tesseract-ocr):
    pip install pillow
    pip install pytesseract

    The code

    from PIL import Image
    import pytesseract

    file = Image.open("/home/user/sample.png")
    str = pytesseract.image_to_string(file, lang='eng')


  6. The article is updated with:
    how to improve the OCR processing:
    * Use white color themes (dark text on white background)
    * Scale the image to the optimal size

    good article on OCR efficiency – https://docparser.com/blog/improve-ocr-accuracy/

    New video will be done about extraction of tables from PDF and improving OCR.

