6 thoughts on “python extract text from image or pdf”

  1. The new video how to improve OCR and get better format – getting also the indent

  2. Love it very basic steps to start working with images and ocr. We can build off this, not sure why so many are down voting the video. No where does creator state this will do your entire project and handle all types of issues…

  3. Extract tabular data from PDF with Python – Tabula, Camelot, PyPDF2
    Easily extract tables from websites with pandas and python
    Easily extract information from excel with Python and Pandas

  4. This is the new video about how to extract tabular data with python:

    Extract tabular data from PDF with Python – Tabula, Camelot, PyPDF2

    Feel free to ask question!

  5. install tesseract-ocr:
    sudo apt-get install tesseract-ocr

    install pill and pytesseract(used for connection to tesseract-ocr):
    pip install pillow
    pip install pytesseract

    The code

    from PIL import Image
    import pytesseract

    file = Image.open("/home/user/sample.png")
    str = pytesseract.image_to_string(file, lang='eng')


  6. The article is updated with:
    how to improve the OCR processing:
    * Use white color themes (dark text on white background)
    * Scale the image to the optimal size

    good article on OCR efficiency – https://docparser.com/blog/improve-ocr-accuracy/

    New video will be done about extraction of tables from PDF and improving OCR.

Leave a Reply

Your email address will not be published. Required fields are marked *