scripts:files:ocr_pdf_file
OCR a PDF file
Super simple solution.
- Install
tesseract
and the corresponding data package for your language - Install
ocrmypdf
either from AUR or pip.
If your document is scanned/has no text layer, run it as
ocrmypdf input.pdf output_with_text.pdf
If it does have text, it doesn't always merge cleanly. You can use –force-ocr
but that rasterizes the file and makes it massive.
scripts/files/ocr_pdf_file.txt · Last modified: by Tony