Theresa Arzadon-Labajo

Tesseract with Russian

Posted by Theresa Arzadon-Labajo (tarzadon) on Nov 19 2009
Tech Stuff >> Unix-Linux
The main Tesseract-OCR Project on Google Code doesn't contain the Russian language files.

They are available at tesseract-ocr-russian Google Groups .



Copy/Move the files in the rus directory to the tessdata directory on your installation.  In my case, it is located in /usr/local/share/tessdata.

Prepare your TIFF files and make sure the contrast is sufficient where there are no remnants or specks on the page.

Then run tesseract <file>.tif <output_file> -l rus

<output_file> will append .txt to the end of the file name.

Last changed: Feb 27 2020 at 4:14 PM