Theresa Arzadon-Labajo | ||||
Menu:
Powered by Website Baker |
Building Tesseract RPMDownload tarball from http://code.google.com/p/tesseract-ocr/ Untar the package: tar -xzvf tesseract-2.03.tar.gz cd tesseract-2.03 Spec file is included in the tarball. tesseract.spec. Copy spec file to your SPECS directory. Before building the rpm, make sure libtiff-devel is installed.
Now you can install the newly created RPM. rpm -ivh tesseract-2.03-4.i386.rpm I edited my tesseract.spec so that it would include all the language files as well as include pdf2tif, ocr.sh and xsane2tess.
--------------------------------------------------------------------------------------- How to scan and OCR like a pro with open source tools NOTE: for tesseract to work, the tiff file you're running it on needs to be renamed to end in .tif (not .tiff) AND it needs to be an image without an alpha channel. If you've renamed the file and tesseract is still barfing, this is probably the problem. Use an image conversion utility with the ability to remove alpha channels to re-save your image. For bulk image conversion I recommend Imagemagick (it's gpl and runs well on the mac). to ocr your tiff image, do: Tesseract can also make use of the libtiff library. (www.libtiff.org) ---------------------------------------------------------------------------------------------------------------- Using Tesseract with XsaneXsane Settings There are several options in using Tesseract with Xsane:
tesseract inputimage.tif outputtext -l eng pdf2tif filename.pdf (creates tif images of each page)
Google Groupshttp://groups.google.com/group/tesseract-ocr/ Last changed: Jun 30 2009 at 3:22 PM Back |
|||