Tesseract is an open source Optical Character Recognition (OCR) Engine, available under the Apache 2.0 license. It can be used directly, or (for programmers) using an API to extract typed, handwritten or printed text from images. It supports a wide variety of languages.
Key Features :
The library provides optical character recognition (OCR) support for:
TIFF, JPEG, GIF, PNG, and BMP image formats
Multi-page TIFF images
PDF document format
Out-of-box support for multiple languages
Capability to train for new languages including German, Chinese Simplified, Chinese Traditional, Hindi
Provides scripts to compile the code for a variety of targets environments
Provides capability to OCR from a variety of source documents including multi-page TIFF, images and PDF.
Like this post? Don’t forget to share it!
Summary
Article Name
Tesseract open source OCR Engine
Description
Tesseract OCR can be used directly, or (for programmers) using an API to extract typed, handwritten or printed text from images.
Author
Karthik
Publisher Name
upnxtblog
Karthik
Allo! My name is Karthik,experienced IT professional.Upnxtblog covers key technology trends that impacts technology industry.This includes Cloud computing,Blockchain,Machine learning & AI,Best mobile apps, Best tools/open source libs etc.,I hope you would love it and you can be sure that each post is fantastic and will be worth your time.