|
OCRopus is a free document analysis and optical character recognition (OCR) system released under the Apache License, Version 2.0 with a very modular design through the use of plugins. These plugins allow OCRopus to swap out components easily. OCRopus is currently developed under the lead of Thomas Breuel from the German Research Centre for Artificial Intelligence in Kaiserslautern, Germany and is sponsored by Google. OCRopus is developed for Linux; however, users have reported success with OCRopus on Mac OS X and an application called TakOCR〔(TakOCR website )〕 has been developed that installs OCRopus on Mac OS X and provides a simple droplet interface. == How it works == OCRopus is an OCR system that combines pluggable layout analysis, pluggable character recognition, and pluggable language modeling. It aims primarily for high-volume document conversion, namely for Google Book Search, but also for desktop and office use or for vision impaired people. OCRopus used Tesseract as its only character recognition plugin, but it uses its own engine in the 0.4 release.〔(OCRopus doesn't even link with Tesseract by default )〕 This is especially useful in expanding functionality to include additional languages and writing systems. OCRopus also contains disabled code for a handwriting recognition engine which may be repaired in the future. OCRopus's layout analysis plugin does image preprocessing and layout analysis: it chops up the scanned document and passes the sections to a character recognition plugin for line-by-line or character-by-character recognition. As of the alpha release, OCRopus uses the language modeling code from another Google-supported project, OpenFST,〔(Official OpenFST website )〕 optional as of version pre-0.4. 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「OCRopus」の詳細全文を読む スポンサード リンク
|