Correcting OCR using hOCR in Firefox
Quick post on a little tool I came across, moz-hocr-edit. This Firefox add-on lets you proofread Optical Character Recognition (OCR) output. Given my interest in OCR and the Biodiversity Heritage Library I decided to take it for a spin.
moz-hocr-edit uses the hOCR, which is a format for representing the output of OCR software, and is used by tools such as OCRopus (you can see the public specification for hOCR here). Basically it's a microformat, that is, it's HTML with some additional tags. Given some hOCR, moz-hocr-edit enables you to edit the OCR output line-by-line.
Demo
I've created a simple demo based upon Case 3368 Eatoniella Dall, 1876 and EATONIELLIDAE Ponder, 1965 (Mollusca, Gastropoda): proposed conservation. For the demo to work you will need to use the Firefox web browser with the moz-hocr-edit installed.
- Go to http://dl.dropbox.com/u/639486/hocr/80780.html
- You will see a simple HTML representation of the OCR text from "Case 3368 Eatoniella Dall, 1876 and EATONIELLIDAE Ponder, 1965 (Mollusca, Gastropoda): proposed conservation". I created this HTML from the original ABBYY FineReader XML from the Internet Archive.
- On the bottom right-hand of the Firefox browser window you should see hOCR. Click on it and select "Edit this hOCR document":
- Firefox will open a new tab that will look something like this:
- You can now edit individual lines of text, and see your edits applied to the HTML below.
Additional details
Description
Quick post on a little tool I came across, moz-hocr-edit. This Firefox add-on lets you proofread Optical Character Recognition (OCR) output.
Identifiers
- UUID
- 3fed0f99-e255-4fa9-b7d9-095df2eac89f
- GUID
- tag:blogger.com,1999:blog-16081779.post-2425236243751210928
- URL
- https://iphylo.blogspot.com/2011/07/correcting-ocr-using-hocr-firefox.html
Dates
- Issued
-
2011-07-13T14:12:00
- Updated
-
2011-07-13T14:17:09