Optical Character
Recognition (OCR) is a big, though often hidden
part, of our family history research.
Often, when we
access digitized newspapers or any large digitized collection via search terms,
it’s because of OCR. OCR greatly
enhances our ability to access the information contained in these collections
though it is less than perfect. You
might consider reading 8 Ways to Overcome
OCR Errors when Searching Newspapers, http://www.theancestorhunt.com/1/post/2013/09/8-ways-to-overcome-ocr-errors-when-searching-newspapers.html, (Kenneth R Marks, The Ancestor Hunt) to increase your success with using a search engine based on OCR.
When our society
needed to “recreate” old journals for our archive (an on-going process), it was
found for some of the oldest editions that we had NO electronic version
available to us. This necessitated our
scanning the journal pages, converting those pages to Portable Document Format (PDF), using OCR to “get at” the
text, place that text in a word-processing format and then recreate the
journal. We’ve similarly done that with
a book that we were given the publishing rights to and yet, again, had no
digital version of. BTW, the original
process was to scan the images into .tiff and then covert to OCR using software
that came with my scanner (OmniPage SE).
I still think that OmniPage does a better job with doing OCR and the
Adobe process is so much faster that it mostly compensates for those
differences.
And, you don’t have
to own an Adobe product to do this. My
husband swears by PDF-XChange Viewer ($37.50 – which does way more than
OCR!) and the article mentioned below by James Tanner talks about other options.
Recently I was
involved with a project involving photographing a whole collection of private
papers – most of which were typewritten documents. As part of the project, I was requested to
provide images and then a searchable PDF file.
This is how I learned that my Adobe Acrobat software has an option for “OCR
Text Recognition.” Amazing how one is
always learning how to better use the tools they already have! So, after taking the images, I then created a
PDF file and then used the OCR Text Recognition option to create a “searchable”
PDF document. Isn’t that really neat?!?!
For this same project, I found out that the David M. Rubenstein Rare Book & Manuscript Library (Duke University ) has a new scanner with a variety of output options. It’s amazing. You can save as .jpg images, .pdf files and also as searchable .pdf files! Now, I can scan each page of the document collection and then let the machine create a searchable .pdf file; I then walk away with my USB stick loaded with .jpeg and .pdf files! Unfortunately, I cannot spend all day monopolizing the machine and it’s a gem for when you need to photograph books and/or create searchable .pdf files! For small jobs, I won’t bother lugging my laptop, camera, tripod, cables, etc, when I know that I might not use them. Though, it is a machine, and does break and/or there is a queue to use it, and so, I’ll at least pack the usual accouterments in my car for back-up.
James Tanner
(Genealogy’s Star) recently posted A Look at Optical
Character Recognition (OCR) for Genealogists which talks a bit about his use of
OCR with his genealogy. A very important
point he mentions is that the quality of the OCR conversion is highly dependent
on the quality of the original “image” and it’s ability to handle hand-written
documents is quite limited.
Earlier this year,
Dick Eastman (EOGN) talked about an Android phone app that performs OCR in The Easy and Free
Way to Perform OCR Conversions of Documents.
Given that for my recent project I have photographed over 2000 pages, I
couldn’t and wouldn’t use my phone and for small documents or small
collections, it is a viable option.
Have you used OCR with your genealogy research?
How have you used it?
How might family historians use this technology in the future?
~~~~~~~~~~~~~~~~~~~~
copyright © National
Genealogical Society, 3108 Columbia Pike, Suite 300, Arlington, Virginia
22204-4370. http://www.ngsgenealogy.org.
~~~~~~~~~~~~~~~~~~~~~
Want to learn more
about interacting with the blog, please read Hyperlinks,
Subscribing and Comments -- How to Interact with Upfront with NGS Blog posts!
~~~~~~~~~~~~~~~~~~~~~
NGS does not imply
endorsement of any outside advertiser or other vendors appearing in this blog.
~~~~~~~~~~~~~~~~~~~~~
Republication
of UpFront articles is permitted and encouraged for
non-commercial purposes without express permission from NGS. Please drop us a
note telling us where and when you are using the article. Express written
permission is required if you wish to republish UpFront articles
for commercial purposes. You may send a request for express written permission
to [email protected]. All republished articles may not be
edited or reworded and must contain the copyright statement found at the bottom
of each UpFront article.
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~
Think your friends,
colleagues, or fellow genealogy researchers would find this blog post
interesting? If so, please let them know that anyone can read past UpFront with NGS posts or subscribe!
~~~~~~~~~~~~~~~~~~~~~
Suggestions
for topics for future UpFront with NGS posts are always welcome. Please
send any suggested topics to [email protected]
No comments:
Post a Comment