SEMI-AUTOMATED INDEXING PROGRAM by Wesley A. Cox, PhD.
The problems are the many books containing genealogical information which do not have an index to names. This was brought home to me by “The History of Lewis County Washington” which contains the story of my grandfather as a teenager and his family going West from Virginia to Washington. Pages 63 to 394 contain two or three family stories per page. This format of book is an 8.5” x 11” glossy coffee table book with photos, it is found for many counties around the nation, some with and some without an index. So in retirement from the industry I wrote a program a dozen years ago.
How it works
The program is a group of Macros using the Microsoft EXCEL spread sheet program and it’s VBA (Visual Basic for Applications). A page is scanned with Microsoft Word and OCR (Optical Character Recognition), and imported to EXCEL with a word per cell. A point-and-click sequence then identifies and saves the page number, a Surname and then a Given Name. Repeated clicks on Given Names (or middle names, nicknames) will record a line for each until a new Surname is clicked. Never is a name typed to be miss-spelled! So on these 316 pages I found an average of 9 surnames and 68 names per page!
Challenges I ran into
EXCEL is readily available to most users and contains tutorials and many code examples. It is a great training ground for computer coders which are needed to fill good jobs. This set of macros are easily adaptable and extended for similar problems. The scanner must be good. For volume work the document would be scanned creating projects similar to FamilySearch Indexing where the scanned image is provided for each page. I combined .txt files into longer spreadsheets to accommodate stories which often continued to the next page and this was faster too.
Accomplishments that I'm proud of
This program was well received when published as a RootsWeb TIP FROM READERS. This note is attached with several Kudos from RootsWeb readers. An Index would nicely augment searchable databases to save time by pin pointing the targets besides names such as location, occupation, etc.
Attached is the file IndexMacros.xls and a printed page copy. Also are attached more detailed instructions I sent to interested readers.
What's next for Semi-Automated Indexing Program
The result of the .txt file of the scanned book and the index has been on the GenWeb site for 12 years:
http://www.usgwarchives.net/wa/lewis/lewishis.htm also on the shelf in the Family History Library next to the History Book (979.782 H2) and in the Lewis County Museum in Chehalis.
- RootsWeb TIP submission “Semi-Automated Indexing Program” 2/22/2003 And response.
- RootsWeb TIP copy as published and several Kudo responses from the Readers.
- Index Program Instructions
- Photo: HP 3970 Scanner and Lewis Book
- OCR .rtf result of page 121.
- Page 121 plain text .txt
- Indexer Macros listing
- Photo: Spreadsheet top 36 rows showing in Row 1 current page number, surname, given name. And in Rows 30, 33, 34 the selected surname in Red and Given Name in Green.
- A finished index page I selected for some of my family names showing the 4 columns in size 8 to make the whole 22,476 names manageable in print.