Logan
Puritan Board Graduate
@davejonescue has been hinting toward pieces of this for a while but the three of us (Dave, Logan, and Alex) have been working somewhat in secret for a while and now it's finally time to share the fruits of our labor.
Many people are aware of the Early English Books Online website, which transcribed, character by character, early English books, among which Dave was able to find about 860 Puritans with around 6,000 works.
One downside (for many purposes) is the non-standard spelling of the era:
What if we could automate the correction of these? Alex had already started doing this on his own projects and had compiled a list of about 6,000 words and their corrections. With this basis, I wrote a script to identify an additional 17,000 of the most commonly occurring non-standard spellings, and then Alex and I painstakingly assigned them corrections.
Then I ran all 860 authors, 6,000 works through another script I'd written and the result compared to above looks like this:
So all of these Puritan works are now significantly "cleaner" than they were. We packaged these up into a customized application and would like to now reveal "Puritan Search", a free application for searching through and utilizing all of these documents. Not all are equally useful but there are some real gems that are now available to the general public and it's available, right now, for free on Windows, Mac, and Linux:
www.puritansearch.org
....But wait, there's more! In addition to the free searching application we have created above, and in the interest of making these available as a cleaner base text to readers or publishers, I have also converted each of these to PDF, EPUB, and Word. I hope that despite the lack of perfection, these will save a lot of effort by current or aspiring publishers, while the layman has immediate access to something that is serviceable for immediate reading.
These are available here:
https://sites.google.com/view/project-puritas/home
This was a team effort with months and months of effort to make happen and we pray it will be a blessing to the church worldwide for years to come. By all means share and spread the word.
Many people are aware of the Early English Books Online website, which transcribed, character by character, early English books, among which Dave was able to find about 860 Puritans with around 6,000 works.
One downside (for many purposes) is the non-standard spelling of the era:
What if we could automate the correction of these? Alex had already started doing this on his own projects and had compiled a list of about 6,000 words and their corrections. With this basis, I wrote a script to identify an additional 17,000 of the most commonly occurring non-standard spellings, and then Alex and I painstakingly assigned them corrections.
Then I ran all 860 authors, 6,000 works through another script I'd written and the result compared to above looks like this:
So all of these Puritan works are now significantly "cleaner" than they were. We packaged these up into a customized application and would like to now reveal "Puritan Search", a free application for searching through and utilizing all of these documents. Not all are equally useful but there are some real gems that are now available to the general public and it's available, right now, for free on Windows, Mac, and Linux:
www.puritansearch.org
....But wait, there's more! In addition to the free searching application we have created above, and in the interest of making these available as a cleaner base text to readers or publishers, I have also converted each of these to PDF, EPUB, and Word. I hope that despite the lack of perfection, these will save a lot of effort by current or aspiring publishers, while the layman has immediate access to something that is serviceable for immediate reading.
These are available here:
https://sites.google.com/view/project-puritas/home
This was a team effort with months and months of effort to make happen and we pray it will be a blessing to the church worldwide for years to come. By all means share and spread the word.
Last edited: