Distributed Proofreaders logo

Project Gutenberg is a longstanding effort to convert books into ebook format. It may have been the first project with ebooks as a goal. The books that get included in the project are generally in the public domain because the copyright has expired. Copyrights run out every year, and as they do, more books become eligible to become part of the project.

Originally, the process involved people transcribing a book from paper to computer and then saving that book to text format. More recently, a new effort has begun to capitalize on the creative commons of the Internet. The Distributed Proofreaders project intends to spread out the burden of making books suitable for Project Gutenberg. Instead of having one person transcribe the book, electronic tools are brought into play.

A scanner – converting the paper pages to images
OCR Software – Optical Character Recognition software attempts to change the images of words into text

OCR conversion from the images is prone to error. Proofreading is needed. That's where we humans come in. the Distributed Proofreader Project calls on us to provide page-by-page proofreading. One session need not accomplish any more, and, indeed, a tricky effort can be suspended in the middle. Clearly, the effort is voluntary, but a rough guideline is to have a personal goal of a page a day.

You could offer your services as a scanner of books, but the main volunteer effort is to proofread and the software runs entirely on the project servers. You gain access to pages through your browser. You only need to enable Javascript, cookies, popup windows for the site (popups let you have a window specially set up for proofreading while keeping the regular browser available for something like the FAQ page, for example.

The following image shows part of an editing page with the scanned page above, and the text below. In this case, there was nothing to change, but corrections from scanning/OCR errors would go in the lower section, just like doing editing in a word processor.

Sample page

After you finish your page, you click the button that saves the page as "Done" so it moves to another person in the chain. Proofreading progresses through several stages with more than one person checking each text. The idea is to let an ad hoc team work through the task of getting a book converted. I am still a novice, but some people dedicate much time to the project and become more involved in the late stages of the conversion process, making final edits and submitting the books to Project Gutenberg where they become universally available.


This article may be reproduced under the Creative Commons Attribution Share Alike license.

Creative Commons License