Tag Archives: scanning

Uses for a scanner Pt 3: Digitising printed texts – a guide for authors and publishers


Recently I’ve had enquiries from authors who are thinking about self-publishing their rights-reverted books as e-books, but don’t know where to start and what is involved.

Let’s assume that you are sure the rights have reverted to you, you feel there’s a market for your books, and you’ve got someone to market them – or you’re prepared to put the time in – and that you realise that you will have to spend some time and probably some money (if you don’t want to spend an inordinate lot of time) on converting your books.

So let’s start with first things first – how do you convert your printed book into an e-book?
Step one: The text must exist in an editable digital form. This means that if all you have is a printed book, you must somehow get it into a text document (Word – or similar will do).

You could type it up again … or get someone else to type it up again…
You could check with your agent and/or publisher to see if they still hold files
You could scan the book and use OCR scanning software. Which brings me to the point of this post.

You can use an ordinary flat-bed scanner that often comes with home printers these days. But it is a drawn-out process. Open book, place on scanner, scan spread (holding book down as flat as you can), take book out of scanner, turn page over and repeat … and repeat … and repeat. You can see how this will quickly become tedious, and you really do have to hold the pages as flat as possible or the OCR scanning software will struggle.

Another option is to use a sheet-feeder scanner. You will have to cut your book up – so book-lovers of the ‘weep to see a broken spine’ disposition look away now…

Step one: Thoroughly break the spine. Open and close in several places, bend and generally loosen up.

Step two: Carefully pull the cover away from the book block.

Gently pull the cover away from the book block

It’s not essential that the cover stays in one piece, but I think it makes it easier to handle.

Step three: Once the cover is off, carefully pull the book block apart into sections. Or if you have a huge guillotine about your person, use it to trim off the glued section.

Pull apart into manageable sections

Step four: Trim along the glued edges. You don’t have to worry about perfection here, just so long as you don’t cut into the text areas. I use scissors, but you can use a knife if it’s easier.

Trim off the glued edges

Step five: Fan through the pages several times to get rid of paper dust and to make sure all the pages are separate. Any still glued together will snarl up in the scanner. Books make a surprising amount of dust too.

Fan pages to separate and get rid of dust

Step six: Place about forty pages in the scanner (with my scanner it’s face down and pointing down). No need to count the pages – just experiment with how much it can cope with. Set the scanner going. Keep an eye out for snarl-ups or misfeeds. Make sure that you put the sections through in the correct order.

Pages going through the scanner

Step seven: Save the resulting scan as a PDF. This has created a series of page images. The text still isn’t editable at this point.

Step eight: Run your OCR software. I use ABBYY Finereader Express. Save the result as a RTF file.

You now have an editable file.

Step nine: You’ll need to check it through for OCR errors. The software is very good for reasonably normal text, but if your original has any fancy fonts, handwriting, etc, expect a lot of errors. I recently scanned a couple of books with chapter heads in a gothic blackletter font and they came out as complete gobbledygook. You’ll need to find and delete page numbers, running heads, etc. I’ve noticed errors on italics with ?or! directly after them and I converting to 1 or the other way around. Foreign accents tend to be ignored too (if scanning in English that is).

Step ten: You now have digital text ready for formatting and converting – but that’s another story!

If this all sounds like a huge faff – I can do any or all parts of this process for you, and I can convert to e-book formats too. Just contact me for details.

Please note:  you must own the rights to the work (or have permission from the owner).

Tagged , , , , , , , ,

A new scanner



Yay! I am now set up to OCR scan using this – the Fujitsu Scansnap. Having played with this for a few days I’m really pleased with it.

It’s a neat little sheet-feed scanner – shown next to a Mac here. It’s permanently plugged in and you switch it on by opening it. Then you just feed in what you want to scan. It’ll take up to A4 size papers (and A3 if you wrap it round a carrier sheet). It scans both sides, but is clever enough to leave out blank pages.

But the magic is really in the software. OCR (Optical Character Recognition) software looks at the page image and extracts editable text from it. For printed text – such as a novel – the conversion is very nearly perfect. It’s not quite clever enough yet to work out page breaks and it has a bit of trouble with foreign accents (but it can be set to various languages so this may be a fix) and has the occasional inexplicable wobble.

OCRed text will always need checking, but it is the perfect solution for authors wanting to rerelease their backlist as e-books or print-on-demand titles.

I’ll post about the workflow later…



Tagged , , , ,