The other day I was asked if I could convert some short ebooks into PDFs using an existing print template set up in InDesign. Only thing was that the contents of the ebooks were edited sections of existing print books, and had never existed as Word files or InDesign files. So how to do it reasonably efficiently? To be absolutely sure that you have the right version of the text ideally you should work with the ebook files.
That’s fine – crack them open and you have HTML. That’s text, right? Well – yes, and in an ideal world InDesign could import HTML and use the HTML code to style your text. But this isn’t an ideal world – InDesign can’t import HTML yet. (I’m sure it’s only a matter of time, right, Adobe?)
But – back to the drawing board for the time being. So I copy and paste the text from my browser… That works, but – hey, hang on! – where’s my formatting? All those italics – gone. Oh lordy, am I going to have to go back over everything and replace the italics? Bolds? Headings!?
Back to the drawing board again. What if I copy-and-paste the HTML into InDesign? Yes but you still haven’t got any formatting? Ah, but you have got the codes for formatting.
You can see here, each paragraph is surrounded by a little bit of code and italics, bolds and headings, etc are surrounded by codes too. These codes work with the css files to style the text in your ebook or browser, and the great thing about this is that they won’t ever be wrong or typed incorrectly (so long as the original text is styled correctly of course). So you can do some find/change work using the code tags as a guide and you’ll soon have styled text without having to go through comparing both versions. Hurrah!
The find/change panel in InDesign is really powerful and I spend a lot of time using it when I’m typesetting. But to sort out this little problem more efficiently, it’s really useful to know a bit of grep. I knew some grep and sometimes use InDesign’s built-in grep queries, but I went back to the trusty Lynda video-training site and brushed up on it. And, wowzers, it really is like magic. (I’m nothing to do with Lynda.com, but I cannot recommend them highly enough – their courses are superb.)
What you need to know here is pretty simple stuff, actually, and is only scratching the surface of the capabilities of grep (and don’t even get me started on the possibilities of grep styles). If you’re ever setting long documents, or have to change from one format to another, a little bit of grep is the way to go.
Here I’m clearing out the paragraph tags and styling the body text at the same time – one click (do check your code is working first though!) and the body text is styled and the paragraph tags are gone. You’ll see that the paragraph tags are in the search field and inside them is (.*). This pretty much means find anything inside this text. Then in the replace field the $1 means put in anything you’ve found but only what you’ve found – not the paragraph tags (actually anything inside those parentheses you see around .*). And at the bottom of the find/change panel I’ve asked it to change the style to ‘text’.
Clearing out paragraph tags
You can use the same method to style headings, opening paragraphs, etc, too. Just substitute your paragraph tags for whatever else you have (H1, div, etc).
Here I’m styling italics with a italic character style and getting rid of the tags at the same time. Again, you can do this with your bold, underline etc – just change the search criteria.
You can also use ‘wild cards’ to clear out things like image tags that are slightly different throughout, so that you don’t have to search and delete manually through all the text.
Using wildcard codes to clear out unwanted text
You could also use grep to convert the image tags to placeholder boxes for the images if you needed too. Finally go through and clear out random div tags, etc. Then I’d do a final check for
< which means you should pick up any remaining lurking code. Then you’re done. Ta dah! Styled text in just a few clicks.