Tag Archives: html

Good gracious – it’s GREP!

Grep definition
The other day I was asked if I could convert some short ebooks into PDFs using an existing print template set up in InDesign. Only thing was that the contents of the ebooks were edited sections of existing print books, and had never existed as Word files or InDesign files. So how to do it reasonably efficiently? To be absolutely sure that you have the right version of the text ideally you should work with the ebook files.

That’s fine – crack them open and you have HTML. That’s text, right? Well – yes, and in an ideal world InDesign could import HTML and use the HTML code to style your text. But this isn’t an ideal world – InDesign can’t import HTML yet. (I’m sure it’s only a matter of time, right, Adobe?)

But – back to the drawing board for the time being. So I copy and paste the text from my browser… That works, but – hey, hang on! – where’s my formatting? All those italics – gone. Oh lordy, am I going to have to go back over everything and replace the italics? Bolds? Headings!?

Back to the drawing board again. What if I copy-and-paste the HTML into InDesign? Yes but you still haven’t got any formatting? Ah, but you have got the codes for formatting.

HTML code

You can see here, each paragraph is surrounded by a little bit of code and italics, bolds and headings, etc are surrounded by codes too. These codes work with the css files to style the text in your ebook or browser, and the great thing about this is that they won’t ever be wrong or typed incorrectly (so long as the original text is styled correctly of course). So you can do some find/change work using the code tags as a guide and you’ll soon have styled text without having to go through comparing both versions. Hurrah!

The find/change panel in InDesign is really powerful and I spend a lot of time using it when I’m typesetting. But to sort out this little problem more efficiently, it’s really useful to know a bit of grep. I knew some grep and sometimes use InDesign’s built-in grep queries, but I went back to the trusty Lynda video-training site and brushed up on it. And, wowzers, it really is like magic. (I’m nothing to do with Lynda.com, but I cannot recommend them highly enough – their courses are superb.)

What you need to know here is pretty simple stuff, actually, and is only scratching the surface of the capabilities of grep (and don’t even get me started on the possibilities of grep styles). If you’re ever setting long documents, or have to change from one format to another, a little bit of grep is the way to go.

Here I’m clearing out the paragraph tags and styling the body text at the same time – one click (do check your code is working first though!) and the body text is styled and the paragraph tags are gone. You’ll see that the paragraph tags are in the search field and inside them is (.*). This pretty much means find anything inside this text. Then in the replace field the $1 means put in anything you’ve found but only what you’ve found – not the paragraph tags (actually anything inside those parentheses you see around .*). And at the bottom of the find/change panel I’ve asked it to change the style to ‘text’.

GREP

Clearing out paragraph tags

You can use the same method to style headings, opening paragraphs, etc, too. Just substitute your paragraph tags for whatever else you have (H1, div, etc).

Here I’m styling italics with a italic character style and getting rid of the tags at the same time. Again, you can do this with your bold, underline etc – just change the search criteria.
grep codes

You can also use ‘wild cards’ to clear out things like image tags that are slightly different throughout, so that you don’t have to search and delete manually through all the text.

Grep

Using wildcard codes to clear out unwanted text

You could also use grep to convert the image tags to placeholder boxes for the images if you needed too. Finally go through and clear out random div tags, etc. Then I’d do a final check for > and < which means you should pick up any remaining lurking code. Then you’re done. Ta dah! Styled text in just a few clicks.

Tagged , , , , , , , , , ,

Fixed-format epub with read-along audio

ios devices

IPhone and iPad

I’ve now added read-along audio to the ‘list of things I can do’. Specifically fixed format ePub for IOS devices with an audio track that highlights each word.

I can’t show you a sample because I’ve been working on copyrighted material. Maybe one day I’ll get around to making something, but the first thing I learned about the process is that it’s time consuming – oh so very time consuming.

You will need:

  • One fixed-format ePub
  • One or more audio tracks saved in .m4a format
  • A .smil file for every page that contains text

 

Method:

1. First you need to mark the start and end point of every word in the audio. (I split my audio into one file for each page –  you can use just one I think, but I haven’t tried it) Apple apparently recommends that you use a program called Audacity – which is handy because I already have it. But it is free anyway and easy to use. You will have to find the start and end point of each word by creating a selection, listening and fine-tuning it until it’s right. Then you press cmd b to create a label – you don’t have to write anything in the little box that comes up (I just did it to draw your attention to it).

When you have marked every word you can export the labels to a text file. When you open the text file all the start and end points of the label, and hence the word, are there in a list.

This will take some time. I’m sure some clever person has or will shortly automate this process, but until then there is nothing for it but to sit and listen … over and over again.

Marking the audio timings

Marking the audio timings

 

2. Next you will need to create your .smil files. Below is a sample showing just three words. This is where you add your timings you made earlier (they don’t have to be in red though!).

<smil xmlns="http://www.w3.org/ns/SMIL" 
 xmlns:epub="http://www.idpf.org/2007/ops"
 version="3.0">
 <body>
<par id="par1">
<text src="6-page6.xhtml#W1"/>
<audio src="audio/audio3.m4a" clipBegin="0.000000s" clipEnd="0.632985s"/>
</par>
<par id="par2">
<text src="6-page6.xhtml#W2"/>
<audio src="audio/audio3.m4a" clipBegin="0.632985s" clipEnd="0.964874s"/>
</par>
<par id="par3">
<text src="6-page6.xhtml#W3"/>
<audio src="audio/audio3.m4a" clipBegin="0.964874s" clipEnd="1.214646s"/>
</par>
</body>
</smil>


3. Next you need to mark up your html. Each word is wrapped in a span with the id corresponding to the text source number in the .smil file. For this short example you’d end up with a paragraph that looked like this.

<p><span id="W1">The </span> <span id="W2">Cat</span> <span id="W3">sat</span></p>

 

4. Next you need to update your content.opf file.

Remember to list all your audio and .smil files in the manifest and give them a unique id.

Then in the entry for the html files with audio you must remember to add:

media-overlay=”ID of corresponding .smil file”. This is really important – if you don’t do this properly it won’t work.

 

5. Then go into the css file and add this.

}
 .-epub-media-overlay-active{
 color: red;

This is saying to highlight the words in red (but it can be any colour you like).

 

6. Finally zip your ePub file back up again and test it out. If you have got everything right you should see a speaker icon in the top right of the menu bar, and if you’ve got everything absolutely right you will be able to have your file read to you and turn the pages automatically or manually. (Have you got the volume turned up?)

So then you’ve got everything right  and you try to validate it. And find it won’t validate – yep that’s right – this is quite normal; the validator doesn’t accept media overlays or some such gubbins.

 

So there you have it. Not a quick snack!

 

I used Read Aloud ePub for iBooks by Liz Castro  as my guide. The section about using GREP to help in the markup was extremely useful.

Tagged , , , , , , , , , , , , , ,