This is a February 21, 2012 revision of this post, which originally appeared September, 23, 2011.
This is one of those posts where I describe something mind-breakingly complex that I did with my computer. The reason I want to post it is so I can find the description after I’ve totally forgotten all the details in six months. Computer knowledge has such a short half-life.
Over time, I’ll also be revising this post directly, as well adding links and ideas to the comments section as the information filters through my head—and as I continue making mistakes. Let the learning process begin!
[As I often do, I’ll be using photo illos with absolutely no connection to the subject at hand. I took these photos at Nick Herbert’s house in Boulder Creek, on a dude ranch in Wyoming, and in the park at Wilder Ranch in Santa Cruz.]
Plan
So suppose I have a document and I want to make it into an ebook.
Amazon will let you submit a DOC, but you are more likely to get the formatting you want if you submit an HTML. Amazon won’t take an EPUB. In practice it seems to work better to send HTML to Amazon, some files I’ve tried had bad indents when I sent DOC, but looked okay in HTML.
Barnes and Noble PubIt will take an HTML or an EPUB. You can convert HTML to EPUB with the Calibre or the Sigil tool.
Lulu will get you on iBook, and they take EPUB.
We’ll need HTML and EPUB versions of our book. So our workflow is to go from DOC to HTML, and then from HTML to EPUB.
[Supposedly Smashwords takes DOC only, but they seem to be becoming a little irrelevant. And, for that matter, they don’t seem to be processing their requests for new accounts. To hell with them.]
Setup
Tools I’m going to use: Microsoft Word, Adobe Dreamweaver, the free Calibre ebook software, the free Sigil epub software, and the free Epubcheck software.
I make a directory for my project and put my Word document in there. It’s important at this point to format all of your chapter or section headings with a Header style such as Header 1, Header 2, etc. The EPUB file is going to want to make a table of contents, and it will build them by finding Header-formatted lines. And you’ll want an internal table of contents that Word generates from headers.
Regarding the images you want to include, even if you inserted them into your Word DOC from somewhere else on your hard drive, put copies of all the images that in a subdirectory of your project directory and name the subdirectory images. Use some photo editing software to adjust the sizes of these images to be, let’s say, 800 pixels across so they can fill up an iPad reader page. This way you have control of what images get used.
Word Formatting
You can adjust font in the ereader, but I go with a default font size of 12 to start with. Don’t use any font larger than about 18 points for titles or sections, or it’ll look too big on a smartphone.
The fonts available in ereaders vary widely. I myself like to go to Georgia. There’s a setting you can do to make the ereader use a similar font if Georgia’s not available. More on this point below.
The Amazon guidelines, and others, suggest that you justify your text, that is, choose the justified paragraph style so that both the left and right edges of the text are in straight columns. I feel your ebook is easier to read if you forget about straight right edges and choose the Flush Left option. Then the spacing between words will be uniform. This said, many ereaders will ignore your format and will justify the text anyway.
Set your paragraph first-line indent fairly small, like 0.2 inches. What you’ll see on the readers again varies. Kindle always puts a huge indent, doesn’t skip a line between paragraphs. Kindle Fire does no indent, skips a line between paragraphs. iPhone does a small indent if you asked for that, doesn’t skip a line. Etc.
Put a Table of Contents into the DOC using the Word automatic TOC. If you’re planning to send an HTML to Amazon, you’ll need the Table of Contents in there, as the Amazon processor won’t build one for you.
If you are heading for EPUB, you’ll be building a Table of Contents from the headers in Calibre or Sigil, but it’s good to have a Table of Contents inside your document anyway. The thing is, some ereaders are unable to show the hidden EPUB table of contents.
Once you have your Word-made TOC, also apply a header style to the title of your book at the beginning, as this will make the EPUB-built TOC more useful. Or you can do this later in Sigil.
Cover
For the cover of your Ebook, use an external photo editor to create a cover.jpg file which is about 600 pixels by 800 pixels high. The cover obviously should show your name and the title of the book. Save it in your images mages subdirectory. In World use Insert|Picture to insert this image at the very start of your DOC.
If you want you can have a bigger version of your cover as well. When you upload to Amazon or B&N, they’ll as for a cover image, and you can upload the same one you use in the book or you can upload the bigger one.
Converting DOC to HTML
In Word, I save the Word file as a “Web Page, Filtered” or filtered HTML file. Filtered means there’s less Word crap in the file. And I open this file in Dreamweaver and use Commands | Clean Up Word HTML… to get rid of more Word crap. When I save the “Filtered HTML” file, Word makes a directory of extra files in my directory, but there’s actually nothing in that directory that matters, and you can delete it.
While you’re in HTML, look at the image links, and make sure they all point to file-names to the images subdirectory that you made. You may need to use the Edit | Find and Replace dialog to get things set.
You’ll have at least one image, your cover. It’s better not have the size of the images hardcoded. That way you’re free to force in larger images if you want. Use the Edit | Find and Replace dialog and for the Search: field select Specific Tag and set to img, then for the Action: field select Remove Attribute and set to width. Then do the same for height. Then do the same for border.
Fill in the Title field for your HTML in Dreamweaver.
Now we need a few more tweaks so that the HTML can be used to build an EPUB.
*Delete the body attributes link and vlink.
*Remove all the align attributes of the p tags
*Remove all the clear attributes of the br tags
*Search and replace to change every occurrence of a name= to a id= . These are the anchor tags that make your Word-built Table of contents work. The Epub standard likes the id attribute but not the name attribute, and either one works.
Your probably want to add some separators between your sections, especially at the start of the book. HTML runs all the pages together. To make a separator, at the top of your HTML file, inside the style block of definitions, add this line
hr {page-break-after:always;}
And then, wherever you want a break in your document, insert the symbol < then the letters hr then the symbol >, and you get a nice looking line like below, which plays the role of a page break. I can’t write out what you actually put, because then you just get the pagebreak line!
Cautionary note: Once you get your HTML file all tweaked, you’d better not save it from Word again, as Word may put in some of the crap that you just removed. I’m not positive about this, but at some point it seems safer to do any further edits in Dreamweaver.
Sending HTML to Amazon
I won’t go into the details about getting a KDP (Kindle Direct Publishing) account, and filling out all the dialogs. Let’s talk about how you send the file to Amazon. You’ll have an HTML file and a directory with some files, in particular with the image files (Word makes some files, but they don’t matter). In order to upload, I put my main HTML file and the directory with the images into a single zipped directory file and uploaded that and the Kindle meat grinder eats that fine. To zip the file and directory in my Windows machine I I highlighted them and right clicked and selected Send To and sent to a Compressed File.
Tweaking HTML and EPUB in Sigil
So now you’ve made an HTML file and cleaned it up. Open the HTML in Sigil. As soon as you open you file in Sigil, it’s converted into an EPUB file that you can save and distribute. But first do some tweaks.
Use the menu item View to put check marks on Book Browser and Table of Contents.
The Book Browser shows all the different components that are hidden inside your EPUB file.
The Table of Contents window should show a Table of Contents. You can create a Table of Con-tents by clicking Generate TOC from Headings, and then clicking OK in the Heading Selector. You can see if it worked by double clicking on some of the items in the Table of Contents box.
If you didn’t format your book title as a header in Word, you can still do that here in Sigil. Select the title and use the drop-down menu on the upper left corner of the Sigil window and apply a format such as Heading1.
Use the menu item Tools|MetaEditor… to fill in names for the Title, Author, and Language of your EPUB file.
In the Book Browser window in the left hand side of your Sigil window, find the Images directory and look for contents.jpg in there. Right click on it, select Add Semantics, and check Cover Image.
Now before you save and distribute your EPUB, use the Sigil Tools|Validate EPUB selection to see if you get errors. If you click on the error messages, you’ll see source code for the HTML, showing where the error lies.
If you don’t understand an error message paste it into the Google search box.
Fix all the changes, but do the fixes over in Dreamweaver, and then save the HTML and reload it in Sigil. You need the roundabout approach, as Sigil won’t save HTML for you. And it’s good to have the fixed HTML as a kind of base code file.
Once you get past the errors, save your EPUB file with a name with no spaces.
Validating with Epubcheck.
Install the Epubcheck ware on your computer, it probably ends up in Program Files\Epubcheck. Make a sample subdirectory of the Epubcheck directory and put a copy of your current EPUB file there. Suppose it’s called betterworlds.epub.
Then go to the Command Line interface for your computer, navigate into the directory where epubcheck lives, like to Program Files\Epubcheck. Now run a command like this:
java -jar epubcheck-3.0b2.jar sample/betterworlds1.epub
Of course the letters and numbers after epubcheck depend on which version of the software you have. And the name of the epub file depends on what file you’re checking.
If all goes well, epubcheck will either print a “No Errors Found” message, or it will spew out a lot of error messages. You can scroll up and down to see them all. Most common causes of errors are (1) you forgot to build a table of contents using the Sigil Table of Contents window, or (2) you didn’t fill in the Name, Title and Language fields using Sigil Meta tool, or (3) the Epub ware is confused because you gave your epub file a name with spaces in it. If you see an error you can’t understand, try copying into the Google search bar to see what other people say about it.
As before, do the fixes in Dreamweaver, save the fixed HTML, reload in Sigil, save off a fresh PUB and try epubcheck again.
Publish on Amazon and B&N
And maybe Lulu, as a way to get to iBook. And if Smashwords ever starts working again, maybe try that…if you care about being listed on the smaller distribuor sites like Diesel.