hpr3367 :: Making books with linux - part 1

A discussion about assembling books using simple tools commonly found in most linux distros.

Hosted by Andrew Conway on Tuesday, 2021-06-29 is flagged as Clean and is released under a CC-BY-SA license.
Tags: linux, books, ebooks, scripts. Comments: 2.

Listen in ogg, opus, or mp3 format. Play now:

Duration: 00:56:07
Download the transcription and subtitles.

Part of the series: general.

Andrew and Dave describe a common itch they have been scratching. Andrew talks through his approach to document creation in this episode and Dave will describe his in the next episode.

Andrew was inspired by a simple and elegant approach to eBook creation by Jon Kulp, possibly from listening to HPR 1909 several years ago.

In Andrew's approach, bash and python scripts assemble various text files into the book, inserting figures and tables using a simple home-brew tag system to generate reference numbers such as Figure 3.7 or Table 2.2. Such auto-numbering functionality is of course provided by many other document authoring systems, such as LaTeX, but the script also uses the tags to hunt down data in CSV files and convert them into the figures. In this way, nearly all information in the book can start off as text and then be processed into anything — prose, graphics, sound or even movies — that can be included with HTML. Also a clean separation between content and appearance is kept by using a CSS file.

This is not WYSIWYG (what you see is what you get) but using the entr command to monitor file changes can allow auto-generation of the HTML and even a browser refresh (using a feature found in Midori and Falkon but not many other browsers).

Dave describes how he achieves something similar to what Andrew has created by using make to co-ordinate the processing. The process of compiling the source text files into a final document does have some similarities with code compilation.

Dave and Andrew discuss how useful their methods might be to others. Some of Andrew's scripts are too bespoke to his use for wider consumption but the figure processing code is available online as part of the content and code of his book How Scotland Works.

Andrew describes the horror of the suggestion that a non-fiction book does not need an index which prompted him to create his simple code to generate an index from a PDF. This was also motivated by laziness and a reluctance to read his own writing for the umpteenth time. Andrew then describes how this code works. The code itself can be found here.

Dave brings up the issue of other formats such as epub which have no concept of pages, or at least do not insist on it natively.

The discussion moves on to other tools for document and text processing that are relevant to the tasks involved such as pandoc, LaTex and ASCIIdoc. In particular, Dave mentions that the "look" of LaTeX is simpler to control these days, at least as compared to the 1990s!

Comments

Comment #1 posted on 2021-07-11 11:52:28 by Jon Kulp

Page numbers

I've been away from HPR for ages but checked back in this morning and found this show about ebooks. Loved it, and thanks for the mentions. The discussion about page numbers prompted me to look into the issue because it's something that's bugged me for a long time. I was pretty sure there was support for specifying page numbers in the EPUB3 standard, but I've never gotten into the weeds and figured it out. For fiction it doesn't really matter, but as you discuss, page numbers from the physical books are still pretty important in academia where we are expected to cite our sources. I took a couple of hours this evening and learned how to embed page numbers, and tomorrow I'll record a response episode to share how it works. There's good news and bad news involved...

Comment #2 posted on 2021-07-16 13:37:30 by dangerseeker

Fonts and LaTeX

Fonts were a problem for LaTeX in the early days, because Mr. Knuth invented his own (high quality) system to describe fonts.
Later (with PDFlatex, I guess) it became possible to use PostScript fonts directly.
But PS fonts are expensive, and on Windows PostScript fonts were never really used widely.
And then Microsoft "invented" TrueType fonts...

With ubiquitous cheap (and not always high quality) TTFs there was a growing need to use TTF in LaTeX: It seems like pdflatex can make use of fonts in the TTF format, but I have not tried it myself.

TODAY luatex/lualatex can not only use TTF but also the even newer OTF fonts with very little problems. It works, but ...

The goal of (La)TeX was to produce HIGH QUALITY documents, that's why the default is EXTREMELY high quality and changing things is hard.
With Microsoft products it is the rule to produce VERY LOW QUALITY documents and it is easy to change things to "comic sans" or worse...

Well, with luatex I now can take part in the low quality document revolution. ;-)

Leave Comment

Note to Verbose Commenters
If you can't fit everything you want to say in the comment below then you really should record a response show instead.

Note to Spammers
All comments are moderated. All links are checked by humans. We strip out all html. Feel free to record a show about yourself, or your industry, or any other topic we may find interesting. We also check shows for spam :).

Your Name/Handle:
Title:
Comment:
Anti Spam Question:	What does the letter P in HPR stand for?
Are you a spammer?	Yes No
Who is the host of this show?
What does HPR mean to you?