hpr2139 :: From Org Mode to LaTeX Beamer to PDF
My presentation pipeline
Hosted by Clinton Roy on 2016-10-13 is flagged as Explicit and is released under a CC-BY-SA license.
emacs, org mode, LaTeX, Beamer.
Listen in ogg, spx, or mp3 format. | Comments (2).
I have recently been fortunate enough to give a presentation to two conferences, PyCon Australia and Kiwi Pycon, the Australian and New Zealand Python conferences, respectively. I'm not going to give a talk based around the presentation, as it's rather code heavy, and we know that doesn't translate well to an audio medium.
Instead, what I wanted to do, was to talk a little bit about the presentation pipeline that I used to prepare this talk. The input is a plain text file, edited in Emacs, using a mode called Org mode. The intermediate form is a LaTeX file, using the document class Beamer which is designed for presentations that are going to be projected. Beamer is apparently the German word for digital projector. The final output form is a plain PDF.
HPR isn't known for having many Emacs talks, so I should probably explain the idea of modes. Emacs has major modes and minor modes. For every document that you're editing there's one major mode, and any number of minor modes. So if I was editing a Python file for example, I would have the Python major mode which understands Python and can thus do Python specific things like Python code completion, and I would have a spell checker minor mode to check the spelling of comments, and another minor mode to automatically line wrap comment lines that are very long, and another minor mode to show what line number I'm currently editing, and another minor mode to blink the cursor and so on.
The other topic that I haven't heard too much on is LaTeX. LaTex is the venerable typesetting solution for Unix based systems. LaTeX documents have a single document class, and then any number of packages. In the case of my presentation, the document class is Beamer, which sets up all the margins and fonts to be good for presentations. Some of the packages I'm using are the symbols package, for arrows and maths symbols, and several graphics packages so I can draw trees in my slides.
I'm fairly comfortable with LaTeX, I could certainly write this presentation directly in LaTeX, but I think there are some advantages in using Org mode to generate my LaTeX instead.
As the name suggests, Org mode is designed to be an organisational mode, helping you write TODO lists and organise documents. While the document is just a plain text document that you can read and write with any text editor, the Emacs Org mode understands its own mark up and provides an outlining mode, where you can hide and expand trees of bullet points. The basic layout of a set of slides for a presentation is a tree of bullet points, where the top level bullet points are slides, and the second level of bullet points are lists of information put into each slide.
Another mark up that Org mode understands is that of code blocks, so that we can easily say ``this chunk of code is a Python block''. Org mode understands how to export this Python code block as a separate file, run it under Python, and can even insert the output of the program, or the result of a function, back into the original document as a code output block.
The advantage of having just one file for my presentation, versus one file for my presentation and a separate file for each code block, is that the code examples in my presentation never get out of sync with the code that I'm actually running. This style of programming where the documentation is the primary document, and the code files are generated, secondary documents, is the inverse of the typical way of programming where the code documents are the primary documents, and documentation, the secondary documents, are automatically generated.
This style of programming, where the primary document is documentation is called literate programming. The process of creating the documentation (the PDF in my case) is called weaving. The process of creating the code files is called tangling.
I really like having just one file to generate one PDF presentation file, so I'm going to keep using this technique in the future.
Now, I have to admit that my presentation is not completely literate, there are some bits of output in my presentation that are copied and pasted, rather than automatically gathered, so I've still got some work to do.
Down to brass tacks. The conventional file name extension for Org mode files is dot org. The typical metadata you put in presentations are Author, Email, and Title. In mine I've also added Subtitle and Institute. Now, the interesting one here is Institute, for whatever reason, it's not a piece of metadata that Org mode knows about, but it's really easy to drop down into LaTeX and just use the LaTeX institute command directly.
There's a metadata line that Org understands called Options, I request that my presentation has a table of contents, and that all the bullet points of level two become line items in that table of contents. Then I'm straight into the slides. Bullet points at the first level are converted to sections, bullet points at the second level are turned into slides, and anything deeper than that are turned into contents of that slide. I have many code blocks, and I use options that specify what file this code block is tangled to, and to leave the white space alone when the code block is exported, as white space is critical to Python. I also turn on an option that gets line numbers printed for the code blocks. In a couple of places where I want to highlight certain areas of the code, I add labels to the code, then outside the code block I can refer to the label, and LaTeX will replace this with the line number. I think I'd prefer to do this referencing with highlighting, or an arrow or something, but I'm not sure I can do that.
Engineering is the process of dealing with tradeoffs to get something done, there are many trade offs when writing code to solve a problem, writing code for slides has quite a different set of tradeoffs, you want code to be easy to read, in terms of using long variable names, but you also need code blocks to contain as few lines as possible, so that you can use a large font size on the projector, and you also don't want to have to split an example across multiple slides if you can help it. I'm also of the view that syntax highlighting is a waste of time, it's just a pretty layer of obfuscation that the mind has to understand, then drop in order to actually see the code. This stance of mine was vindicated when several presenters with syntax highlighted code realised on the day that the projected code was impossible to read due to the low contrast projectors used in a reasonably well lit room.
One feature that I would like to add is the ability to reveal new code. It's quite common to have a code block, reveal a problem with it, and display the same code block again, but with a minor change that fixes the previously explained problem. Ideally the old code and new code would be rendered differently, but I don't think that's an option right now. The other thing that I couldn't work out was how to run custom programs on my code blocks, I was wanting to run the Python unit test program, not the Python interpreter, and could not find a way to do that.
There's a single command to run inside Emacs to create the output PDF,
So, overall, I'm very happy with this pipeline. It lets me have a primary document with code snippets, and it lets me have LaTeX snippets wherever I like. It's not perfect, but I'm hoping to find ways to improve it.