Index ¦ Archives ¦ RSS

Basic EPUB3 export in Libreoffice

Estimated read time: 2 minutes

https://farm5.staticflickr.com/4577/37588898064_117dc4a933_o_d.png

I worked on a new EPUB3 export filter in LibreOffice recently. First, thanks to Nou&Off in cooperation with a customer who made this work possible. The current state is that basic features work nicely to the extent that the filter is probably usable for most books (they typically mostly have just text with minimal formatting), so this post aims to explain the architecture, how the various pieces fit together.

The above picture shows the building blocks. The idea is that nominally EPUB is a complete export filter, but instead of doing all the work, we offload various sub-tasks to other modules:

  • First we invoke the existing (flat) ODT export, so we can work with ODF instead of with the UNO API directly. This will be useful in the next step.

  • Then we feed the SAX events from the ODT export to a new librevenge text export. Given that the librevenge API is really close to ODF (and xmloff/ has quite some code to map the UNO API to ODF), here it pays off to work with ODF and not with the UNO API directly.

  • The librevenge text export talks to a librevenge generator, which is David Tardon’s excellent libepubgen in this case.

  • Finally libepubgen calls back to LibreOffice, and our package code does the ZIP compression.

The setup is a bit complicated, but it has a number of advantages:

  • Instead of reinventing the wheel, LO and DLP now shares code, libepubgen is now a dependency of LibreOffice.

  • libepubgen doesn’t bring its own ZIP writer code, it can nicely reuse our existing one.

  • This is a great opportunity to finally write an ODT→librevenge bridge, so other DLP-based export libs can be added in the future (e.g. librvngabw).

  • If we ever want to export to EPUB from Draw/Impress, libepubgen will help us there as well.

As a user, here is a list of features you can expect working:

  • plain text should work fine (formatting may be lost, but content should be fine)

  • table of contents, as long as you properly use headings or you separate chapters by page breaks

  • export options: EPUB3 vs EPUB2, split on headings vs page breaks

  • basic set of character and paragraph properties should work

During development I regularly used epubcheck, so hopefully the export result is usually valid.

All this is available in master (towards LibreOffice 6.0), or you can grab a daily build and try it out right now. :-)

© Miklos Vajna. Built using Pelican. Theme by Giulio Fidente on github.