vmiklos.hu
shameless self-promoting website
»Root
»Rejourn root
»LibreOffice Community Blogs
Search:
Tags:
»hu (1201)
»en (586)
»hacking (416)
»film (286)
»libreoffice (136)
»geek (82)
»bringa (60)
»konyv (44)
»misc (37)
»zene (32)
»munka (28)
»fun (27)
»frugalware (23)
»sieles (7)
»bitlbee (5)
»gsoc2009 (4)
»git (3)
»szinhaz (3)
»gsoc2008 (3)
»opensuse (3)
»gsoc2011 (2)
»go-oo (2)
»xmlsec (2)
»kde (1)
»libwpd (1)
»karacsony (1)
»upc (1)
»google (1)
»fail (1)
»openstack (1)
»mdadm (1)
»w3c (1)
»java (1)
»okular (1)
»greasemonkey (1)
»auto (1)
»otrs (1)
»gsoc2010 (1)
»bme (1)
»python (1)
»howto (1)
»openoffice (1)
»networking (1)
»gpsbabel (1)
»supybot (1)
»lcov (1)
»nyaralas (1)
  • Thursday, 05 July 2018
    Improved ECDSA handling in LibreOffice

    I wrote about ECDSA handling in LibreOffice last year, back then the target was to be able to verify signatures using the ECDSA algorithm on Linux.

    Lots of things happened since then, this post is meant to summarize those improvements. My personal motivation is that Hungarian eID cards come with a gov-trusted ECDSA (x509) cert, so handling those in LibreOffice would be nice. My goals were:

    • platforms: handling Windows as well, not only Linux/macOS

    • operations: handling signing as well, not only verification

    • formats: cover all of ODF, OOXML and PDF

    Let’s see what has happened:

    • Linux, ODF, sign: we had hardcoded RSA algorithm when generating a signature, now we infer the sign algorithm from the signing cert algorithm (commit)

    • Linux, OOXML, sign: same problem with hardcoded RSA (commit)

    • Windows, PDF, sign: the certificate chooser had to be ported to from CryptoAPI to CNG (commit)

    • Windows, ODF, verify / sign: this was the largest problem, this required a whole new libxmlsec backend (interface, implementation, all in C90) and also required conditionally using that new backend in LibreOffice (commit)

    • Windows, OOXML, sign: this was almost functional, except that the UI recently regressed, now fixed (commit)

    • Finally now that everything is ported on Windows to use CNG, I could enable it by default yesterday.

    I could test hardware-based signing after this, which was fine out of the box on both platforms. Some screenshots:

    • Linux:

    https://farm2.staticflickr.com/1784/29321634078_8818b2d7ba_o.png
    • Windows:

    https://farm1.staticflickr.com/927/42288765505_db72ee48f2_o.png

    (There is no reason why this would not work on macOS, but I did not test that.)

    Thanks Gabor Kelemen who helped me to get a sane card reader that has reasonable driver support on Linux.

    All this is available in master (towards LibreOffice 6.2), or you can grab a daily build and try it out right now. :-)


  • Tuesday, 05 June 2018
    Editing ReqIF-XHTML fragments in LibreOffice Writer

    I worked on a small feature to use Writer as an editor for the XHTML fragments inside Requirements Interchange Format (ReqIF) files. First, thanks to Vector for funding Collabora to make this possible.

    Writer already supported XHTML import and export before (see my previous post) as a special mode of the HTML filter, this work builds on top of that. The main speciality around XHTML as used for fragments inside a ReqIF file is embedded objects.

    The special mode to opt-in for ReqIF-XHTML behavior can actived like this:

    • during import: --infilter="HTML (StarWriter):xhtmlns=reqif-xhtml"

    • during export: -convert-to "xhtml:HTML (StarWriter):xhtmlns=reqif-xhtml"

    Three different cases are handled:

    1. Image with native data we don’t understand and just preserve.

    2. Image with OLE2 data, which we hand out to external applications (at least on Windows). On the above video this is an embedded PPSX file, handled by PowerPoint.

    3. Image with ODF data, which we handle internally. This is a Draw document on the above video.

    Regarding how it works, the import is a series of unwrapping containers till you get to the real data and the export is the opposite of this. Here are the layers:

    • Larger ReqIF files have the .reqifz extension, and are ZIP files containing an XML file, having the XHTML fragments. This is not relevant for this post, as Writer assumes that extracting the XHTML fragment from ReqIF is done before you load the content into Writer.

    • XHTML always has a PNG image for the object, and optionally it has RTF as native data for the object.

    • The RTF file is a fragment, containing just an embedded OLE1 container.

    • The OLE1 container is just a wrapper around the real OLE2 container.

    • The OLE2 container either has the data directly or MSO has a convention on how to include OOXML files in it (see the PPSX example above), and we handle that.

    On export we do the opposite: save the file, put it into OLE2, then into OLE1, then into RTF, finally into XHTML.

    There is no specification on how to put ODF files into OLE2, so I extracted the relevant code from LibreOffice’s binary MSO filters and now the Writer HTML filter uses that as well. This avoids code duplication and also could avoid inventing some new markup this way.

    All this is available in master (towards LibreOffice 6.2), or you can grab a daily build and try it out right now. :-)


  • Friday, 04 May 2018
    Lazy reading images from Microsoft formats in LibreOffice

    I worked on improving document load performance of Microsoft formats in general, and DOC/DOCX in particular in LibreOffice recently. First, thanks to TDF and users that support the foundation by providing donations for funding Collabora to make this possible.

    I built on top of the great work of Tomaz, focusing on these secondary, but important formats.

    The idea is that if you load an Microsoft binary or OOXML file, it should not be necessary to parse all images at load time, it’s enough to lazy read it when we first render e.g. a Writer page containing that image.

    The focus here was documents containing large images. I tested with an Earth photo of size 8000x8000 pixels from NASA, making little modifications to it, so each picture has a different checksum, embedding them into a binary DOC file.

    https://farm1.staticflickr.com/980/41838412652_c1cbefcfc1_o.png

    I measured the time from the soffice process startup to rendering the first page. We defer the work of loading most images now, as you can see on the chart. In contrast, we used to decompress all images on file import in the past. This means the new cost for e.g. 4 images is 37% of the original.

    All this is available in master (towards LibreOffice 6.1), or you can grab a daily build and try it out right now. :-)


  • Tuesday, 10 April 2018
    LibreOffice Hamburg Hackfest 2018

    (via Sweet5hark)

    I arrived home from Hamburg yesterday where I participated in the LibreOffice hackfest over the weekend as a mentor. First, thanks to The Document Foundation — and all the donors for funding Collabora to make this possible.

    There were a few topics I mentored:

    • Patrick was interested fixing tdf#116486, which required some background knowledge on the Writer document model and layout, so we explored the relevant details together towards providing an actual patch for the bug.

    • Nithin wanted to fix tdf#112384, which turned out to be an ideal task for a hackfest. On one hand, the scope is limited so that you can implement this mini-feature over a weekened. On the other hand, it required touching various parts of Writer (UI, document model, UNO API, ODF filter), so it allowed seeing the process of adding a new feature. The patch is merged to master.

    • Linus looked for a task that is relatively easy, still useful, we looked at tdf#42949, and he identified and removed a number of unused includes himself. This should especially help with slow incremental builds. Again, the patch is already in master.

    • Zdeněk (raal) wanted to write a uitest for tdf#106280 so we were figuring out together how to select images from pyuno and how to avoid using graphic URLs in uitests in general.

    The full list of achievements is on the wiki, if you were at the hackfest and you did not contribute to that section, please write a line about what did you hack on. :-)

    Finally, thanks for the organizers and the sponsors of the hackfest, it was a really great event!


  • Friday, 02 March 2018
    Optimizing ODT ↔ XHTML conversion performance for simple documents

    I worked on improving the ODT ↔ XHTML conversion performance for simple documents in LibreOffice recently. First, thanks to Vector for funding Collabora to make this possible.

    ODT → XHTML conversion

    https://farm5.staticflickr.com/4605/26697712598_2ace3f45a3_o.png

    The focus here was really simple documents, like just one sentence with minimal formatting. The use-case is to have thousands of these simple documents, only a minority containing complex formatting, the rest is just that simple.

    Performance work usually focuses on one specific complex feature, e.g. lots of bookmarks, lots of document-level user-defined metadata, and so on — this way there were room for improvements when it comes to trivial documents.

    I managed to reduce the cost of the conversion to the fifth of the original cost in both directions — the chart above shows the impact of my work for the ODT → XHTML direction. The steps that helped:

    • Recognize XHTML as a value for the FilterOptions key in the HTML (StarWriter) export filter, this way avoid the need to go via XSLT, which would be expensive.

    • Add a new NoFileSync flag to the frame::XStorable::storeToURL() API, so that if you know you’ll read the result after the conversion finished, you can avoid an expensive fsync() call for each and every file, which helps HDDs a lot, while means no overhead for SSDs.

    • If you know your input format already, then specifying an explicit FilterName key for the frame::XComponentLoader::loadComponentFromURL() API helps not spending time to detect the file format you already know.

    Note that the XHTML mode for the Writer HTML export is still a work in progress, but it already produces valid output for such simple documents.

    XHTML → ODT conversion

    https://farm5.staticflickr.com/4608/39674632615_de78265c7f_o.png

    The chart above shows the results of my work for the XHTML → ODT direction. The steps to get to the final reduced cost were:

    • The new NoFileSync flag, as mentioned previously.

    • A new NoThumbnail flag, which is useful if the ODT will be part of a next step in the pipeline and you know that the thumbnail image won’t be used anyway.

    • The default table autoformat definitions in Writer are now lazy-loaded. (This is my favorite one, you don’t have to opt-in for this, so everyone benefits.)

    • A new HiddenForConversion flag for frame::XComponentLoader::loadComponentFromURL(), which means we don’t lay out the UI elements (toolbars, sidebar, status bar, etc.) when we know the purpose of the document load is only to save the document model in an other format.

    All this is available in master (towards LibreOffice 6.1), or you can grab a daily build and try it out right now. :-)


  • Sunday, 04 February 2018
    EPUB export in LibreOffice Writer FOSDEM talk

    Yesterday I gave an EPUB export in LibreOffice Writer FOSDEM talk at FOSDEM 2018, in the Open document editors developer room. The room was well-crowded — perhaps because the next talk was about LibreOffice/Collabora Online. ;-)

    Quite some other slides will be available on Planet I expect, don’t miss them.


  • Thursday, 11 January 2018
    EPUB3 export improvements in Libreoffice Writer, take two

    I worked on improving the EPUB3 export filter in LibreOffice further recently. First, thanks to Nou&Off in cooperation with a customer who made this work possible. Since the previous blog entry there have been a number of improvements around a next set of topics.

    Cover images

    https://farm5.staticflickr.com/4760/38920770224_b247fa89c4_o.png

    It is now possible to specify a cover image for the exported EPUB file. Given that a cover image is not naturally part of the Writer document model, I introduced the concept of a media directory for the EPUB export. The media directory is a directory next to the source file, with the <file name without extension> name. If that directory contains a file named cover.svg (or .gif, .jpg, .png), the exporter will automatically use it. Otherwise you can customize this default.

    The picture shows two EPUB files in Readium with different cover images.

    Improved metadata support

    https://farm5.staticflickr.com/4603/38920770174_142950782e_o.png

    It’s quite frequent that you are technically author of a document, but the logical author of the book is somebody else. Same for the date of the book, and so on. So the EPUB export dialog now has support for overwriting the defaults coming from the Writer document model. For mass-conversion of documents it’s possible to place a <file name without extension>.xmp file in the media directory and XMP metadata from that file will also overwrite metadata coming from the document model.

    The picture shows the extended EPUB export options dialog.

    Footnotes and image popups

    https://farm5.staticflickr.com/4612/38920770144_e90e2a8e92_o.png

    I’ve added support for footnotes. As a special case of this, image popups on images and text is now supported. This works by placing a relative link on a text portion or on an image, and placing an image with the same name (e.g. in high resolution) in the media directory. In this case the EPUB export will bundle the image from the media directory inside the EPUB file and clicking on the text or image will open the bundled image in a popup (or in some other container, depending on how your reader interprets footnotes).

    The picture shows such a popup in Microsoft Edge.

    Fixed layout

    https://farm5.staticflickr.com/4604/38920770104_108465bda1_o.png

    The EPUB3 fixed layout is quite similar to PDF, just it is built on top of XHTML and SVG. Possible use-cases for this can be:

    • exporting a document where presenting the content as reflowable text would be misleading (e.g. comic books), but the publisher of the book only works with EPUB (reflowable or fixed layout, but no PDF)

    • printing (again, in case for some reason you want to avoid PDF)

    These might be very specific situations, but luckily supporting them is not too complex. I implemented an approach very similar to the PDF export, where we export individual pages of the Writer document’s layout as a metafile, and then consume that — this time with the SVG export. Building on top of the existing Writer layout and SVG export means the hard work is really done by these components, the EPUB fixed layout export just puts these together.

    The picture shows a Writer document with a table of contents containing page numbers, a header and a footer in Readium.

    All this is available in master (towards LibreOffice 6.1), or you can grab a daily build and try it out right now. :-)


  • Tuesday, 05 December 2017
    EPUB3 export improvements in Libreoffice Writer

    I worked on improving the EPUB3 export filter in LibreOffice recently. First, thanks to Nou&Off in cooperation with a customer who made this work possible. Since the previous blog entry there have been a number of improvements around 4 topics.

    https://farm5.staticflickr.com/4540/38847800651_d5271ced3a_o.png

    The character properties of link text is now handled correctly, in the above example you can see that the text is red, and this comes from a character style.

    Improved table support

    Previously the support for tables was there just to not loose content, now all kinds cell, row and table properties are handled correctly. A few samples

    • custom cell width:

    https://farm5.staticflickr.com/4566/38847800611_38b8483d7f_o.png
    • custom row height:

    https://farm5.staticflickr.com/4580/38847800521_26285a9152_o.png
    • row span:

    https://farm5.staticflickr.com/4540/38847800461_359651bc3d_o.png

    So the table support should be now decent, covering row and column spanning and various cell border properties.

    Improved image support

    Previously only the simplest as-character anchoring was supported. Now much more cases are handled. Two examples:

    • image borders:

    https://farm5.staticflickr.com/4541/24975193838_94818bd1ed_o.png
    • image with a caption:

    https://farm5.staticflickr.com/4568/24975193608_83239bf287_o.png

    This includes various wrap types (to the extent HTML5 allows representing ODF wrap types).

    Font embedding

    If the user chooses to embed fonts (via File → Properties → Font → Embed), then the EPUB export now handles this. Here is a custom font that is typically not available:

    https://farm5.staticflickr.com/4561/38847800811_613d6fbbd2_o.png

    (The screenshot is from the Calibre ebook reader.)

    All this is available in master (towards LibreOffice 6.1), or you can grab a daily build and try it out right now. :-)


  • Friday, 10 November 2017
    Basic EPUB3 export in Libreoffice

    https://farm5.staticflickr.com/4577/37588898064_117dc4a933_o_d.png

    I worked on a new EPUB3 export filter in LibreOffice recently. First, thanks to Nou&Off in cooperation with a customer who made this work possible. The current state is that basic features work nicely to the extent that the filter is probably usable for most books (they typically mostly have just text with minimal formatting), so this post aims to explain the architecture, how the various pieces fit together.

    The above picture shows the building blocks. The idea is that nominally EPUB is a complete export filter, but instead of doing all the work, we offload various sub-tasks to other modules:

    • First we invoke the existing (flat) ODT export, so we can work with ODF instead of with the UNO API directly. This will be useful in the next step.

    • Then we feed the SAX events from the ODT export to a new librevenge text export. Given that the librevenge API is really close to ODF (and xmloff/ has quite some code to map the UNO API to ODF), here it pays off to work with ODF and not with the UNO API directly.

    • The librevenge text export talks to a librevenge generator, which is David Tardon’s excellent libepubgen in this case.

    • Finally libepubgen calls back to LibreOffice, and our package code does the ZIP compression.

    The setup is a bit complicated, but it has a number of advantages:

    • Instead of reinventing the wheel, LO and DLP now shares code, libepubgen is now a dependency of LibreOffice.

    • libepubgen doesn’t bring its own ZIP writer code, it can nicely reuse our existing one.

    • This is a great opportunity to finally write an ODT→librevenge bridge, so other DLP-based export libs can be added in the future (e.g. librvngabw).

    • If we ever want to export to EPUB from Draw/Impress, libepubgen will help us there as well.

    As a user, here is a list of features you can expect working:

    • plain text should work fine (formatting may be lost, but content should be fine)

    • table of contents, as long as you properly use headings or you separate chapters by page breaks

    • export options: EPUB3 vs EPUB2, split on headings vs page breaks

    • basic set of character and paragraph properties should work

    During development I regularly used epubcheck, so hopefully the export result is usually valid.

    All this is available in master (towards LibreOffice 6.0), or you can grab a daily build and try it out right now. :-)


  • Friday, 13 October 2017
    A year in LibreOffice’s PDF support LOCon talk

    A year in LibreOffice’s PDF support was a talk I gave today at LibreOffice conference 2017. Given that this was one of the last talks at the whole conference, thanks to the ones who still did not go home, but listened. :-)


more »