Index ¦ Archives ¦ RSS

Handling PDF digital signatures with PDFium FOSDEM talk

Estimated read time: 2 minutes

Figure 1. Slides of the talk

The next step in the recent PDFium-based signature verification story is my Handling PDF digital signatures in LibreOffice with PDFium talk at FOSDEM 2021, in the LibreOffice devroom (pre-recorded video). The talk gives you an overview of digital signing in general, all the ODF/OOXML/PDF handling, signing/verification, various other related past Collabora projects, and then goes into details regarding how PDFium was improved and is used to do a better PDF signature verification in LibreOffice when opening PDF files in Draw.

The virtual room had around 150 participants and the Matrix based online conference was well-organized. Speakers even got a free t-shirt before the event, I appreciated the "bring your own beer" joke :-)

An other benefit of this unusual setup was to avoid the dreaded room is full problems, where you carefully selected a talk to attend and then failed to hear it.

I expect quite some other slides from other Collaborans and the wider community will be available on Planet, don’t miss them.

Want to start using this?

You can get a snapshot / demo of Collabora Office and try it out yourself right now: try unstable snapshot. Collabora intends to continue supporting and contributing to LibreOffice, the code is merged so we expect all of this work will be available in TDF’s next release too (7.2).


Shadow for tables from PPTX in Impress

Estimated read time: 2 minutes

Impress now has much better support for the shadow of table shapes: not only shape styles can result in table shadows, but it’s also possible to add this as direct formatting. Also the shadow result is PowerPoint-compatible in the direct formatting case.

First, thanks to our partner SUSE for working with Collabora to make this possible.

Motivation

We got a PPTX document, which has a table shape with blurry shadow. The shadow was completely missing in Impress. It was discovered that in case you configure the default shape style to have shadow, then there is some initial table shadow support in Impress, but that was not used in the PPTX case.

The request was to improve the shadow rendering to be PowerPoint-compatible and in general support table shadows as direct formatting as a new feature.

Results so far

The table shadow now looks like this:

https://share.vmiklos.hu/blog/sd-table-shadow/new.png
Figure 1. New render result in Impress

Matching the reference rendering:

https://share.vmiklos.hu/blog/sd-table-shadow/ref.png
Figure 2. Reference render result

While shadow was just missing previously:

https://share.vmiklos.hu/blog/sd-table-shadow/old.png
Figure 3. Old render result in Impress

You can see that not only the shadow is there, but also the cell backgrounds and the blurry shadow is rendered properly.

How is this implemented?

If you would like to know a bit more about how this works, continue reading… :-)

As usual, the high-level problem was addressed by a series of small fixes:

With these, it’s now possible to add, edit, render and delete these table shadows, while preserving them during ODP and PPTX import/export.

Want to start using this?

You can get a snapshot / demo of Collabora Office and try it out yourself right now: try unstable snapshot. Collabora intends to continue supporting and contributing to LibreOffice, the code is merged so we expect all of this work will be available in TDF’s next release too (7.2).


Better PDF signature verification in Draw

Estimated read time: 2 minutes

Draw now has much better support for detecting unsigned incremental updates between signatures at the end of PDF documents. We now also make sure that incremental updates introduced for adding signatures really just add annotations and don’t change the actual content.

Motivation

There has been a recent evaluation of PDF signature verification, which included Draw. While we got a checkmark on their Shadow Hide test, their Shadow Replace test found conditional problems and their Shadow Hide-and-Replace test was not happy, either.

So time to look at what are those corner-cases and how the situation can be improved, so people keep trusting that if Draw says a signature is valid, it’s indeed valid.

Results so far

There were 4 incremental improvements in this area:

These were enough so that talking to the authors of that evaluation now confirmed that these problems are all gone.

How is this implemented?

If you would like to know a bit more about how this works, continue reading… :-)

PDF signature verification works by using a custom PDF tokenizer. You can read about that code in the PDF tokenizer section of this post. The bottom line is that we now have both PDFium and this custom tokenizer, somewhat duplicating the functionality.

After talking to the PDFium developers (see the relevant mailing list thread), there were open regarding adding all the high level API to allow PDF signature verification based on PDFium, and not via our own tokenizer. See this header file for the set of relevant APIs added. A combinations of those allowed to adapt the code on our side and use PDFium for signature verification, not the own tokanizer.

Want to start using this?

You can get a snapshot / demo of Collabora Office and try it out yourself right now: try unstable snapshot. Collabora intends to continue supporting and contributing to LibreOffice, the code is merged so we expect all of this work will be available in TDF’s next release too (7.1).


Effecting code review and backporting for Collabora Online

Estimated read time: 3 minutes

Collabora Online now has a ./g script that tries to bring some of the Gerrit-based review benefits to a workflow based on GitHub.

Motivation

Collabora Online is on GitHub, but core.git is still on Gerrit, so it made sense to spend some time on a small shell script that gives you review and backport experience that is closer to Gerrit than the stock GitHub workflow.

How we use GitHub

Most Online committers push their code for review directly to online.git, to private namespaced branches, like private/kendy/master, then a pull request can be created to get commits from that branch into master after CI, review, etc. This workflow has the benefit that you don’t have to deal with the complexity of multiple repos.

Next to master, there are distro branches like distro/collabora/co-6-4, we may or may not want to backport the contents of a PR to that stable branch.

It’s important that Gerrit used to have a git review command to just submit your changes for review, without asking anything. That explains why the stock GitHub workflow where you need to name the source branch of your PR feels unnecessary complexity. Creating the PR by visiting a webpage is again something we want to avoid. Not to mention open questions like should you delete your source branch after a PR is merged? Should you delete your source repo?

On the other hand, we’re interested in GitHub’s ability to have multiple commits in a PR: Gerrit forces to have one commit per change. The GitHub way encourages developers to split changes into more commits, now that the review and CI cost won’t increase just due to such splits.

Submit for review

The happy path is when you have one or more local commits and you want to submit it for review. In this case now you can do:

./g review

And the script will figure out that you want to push your local branch to a remote branch like private/kendy/master and also create a pull request for you, printing its URL.

In case that branch already exists then you need to specify a name:

./g review myhack

So parallel reviews are possible, but only the first gets an inferred name. Both cases need no clicking in a browser, thanks to gh.

Submit a backport

The easiest case is when you can assume that the master branch and the distro branch is so close to each other that there won’t be conflicts to be resolved. In that case, you can do:

./g backport distro/collabora/co-6-4 790

to pick all commits from PR 790 (which is already merged to master) to a distro branch.

Again, you can have multiple backports in progress, e.g. you can do:

./g backport distro/collabora/co-6-4 790 myhack

If the default name is already used. The backport syntax is a bit longer, so you can always just type ./g backport and you’ll get the usage.

This second command is a bit more complicated, as gh has no trivial way to expose what is the commit range of a PR. But there is gh api graphql which can do arbitrary GraphQL queries, which provide this information. At this stage it may make sense to just rewrite the whole ./g script in e.g. Python, but till that happens, we parse the output of the query using jq.

Finally, if you do have conflicts or you want a local build test / manual test before submitting, you can always check out the distro branch manually, cherry-pick there and use plain ./g review to submit your backport for review.

Want to start using this?

You can go to the Collabora Online community website and see how to build the code. Then you may want to solve an easy hack, finally submit your commit for review either by using the above method or whichever way you prefer contributing to GitHub projects.


Better handling of cached field results in Writer

Estimated read time: 2 minutes

Writer now has much better support for preserving the cached result of fields in documents. This is especially beneficial for Word formats where the input document may have a field result which is not only a cache, but re-calculating the formula would yield a different result, even in Word.

Motivation

A Collabora Office customer gave us a DOCX document, which is essentially a calendar for planned IT maintenance windows at some organization. These calendars are tables with fields in it. The document is halfway through towards changing it to a newer year: the formulas are already changed to calculate a newer year, but all the cached field results are still for the old year.

The request was to keep showing these results and not throw them away during save, either. Their primary workflow is to fill the calendar with manual entries, not to tweak the calendar layout itself.

Results so far

The calendar now looks like this:

https://lh3.googleusercontent.com/6o7pvix-dJ9QhCX65FUkWeQZ60B89sHqDpBvd7WVRLtAzBW1323odrQ13aV_CgEFvgh7Iee-ePq95oPOf1Q-jMxvX1MBsz9FhgKd9vymyrdMBIZbF459hNKE1dM4XLcwXkGYh8ksmok=w1920
Figure 1. New render result in Writer

Matching the reference rendering:

https://lh3.googleusercontent.com/GJd2zcnspXDb7Wa2p32TInf9C8MAgt92h3G6PYuUwUvpQi5f3AdRbl5yGq8FN7kUPMcZwuFpohTKmX33s8u-AxFSO9rZFgH4X-fwrg8jShtJoA1KyGws_-ymUvINmK-5xo2_hd7YmLI=w1920
Figure 2. Reference render result

While it looked like a broken calendar previously:

https://lh3.googleusercontent.com/bpOVqcZX2CcKouuADNyPx1PMyI3I6CyjIDIAnUbylsT-ZimxSkPcUaRbMDd8MzHlG3Uqw2d-TunD4m7U4DUlm_O_esJt6CAY-H7Z5tdQxZ6q_MYxgJphutr_-JRVYh8uLmspiiI532U=w1920
Figure 3. Old render result in Writer

You can see that the day numbers were broken previously and now they line up properly.

How is this implemented?

If you would like to know a bit more about how this works, continue reading… :-)

As usual, the high-level problem was addressed by a series of small fixes:

With these, it’s now possible to edit these calendars, without breaking the fields which provide the day numbers.

Want to start using this?

You can get a snapshot / demo of Collabora Office and try it out yourself right now: try unstable snapshot. Collabora intends to continue supporting and contributing to LibreOffice, the code is merged so we expect all of this work will be available in TDF’s next release too (7.1).


Detecting 0-byte files based on extension in Impress and elsewhere

Estimated read time: 3 minutes

Impress (and Writer and Calc) now has support for detecting 0-byte files on open/import based on their extension. This builds on top of the previous language-independent template improvements. This means that e.g. a 0-byte PPTX file will open as an empty Impress presentation, not in Writer.

Motivation

We regularly see customers wanting minimal templates, which are language independent and have no content. Such files are handy if your workflow is to first name an empty document (create it) and only then edit it (and not the other way around: first create the document, then save it by giving it a name). This is easy for .txt files: if it’s zero bytes, it’s empty. But then this approach is also expected to work for other file formats as well, where our original approach was more technical: if it’s an empty file, that that can be only plain text, so we (almost) always opened it in Writer, not matching the user expectations.

Instead of explaining the problem to people again and again (that a literally empty PPTX file is not a PPTX template), there is value in just adapting the code instead to "do what I mean".

Results so far

An empty PPTX file is now handled like this:

https://lh3.googleusercontent.com/zk3b0f2Rx3t5vFVuKiimujSJWYwPNH05PCf5Indih3OwMDeBrOUH1X7N22PO46kIbxTVzI0V3IV-bE0sMycTHGj2eRqKT6K7eQkZ0Py9QVCPIhV0pdKdGPLGH08xpw72wFQ-3eGyX4k=w1920
Figure 1. Empty PPTX file opening in Impress

You can see this is no longer opening in Writer as plain text but in Impress, which is clearly a less surprising behavior.

Here is what happens if you open an empty DOTX (template):

https://lh3.googleusercontent.com/cVB_kK2wDyNIJjLt9v9UcNS4AagRCifwBofp70mHfNVzopvrN1cxcsVLhWfEArhab_PwSFkAvLlMUS1witevRcKeEn9UXYtw5o4VeGSztvnNUi6YMtR3t2DUIu1k2LLOUhnpckAnrwQ=w1920
Figure 2. Empty DOTX file creates a new Writer document

You can see that it is even recognized that this is a template format, so a new document is created, not the template itself is opened for editing.

How is this implemented?

If you would like to know a bit more about how this works, continue reading… :-)

You can see the code change in this commit. First, we restrict this trick to file URLs, and also to empty files.

Second, we look at the extension of the file and try to match an import filter that usually handles that extension. This helps, because then nominally the correct filter will be used for the import, so save will not ask for a filename (as it happens for new documents), but it will know what target filename and export filter to use.

Finally we need to avoid actually invoking the import filter, because no file content is not something an import filter has to handle if its filter detection would reject the file. (E.g. PPTX is expected to be a valid ZIP file.) This is important, because we want to avoid touching each & every file filter to not fail for empty file content — instead we want to handle this centrally, at a single place.

Want to start using this?

You can get a snapshot / demo of Collabora Office and try it out yourself right now: try unstable snapshot. Collabora intends to continue supporting and contributing to LibreOffice, the code is merged so we expect all of this work will be available in TDF’s next release too (7.1).


OOXML / PDF Digital Signing in Draw and elsewhere conference talk

Estimated read time: 1 minutes

Today I gave a OOXML / PDF Digital Signing in Draw and elsewhere talk at the LibreOffice Conference 2020. The (virtual) room was well-crowded — somehow people find digital signatures interesting. ;-)

It contains an overview of the ODF/OOXML/PDF signing feature set and also details the latest improvements, like visible PDF signing.

I expect quite some other slides from other Collaborans and the wider community will be available on Planet, don’t miss them.

You can get a snapshot / demo of Collabora Office and try the presented features out yourself right now: try unstable snapshot. Collabora intends to continue supporting and contributing to LibreOffice, the code is merged so we expect all of this work will be available in TDF’s next release too (7.1).


SmartArt improvements in Impress, part 6

Estimated read time: 3 minutes

Impress now has support for an improved auto-fit-of-text layout across multiple shapes, also the snake algorithm now handles width requests from constraints much better for SmartArt graphics from PPTX files. This builds on top of the previous improvements around SmartArt support.

First, thanks to our partner SUSE for working with Collabora to make this possible.

Motivation

SmartArt allows declaring your content and requirements for a graphic, then the layout will take care of arranging that in a suitable way. It is allowed to ask for an automatic font size, which is small enough so that all the content fits into the shape. At the same time, you can ask that the font size is the same in multiple shapes. Impress lacked the ability to do the latter, leading to different font sizes in different shapes, all automatic inside a single shape.

Results so far

Here is how the automatic text scaling across multiple shapes works in practice:

https://lh3.googleusercontent.com/5f-rH0nKGed-6GhBn3bAOMH6sVUeZUeqt2TsFydVSFlL_185Hj6BjNkchKn7DVKpAQmRsg6bGNwKyBIN9bR1sRYacqcKnLYOeqasGZB2IWRohN8mtgFG9aNN_k5ofC_ZqunSeHqIYTc=w640
Figure 1. Autofit synchronization, new output
https://lh3.googleusercontent.com/lncpRp13-vUBJH5Kt4ccYHMULGQ8U1Qw8v5z7LmRSE9bv6yjukFMfuiJolCKbVOpjT-85zw_BQMj72dKJLVnMI242CQlIxR7tDUbhBuVaYDuGPRVnAqhCsGbDmGLmyu-7ueA39kNXIg=w640
Figure 2. Autofit synchronization, old output
https://lh3.googleusercontent.com/Go9LGPftmbtFnQXgzxITJVLhEVJF1B13Ge3PGbyKPNEzCJ2zi2DfYBMak92v127PJGYyzjL8V9fTh8Fb_vZXpAdBrBRQizd2onXM8dBka38BkBEi2FE8UP3JCPecKN1m9u8fR591GMM=w640
Figure 3. Autofit synchronization, reference output

You can see how the old output used to have unexpected large text in shape A, but now has the same text size as shape B. This is not applied unconditionally, shape C can request to have an independent, fixed font size.

https://lh3.googleusercontent.com/7IBC-z9NfhP0mjutFPQLPN312AH5Jch6Gss-75kROjLksQ3MnSZnhTodrPDJBm3MmkcQ-rHKvzozgB1O8j8rDBJEkzCf9vgmgrSYa3kH7GqnDS0BgBnlSOWC0GQxVBCIMYX0-Blf_F8=w640
Figure 4. Snake rows, new output
https://lh3.googleusercontent.com/4y0pEF3utBcpXMcCsrvrkvnNCdKKyhVlwejiwsI6cMUrA1nV4u1VuE4l1Xhuw60jQYrkeQD54Y0JuB4NR571kwtluUGceclQPZPcYITEyqf0GF1Y7fr_GXNnSRCtnXO1jjtcO_nSLS0=w640
Figure 5. Snake rows, old output
https://lh3.googleusercontent.com/JqIugvyKapfY6Hw0bs7OtWMJ2sj5mdFOv8ebJwZac_BgmuJXKyHxDUdzCj0xZl9zcksXDjdqthce1xrHJzZdGG_024CLbVBSoCmR-X_qFxdWupFwXBa281LId18qezAU80vuT69kGl0=w640
Figure 6. Snake rows, reference output

You can see that the old output laid shapes all over the place, while the new output puts them to a 3 by 2 matrix. The reason this works is because now we parse width requests from constraints correctly. This means we give spacings a smaller width, real shapes a larger width, so the content fits in less rows and the layout looks like a grid, matching the reference rendering.

How is this implemented?

If you would like to know a bit more about how this works, continue reading… :-)

As for the autofit synchronization:

Beyond that, for the snake rows:

Want to start using this?

You can get a snapshot / demo of Collabora Office and try it out yourself right now: try unstable snapshot. Collabora intends to continue supporting and contributing to LibreOffice, the code is merged so we expect all of this work will be available in TDF’s next release too (7.1).


Locale-independent Writer templates

Estimated read time: 2 minutes

The problem

Users create new documents in various ways. When they do so in Online or via Windows Explorer’s context menu (New → …) then actual templates are not involved in the process, technically. What happens instead is that there is a plain empty Writer (or Calc, Impress) document that gets copied. The reason for this is that by the time the document gets created, the WOPI-like protocol or Windows Explorer doesn’t have a running soffice process to create a document instance from a template: it’ll just copy a file.

With that aside, users expect that when they create new documents, the language of their new document matches the locale of Writer itself. This conflicts with the idea that languages in the documents are explicit, so if a German users writes a piece of German text, the spellcheck passes and the next user is English, then the text should remain German, not introducing new spellcheck errors.

Result

https://lh3.googleusercontent.com/OnDdNBGLsYhicnEbt_G6XW3Tmrn17XUT4XyBczgm0eETha9ZQ0y62t74QxeUFi3BfzfZrbBzZaMikglblqQBqTnWdYQzEQ72iBh3gZMHb9akFpQRVztOW7_0pK1Uyn9fvaNhLfugHfQ=w640
Figure 1. Locale-indepentent Writer template

The solution to this problem is what Mike and Ezinne implemented: make these "templates" minimal, so they don’t refer to any language. Then Calc or Impress will fill the language from the locale of the soffice process and it’ll be part of the document on the first save. This solves the problem of multi-language templates while it does not break the spellcheck use-case.

Andras copied the same templates to various Online integrations to have the same problem solved in that use-case as well.

Writer was still problematic, though. sw: default to UI locale when language is missing now fixes this. You can see on the above screenshot that the stock soffice.odt was opened with a Hungarian locale and the status bar shows that the document language is Hungarian, not the confusing "multiple languages", as before.

Want to start using this?

You can get a snapshot / demo of Collabora Office and try it out yourself right now: try unstable snapshot. Collabora is a major contributor to LibreOffice and all of this work will be available in TDF’s next release too (7.1).


SmartArt improvements in Impress, part 5

Estimated read time: 3 minutes

Impress now has support for considering rules next to constraints when it comes to lay out SmartArt graphics from PPTX files. This builds on top of the previous improvements around SmartArt support.

First, thanks to our partner SUSE for working with Collabora to make this possible.

Motivation

SmartArt allows declaring your content and requirements for a graphic, then the layout will take care of arranging that in a suitable way. It is allowed to declare conflicting requirements, and rules can specify how to resolve those conflicts. The below example document has shape widths defined in a way that multiple child shapes wants to have a width of 100%, but simply scaling down all child shapes does not give a correct result. Rules define what to scale down and what to leave unchanged.

Results so far

Here is how this works in practice:

https://lh3.googleusercontent.com/_AL6ARVsbgdaovqKPxr0n0I1kSn2zX_5xGg5y_4M8whkT6K0-mXIsGXeYI2Uo6u2YQAVwfLtbfy8XeYHggaPWpIHV4yaA4CaaIFUK4LQLRbV-JIbhy9A-Xz5JEEbcXp3TRWK4CzVcl0=w640
Figure 1. Linear layout with multiple 100% width shapes, new output
https://lh3.googleusercontent.com/UmK7-j0WxUHamDA-g3FepAOYYgbD5LJJhssleqv2jLnfXX-62fP82uA_5t__9HOQWIZfJUl6hoZVVQX5-LuIdOxz2M0HS90zcaoov_SbxQHuv4DN48be8dZkvySb_QtAbmNOTcMpJ5c=w640
Figure 2. Linear layout with multiple 100% width shapes, old output
https://lh3.googleusercontent.com/i2ScJOwjQfQeeFrw-yu6EQt67nt5Xx7o325WnaOeprXH4jc_CPLuXt0Mwb2iiT9rBamjooEA271HY48P6v8ieuWMUcoSq5HTjMsJkJnUOcrCrF_7uutebYGfO2WOZzAJRh6k-ibbglc=w640
Figure 3. Linear layout with multiple 100% width shapes, reference output

How is this implemented?

If you would like to know a bit more about how this works, continue reading… :-)

  • The initial heavy-lifting is done in this commit, which parses the rules from the XML input.

  • Then once we had rule info around, the linear algorithm was improved to scale down child shapes based on rules (and not just all of them, equally).

  • Then it was necessary to scale spacings (between child shapes) based on rules as well.

  • It was also needed to limit the height request of a shape, since they should not leave the canvas of the SmartArt.

  • Finally it was necessary to support the "top" child order. This can be declared using the following markup:

<dgm:layoutNode ... chOrder="t">

This declares that an earlier shape in a linear layout is on top of a later shapes, not the opposite. The default is that newer shapes are on top of older shapes. This is not a visible problem usually, but once you start using negative widths in a linear layout, you can have overlapping shapes. The above example has 3 text shapes, which are overlapping with the "background" arrow shape. This is expressed by having 100% width for child shapes (OK to scale down), then a -100% width for a dummy shape (not scaling) and finally a 100% width for the background arrow (not scaling).

All in all, now the background arrow shape has a good position and size, and the text on the arrow is readable.

Want to start using this?

You can get a snapshot / demo of Collabora Office and try it out yourself right now: try unstable snapshot. Collabora is a major contributor to LibreOffice and all of this work will be available in TDF’s next release too (7.1).

© Miklos Vajna. Built using Pelican. Theme by Giulio Fidente on github.