Signing existing PDF files in LibreOffice

Posted on: Mon 12 December 2016

Estimated read time: 6 minutes

https://farm1.staticflickr.com/542/31208923460_2eb9bca147_z.jpg

TL;DR: see above — it’s now possible signing existing PDF files and also verify those signatures in LibreOffice 5.3.

The problem

LibreOffice already made it possible to digitally sign PDF files as part of the PDF export, so in case you had e.g. ODF documents and exported them to PDF, optionally a single digital signature could be added as part of the export process. This is now much improved. First, thanks to the Dutch Ministry of Defense in cooperation with Nou&Off who made this work possible.

A user can already use an other application to verify that signature or sign an already existing PDF file. The idea is to allow doing these from inside LibreOffice, directly.

Results

As it can be seen above, now the Digital Signatures dialog not only works for ODF and OOXML files, but also for PDF files. If the file has been signed, then the dialog performs verifications of that signature. Signatures are also verified on opening any signed PDF file.

I’ve also extended the user interface a bit, so that signing an existing PDF file is easy, similarly how exporting to PDF is easier than exporting to a random other file format. There is now a new File → Digital signatures → Sign exiting PDF menu item to open a PDF file for signing:

https://farm1.staticflickr.com/30/31543378146_b981a22ee0_z.jpg

When that happens the infobar has a dedicated button to open the Digital Signatures dialog, and also going into editing mode triggers a warning dialog, as going read-write is not needed to be able to sign a document:

https://farm1.staticflickr.com/752/31543377356_eaed7c2301_z.jpg

And that’s basically it, after you open a PDF file in Draw, you can do the usual digital signature operations on the file, just like it already works for previously supported file formats.

Details

What follows is something you can probably skip if you’re a user — however if you’re a developer and you want to understand how the above is implemented, then read on. ;-)

PDF tokenizer

The signing feature in ODF/OOXML is implemented by working directly on the ZIP storage in xmlsecurity/. This means that in the PDF case it’s necessary to work on the PDF file directly, except that we had no such PDF tokenizer ready to be used.

Code under xmlsecurity/source/pdfio/ now is such a tokenizer that can extract info from PDF files and can also add incremental updates at the end of the file, this way we can make sure adding a signature to a file won’t loose existing content in the file. This is fundamentally different form the usual load-edit-save workflow, when we convert the file into a document model, and work on that.

Verification of signatures

Previously LO was only able to generate signatures, not verify them. I’ve implemented PDF signature verification using both NSS and CryptoAPI, so all Windows, Linux and macOS are covered. I have to admit that the initial verification was much easier with CryptoAPI. Until I hit corner-cases, I could use an API that’s well-documented and is higher level than NSS. (I don’t have to support different hash types explicitly, for example.)

When I added support for non-detached signatures, that changed the situation a bit:

 1 file changed, 15 insertions(+), 11 deletions(-)

was the NSS patch, and

 1 file changed, 104 insertions(+), 8 deletions(-)

was the CryptoAPI patch.

Signing existing files

Signing an existing file means tokenizing a document, figuring out how an incremental update should look like for that file, writing an incremental update that has a placeholder for the actual signature (a PKCS#7 blob, where the input is just the non-placeholder parts of the document as binary data), and finally filling in the placeholder with the actual signature.

For the last step, I could reuse code from the PDF export (modulo fixing bugs like tdf#99327). For the other steps, the tokenizer remembers the input offset / length for the given token, this way it’s relatively easy to create incremental updates. You can add new objects or update new objects in such an incremental update, and this source tracking feature allows copying even the unchanged parts of updated objects verbatim.

PDF 1.5+

Everything becomes a bit more complicated once I started to handle not only LO-generated PDF-1.4, but also newer PDF versions. I think this is important, as Adobe Acrobat creates PDF 1.6 by default today, which has a number of new features (I think all of them were actually introduced in PDF-1.5) that affects the tokenizer:

xref stream: instead of an ASCII xref table ("table of contents") at the end of the file, it’s now possible to write the binary equivalent of this as an xref stream. Because the binary version can describe more features we must also write an updated xref stream (and not an xref table) when the import already had an xref stream.
object streams: it’s now possible to write multiple objects inside the stream section of a single object in binary form. The tokenizer is necessary to be able to read these objects and also roundtripping (source tracking) should work not only with physical file offsets, but also inside such compressed streams where the offset is no longer just a number inside the input file. (It’s OK to write the updated objects outside object streams, still.)
stream predictors: this is a concept from the PNG format, but also used in PDF when compressing the xref stream. See the spec for the gory details, but in short it’s not enough that instead of plaintext you have to deal with binary compressed data, you also have to filter the data before actually parsing the file offsets, and the filter is defined not in terms of object IDs and file offsets, but in terms of adjacent pixels, since it’s documented in the PNG spec. :-) (To be close to the Adobe output, we also apply such predictors when writing compressed xref streams.)

User Interface

In addition to be UI changes already mentioned above, one more improvement I did is that now the Digital Signatures dialog has a new column to show the signature type. This is either XML-DSig (for ODF/OOXML) or PDF.

Testing

I’ve added an integration test in the existing CppunitTest_xmlsecurity_signing to have coverage for the small new code that calls into xmlsecurity/ from sfx2/ in case of PDF files. But fortunately because all other code in xmlsecurity/ was new, I could do unit testing in CppunitTest_xmlsecurity_pdfsigning for the rest of the features.

Needless to say that invoking the PDF tokenizer + signature creator/verifier directly is much quicker than loading a full PDF file into Draw, just to see the signature status. ;-)

Summary

If you want to try these out yourself, get a daily build and play with it! This work is part of both master or libreoffice-5-3, so those builds are of interest. Happy testing! :-)

Category: libreoffice – Tags: en