Index ¦ Archives ¦ RSS > Category: libreoffice ¦ RSS

LibreOffice now uses pdfium to render inserted PDF images

Estimated read time: 2 minutes

pdfium is the rendering library used in Chromium’s pdf viewer. It’s based on the foxit pdf renderer and its rendering quality is much better compared to the pre-existing "convert PDF to ODG, then to an image" code when it comes to just viewing a PDF file. First, thanks to PMG who made this work possible.

Let’s look at a few samples that compare the old pdfimport rendering result and the new pdfium-based one. One important feature is that embedded fonts are handled. This is how this inserted PDF looked like previously:

https://farm4.staticflickr.com/3727/33163219940_3a2a3278a0_o.png

Compare it with the new result:

https://farm3.staticflickr.com/2927/33547029855_92c1a5150d_o.png

Now let’s see the front page of a magazine, you can see 4 unexpected artifacts:

https://farm4.staticflickr.com/3948/33563793222_8a6b8e8a6b_z.jpg

New result:

https://farm3.staticflickr.com/2809/33547029645_de7cbcd800_z.jpg

Finally a problem with pdfium was that LibreOffice got bitmaps from it, so in case you re-exported to PDF, the quality of these PDF images were worse than in the original PDF file. The PDF specification has a reference XObject feature that helps in this case: it allows the PDF export to still write the bitmap to the exported PDF, but in case the reader supports this feature, the vector-based original file will be shown, not the bitmap.

Here is a simple hand-crafted star in a PDF file, as it looked initially:

https://farm3.staticflickr.com/2915/33163219680_30f63b4a82_z.jpg

This is how it looks after LibreOffice’s PDF export learned to emit reference XObjects:

https://farm4.staticflickr.com/3933/33547029485_4f487bb26c_z.jpg

All this is available in LibreOffice master, towards 5.4.


LibreOffice PDF export now supports videos

Estimated read time: 2 minutes

https://farm4.staticflickr.com/3924/32549564340_4d0990cfa4_o.png

PDF supports screen annotations, which means it’s possible to play embedded and linked videos on top of a static image. Given that LibreOffice also supports videos, it made sense to add support for this in our PDF export filter. First, thanks to PMG who made this work possible. This is currently added for Writer and Impress.

Linked videos

Linked videos are the situation when the video is not part of the document itself, but it’s located somewhere else, e.g. a http:// location. This is helpful if you want to email around a PDF file, and want to avoid sending large files when it has video content.

tdf#104841 is about this situation, first I added support for linked videos in Impress, then also in Writer.

The result can be played using Adobe Acrobat Reader — for some reason okular on Linux is a bit confused about http:// URLs, wants to convert them to relative ones, and then fails as of today.

Embedded videos

https://farm3.staticflickr.com/2666/32115175413_ec6f64243a_z.jpg

tdf#105093 is the embedded video case, this is handy in case you want to create an entirely self-contained PDF, where even the video content is inside the PDF file as an embedded file.

After Impress support (and a trick around Draw vs Impress shapes) the Writer part wasn’t too complicated.

Regarding the situation around various video containers and codecs, the above code is quite agnostic. :-) On the LibreOffice side all we require is to be able to extract a key frame from the video to provide a preview image, so e.g. on Linux the support depends on what gstreamer plugins you have installed. The video content is written to the PDF file as-is, so again if it will work in the PDF reader is up to the reader’s codec support. On Linux e.g. okular uses vlc for video playback, so the range of supported formats is quite wide. The same is true on Windows, what I personally tested is LibreOffice’s VLC backend and the embedded QuickTime player in Acrobat Reader.

All of this is available on LibreOffice master towards 5.4.


Impress bugfixes, in time for FOSDEM 2017

Estimated read time: 2 minutes

https://farm1.staticflickr.com/334/32605456735_ac88121be8_o.png

FOSDEM 2017 is here this weekend, and as Michael Stahl pointed out, this (together with the LibreOffice annual conference) are two time periods each year when lots of Impress bugfixes are made, as people start dogfooding. ;-) So below you can read about a pair of Impress bugs I fixed recently.

Changing font size now takes table selection into account

tdf#105502 is a situation where you have an Impress table shape, and you select part of the cells, then you click on the sidebar to change the font size. Previously this affected all cells of the table shape, now only the selected cells are updated.

Background fill for shapes

https://farm1.staticflickr.com/277/31761747774_4b1e6b8d38_o.png

tdf#105150 is a PPT(X) filter bug where a shape was previously imported as transparent, but it actually has to have the same fill type as the slide background. In case of PPTX this was already handled in general, but not in case the slide had no explicit background. The result was that in case the shape was used to cover other shapes, they were visible, leading to e.g. this unexpected red rectangle on the screenshot.

The same bug was present in the PPT import, though there existing support was even more limited: just the "background colored objects" were collected, but nothing was done to them. Now the above use-case should be as good for PPT as it is for PPTX.


Hack-(rest-of-the)-week at Collabora

Estimated read time: 2 minutes

https://farm1.staticflickr.com/726/32306648426_b4ee93f6a1_o.png

As mentioned in the blog post of Mike already, last month we were allowed to hack on anything we want in LibreOffice for a few days. I used this time to progress with 3 different topics.

Stepping through TextBoxes using the keyboard

Given that a Writer shape with a TextBox is internally two shapes, this needed explicit support. After my TextBox bugfix it’s possible to have two such shapes in a document, and once you select one of them, tab properly jumps between the two shapes; previously nothing happened.

What did happen is we tried to activate the TextBox of the selected shape, which selected the shape itself, so at the end nothing happened.

RTF improvements

For some time it was already possible to import and export custom string document properties from/to RTF, but just in case the value type of the property was string. Now I extended support for these custom properties, so also the remaining types are handled: numbers, bools, doubles and dates.

xmlsec patch upstreaming

Last, I’ve started working on upstreaming external/libxmlsec/xmlsec1-noverify.patch.1. xmlsec has no ability to disable the verification of certificates (think of curl -k or wget -k), so in LibreOffice currently we just patch out that code as we don’t need it. So I wanted to add a new verification flag to avoid patching, but it turns out that in the NSS case xmlsec didn’t do the verification, so as a first step I fixed that instead in this xmlsec GitHub pull request. Now that it’s merged, the next step will be to add such a flag, and then LibreOffice can get rid of the patch after the next xmlsec release.


PAdES support for PDF files in LibreOffice

Estimated read time: 3 minutes

Building on top of the previously mentioned signing of existing PDF files work, one more PDF feature coming in LibreOffice 5.3 is initial support for the PDF Advanced Electronic Signatures (PAdES) standard. First, thanks to the Dutch Ministry of Defense in cooperation with Nou&Off who made this work possible.

Results

PAdES is an extension of the ISO PDF signature with additional constraints, so that it conforms to the requirements of the European eIDAS regulation, which in turns makes it more likely that your signed PDF document will be actually legally binding in many EU member states.

The best way to check if LibreOffice produces such PDF signatures is to use a PAdES validator. So far I found two of them:

As it can be seen above, the PDF signature produced by LibreOffice 5.3 by default conforms to the PAdES baseline spec.

Implementation

I implemented the followings in LO to make this happen:

  • PDF signature creation now defaults to the stronger SHA-256 (instead of the previously used weaker SHA-1), and the PDF verifier understands SHA-256

  • the PDF signature creation now embeds the signing certificate into the PKCS#7 signature blob in the PDF, so the verifier can check not only the key used for the signing, but the actual certificate as well

  • the PDF signature import can now detect if such an embedded signing certificate is present in the signature or not

Note
Don’t get confused, LO does signature verification (checks if the digest matches and validates the certificate) and now shows if the signing certificate is present in the signature or not, but it doesn’t do more than that, the above mentioned DSS tool is still superior when it comes to do a full validation of a PAdES signature.

As usual, this works both with NSS and MS CryptoAPI. In the previous post I noted that one task was easier with CryptoAPI. Here I experienced the opposite: when writing the signing certificate hash, I could provide templates to NSS on how the ASN.1 encoding of it should happen, and NSS did the actual ASN.1 DER encoding for me. In the CryptoAPI case there is no such API, so I had to do this encoding manually (see CreateSigningCertificateAttribute()), which is obviously much more complicated.

Another pain was that the DSS tool doesn’t really separate the validation of the signature itself and of the certificate. The above screenshot was created using a non-self-signed certificate, hence the unclear part in the signed-by row.

If you want to try these out yourself, get a daily build and feel free to play with it. This work is part of both master or libreoffice-5-3, so those builds are of interest. Happy testing! :-)


Signing existing PDF files in LibreOffice

Estimated read time: 6 minutes

TL;DR: see above — it’s now possible signing existing PDF files and also verify those signatures in LibreOffice 5.3.

The problem

LibreOffice already made it possible to digitally sign PDF files as part of the PDF export, so in case you had e.g. ODF documents and exported them to PDF, optionally a single digital signature could be added as part of the export process. This is now much improved. First, thanks to the Dutch Ministry of Defense in cooperation with Nou&Off who made this work possible.

A user can already use an other application to verify that signature or sign an already existing PDF file. The idea is to allow doing these from inside LibreOffice, directly.

Results

As it can be seen above, now the Digital Signatures dialog not only works for ODF and OOXML files, but also for PDF files. If the file has been signed, then the dialog performs verifications of that signature. Signatures are also verified on opening any signed PDF file.

I’ve also extended the user interface a bit, so that signing an existing PDF file is easy, similarly how exporting to PDF is easier than exporting to a random other file format. There is now a new File → Digital signatures → Sign exiting PDF menu item to open a PDF file for signing:

When that happens the infobar has a dedicated button to open the Digital Signatures dialog, and also going into editing mode triggers a warning dialog, as going read-write is not needed to be able to sign a document:

And that’s basically it, after you open a PDF file in Draw, you can do the usual digital signature operations on the file, just like it already works for previously supported file formats.

Details

What follows is something you can probably skip if you’re a user — however if you’re a developer and you want to understand how the above is implemented, then read on. ;-)

PDF tokenizer

The signing feature in ODF/OOXML is implemented by working directly on the ZIP storage in xmlsecurity/. This means that in the PDF case it’s necessary to work on the PDF file directly, except that we had no such PDF tokenizer ready to be used.

Code under xmlsecurity/source/pdfio/ now is such a tokenizer that can extract info from PDF files and can also add incremental updates at the end of the file, this way we can make sure adding a signature to a file won’t loose existing content in the file. This is fundamentally different form the usual load-edit-save workflow, when we convert the file into a document model, and work on that.

Verification of signatures

Previously LO was only able to generate signatures, not verify them. I’ve implemented PDF signature verification using both NSS and CryptoAPI, so all Windows, Linux and macOS are covered. I have to admit that the initial verification was much easier with CryptoAPI. Until I hit corner-cases, I could use an API that’s well-documented and is higher level than NSS. (I don’t have to support different hash types explicitly, for example.)

When I added support for non-detached signatures, that changed the situation a bit:

 1 file changed, 15 insertions(+), 11 deletions(-)

was the NSS patch, and

 1 file changed, 104 insertions(+), 8 deletions(-)

was the CryptoAPI patch.

Signing existing files

Signing an existing file means tokenizing a document, figuring out how an incremental update should look like for that file, writing an incremental update that has a placeholder for the actual signature (a PKCS#7 blob, where the input is just the non-placeholder parts of the document as binary data), and finally filling in the placeholder with the actual signature.

For the last step, I could reuse code from the PDF export (modulo fixing bugs like tdf#99327). For the other steps, the tokenizer remembers the input offset / length for the given token, this way it’s relatively easy to create incremental updates. You can add new objects or update new objects in such an incremental update, and this source tracking feature allows copying even the unchanged parts of updated objects verbatim.

PDF 1.5+

Everything becomes a bit more complicated once I started to handle not only LO-generated PDF-1.4, but also newer PDF versions. I think this is important, as Adobe Acrobat creates PDF 1.6 by default today, which has a number of new features (I think all of them were actually introduced in PDF-1.5) that affects the tokenizer:

  • xref stream: instead of an ASCII xref table ("table of contents") at the end of the file, it’s now possible to write the binary equivalent of this as an xref stream. Because the binary version can describe more features we must also write an updated xref stream (and not an xref table) when the import already had an xref stream.

  • object streams: it’s now possible to write multiple objects inside the stream section of a single object in binary form. The tokenizer is necessary to be able to read these objects and also roundtripping (source tracking) should work not only with physical file offsets, but also inside such compressed streams where the offset is no longer just a number inside the input file. (It’s OK to write the updated objects outside object streams, still.)

  • stream predictors: this is a concept from the PNG format, but also used in PDF when compressing the xref stream. See the spec for the gory details, but in short it’s not enough that instead of plaintext you have to deal with binary compressed data, you also have to filter the data before actually parsing the file offsets, and the filter is defined not in terms of object IDs and file offsets, but in terms of adjacent pixels, since it’s documented in the PNG spec. :-) (To be close to the Adobe output, we also apply such predictors when writing compressed xref streams.)

User Interface

In addition to be UI changes already mentioned above, one more improvement I did is that now the Digital Signatures dialog has a new column to show the signature type. This is either XML-DSig (for ODF/OOXML) or PDF.

Testing

I’ve added an integration test in the existing CppunitTest_xmlsecurity_signing to have coverage for the small new code that calls into xmlsecurity/ from sfx2/ in case of PDF files. But fortunately because all other code in xmlsecurity/ was new, I could do unit testing in CppunitTest_xmlsecurity_pdfsigning for the rest of the features.

Needless to say that invoking the PDF tokenizer + signature creator/verifier directly is much quicker than loading a full PDF file into Draw, just to see the signature status. ;-)

Summary

If you want to try these out yourself, get a daily build and play with it! This work is part of both master or libreoffice-5-3, so those builds are of interest. Happy testing! :-)


LibreOffice session at DevTalks Jr.

Estimated read time: 1 minutes

(via DevTalksRo)

Today I gave a Getting involved with LibreOffice Online and Android session at DevTalks Jr, Bucharest. The event had two tracks in parallel, with a total attendees of about 200 developers.

Some photos I took after the event are available.

Thanks the organizers and sponsors for the great event! :-)


Insert PDF as image in LibreOffice 5.3

Estimated read time: 3 minutes

Results

LibreOffice 5.3 will add one more vector-based format that can be inserted as an image into documents: PDF. First, thanks to PMG who made this work possible. On the user interface you can now select PDF files when you choose e.g. Writer’s Insert → Image option:

The first page of the PDF document will be shown, which is handy if the PDF file is basically used as a vector image format.

Similarly to the SVG feature, the original vector image is stored in the document, but when saving to ODF, a replacement PNG file is also generated to be backwards compatible with older ODF readers. The image context menu → Save menu item allows to extract your original PDF data from the image, too:

And that’s it, as long as you save your document in ODF, your PDF-as-an-image will be kept without loosing any data. As usual, you can try this right now with a 5.3 daily build. :-)

However, if you’re interested in how this is implemented, keep reading…

Document model

The PDF image in the document model is really similar to how SVG is handled, next to Graphic::getSvgData(), there is now a Graphic::getPdfData(). This new member function exposes the original PDF data, otherwise the Graphic is just a metafile.

UNO API

The ReplacementGraphicURL property of the image at an UNO level now exposes the generated metafile for PDF images. This is implemented for both Draw and Writer images, and is used by the ODF export filter.

Layout

When the Graphic instance is rendered, the layout knows nothing about the PDF data attached to the object, only parses the generated metafile. This way the display of the PDF image works out of the box.

Filters

First I’ve implemented a PDF import-as-graphic filter, then the export equivalent of it. As you can see, the PDF import-as-graphic filter isn’t too complicated, it completely reuses the existing "import PDF into Draw" filter, it simply copies the first page of the resulting document model as a metafile.

Second, once the graphic filters were working, I’ve also improved the ODF import to recognize PDF data — the export side needed no explicit work, once the ReplacementGraphicURL bits were in place.

Tests

As mentioned above, the Draw and the Writer image implementation is separate, so first I’ve added tests for ODT files in the testEmbeddedPdf of CppunitTest_sw_odfexport, and then SdExportTest::testEmbeddedPdf() to cover ODP files (and other ODF formats). Second, the PDF part of the graphic swapout/in code has a dedicated test in GraphicObjectTest::testPdf(), and the UI’s "Save original PDF" feature has a new XOutdevTest::testPdfGraphicExport() test.

Oh, and if you intent to test this manually in a self-created build, make sure to avoid --disable-pdfimport, otherwise this feature can’t work. ;-)


Small capitals toolbar button in LibreOffice Writer

Estimated read time: 1 minutes

It was requested to be able to set the small capitals character property via a toolbar button in Writer, which was indeed not possible. Not only the toolbar button wasn’t there, but the underlying UNO command was also missing (which you can use e.g. from a macro to format the current selection).

So my commit added a simple set of icons to the galaxy theme for the new toolbar button, defined the new UNO command for Writer text and added it to Writer’s text object bar, next to the upper case and lower case buttons (hidden by default). One difference from those buttons is that those buttons perform a transliteration, while this one really just sets a character property, you can easily undo the property later if needed.

Wrt. other icon themes, see this mail, hopefully the design team can help there.

As usual, you can try this right now with a 5.3 daily build. :-)


Using clang-based tools beyond loplugin LOCon lightning talk

Estimated read time: 1 minutes

The last week I gave a Using clang-based tools beyond loplugin lightning talk at LibreOffice conference 2016, on the last day. Click on the image to see all the slides.

If you’re a vim or emacs user and you work with C++11 code, you probably want to have a look at clang-rename, include-fixer and some editor plugin exposing the power of libclang (like YouCompleteMe or libclang-vim), sometimes these are really helpful.

© Miklos Vajna. Built using Pelican. Theme by Giulio Fidente on github.