vmiklos.hu
shameless self-promoting website
»Root
»Rejourn root
»LibreOffice Community Blogs
Search:
Tags:
»hu (1201)
»en (579)
»hacking (416)
»film (286)
»libreoffice (129)
»geek (82)
»bringa (60)
»konyv (44)
»misc (37)
»zene (32)
»munka (28)
»fun (27)
»frugalware (23)
»sieles (7)
»bitlbee (5)
»gsoc2009 (4)
»git (3)
»szinhaz (3)
»gsoc2008 (3)
»opensuse (3)
»gsoc2011 (2)
»go-oo (2)
»kde (1)
»libwpd (1)
»karacsony (1)
»upc (1)
»google (1)
»fail (1)
»openstack (1)
»mdadm (1)
»w3c (1)
»java (1)
»okular (1)
»greasemonkey (1)
»auto (1)
»otrs (1)
»gsoc2010 (1)
»bme (1)
»python (1)
»howto (1)
»openoffice (1)
»networking (1)
»xmlsec (1)
»gpsbabel (1)
»supybot (1)
»lcov (1)
»nyaralas (1)
  • Tuesday, 05 December 2017
    EPUB3 export improvements in Libreoffice Writer (Comments)

    I worked on improving the EPUB3 export filter in LibreOffice recently. First, thanks to the Dutch Ministry of Defense in cooperation with Nou&Off who made this work possible. Since the previous blog entry there have been a number of improvements around 4 topics.

    https://farm5.staticflickr.com/4540/38847800651_d5271ced3a_o.png

    The character properties of link text is now handled correctly, in the above example you can see that the text is red, and this comes from a character style.

    Improved table support

    Previously the support for tables was there just to not loose content, now all kinds cell, row and table properties are handled correctly. A few samples

    • custom cell width:

    https://farm5.staticflickr.com/4566/38847800611_38b8483d7f_o.png
    • custom row height:

    https://farm5.staticflickr.com/4580/38847800521_26285a9152_o.png
    • row span:

    https://farm5.staticflickr.com/4540/38847800461_359651bc3d_o.png

    So the table support should be now decent, covering row and column spanning and various cell border properties.

    Improved image support

    Previously only the simplest as-character anchoring was supported. Now much more cases are handled. Two examples:

    • image borders:

    https://farm5.staticflickr.com/4541/24975193838_94818bd1ed_o.png
    • image with a caption:

    https://farm5.staticflickr.com/4568/24975193608_83239bf287_o.png

    This includes various wrap types (to the extent HTML5 allows representing ODF wrap types).

    Font embedding

    If the user chooses to embed fonts (via File → Properties → Font → Embed), then the EPUB export now handles this. Here is a custom font that is typically not available:

    https://farm5.staticflickr.com/4561/38847800811_613d6fbbd2_o.png

    (The screenshot is from the Calibre ebook reader.)

    All this is available in master (towards LibreOffice 6.1), or you can grab a daily build and try it out right now. :-)


  • Friday, 10 November 2017
    Basic EPUB3 export in Libreoffice (Comments)

    https://farm5.staticflickr.com/4577/37588898064_117dc4a933_o_d.png

    I worked on a new EPUB3 export filter in LibreOffice recently. First, thanks to the Dutch Ministry of Defense in cooperation with Nou&Off who made this work possible. The current state is that basic features work nicely to the extent that the filter is probably usable for most books (they typically mostly have just text with minimal formatting), so this post aims to explain the architecture, how the various pieces fit together.

    The above picture shows the building blocks. The idea is that nominally EPUB is a complete export filter, but instead of doing all the work, we offload various sub-tasks to other modules:

    • First we invoke the existing (flat) ODT export, so we can work with ODF instead of with the UNO API directly. This will be useful in the next step.

    • Then we feed the SAX events from the ODT export to a new librevenge text export. Given that the librevenge API is really close to ODF (and xmloff/ has quite some code to map the UNO API to ODF), here it pays off to work with ODF and not with the UNO API directly.

    • The librevenge text export talks to a librevenge generator, which is David Tardon’s excellent libepubgen in this case.

    • Finally libepubgen calls back to LibreOffice, and our package code does the ZIP compression.

    The setup is a bit complicated, but it has a number of advantages:

    • Instead of reinventing the wheel, LO and DLP now shares code, libepubgen is now a dependency of LibreOffice.

    • libepubgen doesn’t bring its own ZIP writer code, it can nicely reuse our existing one.

    • This is a great opportunity to finally write an ODT→librevenge bridge, so other DLP-based export libs can be added in the future (e.g. librvngabw).

    • If we ever want to export to EPUB from Draw/Impress, libepubgen will help us there as well.

    As a user, here is a list of features you can expect working:

    • plain text should work fine (formatting may be lost, but content should be fine)

    • table of contents, as long as you properly use headings or you separate chapters by page breaks

    • export options: EPUB3 vs EPUB2, split on headings vs page breaks

    • basic set of character and paragraph properties should work

    During development I regularly used epubcheck, so hopefully the export result is usually valid.

    All this is available in master (towards LibreOffice 6.0), or you can grab a daily build and try it out right now. :-)


  • Friday, 13 October 2017
    A year in LibreOffice’s PDF support LOCon talk (Comments)

    A year in LibreOffice’s PDF support was a talk I gave today at LibreOffice conference 2017. Given that this was one of the last talks at the whole conference, thanks to the ones who still did not go home, but listened. :-)


  • Wednesday, 11 October 2017
    LibreOffice: Code Structure LOCon talk (Comments)

    Today I gave a LibreOffice: Code Structure talk at LibreOffice conference 2017. These are an updated version of Michael Meeks' original slides, it’s actually surprised me how many things changed since April 2016. :-)


  • Monday, 25 September 2017
    pdfium path segment API for LibreOffice's test needs (Comments)

    I recently fixed tdf#108963, which is a PDF export bug — in case of highlighted and rotated text in e.g. Impress, the highlight rectangle in the PDF export was not rotated.

    This is how the export result looked like:

    https://farm5.staticflickr.com/4341/37305427601_db1cfb697e_o.png

    And this is how it now looks like, after fixing:

    https://farm5.staticflickr.com/4453/37258379126_b20fd39655_o.png

    For a long time the PDF export filter had no tests at all; the current approach I introduced is that we parse the PDF export result with pdfium, which is an excellent PDF rendering library (I covered it in general in an earlier post).

    So given that pdfium knows how that rectangle looks like, we should be able to query the details of it from a test as well, correct? It depends. Yes, it’s possible technically, but no, most of the pdfium functionality is actually not exposed at its public API.

    The current situation is that one could use FPDF_LoadMemDocument(), FPDF_LoadPage() to get access to a PDF page, then FPDFPage_CountObject() and FPDFPage_GetObject() to iterate over objects on a page. We can filter for the relevant object by using FPDFPageObj_GetType() and FPDFPath_GetFillColor(), that will give us the only path that has a yellow fill color.

    But getting more info about the geometry of the path isn’t really possible. As a workaround I went with FPDFPageObj_GetBounds() for the test, but wouldn’t it be nicer to get the individual segments (the objects that are the children of a path) and then get coordinates and other properties of a segment? This is what the recent API I added to pdfium now does. It provides the followings:

    • FPDFPath_CountSegments() gives you the number of segments of a path

    • FPDFPath_GetPathSegment() gives you a given segment, via a new FPDF_PATHSEGMENT opaque type

    • you can use FPDFPathSegment_GetPoint() to get the coordinates, FPDFPathSegment_GetType() to get the type (move to, line to, etc.) and FPDFPathSegment_GetClose() to see if the segment closes the current subpath of the path (or not)

    This means that after the next pdfium update in LibreOffice, PDF export tests can nicely assert these properties of paths instead of dubious bounding box should be larger after rotation assertions.


  • Friday, 25 August 2017
    Split sections inside tables for LibreOffice Writer (Comments)

    Tables and sections in LibreOffice Writer are both containers, and in some cases it makes sense to have sections inside tables or tables inside sections. (For example you can mark a group of paragraphs as read-only by including them in a read-only section.) Tables in sections, split over multiple pages was already working, but now it’s possible to have sections in tables split over multiple pages as well.

    First, thanks Escriba who made this work possible.

    There were 3 parts of this work, you can read some details about them below.

    Split of multi-line paragraphs

    The first goal was to handle the split of multi-line paragraphs inside sections inside tables. Initially this looked like this:

    https://farm5.staticflickr.com/4430/35957293074_cfeabe6a51_o.png
    https://farm5.staticflickr.com/4393/35957293014_ae8f210542_o.png

    Split of one-liner paragraphs

    Technically this is a situation different to the previous one, as split paragraphs have a master (first) frame and one or more follow (non-first) frames; and the previous stage only addressed the move of follow frames to next pages. Initially such a document looked like this:

    https://farm5.staticflickr.com/4360/35957292924_2af502ffc7_o.png
    https://farm5.staticflickr.com/4399/35957292834_dc2ce35f85_o.png

    Merge a split section

    The last piece was moving paragraphs back to previous pages when there is again space for them. Initially we did not use the newly available space:

    https://farm5.staticflickr.com/4432/35982835413_99a65febe2_o.png

    After commit tdf#108524 sw: handle sections inside tables in SwFrame::GetPrevSctLeaf() the paragraph is moved back properly:

    https://farm5.staticflickr.com/4408/35982835283_1c2002254b_o.png

    One more thing…

    Given that all code changes affect how sections in tables are handled in a parent frame in general (which is a body frame in all the above pictures), the same changes are also usable for other parent containers as well, e.g. linked TextFrames. Here is how that looks like:

    https://farm5.staticflickr.com/4342/35982835353_25d609548d_o.png

    That’s it for now — as usual the commits are in master, so you can try this right now with a 6.0 daily build. :-)


  • Friday, 21 July 2017
    Mail merge Writer data source (Comments)

    If you ever used the mail merge wizard with data sources, then you know how it works: it typically needs some kind of data source (e.g. a Calc spreadsheet), a Writer document containing the email or letter (that contains fields), and then mail merge can generate the personalized documents for you.

    In case you have an existing document where you already have such data in a Writer table, you had to somehow transfer it to one of the formats for which there was a data source driver, and then you could use it inside mail merge. I’ve now added a dedicated Writer driver in connectivity/, so picking up data directly from Writer tables is now possible.

    If you are interested how this looks like, here is a demo (click on the image to see the video):

    That’s it for now — as usual the commits are in master, so you can try this right now with a 6.0 daily build. :-)


  • Tuesday, 04 July 2017
    Using LibreOffice with xmlsec from the system (Comments)

    LibreOffice uses a number of external libraries, and most of them can be configured to use a bundled version or a system version. libxmlsec was an exception previously (only the bundled version was usable), but LibreOffice master (towards 6.0) doesn’t have this limitation anymore.

    Using a bundled version is a good choice in case:

    • you want to create self-contained binaries

    • you want to bisect a regression, where possibly the regression was introduced by upgrading one of the external libraries

    • the system (e.g. macOS, Windows) doesn’t provide the relevant library

    Using a system version is a good thing in case:

    • you want to work with the system, not against it (if a Linux distro already provides libxmlsec, why ship a duplicated copy inside LibreOffice?)

    • being able to use a system version means our bundled version does not have custom patches which affect the functionality of the library

    • not having custom patches also means upstream benefit from our submitted patches, these patches are reviewed by competent maintainers and upgrading the external is easier, as there is no patchset to rebase.

    With that in mind, let’s have a look what blocked using system-xmlsec in the past:

    • LibreOffice inherited a large patchset from OpenOffice.org, commit 694a2c53810dec6d8e069d74baf51e6cdda91faa (2012-11-30) had 16 patches, with this scary diffstat:

     43 files changed, 5888 insertions(+), 1885 deletions(-)
    • I even increased this when I added the SHA256 patches, as back then I wasn’t sure if it’ll be ever possible to upgrade to a newer libxmlsec version.

    • Step by step I got rid of most of these patches, either by upstreaming them or realizing they are no longer necessary. Upstreaming wasn’t always trivial, as for our purposes it was always easy to patch something, but for upstream non-compatible changes always have to be conditional. Today we have 3 build-specific patches, 1 backport and 1 feature patch that is (at least) not necessary when signing / verifying documents with software-based certificates.

    • At the end two more commits were necessary to support building against system-xmlsec, only adding minimal differences when using the system or the bundled xmlsec variants.

    If you are a Linux distro packager then --with-system-libs already implies --with-system-xmlsec, so you probably don’t have to do anything. If you build LO for static analysis (e.g. Coverity) then this should be also useful, so not relevant issues in 3rd-party code don’t have to be ignored manually.


  • Wednesday, 31 May 2017
    LibreOffice Perugia HackFest 2017 (Comments)

    (via ogervasi)

    Last weekend I attended the LibreOffice Perugia HackFest 2017, with the primary goal of mentoring students (together with Eike and Christian): provided they manage to contribute at least one non-trivial easy hack, they get university credits for their work.

    I worked with Arianna, Claudio, Francesco and Gian, all of them managed to achieve something by the end of the third day.

    When I was not helping others, I also fixed a few bugs:

    • tdf#107976 sw: let a view handle multiple transferables

    • tdf#107837 DOCX export: fix balanced multi-col section at doc end

    • tdf#107684 DOCX export: fix duplicated <w:outlineLvl> element for styles

    • tdf#106950 sw: support CharShadingValue property on paragraph styles

    Some photos I took during the event are available.

    Thanks the organizers for the great event, also kudos to Collabora, Red Hat and TDF for allowing mentors to come! :-)


  • Wednesday, 17 May 2017
    xmlsec improvements in LibreOffice 5.4 (Comments)

    This post summarizes the plumbing work around ODF/OOXML digital signatures that I did on LibreOffice master after the 5.3 branch-off up to now. The big thing is the integration of the libxmlsec 1.2.24 release. Among other things, this contains 2 larger changes that I contributed upstream triggered by the needs of LibreOffice:

    • The ECDSA-SHA256 feature is something I already mentioned, but I did not bother to backport the SHA1 and the SHA256 part, so those now arrived to LibreOffice as well.

    • xmlsec’s XMLSEC_KEYINFO_FLAGS_X509DATA_DONT_VERIFY_CERTS flag (while verifying signatures) was there, but its behavior was not clear (neither for nss nor for mscrypto). I’ve changed it to be in sync what you have in other commands to avoid certificate validation (like wget -k or curl -k), which means as a next step there will be one less xmlsec patch in LibreOffice that prevents us from using xmlsec from the system on Linux. (Adding tests also detected that in the nss case not using that flag also didn’t do verification by accident, this is now fixed as well.)

    After the release I also noticed that creating signatures on Windows was broken, this is now fixed on xmlsec master and also backported to LibreOffice.

    All this is available in LibreOffice master, towards 5.4.


more »