vmiklos.hu
shameless self-promoting website
»Root
»Rejourn root
»LibreOffice Community Blogs
Search:
  • Tuesday, 27 January 2015
    Tiled editing: from viewing only to a living document (Comments)

    As it has been announced last week, an Android port of LibreOffice in the form of a viewer app is now available for download. What’s next? Editing, naturally. First, thanks again to The Document Foundation — and all the donors who made this (ongoing) work possible. In this post I would like to explain what did we do with Tomaž Vajngerl at Collabora so far in that direction.

    If you ever touched the Android port of LibreOffice, you probably noticed that sadly developing for Android is much harder compared to Linux (desktop). On Linux, if you just touch a single module, it’s possible to rebuild just that module in a few seconds, and then you can run soffice again with your modifications included. On Android, this is much harder:

    • due to a limitation of the Android linker, we link all the native code into a single shared object, that has to be re-linked after each native code modification

    • the native + the Java code has to be packed into a .apk archive

    • the .apk archive has to be uploaded to the device (or emulator) and installed there

    and only then can you test your changes. To partly sidestep from this problem, we split the "Android editing" into two:

    • tiled editing: this can be tested on Linux using the gtktiledviewer test application (and ideally any core problem can be seen here already)

    • Android LibreOfficeKit client: replacing gtktiledviewer with the real Android client code, and this time testing it on the device

    One problem with this approach was that while Android properly rendered small tiles of 256x256 pixels, gtktiledviewer rendered a single huge tile. This means that in case part of the document changes and we need to re-draw it, we always repainted the whole document in gtktiledviewer, while we only repainted the necessary parts on Android. Guess what, if the area to be repainted is wrong, it’ll be visible on Android but not on gtktiledviewer. So the first task we solved was to let gtktiledviewer also render small tiles. For debugging purposes, small red rectangles are painted at the top left corners of each rectangle, so the size and position of the tiles can be seen easily:

    The next step was to somehow start work on real editing — but where to start? We identified two critical building blocks:

    • there should be some way for the user to provide input (e.g. press a key on the software keyboard)

    • once the document changed, the application has to redraw the changed part of the view

    To avoid solving two problems at the same time, we first went after the second. One use case that only requires the update of the view is blinking text. Even if no touch or key events are available, a blinking text wants to update the view using a timer, so it’s a good testcase. It’s now possible for LibreOfficeKit clients to register a notification callback, and using that, LibreOffice can notify clients if part of the view has to be redrawn. Here is how it looks using gtktiledviewer:

    This demonstrates that the LibreOfficeKit implementation in LibreOffice core and also the gtktiledviewer client code handle correctly tile invalidations. Once that was done, we could also implement a similar client code in the Android app — it looks like this:

    That’s it for now — next on our list is adding support for input handling, so it’s possible to type in some text. :-)


  • Saturday, 24 January 2015
    Perfect WW8 comment import (Comments)

    TL;DR: Import of annotated text ranges from binary DOC format was a problem for quite some time, now it should be as good as it always was in the ODT/DOCX/RTF filter.

    Longer version: the import of annotation marks from binary DOC was never perfect. My initial implementation had a somewhat hidden, but important shortcoming, in the form of a "Don’t support ranges affecting multiple SwTxtNode for now." comment. The underlying problem was that annotation marks have a start and end position, and this is described as an offset into the piece table (so the unit was a character position, CP) in the binary DOC format, while in Writer, we work with document model positions (text node and content indexes, SwPosition), and it isn’t trivial to map between these two.

    Tamás somewhat improved this homegrown CP → SwPosition mapping code, but was still far from perfect. Here is an example. This is how this demo document looks like now in LibreOffice Writer:

    And this is how it looked like before the end of last year:

    Notice how "Start" is commented and it wasn’t before. Which one is correct? Here is the reference:

    The reason is that the document has fields and tables, and the homegrown CP → SwPosition mapping did not handle this. A much better approach is to handle the mapping as we do it for bookmarks: even if at the end annotation marks and bookmarks are entires in sw::mark::MarkManager, it’s possible to set the start position as a character attribute during import (since mapping the current CP to the current SwPosition is easy) and when we know both the start and end, delete the character attribute and turn it into a mark manager entry. That’s exactly what I’ve done. The first screenshot is the result of 3 changes:

    Hopefully this makes LibreOffice not only avoid crashing on such complex annotated contents, but also puts an end to the long story of "annotation marks from binary DOC" problems.

    Note
    Just like how C++11 perfect forwarding isn’t perfect — if you think it is, see "Familiarize yourself with perfect forwarding failure cases." in this post of Scoot — the above changes may still not result in a truly perfect import result of DOC annotation marks. But I think the #1 problem in this area is now solved. :-)


  • Saturday, 10 January 2015
    Export validation as a new year's resolution (Comments)

    TL;DR: If you touch the ODF and/or OOXML filters in LibreOffice, please use the --with-export-validation configure option after you ran the setup.sh script.

    Markus Mohrhard did an excellent job with adding the --with-export-validation build switch to LibreOffice. It does the following:

    • it validates every Calc and Impress zipped XML document (both ODF and OOXML) produced during the build by export filters

    • it does the same for Writer, except there only a subset of documents are validated

    One remaining problem was that it required setting up both odfvalidator and officeotron, neither of them are standard GNU projects but Java beasts. So even if I and a number of other developers do use this option, it happens from time to time that we need to fix new validation regressions, as others don’t see the problem; and even if we point it out, it’s hard to reproduce for the author of the problematic commit.

    This has just changed, all you need is to get export-validation/setup.sh from dev-tools.git, and run it like this:

    ./setup.sh ~/svn /opt/lo/bin

    I.e. the first parameter is a working directory and the second is a directory that’s writable by you and is already in your path. And then wait a bit… ODF validator uses maven as a build system, so how much you have to wait depends on how much of the maven dependencies you already have in your local cache… it’s typically 5 to 15 minutes.

    Once it’s done, you can add --with-export-validation to your autogen.input and then toplevel make will invoke odfvalidator and officeotron for the above mentioned documents.

    The new year is here, if you don’t have a new year’s resolution yet — or if you hate those, but you’re willing to adopt a new habit from time to time — then please consider --with-export-validation, so that such regressions can be detected before you publish your changes. Thanks! ;-)


  • Saturday, 27 December 2014
    Fixing the cloud problem (Comments)

    TL;DR: see above -- a number of preset shapes are now rendered correctly at any scale factors, where previously rendering problems occurred.

    fdo#87448 has a reproducer document that shows rendering errors with the scaled cloud preset shape definition. At first I thought that the OOXML spec has wrong definition for this shape type, but that turned out to be not the case. What was a problem is our implementation of the drawingML arcTo command. This implementation defines how we render such arcs as polygons when the shape is to be painted, and given that LibreOffice has native support for the drawingML arcTo / ODF G command, this implementation is invoked during rendering, it’s not an import/export problem.

    The rendering result looked like this before:

    The cloud is drawn using a set of moveTo and arcTo commands. MoveTo is easier, as it uses explicit coordinates, but arcTo is more complex. It has 4 parameters: the height and width of a "circle", and the start / end angle of an arc on that circle. (Of course if height and width do not equal, than that’s no longer a circle… ;-) ) The problem is that due to this, the distance vector between the arc’s start and end points is implicit — so if something is miscalculated, errors are nicely added to each other as more and more arcs are drawn. This is especially a problem if you later return to the end of an earlier arc using moveTo: if arcTo has some problem, then it’ll be clearly visible.

    After fixing UNO ARCANGLETO to only take care of scaling / translation only after counting the actual arc, we started to produce correct end points for the arcs and shapes started to appear correctly at any scale factor, yay! :-)

    One remaining problem was how to test this from cppunit, in the above commit I exported the shape to a metafile, and then I could use Tomaž's excellent MetafileXmlDump to assert that the end of an arc (implicit location) and the parameters of a moveTo command (explicit location) equal — when they do not, that’s what your eyes call a "rendering problem".


  • Saturday, 29 November 2014
    Document Liberation Project hacking experience (Comments)

    As someone who usually hacks on LibreOffice, external import filters produced by the Document Liberation Project cut both ways: they are great, as they deal with obscure formats and we get them for free, OTOH hacking such code is more complex than the usual LO code. I recently contributed a few patches to libvisio and libodfgen, but before I was able to do actual code changes, I had to set up a number of repositories and configure them to talk to each other — this post describes one possible setup that suited my needs.

    Building blocks

    DLP’s central project is librevenge and everything builds on top of that, either by calling it or called by it. In case the task is to turn VSDX files into ODG ones, it looks like this:

    libvisio can build a librevenge document model from Visio files (more on the various librevenge-based libraries here), libodfgen can generate ODF output from such document models (one other possibility would be e.g. libepubgen), and the writerperfect module provides kind of a controller for the remaining modules, e.g. for our purpose, a vsd2odg binary.

    Alternatives considered

    One possibility is to build LibreOffice, use --with-system-libvisio and similar switches, then clone the repos, install them system-wide (possibly with your modifications), and then you can test your changes just with building the various libs, without changing your LO build (more here). The drawback is that this way you pollute your system with unstable versions of those libs.

    An other possibility is to build LibreOffice as usual, and then use the external libraries patching mechanism to hack on the code. The drawback is that you have to work without git on the code, and also you can only work with a released version.

    The pkg-config approach

    So here is what I did to avoid the above mentioned drawbacks: all DLP projects use pkg-config to find the required libraries, so you can configure them in a way that allows building as a user, avoid installing them at all, and still execute vsd2odg using the libs with your changes. Here is how to do it:

    • librevenge:

    git clone git://git.code.sf.net/p/libwpd/librevenge
    cd librevenge
    ./autogen.sh
    ./configure --enable-debug
    make

    • libvisio:

    git clone git://gerrit.libreoffice.org/libvisio
    cd libvisio
    ./autogen.sh
    ./configure REVENGE_CFLAGS="-I/home/vmiklos/git/libreoffice/librevenge/inc" REVENGE_LIBS="-L/home/vmiklos/git/libreoffice/librevenge/src/lib/.libs/ -lrevenge-0.0" REVENGE_GENERATORS_CFLAGS="-I/home/vmiklos/git/libreoffice/librevenge/inc" REVENGE_GENERATORS_LIBS="-L/home/vmiklos/git/libreoffice/librevenge/src/lib/.libs/ -lrevenge-generators-0.0" REVENGE_STREAM_CFLAGS="-I/home/vmiklos/git/libreoffice/librevenge/inc" REVENGE_STREAM_LIBS="-L/home/vmiklos/git/libreoffice/librevenge/src/lib/.libs/ -lrevenge-stream-0.0" --enable-debug
    make

    • libodfgen:

    git clone git://git.code.sf.net/p/libwpd/libodfgen
    cd libodfgen
    ./autogen.sh
    ./configure REVENGE_CFLAGS="-I/home/vmiklos/git/libreoffice/librevenge/inc" REVENGE_LIBS="-L/home/vmiklos/git/libreoffice/librevenge/src/lib/.libs/ -lrevenge-0.0" REVENGE_STREAM_CFLAGS="-I/home/vmiklos/git/libreoffice/librevenge/inc" REVENGE_STREAM_LIBS="-L/home/vmiklos/git/libreoffice/librevenge/src/lib/.libs/ -lrevenge-stream-0.0" --enable-debug
    make

    • writerperfect:

    git clone git://git.code.sf.net/p/libwpd/writerperfect
    cd writerperfect
    ./autogen.sh
    ./configure REVENGE_CFLAGS="-I/home/vmiklos/git/libreoffice/librevenge/inc" REVENGE_LIBS="-L/home/vmiklos/git/libreoffice/librevenge/src/lib/.libs/ -lrevenge-0.0" REVENGE_STREAM_CFLAGS="-I/home/vmiklos/git/libreoffice/librevenge/inc" REVENGE_STREAM_LIBS="-L/home/vmiklos/git/libreoffice/librevenge/src/lib/.libs/ -lrevenge-stream-0.0" ODFGEN_CFLAGS="-I/home/vmiklos/git/libreoffice/libodfgen/inc" ODFGEN_LIBS="-L/home/vmiklos/git/libreoffice/libodfgen/src/.libs -lodfgen-0.1 -lrevenge-0.0 -lrevenge-stream-0.0" VISIO_CFLAGS="-I/home/vmiklos/git/libreoffice/libvisio/inc" VISIO_LIBS="-L/home/vmiklos/git/libreoffice/libvisio/src/lib/.libs -lvisio-0.1 -lrevenge-0.0" --enable-debug --with-libvisio

    Of course, replace /home/vmiklos/git/libreoffice/ with any other directory you like, just be consistent. ;-)

    Now you can hack on any of these libraries, you just need to build your changes, and then vsd2odg will produce a flat ODG that you can quickly test with any ODF processor, like LibreOffice. One remaining trick (in case you’re not an autotools expert) is that vsd2odg is a libtool shell script, not a binary. If you still want to run the underlying binary in gdb, here is how you can do that:

    libtool --mode=execute gdb --args vsd2odg /home/vmiklos/git/libreoffice/test.vsdx

    In case the above considered two alternatives are not sufficient for your purposes, then I hope you find this setup useful. ;-)


  • Saturday, 25 October 2014
    The yellow border around the pig (Comments)

    It turns out LibreOffice’s RTF and DOCX import filter ignored borders around Writer pictures. Given that this worked in the RTF case in the past, it’s a bit amusing that now the very same commit implements a new feature for the DOCX case and at the same time fixes a regression in the RTF filter. Code sharing FTW! :-)


  • Monday, 08 September 2014
    LibreOffice Conference 2014, Bern (Comments)

    This year’s LibreOffice conference was held in Bern, Switzerland. Links to my slides:

    During the sessions I also had some time to hack on the followings:

    • RTF export: added support for custom wrap polygon of Writer pictures

    • fixed fdo#82067 FILEOPEN: RTF images not in correct position

    • fixed fdo#82078 FILEOPEN: RTF bold text spilling over to non-bold text

    • fixed abi#10039 crasher on RTF export of page-anchored pictures

    Regarding the number of attendees, draw your own conclusions from the group picture — probably around 300 attendees, counting all days.

    Thanks for the organizers for this beautiful event — and also the sponsors! :-)

    My pictures are available here (and I also made a panorama).


  • Thursday, 28 August 2014
    Cleanup of ooxmltok in LibreOffice (Comments)

    (via aigle_dore)

    In June, we decided to get rid of XSLT usage in writerfilter, the module responsible for RTF and DOCX import in LibreOffice. As usual with cleaning up mess, this took time (about two months), but I’m now happy to say that I’m mostly done with this. :-)

    See the doctok blog post for some background, the topic here was to clean up the OOXML tokenizer, that is that building block that turns a zipped XML document into a token stream.

    The following problems are now solved:

    • Part of the module was generated code, the generator was implemented mostly in XSLT, but some bits were written in Perl and sed. About 4200 lines of XSLT code is now rewritten in Python, in about 1300 lines.

    • Given that we have much more developers who speak Python, compared to XSLT, nontrivial changes are now much easier in the generator: Jan Holesovsky cleaned up boost::unordered_map usage at places where we depended on the order of elements. (Yes, you read it correctly, that was the situation up till now!) This also helps reducing the size of the resulting writerfilter shared library.

    • The input of the code generator was the large model.xml file, and generator scripts only extracted interesting information from it, so if you mistyped something, you got no error messages, just silent failures. I’ve removed quite some XML elements and attributes from it which were parsed by none of the generator scripts and written a relax-ng schema for the remaining markup. Validating against this schema is part of the default build, so no more typos without a build failure. ;-) (The schema also contains quite some documentation, finally.)

    • A gperf hash of all possible OOXML elements / attribute names were duplicated in writerfilter, even if that information was already available from the oox module. This is now fixed, reducing the size of the shared library even further.

    • Also, both oox and writerfilter had a list of namespace URL’s, mapping them to an integer enumeration, and when the two lists didn’t match, Bad Things happened (read: usually resulted in a crash.) This is the past, I’ve refactored writerfilter to use the same namespace alias names as oox, and this allowed to get rid of the writerfilter copy of the namespace alias list. So in the future, if new namespaces have to added, only oox has to be extended.

    Oh and the bonus feature: I’ve implemented a script called watch-generated-code.sh, which can record a good state of the generated code, and then compare later generated results against that, so that refactoring of the generator can now be performed in a safe way: you can change the generator in any way to make it better, and still avoid accidental output changes. :-) This is particularly useful, as it only diffs the end result of the whole generation process (cxx and hxx files), not temporarily files, which are OK to change, as long as the end result is the same.

    As a conclusion, here are sizes of a stripped dbgutil version of the writerfilter shared library, from the libreoffice-4-3-branch-point and today’s master:

    $ git checkout oldest
    HEAD is now at b3130c8... 2014-05-21
    vmiklos@o9010:~/git/libreoffice/daily$ ls -lh opt/program/libwriterfilterlo.so
    -rwxr-xr-x 1 vmiklos users 8,3M aug   28 14:00 opt/program/libwriterfilterlo.so
    $ git checkout master
    Switched to branch 'master'
    vmiklos@o9010:~/git/libreoffice/daily$ ls -lh opt/program/libwriterfilterlo.so
    -rwxr-xr-x 1 vmiklos users 6,1M aug   28 14:01 opt/program/libwriterfilterlo.so

    Again, the 8,3MB → 6,1MB size reduction is mostly thanks to Kendy’s map cleanups + the duplicated gperf hash going away. :-)


  • Monday, 11 August 2014
    So many bugs (Comments)

    From time to time developers feel that they have little time, but they would want to take care of many bugs. Last week I was frustrated enough to actually design a T-shirt for just that. ;-)

    Above is how it looks like. In case you don’t get the joke, see here for a hint. Oh, and if you would like to build your own binary… err T-shirt, you can do it: here is the ODG file that can serve as a source. Happy bugfixing! :-)


  • Wednesday, 16 July 2014
    TextBox: complex LibreOffice Writer content inside shapes (Comments)

    TL;DR: see above — it’s now possible to have complex Writer content (charts, tracked changes, tables, fields, etc.) inside drawinglayer shapes, yay! :-)

    The problem

    Writer in LibreOffice 4.3 can have two kind of shapes: drawinglayer ones or Writer TextFrames. (Let’s ignore OLE objects and Writer pictures for now.) Drawinglayer shapes can be triangles (non-rectangular), rectangles can have rounded corners and so on, but shape text is handled by editeng — the same engine that is used for Impress shapes or Calc cells. OTOH a Writer TextFrame can contain anything that is supported by Writer (Writer fields, styles, tables, etc.), but its drawing capabilities are quite limited: no triangle, rounded corners, etc. Together with CloudOn, we thought the best would be to be able to have both, and started to use the "shape with TextBox" term for this feature.

    A user can already sort of to do this by creating a drawinglayer shape, then a Writer TextFrame, and by setting the properties of the Writer TextFrame (position, size, etc) to appear as if the TextFrame would be the shape text of the drawinglayer shape. The idea is to tie these two objects together, so the (UI and API) user sees them as a single object.

    Results

    I’m providing here a few screenshots. Above, you can see an ODF document having a rectangle with rounded corners, still containing a table.

    Given that OOXML has this feature since its birth, I’m also showing a few DOCX documents, which are now handled far better:

    • chart inside a left arrow callout:

    • tracked changes inside a cloud callout:

    • SmartArt inside a snip diagonal corner rectangle:

    • Table of Contents inside a pentagon:

    Details

    What follows is something you can probably skip if you’re a user — however if you’re a developer and you want to understand how the above is implemented, then read on. ;-)

    Situation in 4.3

    From the drawinglayer point of view: SwDoc contains an SdrModel (SwDoc::GetOrCreateDrawModel()), which contains a single SdrPage (SdrModel::GetPage()) — Draw/Impress contain multiple sdr pages. The SdrPage contains the shapes: e.g. a triangle is an SdrObjCustomShape. For TextFrames, a placeholder object called SwVirtFlyDrawObj is added to the draw page.

    The writer-specific properties of an SdrObject is stored as an SwFrmFmt object, an SwFrmFmt array is a member of SwDoc ("frame format table"). The anchor position and the node index of the frame contents counts as a property.

    At UNO level, a single DrawPage object is part of the Component (opened document), which abstracts away the internal SdrPage.

    For TextFrames, the UNO API works exactly the same way, except that the implementation stores all properties of the TextFrame in the SwFrmFmt (and some properties are different, compared to a drawinglayer shape).

    One remaining detail is how the shape text is represented. In case of drawinglayer shapes, this is provided by editeng: internally an EditTextObject provides a container for paragraphs, at UNO API level SvxUnoTextContent provides an interface that presents paragraphs and their text portions.

    For TextFrames, the contents of the frames is stored in a special section in the Writer text node array (in the 3rd toplevel section, while the 5th toplevel section is used for body text), that’s how it can contain anything that’s a valid Writer body text. An offset into this node array of the "content" property of the SwFrmFmt.

    Document model

    At a document model level, we need a way to describe that an SdrObject (provided by svx) has an associated TextFrame (provided by sw). svx can’t depend on sw, but in the SwFrmFmt of the SdrObject, we can use the so far unused RES_CNTNT ("content") property to point to a TextFrame content.

    So behind the scenes the UNO API and the UI does the following when turning on the TextBox bit for a drawinglayer shape:

    • creates a TextFrame

    • connects the SdrObject to the TextFrame

    Also, every property of the TextFrame depends on the properties of the SdrObject, think of the followings:

    • position / size is the largest rectangle that fits inside the shape

    • borders are disabled

    • background is transparent

    Finding the largest rectangle that fits inside the shape is probably the most interesting here, it’s implemented in SwTextBoxHelper::getTextRectangle(), which uses SdrObjCustomShape::GetTextBounds().

    UNO API

    The UNO API hides the detail that the TextFrame and the SdrObject are in fact two objects. To get there, the followings are done:

    • SwXShape is modified, so that in the TextBox case not editengine, but the attached TextFrame is accessed when getText() is invoked. This was a bit tricky, as SwXShape doesn’t have an explicit getText() implementation: it overrides queryInterface() instead (see SwTextBoxHelper::queryInterface()).

    • SwXDrawPage (its XEnumerationAccess and XIndexAccess) is modified to ignore TextFrames in the TextBox case

    • SwXTextPortionEnumeration is modified to ignore TextFrames in the TextBox case

    • SwXText::insertTextContent() and SwXText::appendTextContent() is modified to handle the TextBox case

    Layout

    This was the easiest part: the "merge TextFrame and SdrObj into a shape with TextBox" approach ensured that that we use existing layout features here, no major effort was necessary here.

    One interesting detail here was the positioning of as-character anchored shapes having TextBoxes, that’s now handled in SwFlyCntPortion::SetBase().

    Filters

    The primary point of this feature is to improve Word (and in particular DOCX) compatibility, and of course I wanted to update ODF as necessary as well.

    Regarding the new feature, I did the followings:

    • DOCX import now avoids setting service name from original to css.text.TextFrame in case shape has shape text

    • DOCX export now handles the TextBox case: reads Writer text instead of editeng text as necessary

    • ODF export now adds a new optional boolean attribute to make export of the TextBox case possible

    • ODF import now handles the new attribute and act accordingly

    Note that regarding backwards compatibility, we keep supporting editengine-based text as well. This has the best of two worlds:

    • existing ODF documents are unchanged, but

    • the TextBox feature is enabled unconditionally in DOCX import to avoid formatting loss

    User Interface

    I took care of the followings:

    • the context menu of shapes now provides an item to add / remove a TextBox to/from a shape

    • when moving or resizing a shape, the TextBox properties are updated as well

    • when the shape is deleted, the associated TextBox is also deleted

    • editing individual TextBox properties is no longer possible, since they depend on the shape properties

    Summary

    If you want to try these out yourself, get a daily build and play with it! If something goes wrong, report it to us in the Bugzilla, so we can try fix it before 4.4 gets branched off. Last, but not at least, thanks for CloudOn for funding these improvements! :-)


more »