shameless self-promoting website
»Rejourn root
Posted Monday, 15 April 2013 by Miklos
Tags: en libreoffice

Last week was Hackweek at SUSE — below is a quick summary on what experiments did I do during that timeframe.


I did some experiments with using lcov on the LibreOffice codebase. The goal is to have a quick iteration, so you can see the current coverage of a file or a directory, select a method that is not yet tested, add a test for it, and "test" the test by checking if the coverage indeed got improved. As a first step, I tried this out on the Writer RTF import:

cd writerfilter
touch source/rtftok/*
make -sr -j8 gb_GCOV=YES <1>
cd ../sw; make -sr -j8 CppunitTest_sw_rtfexport CppunitTest_sw_rtfimport <2>
lcov --directory workdir/unxlngx6/CxxObject/writerfilter/source/rtftok/ --capture --output-file libreoffice.info <3>
genhtml -o coverage libreoffice.info <4>
  1. rebuild selected files with lcov options

  2. run the tests

  3. extract coverage information to a single .info file

  4. generate some nice HTML output from the .info file

lcov had problems with gcc-4.7, fully updated openSUSE 12.2 or 12.3 is known to work.

There is a script available to make the above a bit more automated.

The speed of the above depends on the amount of code needing a rebuild + the number of tests, but it should not take more than a minute.

E.g. I noticed the bookmark import code isn’t tested, added a test for it, and that indeed improved the line coverage of rtfdocumentimpl.cxx: 84.1% → 85.0%.

A next area I wanted to test is the Writer RTF export. Let’s pick something in rtfattributeoutput.cxx… StartURL() is not tested, so a hyperlink testcase should help. Indeed it did: 50.2% → 52.0%.

Last, but not at least, thanks to Norbert Thiebaud, who added gb_GCOV to gbuild.

gdb pretty-printers

Then I experimented with improving our Writer gdb Python pretty-printers. One annoying shortcoming was the lack of handling uno::Reference<text::XTextRange>. Imagine one searches for a bug related to table import for DOCX or RTF. One idea is to check the arguments of the convertToTable() method call. The first argument is a 2D array of XTextRange pairs, that describe what will be the input for cell contents. So if you want to check the first cell, you do something like this:

(gdb) b DomainMapperTableHandler.cxx:798
(gdb) r
(gdb) print (*m_pTableSeq)[0][0]
$1 = uno::Sequence of length 2 = {uno::Reference to (XInterface) 0x1a73648, uno::Reference to (XInterface) 0x1a77f68}
(gdb) print (*m_pTableSeq)[0][0][0]
$2 = uno::Reference to (XInterface) 0x1a73648
(gdb) print (*m_pTableSeq)[0][0][1]
$3 = uno::Reference to (XInterface) 0x1a77f68

Not that helpful. Here is how one could work it around:

(gdb) print (*m_pTableSeq)[0][0][0]._pInterface->m_pImpl->m_pMark->m_pPos1
$4 = boost::scoped_ptr SwPosition (node 10, offset 0)
(gdb) print (*m_pTableSeq)[0][0][1]._pInterface->m_pImpl->m_pMark->m_pPos1
$5 = boost::scoped_ptr SwPosition (node 10, offset 20)

But this is not something anyone will remember. After adding a few new pretty-printers, now it’s like this:

(gdb) print (*m_pTableSeq)[0][0]
$1 = uno::Sequence of length 2 = {uno::Reference to (SwXTextRange *) 0x1a72b98, uno::Reference to (SwXTextRange *) 0x1a773b8}
(gdb) print *(*m_pTableSeq)[0][0][0]._pInterface
$2 = (SwXTextRange) SwXTextRange sw::UnoImplPtr SwXTextRange::Impl = {mark = sw::mark::IMark = {pos1 = boost::scoped_ptr SwPosition (node 10, offset 0), pos2 = empty boost::scoped_ptr}}
(gdb) print *(*m_pTableSeq)[0][0][1]._pInterface
$3 = (SwXTextRange) SwXTextRange sw::UnoImplPtr SwXTextRange::Impl = {mark = sw::mark::IMark = {pos1 = boost::scoped_ptr SwPosition (node 10, offset 20), pos2 = empty boost::scoped_ptr}}

Technically, it would be possible to make print (*m_pTableSeq)[0][0][0] work as well, but for a larger class without a pretty-printer that would result in multiple pages of output. Anyway, _pInterface is the same for all UNO objects, so something that is not too hard to remember.

An other improvement is the XTextCursor pretty-printer. Example usage: debugging of the commented text range ODF import. Before:

(gdb) b txtfldi.cxx:559
(gdb) print *rHlp.GetCursor()._pInterface->m_pImpl->pRegisteredIn->m_pMark
$1 = SwPosition (node 9, offset 4)

After the new pretty-printers one doesn’t have to type that much:

(gdb) print *rHlp.GetCursor()._pInterface
$1 = (SwXTextCursor)
    SwXTextCursor sw::UnoImplPtr SwXTextCursor::Impl = {registeredIn = SwModify = {point = SwPosition (node 9, offset 4), mark = SwPosition (node 9, offset 4), next = 0x1a28b88, prev = 0x1a28b88}}

RTF filter text frame rework

Finally, I experimented with reworking the textframe code in the RTF filter. In short, the motivation is to bring the RTF filter in sync with the OOXML one, which can nicely import and export text box gradients. To get there, there are 3 different problems to solve:

  1. The RTF import filter currently imports rectangle and textbox shapes as drawinglayer rectangles, even if they have some text inside. Just like the OOXML import filter, we would better import these shapes as Writer textframes, as long as they contain some text.

  2. The RTF export writes Writer textframes as old-style Word frames, not as text box shapes. This should be changed, as the old syntax doesn’t support gradients, and in general both the DOC and DOCX export filters already export new-style Word frames, so there is no reason why the RTF filter would not do the same.

  3. Once all the above is done, add support for gradients in the RTF filter, in a similar way OOXML filters were already improved to handle gradients.

  4. Once this all is done, add new testcases to cover the new code.

First I had hacked on #1, sadly Writer textframes and drawinglayer rectangles don’t share the exactly same UNO API, like drawinglayer has TextWritingMode and a Name property, Writer textframes have a WritingMode property instead, and additionally they implement the XNamed UNO interface, etc.

Then I switched to #3 — there I managed to reuse our existing VML import to do the hard work: the RTF tokenizer reads the RTF shape properties, then constructs the same VML model what is normally built from v:fill and v:shadow XML elements inside DOCX files, finally the VML import does the mapping of Word’s gradient concept to the Writer gradient concept.

At the end of the week I also hacked on #2 and #4 — and while I did so, I noticed two more interesting details of Word’s new-style RTF textframe markup:

  • The bad news: Writer supports having different top/left/bottom/right borders, RTF still just supports the concept of a single line around the textframe.

  • The good news: old-style RTF frames didn’t support different left/right or top/bottom external margins, but Writer does — so now using the new syntax, this is exported properly.


Unrelated to the above, I fixed an annoying git bug, when one tried to cherry-pick multiple commits at the same time, and copy&paste went wrong, the "unrecognized" arguments were just silently ignored. Now one gets an error instead.


In parallel to the above, Thorsten was kind enough to explain how to update docs.libreoffice.org: The new output is generated using doxygen 1.8, it contains a bit more eye-candy. E.g. notice the new foldable subsections here. ;-)