shameless self-promoting website
»Rejourn root
»LibreOffice Community Blogs
»hu (1199)
»en (508)
»hacking (416)
»film (285)
»geek (82)
»libreoffice (60)
»bringa (60)
»konyv (43)
»misc (37)
»zene (32)
»munka (28)
»fun (27)
»frugalware (23)
»sieles (7)
»bitlbee (5)
»gsoc2009 (4)
»git (3)
»szinhaz (3)
»gsoc2008 (3)
»opensuse (3)
»gsoc2011 (2)
»go-oo (2)
»kde (1)
»libwpd (1)
»karacsony (1)
»google (1)
»fail (1)
»openstack (1)
»mdadm (1)
»w3c (1)
»java (1)
»greasemonkey (1)
»auto (1)
»otrs (1)
»gsoc2010 (1)
»bme (1)
»python (1)
»howto (1)
»openoffice (1)
»gpsbabel (1)
»supybot (1)
»lcov (1)
»nyaralas (1)
  • Thursday, 28 August 2014
    Cleanup of ooxmltok in LibreOffice (Comments)

    (via aigle_dore)

    In June, we decided to get rid of XSLT usage in writerfilter, the module responsible for RTF and DOCX import in LibreOffice. As usual with cleaning up mess, this took time (about two months), but I’m now happy to say that I’m mostly done with this. :-)

    See the doctok blog post for some background, the topic here was to clean up the OOXML tokenizer, that is that building block that turns a zipped XML document into a token stream.

    The following problems are now solved:

    • Part of the module was generated code, the generator was implemented mostly in XSLT, but some bits were written in Perl and sed. About 4200 lines of XSLT code is now rewritten in Python, in about 1300 lines.

    • Given that we have much more developers who speak Python, compared to XSLT, nontrivial changes are now much easier in the generator: Jan Holesovsky cleaned up boost::unordered_map usage at places where we depended on the order of elements. (Yes, you read it correctly, that was the situation up till now!) This also helps reducing the size of the resulting writerfilter shared library.

    • The input of the code generator was the large model.xml file, and generator scripts only extracted interesting information from it, so if you mistyped something, you got no error messages, just silent failures. I’ve removed quite some XML elements and attributes from it which were parsed by none of the generator scripts and written a relax-ng schema for the remaining markup. Validating against this schema is part of the default build, so no more typos without a build failure. ;-) (The schema also contains quite some documentation, finally.)

    • A gperf hash of all possible OOXML elements / attribute names were duplicated in writerfilter, even if that information was already available from the oox module. This is now fixed, reducing the size of the shared library even further.

    • Also, both oox and writerfilter had a list of namespace URL’s, mapping them to an integer enumeration, and when the two lists didn’t match, Bad Things happened (read: usually resulted in a crash.) This is the past, I’ve refactored writerfilter to use the same namespace alias names as oox, and this allowed to get rid of the writerfilter copy of the namespace alias list. So in the future, if new namespaces have to added, only oox has to be extended.

    Oh and the bonus feature: I’ve implemented a script called watch-generated-code.sh, which can record a good state of the generated code, and then compare later generated results against that, so that refactoring of the generator can now be performed in a safe way: you can change the generator in any way to make it better, and still avoid accidental output changes. :-) This is particularly useful, as it only diffs the end result of the whole generation process (cxx and hxx files), not temporarily files, which are OK to change, as long as the end result is the same.

    As a conclusion, here are sizes of a stripped dbgutil version of the writerfilter shared library, from the libreoffice-4-3-branch-point and today’s master:

    $ git checkout oldest
    HEAD is now at b3130c8... 2014-05-21
    vmiklos@o9010:~/git/libreoffice/daily$ ls -lh opt/program/libwriterfilterlo.so
    -rwxr-xr-x 1 vmiklos users 8,3M aug   28 14:00 opt/program/libwriterfilterlo.so
    $ git checkout master
    Switched to branch 'master'
    vmiklos@o9010:~/git/libreoffice/daily$ ls -lh opt/program/libwriterfilterlo.so
    -rwxr-xr-x 1 vmiklos users 6,1M aug   28 14:01 opt/program/libwriterfilterlo.so

    Again, the 8,3MB → 6,1MB size reduction is mostly thanks to Kendy’s map cleanups + the duplicated gperf hash going away. :-)

  • Monday, 11 August 2014
    So many bugs (Comments)

    From time to time developers feel that they have little time, but they would want to take care of many bugs. Last week I was frustrated enough to actually design a T-shirt for just that. ;-)

    Above is how it looks like. In case you don’t get the joke, see here for a hint. Oh, and if you would like to build your own binary… err T-shirt, you can do it: here is the ODG file that can serve as a source. Happy bugfixing! :-)

  • Wednesday, 16 July 2014
    TextBox: complex LibreOffice Writer content inside shapes (Comments)

    TL;DR: see above — it’s now possible to have complex Writer content (charts, tracked changes, tables, fields, etc.) inside drawinglayer shapes, yay! :-)

    The problem

    Writer in LibreOffice 4.3 can have two kind of shapes: drawinglayer ones or Writer TextFrames. (Let’s ignore OLE objects and Writer pictures for now.) Drawinglayer shapes can be triangles (non-rectangular), rectangles can have rounded corners and so on, but shape text is handled by editeng — the same engine that is used for Impress shapes or Calc cells. OTOH a Writer TextFrame can contain anything that is supported by Writer (Writer fields, styles, tables, etc.), but its drawing capabilities are quite limited: no triangle, rounded corners, etc. Together with CloudOn, we thought the best would be to be able to have both, and started to use the "shape with TextBox" term for this feature.

    A user can already sort of to do this by creating a drawinglayer shape, then a Writer TextFrame, and by setting the properties of the Writer TextFrame (position, size, etc) to appear as if the TextFrame would be the shape text of the drawinglayer shape. The idea is to tie these two objects together, so the (UI and API) user sees them as a single object.


    I’m providing here a few screenshots. Above, you can see an ODF document having a rectangle with rounded corners, still containing a table.

    Given that OOXML has this feature since its birth, I’m also showing a few DOCX documents, which are now handled far better:

    • chart inside a left arrow callout:

    • tracked changes inside a cloud callout:

    • SmartArt inside a snip diagonal corner rectangle:

    • Table of Contents inside a pentagon:


    What follows is something you can probably skip if you’re a user — however if you’re a developer and you want to understand how the above is implemented, then read on. ;-)

    Situation in 4.3

    From the drawinglayer point of view: SwDoc contains an SdrModel (SwDoc::GetOrCreateDrawModel()), which contains a single SdrPage (SdrModel::GetPage()) — Draw/Impress contain multiple sdr pages. The SdrPage contains the shapes: e.g. a triangle is an SdrObjCustomShape. For TextFrames, a placeholder object called SwVirtFlyDrawObj is added to the draw page.

    The writer-specific properties of an SdrObject is stored as an SwFrmFmt object, an SwFrmFmt array is a member of SwDoc ("frame format table"). The anchor position and the node index of the frame contents counts as a property.

    At UNO level, a single DrawPage object is part of the Component (opened document), which abstracts away the internal SdrPage.

    For TextFrames, the UNO API works exactly the same way, except that the implementation stores all properties of the TextFrame in the SwFrmFmt (and some properties are different, compared to a drawinglayer shape).

    One remaining detail is how the shape text is represented. In case of drawinglayer shapes, this is provided by editeng: internally an EditTextObject provides a container for paragraphs, at UNO API level SvxUnoTextContent provides an interface that presents paragraphs and their text portions.

    For TextFrames, the contents of the frames is stored in a special section in the Writer text node array (in the 3rd toplevel section, while the 5th toplevel section is used for body text), that’s how it can contain anything that’s a valid Writer body text. An offset into this node array of the "content" property of the SwFrmFmt.

    Document model

    At a document model level, we need a way to describe that an SdrObject (provided by svx) has an associated TextFrame (provided by sw). svx can’t depend on sw, but in the SwFrmFmt of the SdrObject, we can use the so far unused RES_CNTNT ("content") property to point to a TextFrame content.

    So behind the scenes the UNO API and the UI does the following when turning on the TextBox bit for a drawinglayer shape:

    • creates a TextFrame

    • connects the SdrObject to the TextFrame

    Also, every property of the TextFrame depends on the properties of the SdrObject, think of the followings:

    • position / size is the largest rectangle that fits inside the shape

    • borders are disabled

    • background is transparent

    Finding the largest rectangle that fits inside the shape is probably the most interesting here, it’s implemented in SwTextBoxHelper::getTextRectangle(), which uses SdrObjCustomShape::GetTextBounds().


    The UNO API hides the detail that the TextFrame and the SdrObject are in fact two objects. To get there, the followings are done:

    • SwXShape is modified, so that in the TextBox case not editengine, but the attached TextFrame is accessed when getText() is invoked. This was a bit tricky, as SwXShape doesn’t have an explicit getText() implementation: it overrides queryInterface() instead (see SwTextBoxHelper::queryInterface()).

    • SwXDrawPage (its XEnumerationAccess and XIndexAccess) is modified to ignore TextFrames in the TextBox case

    • SwXTextPortionEnumeration is modified to ignore TextFrames in the TextBox case

    • SwXText::insertTextContent() and SwXText::appendTextContent() is modified to handle the TextBox case


    This was the easiest part: the "merge TextFrame and SdrObj into a shape with TextBox" approach ensured that that we use existing layout features here, no major effort was necessary here.

    One interesting detail here was the positioning of as-character anchored shapes having TextBoxes, that’s now handled in SwFlyCntPortion::SetBase().


    The primary point of this feature is to improve Word (and in particular DOCX) compatibility, and of course I wanted to update ODF as necessary as well.

    Regarding the new feature, I did the followings:

    • DOCX import now avoids setting service name from original to css.text.TextFrame in case shape has shape text

    • DOCX export now handles the TextBox case: reads Writer text instead of editeng text as necessary

    • ODF export now adds a new optional boolean attribute to make export of the TextBox case possible

    • ODF import now handles the new attribute and act accordingly

    Note that regarding backwards compatibility, we keep supporting editengine-based text as well. This has the best of two worlds:

    • existing ODF documents are unchanged, but

    • the TextBox feature is enabled unconditionally in DOCX import to avoid formatting loss

    User Interface

    I took care of the followings:

    • the context menu of shapes now provides an item to add / remove a TextBox to/from a shape

    • when moving or resizing a shape, the TextBox properties are updated as well

    • when the shape is deleted, the associated TextBox is also deleted

    • editing individual TextBox properties is no longer possible, since they depend on the shape properties


    If you want to try these out yourself, get a daily build and play with it! If something goes wrong, report it to us in the Bugzilla, so we can try fix it before 4.4 gets branched off. Last, but not at least, thanks for CloudOn for funding these improvements! :-)

  • Tuesday, 08 July 2014
    Updated Writer training slides (Comments)

    (via michaeljosh)

    Last year I published some Writer training slides, which are hopefully a useful extension to in-tree documentation like sw/README and sw/qa/extras/README.

    Last week I reviewed those slides and realized that some of them are outdated. So here comes an updated version:

    The intention is that these build nicely on top of Michael’s generic intro slides, and with that, the reader can have a good "big picture" understanding of the code base. For the gory details, you always need to read the code anyway. ;-)

  • Thursday, 19 June 2014
    CLUC 2014 Conference (Comments)

    I’ve arrived home yesterday from Zagreb where I gave a keynote at CLUC 2014 on Tuesday.

    Here are a few talks I enjoyed:

    I also took a panorama and some pictures, available here, including photos of some speakers.

    Thanks Elizabeth for the above photo, and also to the organizers of the conference, it was a great one! ;-)

  • Sunday, 08 June 2014
    Improved handling of track changes in groupshape text (Comments)

    Shapes in Writer are provided by LibreOffice’s drawing layer — they are independent from the normal Writer paragraphs. Given that the drawing layer does not support tracking changes, just Writer’s "native" paragraphs, fully featured tracked changes in real shape text would be quite some work. In case of ODF, the markup describes tracked changes in a way, so that in case the reader does not support tracking changes, it can at least read the normal and inserted text, i.e. the current version.

    This is exactly what I implemented in the DOCX import filter now:

    Previously we just ignored both inserted and deleted text, so if you had content which was all either deleted or inserted, you ended up having no shape text at all (can be tested using e.g. this test document):

    To be fair, the reference layout looks like this:

    I still hope to fix that as well one day, but the above fix is something we’ll already provide in 4.3. :-)

  • Sunday, 01 June 2014
    Balaton Maraton 2014 (Comments)

    Idén is részt vettünk LGee-vel a Balaton egy napos körbetekerését célzó, 10 órás szintidős Balaton Maratonon, a tavalyihoz hasonló feltételek mellett. Elég csak a fenti képen látható rajt látványát összehasonlítani a tavalyival. ;-)

    A velo.hu idén is végigfotózta a résztvevők nagy részét, így engem is:

    A kmóra ezeket mérte: idő 8h52m14s, táv 221.68km, átlag 24.9 kmph, max 49.2 kmph.

    A bruttó időm pedig idén 9h20m körül volt, ami ugyan a szintidőn belül van, de rosszabb volt a tavalyinál. Erre biztos sok magyarázatot lehet adni (a héten sikerült megfázni, főleg az északi parton erős szél volt), viszont szerintem a döntő körülmény az volt, hogy kevesebb szakaszon sikerült találni olyan bolyt, akikkel kényelmesen tudtam volna együtt haladni — már pedig ha ez adott, akkor jelentősen jobban lehet haladni.

    A GPS log sebesség / idő diagramja egyébként így néz ki:

    Szépen látszik rajta az 5 rövid pihenő, illetve egy pillanatnyi megállás, ahogy elbizonytalanodtunk a 71-esről a 7-esre áttérés során. :-)

  • Thursday, 22 May 2014
    Bringás ellenőrző lista (Comments)

    A hétvégén a K100 útvonalát jártam végig edzés céljából. A fenti képen a Vácrátót előtti egyik alattomos emelkedő látható ;-)

    Az emelkedő vége felé jutott eszembe, hogy itthon hagytam a kulacsom. Mivel már olyan is előfordult, hogy az SPD-s cipőmet hagytam otthon (mikor nem itthonról indultunk, hanem kocsival Mátrába, és csak ott bringáztunk), ill. pár éve LGee a bukósisakját hagyta otthon (és kellett a Pelso helyszínén vegyen egyet), gondoltam összeállítok egy listát, hogy mi az a 10 legfontosabb dolog, amit semmiképpen nem érdemes otthon hagyni, ha dedikáltan bringázni indul az ember:

    1. első és hátsó lámpa

    2. bukósisak

    3. cipő

    4. nyeregtáska (pótbelső, minimális mennyiségű szerszám, pumpa)

    5. kulacs

    6. kaja (ha a szervezők nem biztosítanak)

    7. kmóra

    8. kesztyű

    9. fényvisszaverő csík (hosszúnadrág esetén, hogy ne legyen olajos)

    10. láthatósági mellény (este, lakott területen kívül)

    11. láncolaj, GPS

    12. naptej / krém

    13. bringás nadrág / mez

    Hátha ez segít, hogy a jövő heti Pelsora menet most ne hagyjunk itthon semmit. :-)

  • Sunday, 18 May 2014
    OTRS plugin for Supybot (Comments)

    OTRS is quite different to Bugzilla (what we use for upstream LibreOffice development for quite some time). On the plus side, e.g. it has strong support for multiple customers. OTOH, it deals with tickets instead of bugs, and sadly it doesn’t have a single identifier for tickets. It has a ticket number (which by default even includes the date), which is searchable, and it has a ticket ID, which is used for URL’s.

    In case of Bugzilla, you can easily lookup "bug#12345" in Firefox. Create a bookmark with the following properties:

    and then you can just copy&paste bug#12345 to Firefox, replace the # with a space, and Firefox will do the right thing.

    Unfortunately (due to the above detailed reasons), this is not possible with OTRS. So I decided to write a Supybot plugin that can notice "bug#12345" on an IRC channel, and give you the clickable URL (after finding out the ticket ID from the ticket number).

    The result is available on GitHub, it looks like this:

    09:58 < vmiklos> bug#1000068
    09:58 < supybot> https://localhost/otrs/index.pl?Action=AgentTicketZoom;TicketID=73

    Given that I found no relevant hits when searching for supybot otrs, I hope this code may be useful for others as well.

    Thanks to James Scott for his YouTube plugin that helped to quickly figure out the relevant Supybot API’s.

  • Sunday, 13 April 2014
    Chihiro szellemországban (Comments)

    Valamiért sose fogott meg a japán anime műfaj, de erről annyi pozitívumot olvastam, hogy megnéztük. Tényleg nem rossz.

more »