Index ¦ Archives ¦ RSS > Category: libreoffice ¦ RSS

TextBox: complex LibreOffice Writer content inside shapes

Estimated read time: 5 minutes

TL;DR: see above — it’s now possible to have complex Writer content (charts, tracked changes, tables, fields, etc.) inside drawinglayer shapes, yay! :-)

The problem

Writer in LibreOffice 4.3 can have two kind of shapes: drawinglayer ones or Writer TextFrames. (Let’s ignore OLE objects and Writer pictures for now.) Drawinglayer shapes can be triangles (non-rectangular), rectangles can have rounded corners and so on, but shape text is handled by editeng — the same engine that is used for Impress shapes or Calc cells. OTOH a Writer TextFrame can contain anything that is supported by Writer (Writer fields, styles, tables, etc.), but its drawing capabilities are quite limited: no triangle, rounded corners, etc. Together with CloudOn, we thought the best would be to be able to have both, and started to use the "shape with TextBox" term for this feature.

A user can already sort of to do this by creating a drawinglayer shape, then a Writer TextFrame, and by setting the properties of the Writer TextFrame (position, size, etc) to appear as if the TextFrame would be the shape text of the drawinglayer shape. The idea is to tie these two objects together, so the (UI and API) user sees them as a single object.

Results

I’m providing here a few screenshots. Above, you can see an ODF document having a rectangle with rounded corners, still containing a table.

Given that OOXML has this feature since its birth, I’m also showing a few DOCX documents, which are now handled far better:

  • chart inside a left arrow callout:

  • tracked changes inside a cloud callout:

  • SmartArt inside a snip diagonal corner rectangle:

  • Table of Contents inside a pentagon:

Details

What follows is something you can probably skip if you’re a user — however if you’re a developer and you want to understand how the above is implemented, then read on. ;-)

Situation in 4.3

From the drawinglayer point of view: SwDoc contains an SdrModel (SwDoc::GetOrCreateDrawModel()), which contains a single SdrPage (SdrModel::GetPage()) — Draw/Impress contain multiple sdr pages. The SdrPage contains the shapes: e.g. a triangle is an SdrObjCustomShape. For TextFrames, a placeholder object called SwVirtFlyDrawObj is added to the draw page.

The writer-specific properties of an SdrObject is stored as an SwFrmFmt object, an SwFrmFmt array is a member of SwDoc ("frame format table"). The anchor position and the node index of the frame contents counts as a property.

At UNO level, a single DrawPage object is part of the Component (opened document), which abstracts away the internal SdrPage.

For TextFrames, the UNO API works exactly the same way, except that the implementation stores all properties of the TextFrame in the SwFrmFmt (and some properties are different, compared to a drawinglayer shape).

One remaining detail is how the shape text is represented. In case of drawinglayer shapes, this is provided by editeng: internally an EditTextObject provides a container for paragraphs, at UNO API level SvxUnoTextContent provides an interface that presents paragraphs and their text portions.

For TextFrames, the contents of the frames is stored in a special section in the Writer text node array (in the 3rd toplevel section, while the 5th toplevel section is used for body text), that’s how it can contain anything that’s a valid Writer body text. An offset into this node array of the "content" property of the SwFrmFmt.

Document model

At a document model level, we need a way to describe that an SdrObject (provided by svx) has an associated TextFrame (provided by sw). svx can’t depend on sw, but in the SwFrmFmt of the SdrObject, we can use the so far unused RES_CNTNT ("content") property to point to a TextFrame content.

So behind the scenes the UNO API and the UI does the following when turning on the TextBox bit for a drawinglayer shape:

  • creates a TextFrame

  • connects the SdrObject to the TextFrame

Also, every property of the TextFrame depends on the properties of the SdrObject, think of the followings:

  • position / size is the largest rectangle that fits inside the shape

  • borders are disabled

  • background is transparent

Finding the largest rectangle that fits inside the shape is probably the most interesting here, it’s implemented in SwTextBoxHelper::getTextRectangle(), which uses SdrObjCustomShape::GetTextBounds().

UNO API

The UNO API hides the detail that the TextFrame and the SdrObject are in fact two objects. To get there, the followings are done:

  • SwXShape is modified, so that in the TextBox case not editengine, but the attached TextFrame is accessed when getText() is invoked. This was a bit tricky, as SwXShape doesn’t have an explicit getText() implementation: it overrides queryInterface() instead (see SwTextBoxHelper::queryInterface()).

  • SwXDrawPage (its XEnumerationAccess and XIndexAccess) is modified to ignore TextFrames in the TextBox case

  • SwXTextPortionEnumeration is modified to ignore TextFrames in the TextBox case

  • SwXText::insertTextContent() and SwXText::appendTextContent() is modified to handle the TextBox case

Layout

This was the easiest part: the "merge TextFrame and SdrObj into a shape with TextBox" approach ensured that that we use existing layout features here, no major effort was necessary here.

One interesting detail here was the positioning of as-character anchored shapes having TextBoxes, that’s now handled in SwFlyCntPortion::SetBase().

Filters

The primary point of this feature is to improve Word (and in particular DOCX) compatibility, and of course I wanted to update ODF as necessary as well.

Regarding the new feature, I did the followings:

  • DOCX import now avoids setting service name from original to css.text.TextFrame in case shape has shape text

  • DOCX export now handles the TextBox case: reads Writer text instead of editeng text as necessary

  • ODF export now adds a new optional boolean attribute to make export of the TextBox case possible

  • ODF import now handles the new attribute and act accordingly

Note that regarding backwards compatibility, we keep supporting editengine-based text as well. This has the best of two worlds:

  • existing ODF documents are unchanged, but

  • the TextBox feature is enabled unconditionally in DOCX import to avoid formatting loss

User Interface

I took care of the followings:

  • the context menu of shapes now provides an item to add / remove a TextBox to/from a shape

  • when moving or resizing a shape, the TextBox properties are updated as well

  • when the shape is deleted, the associated TextBox is also deleted

  • editing individual TextBox properties is no longer possible, since they depend on the shape properties

Summary

If you want to try these out yourself, get a daily build and play with it! If something goes wrong, report it to us in the Bugzilla, so we can try fix it before 4.4 gets branched off. Last, but not at least, thanks for CloudOn for funding these improvements! :-)


Updated Writer training slides

Estimated read time: 1 minutes

(via michaeljosh)

Last year I published some Writer training slides, which are hopefully a useful extension to in-tree documentation like sw/README and sw/qa/extras/README.

Last week I reviewed those slides and realized that some of them are outdated. So here comes an updated version:

The intention is that these build nicely on top of Michael’s generic intro slides, and with that, the reader can have a good "big picture" understanding of the code base. For the gory details, you always need to read the code anyway. ;-)


CLUC 2014 Conference

Estimated read time: 1 minutes

I’ve arrived home yesterday from Zagreb where I gave a keynote at CLUC 2014 on Tuesday.

Here are a few talks I enjoyed:

I also took a panorama and some pictures, available here, including photos of some speakers.

Thanks Elizabeth for the above photo, and also to the organizers of the conference, it was a great one! ;-)


Improved handling of track changes in groupshape text

Estimated read time: 1 minutes

Shapes in Writer are provided by LibreOffice’s drawing layer — they are independent from the normal Writer paragraphs. Given that the drawing layer does not support tracking changes, just Writer’s "native" paragraphs, fully featured tracked changes in real shape text would be quite some work. In case of ODF, the markup describes tracked changes in a way, so that in case the reader does not support tracking changes, it can at least read the normal and inserted text, i.e. the current version.

This is exactly what I implemented in the DOCX import filter now:

Previously we just ignored both inserted and deleted text, so if you had content which was all either deleted or inserted, you ended up having no shape text at all (can be tested using e.g. this test document):

To be fair, the reference layout looks like this:

I still hope to fix that as well one day, but the above fix is something we’ll already provide in 4.3. :-)


Improved support for text frames with relative sizes in LibreOffice Writer

Estimated read time: 2 minutes

When using text frames in Writer, you can always choose if you set an absolute size for it or you set a relative one. Oddly enough, in case of relative sizes, it wasn’t entirely clear what 100% percent means. With a bit of searching, the help says "it’s the page text area", which in practice means the page size, excluding the margins.

And that’s where the problem lies: in many cases (importing foreign formats, cover page of a document, etc.) you want to have a textframe which is 100% wide, compared to the full page size, including margins. It was already possible previously to work this around by manually specifying the same size what was used for page size, but that’s ugly, you duplicate the setting at two places.

As you can see on the above screenshot, in LibreOffice 4.3, I now implemented this as a new option, you can choose what 100% means for both width and height. File filters are also updated accordingly: in case of ODF an extension is proposed, and also DOCX and RTF filters are updated, where the file format already supported this feature.

For the curious ones, the feature is in master for almost two months now, but I only implemented my favorite part — RTF filter — only last week, that’s the "news" here. ;-)

If you want to try these out yourself, get a daily build and play with it! If something goes wrong, report it to us in the Bugzilla, so we can try fix it before 4.3 gets branched off. Last, but not at least, thanks for CloudOn for funding this improvement! :-)


DOCX import progressbar in LibreOffice Writer

Estimated read time: 1 minutes

I’m sure in this case a few words are worth more than the above picture, so let me describe what you see above. :-)

In case of opening an ODT, DOC or RTF document in LibreOffice Writer, you already got some feedback on where the importer is, in case the process needed more time than what you feel "instant". However, this wasn’t supported for DOCX. According to git blame, I added this to my todo on 2012-10-29, and a few months later also a bugreport was opened, requesting the same, but up to yesterday, nothing changed. However, now I’ve implemented this on master, it’ll be part of the 4.3 release.

Back to where I started, what you actually see there is when LibreOffice is in the middle of the import process of the Holy Bible in DOCX format, which takes around 12 seconds on my machine. One could say that speed up quite acceptable for that amount of data, but with a progressbar, it’s definitely better. ;-)


Death of doctok in LibreOffice

Estimated read time: 3 minutes

Last year in September we decided to get rid of the writerfilter-based DOC tokenizer, and I volunteered to actually do this. As cleanups in general have a low priority, I only progressed with this slowly, though yesterday I completed it, that’s why I’m writing this post. :-)

Some background: the writerfilter module is responsible for RTF and DOCX import in Writer. As the above picture shows, the currently used DOC import is independent from it, and there was also an other DOC import filter, that was in writerfilter which was disabled at runtime. As I don’t like duplication, I examined the state of the two filters, and the linked minutes mail details how we decided that the old filter will stay, and we’ll get rid of the writerfilter one. It’s just a matter of deleting that code, right? :-) That’s what I thought first. But then I had to realize that the architecture of writerfilter is a bit more complex:

It has the following components:

  • the dmapper (domain mapper), that handles all the nasty complexities of mapping Word concepts to Writer concepts (think of e.g. sections ↔ page styles)

  • one tokenizer for each (RTF, DOCX, DOC) format

The traffic between the tokenizers and dmapper is called tokens. Naturally it’s not enough that tokenizers send and dmapper receives these tokens, they should be defined somewhere as well. And that’s where I realized this work will take a bit more time: instead of having a single token definition, actually the ooxml tokenizer defined its own grammar, and doctok also defined two additional grammars. And of course dmapper had to handle all of that. ;-) Given that OOXML is a superset of the DOC/RTF format, it makes sense to just use the ooxml grammar, and get rid of the other two.

Especially that — by now you probably found this out — if I wanted to kill doctok, I had to kill the sprm and rtf grammars as well. Otherwise just removing doctok would break the RTF and DOCX import as well, as those also used the rtf/sprm grammars.

So at the end, the cleaned up architecture now looks like this:

And that has multiple advantages:

  • It removes quite some code: In libreoffice-4-1, the doctok was 78849 (!) lines of code (well, part of that was XML data, and some scripts generated C++ code from that).

  • dmapper now doesn’t have to handle the rtf and sprm grammars anymore, so now there is a single place in dmapper that handles e.g. the italic character property.

  • Smaller writerfilter binary for the end user: even if doctok wasn’t enabled at runtime, it was shipped in the installation set.

  • Hopefully it’s now a bit more easy to understand writerfilter: at least e.g. if you want to look up the place where dmapper handles the character bold ("b") XML tag of OOXML, you don’t have to know that the binary DOC equivalent of that is sprmCFBold, just because we have an unused DOC tokenizer there as well. :-)

  • Given that DOC and RTF formats are a dead end, I think it’s a good thing that in writerfilter now the grammar is OOXML (that keeps introducing new features), rather than some dead format. ;-)


LibreOffice Writer now supports nested comments in its DOC/RTF filters

Estimated read time: 1 minutes

If you ever tried to use nested comments in Writer (make a selection, Insert → Comment, then make an overlapping selection, and do it again), you may have noticed that only the ODF filter can load and save such a document properly. Recently I have improved this situation a lot. Motivated by seeing this is now supported in the DOCX import filter, I now added support for this also to the DOCX export, RTF import/export and binary DOC import/export filters.

If you want to try this out, core.git has a ODT and DOC samples to play with.


InteropGrabBag in LibreOffice Writer

Estimated read time: 1 minutes

I’ve arrived home yesterday from Brussels where I presented at FOSDEM 2014, in the Open document editors devroom.

We also had a Hackfest, kindly hosted by Betacowork on Monday and Tuesday.

Here are a few talks I enjoyed, not counting the LibreOffice ones:

I was also happy to meet Jacobo, Matus, Tim and Tomaž finally personally. :-)

Quite some other slides are now available on Planet, don’t miss them. I also took some pictures, available here, including photos of all speakers in our devroom.


OOXML shape improvements in LibreOffice Writer 4.3

Estimated read time: 2 minutes

Although LibreOffice 4.2.0 is not yet released, it was already branched off from master in November last year, and improvements for the next release are already cooking in master. One of these will be a major improvement of shape handling in the DOCX import/export filter.

Some background: when DOCX was initially introduced, it still used VML (which is in short an XML equivalent of the binary shape format), and only Word 2010 started to write shapes using drawingML. Given that Word still understands VML, it wasn’t urgent for us to write shapes using the drawingML markup. As for import, Word still writes an approximate version of the shape in VML as a fallback — that’s what we read till now. Needless to say, newer drawingML features have no VML equivalent so with time it became more and more important for us to finally read and write shapes in DOCX using drawingML, which just happened in Writer.

I’m posting here a few screenshots showing the improvements I’ve implemented. Note that final 4.3 is still far from being released, so this is not a complete list. :-) In each case I’m providing a screenshot showing how it looked (at the end of an import/export/import again roundtrip) before, how it looks now in 4.3 and the reference layout. Click on the images to get a larger image:

  • document with different colors (test doc):

OK, this has four pictures: before, now, Word 2007 and Word2010. As you can see now we’re now on par with Word 2010. ;-)

  • document with textboxes inside a group shape (test doc):

  • document with a shape having a custom adjustment (test doc):

  • document with different colors (test doc):

If you want to try these out yourself, get a daily build and play with it! If something goes wrong, report it to us in the Bugzilla, so we can try fix it before 4.3 gets branched off. Last, but not at least, thanks for CloudOn for funding these improvements! :-)

© Miklos Vajna. Built using Pelican. Theme by Giulio Fidente on github.