Index ¦ Archives ¦ RSS

Editeng RTF export: fixing a lost paragraph style

Estimated read time: 2 minutes

Impress shape text doesn't have much support for styles, e.g. the default UI in Writer gives you a paragraph style dropdown, and you don't get the same in Impress. Still, a paragraph style is attached to bullets based on their outline level, and Impress has a View → Outline menu item to give you that styled text you can copy. Pasting that to Writer started to lose styles recently and it's now fixed to work again.

This work is primarily for Collabora Online, but the feature is available in desktop Impress as well.

Motivation

As described in a previous commit, I had a case where lots of not needed paragraph styles were exported to RTF in case an Impress document had enough master pages. The idea was to only export actually used paragraph styles, to avoid wasting CPU power.

Turns out filtering out paragraph styles has to happen at two locations:

  • in the style table to assign an index to a paragraph style
  • when referring to those styles

The problem was that unused styles were removed from the style table, but not from the style → index mapping, so as soon as you had both used and unused paragraph styles, the declared and the referred style indexes didn't match anymore.

Results so far

Here is a sample paste result in Writer, where you can see that the text doesn't have a custom paragraph style:

Bugdoc: old Writer paste

And here is the same paste, now with paragraph styles restored:

Bugdoc: new Writer paste

As you can see, now the pasted text has paragraph styles.

How is this implemented?

If you would like to know a bit more about how this works, continue reading... :-)

The bugfix commit was editeng RTF export: fix broken offsets into the para style table.

The tracking bug was tdf#163883.

Want to start using this?

You can get a development edition of Collabora Online 24.04 and try it out yourself right now: try the development edition. Collabora intends to continue supporting and contributing to LibreOffice, the code is merged so we expect all of this work will be available in TDF's next release too (25.2).


Handling page captures for Writer TextBoxes

Estimated read time: 3 minutes

Writer TextBoxes provide the user with shapes that can have complex geometry and complex content. There is also a feature to capture shapes inside page boundaries: now the two features interact with each other better.

This work is primarily for Collabora Online, but the feature is available in desktop Writer as well.

Motivation

As described in a previous post, Writer implements the TextBox feature with a pair of objects: a Draw shape (with complex geometry) and a (hidden) Writer TextFrame, providing complex content. To avoid wrapping problems, the underlying TextFrame always has its wrap type set to "through", i.e. text may wrap around the Draw shape, but the hidden TextFrame is always ignored during text wrapping.

In most cases this provides the expected behavior, because the user sees one object, so wrapping around at most one object is not surprising.

However, there is also an other feature, that shapes may be captured inside page frames: if their position would be outside the page frame, Writer corrects this, so they are not off-page. This also makes sense, so it can't happen that your document has a shape that is hard to find, due to a silly position.

The trouble comes when these two are combined: the Draw shape's position gets adjusted to be captured inside the page frame, but the TextFrame's wrap type is "through", and objects with this wrap type are an exception from the capturing mechanism, so the position of the two shapes get out of sync.

Results so far

The problem is now solved by improving the layout, so in case the TextFrame is actually part of a Draw shape + TextFrame pair (forming a TextBox), then we calculate the effective wrap type of the TextFrame based on the wrap type of its Draw shape, so either both objects are captured or none, which results in consistent render result.

Here is a sample document where all margins are configured to be equal, but capturing corrected the Draw shape (and not the TextFrame):

Bugdoc: old Writer render

And here is the same document, with consistent positioning:

Bugdoc: new Writer render

As you can see, now the rendered margins actually equal, as wanted.

How is this implemented?

If you would like to know a bit more about how this works, continue reading... :-)

The bugfix commit was sw textbox: capture fly when its draw object is captured.

The tracking bug was tdf#138711.

Want to start using this?

You can get a development edition of Collabora Online 24.04 and try it out yourself right now: try the development edition. Collabora intends to continue supporting and contributing to LibreOffice, the code is merged so we expect all of this work will be available in TDF's next release too (25.2).


Per-paragraph semi-transparent shape text in Impress

Estimated read time: 2 minutes

The SVG export in Impress now supports a per-paragraph setting to handle semi-transparent shape text, while previously this was only possible to control at a per-shape level.

This work is primarily for Collabora Online, but the feature is available in desktop Impress as well.

Motivation

As described in a previous post, Impress already had the capability to have semi-transparent shape text, but the SVG export of this for the case when not all paragraphs have the same setting was broken.

Transparency in SVG can be described as a property of a group (<g style="opacity: 0.5">...</g>) and it can be also a property of the text (<tspan fill-opacity="0.5">...</tspan>).

The SVG export works with the metafile of the shape, so when looking for meta actions, it tries to guess if the transparency will be for text: if so, it needs to use the tspan markup, otherwise going with the g markup is OK.

What happened here is that meta action for a normal text started, so the SVG export assumed the text is not semi-transparent, but later the second line was still transparent, so we started a group element, and this resulted in a not even well-formed XML output.

Results so far

The relevant part of the test document is simple: just 3 paragraphs, the second one is semi-transparent (and also has a bullet, as an extra):

Bugdoc: original Impress render

Once this was exported to SVG, this resulted in a non-well-formed XML, so you got this error in a web browser:

Bugdoc: old SVG render

Once tweaking the transparency mask writer to check if text started already, we get the correct SVG render:

Bugdoc: new SVG render

How is this implemented?

If you would like to know a bit more about how this works, continue reading... :-)

The bugfix commit was SVG export: fix handling of semi-transparent text inside a list.

The tracking bug was tdf#162782.

Want to start using this?

You can get a development edition of Collabora Online 24.04 and try it out yourself right now: try the development edition. Collabora intends to continue supporting and contributing to LibreOffice, the code is merged so we expect all of this work will be available in TDF's next release too (25.2).


Improved interactivity for LOK clients in Writer's layout

Estimated read time: 3 minutes

Writer now has support for doing partial layout passes when LOK clients have pending events, which sometimes improves interactivity a lot.

This work is primarily for Collabora Online, but the feature is useful for any LOK clients.

Motivation

I recently worked with a document that has relatively simple structure, but it has 300 pages, and most of the content is part of a numbered list. Pasting a simple string (like an URL) into the end of a paragraph resulted in a short, but annoying hang. It turns out we updated Writer's layout for all the 300 pages before the content was repainted on the single visible page. In theory, you could reorder events, so you first calculate the first page, you paint the first page, then you calculate the remaining 299 pages. Is this possible in practice? Let's try!

Results so far

The relevant part of the test document is simple: just an empty numbered paragraph, so we can paste somewhere:

Bugdoc: empty paragraph, part of a numbered list and then pasting an URL there

This is a good sample, because pasting into a numbered list requires invalidating all list items in that list, since possibly the paste operation created a new list item, and then the number portion has to be updated for all items in the rest of the list. So if you paste into a numbered list, you need to re-calculate the entire document if all the document is just a numbered list.

The first problem was that Writer tracks its visible area, but LOK needs two kinds of visible areas. The first kind decides if invalidations are interesting for part of the document area. LOK wants to get all invalidations, so in case we cache some document content in the client that is near the visible area, we need to know when to throw away that cache. On the other hand, we want to still track the actually visible viewport of the client, so we can prioritize visible vs hidden parts of the document. Writer in LOK mode thought that all parts of the document are a priority, but this could improved by taking the client's viewport into account.

The second problem was that even if Writer had two layout passes (first is synchronous, for the visible area; second is async, for the rest of the document), both passes were performed before allowing a LOK client to request tiles for the issued invalidations.

This is now solved by a new registerAnyInputCallback() API, which allows the LOK client to signal if it has pending events (e.g. unprocessed callbacks, tiles to be painted) or it's OK for Writer layout to finish its idle job first.

The end result for pasting a URL into this 300 pages document, when measuring end-to-end (from sending the paste command to getting the first updated tile) is a decrease in response time, from 963 ms to 14 ms.

How is this implemented?

If you would like to know a bit more about how this works, continue reading... :-)

As usual, the high-level problem was addressed by a series of small changes:

The tracking issue for this problem was cool#9735.

Want to start using this?

You can get a development edition of Collabora Online 24.04 and try it out yourself right now: try the development edition. Collabora intends to continue supporting and contributing to LibreOffice, the code is merged so we expect all of this work will be available in TDF's next release too (25.2).


Improved font fallback in the DOCX import of Writer

Estimated read time: 2 minutes

Writer now has improved support for font fallback when you open a DOCX file that refers to fonts which are not available currently.

This work is primarily for Collabora Online, but the feature is fully available in desktop Writer as well.

Motivation

Font embedding is meant to solve the problems around missing fonts, but you can also find documents with stub embedded fonts that are to be ignored and our code didn't have any sanity check on such fonts, leading to unexpected glyph-level fallbacks. Additionally, once font-level fallback happened, we didn't take the font style (e.g. sans vs serif) into account, which is expected to work when finding a good replacement for the missing font.

Results so far

Here is how to the original rendering looked like:

Bugdoc, before: ugly glyph-level fallback

Once the handler for the embedded fonts in ODT/DOCX was improved to ignore stub fonts where even basic glyphs were not available, the result was a bit more consistent, but still bad. Here is a different document to show the problem:

Bugdoc, first improvement: no glyph fallback but the result is sans

Note how now we used the same font, but the glyphs are always sans, not serif. So the final step was to import the font type from DOCX and consider that while deciding font fallback:

Bugdoc, second improvement: no glyph fallback and the result is sans / serif

With this, we ignore stub embedded fonts from DOCX, we import the font type and in general font fallback on Linux takes the font type into account while deciding font fallback.

How is this implemented?

If you would like to know a bit more about how this works, continue reading... :-)

As usual, the high-level problem was addressed by a series of small changes:

Want to start using this?

You can get a development edition of Collabora Online 24.04 and try it out yourself right now: try the development edition. Collabora intends to continue supporting and contributing to LibreOffice, the code is merged so we expect all of this work will be available in TDF's next release too (24.8).


Fixing handling of line object transformations in the DOCX import of Writer

Estimated read time: 2 minutes

Writer now has improved support for toplevel line shapes when you import those from DOCX.

This work is primarily for Collabora Online, but the feature is fully available in desktop Writer as well.

Motivation

As described in a post from 2014, Writer reads the drawingML markup for shapes in DOCX files, including line shapes. While investigating a simple-looking problem around a horizontal vs vertical line, it turns out that there is a deeper issue here, and it looks like now have proper fix for this bug.

Results so far

Imagine that your company template has a nice footer in two columns, and the content in the columns are separated by a vertical line shape, but when you open your DOCX in Writer, it crosses the text of that footer instead:

Bugdoc, before: reference is red, Writer result is painted on top of it

While researching how line shapes are represented in our document model and how ODT import works, it turned out that the proper way to create a line shape is to only consider size / scaling when it comes to the individual points of the line, everything else (e.g. position / translation) should go to the transform matrix of the shape, then the render result will be as expected:

Bugdoc, after: reference is red, Writer result is painted on top of it

It was also interesting to see that this also improved other, existing test documents, e.g. core.git sw/qa/extras/ooxmlimport/data/line-rotation.docx looked like this before:

3 rotated lines, before: reference is red, Writer result is painted on top of it

And the same fix makes it perfect:

3 rotated lines, after: reference is red, Writer result is painted on top of it

Just stick to the rule: scaling goes to the points -- translation, rotation and horizontal shear goes to the shape.

For now, this is only there for toplevel Writer lines, but in-groupshape and Calc/Impress lines could also follow this technique if there is a practical need.

The "after" screenshots show ~no red, which means there is ~no reference output, where the Writer output would be missing.

How is this implemented?

If you would like to know a bit more about how this works, continue reading... :-)

The bugfix commit was tdf#161779 DOCX import, drawingML: fix handling of translation for lines.

The tracking bug was tdf#161779.

Want to start using this?

You can get a development edition of Collabora Online 24.04 and try it out yourself right now: try the development edition. Collabora intends to continue supporting and contributing to LibreOffice, the code is merged so we expect all of this work will be available in TDF's next release too (24.8).


Section-based continuous endnotes in Writer

Estimated read time: 3 minutes

Writer now has much better support for continuous / inline endnotes (not on a separate page) in Writer, enabled by default for DOCX files.

This work is primarily for Collabora Online, but the feature is fully available in desktop Writer as well.

Motivation

As described in a previous post, Writer already had minimal support for not rendering endnotes on a separate endnote page, but it was not mature enough to enable is by default for DOCX files.

Results so far

What changed from the previous "continuous endnotes" approach is that instead of trying to map endnotes to footnotes, we now create a special endnotes section, which only exists at a layout level (no section node is backing this one), and this hosts all endnotes at the end of the document. It turns out this is a much more scalable technique, for example a stress-test with 72 endnotes over several pages is now handled just fine.

Here are some screenshots:

Before: reference is red, Writer result is painted on top of it

After: reference is red, Writer result is rendered on top of it

As you can see, there were various differences for this document, but the most problematic one was that the entire endnote was missing from the (originally) last page, as it was rendered on a separate page.

Now it's not only on the correct page, but also its position is correct: the endnote is after the body text, while the footnote is at the bottom of the page, as expected. The second screenshot shows ~no red, which means there is ~no reference output, where the Writer output would be missing.

How is this implemented?

If you would like to know a bit more about how this works, continue reading... :-)

As usual, the high-level problem was addressed by a series of small changes:

The tracking bug was tdf#160984.

Want to start using this?

You can get a development edition of Collabora Online 24.04 and try it out yourself right now: try the development edition. Collabora intends to continue supporting and contributing to LibreOffice, the code is merged so we expect all of this work will be available in TDF's next release too (24.8).


Improved copy & paste in Collabora Online: the copy side

Estimated read time: 4 minutes

Collabora Online now uses the Clipboard API if it's available in the used browser, getting rid of many annoying dialogs.

Motivation

The paste side was covered by an earlier post, but in short, if you're on Chrome (or Safari), you won't see the "can't paste" dialog:

"Can't paste" dialog in COOL 23.05

Also you won't see a "Can't paste special" dialog:

"Can't paste special" dialog in COOL 23.05

Wouldn't it be nice to improve the copy side in a similar way?

Results so far

The primary reason why we needed similar, annoying dialogs for copy in the past is that the clipboard API was synchronous but the network API is async. This means that writing to the clipboard ("copy") is only possible with data that we have in the browser when the copy is executed. This is a problem, since in general we don't have data for your selection in the browser.

With the Clipboard API, the copy side can be improved as well. Instead of always fetching the HTML for a simple selection (even if you don't copy) and having a three step copy for complex selections, async clipboard write is now possible. This gets us rid of the "You need to download" dialog:

"You need to download" dialog in COOL 23.05

Also it eliminates the "Download completed dialog:

"Download completed" dialog in COOL 23.05

Instead, we still need to nominally write to the clipboard in the special keyboard or click context initiating the copy (Chrome doesn't require this, Safari does), but the written object can be a callback that will do the network operation for us.

Unfortunately it's hard to screenshot the new code, since part of the result is that all these dialogs are now eliminated, copy & paste just works. :-)

Note that this can be used also in Firefox, but first you need to set dom.events.asyncClipboard.clipboardItem to true in about:config.

The last part was to adapt tests to this new world, because previously it was handy to just create a selection and assert what would be copied to the clipboard as HTML, but now we don't download the HTML anymore every time you create a selection.

How is this implemented?

If you would like to know a bit more about how this works, continue reading... :-)

As usual, the high-level problem was addressed by a series of small changes:

The tracking issue was cool#8648.

Want to start using this?

You can get a development edition of Collabora Online 24.04 and try it out yourself right now: try the development edition.


Improve copy&paste in Calc and elsewhere

Estimated read time: 3 minutes

Calc now supports much better copy&paste when you transfer data between Google Sheets and Calc.

This work is primarily for Collabora Online, but the feature is fully available in desktop Calc as well.

Motivation

First, Collabora Online was using the deprecated document.execCommand() API to paste text, which is problematic, as the "paste" button on the toolbar can't behave the same way as pressing Ctrl-V on the keyboard.

Second, it turns out Google Sheets came up with some additional HTML attributes to represent spreadsheet data in HTML in a much better way, and Calc HTML import/export had no support for this, while this is all fixable.

Results so far

In short, Collabora Online now uses the Clipboard API to read from the system clipboard -- this has to be supported by the integration, and Calc's HTML filter now support the subset of the Google Sheets markup I figured out so far. This subset is also documented.

Note that the default behavior is that the new Clipboard API is available in Chrome/Safari, but not in Firefox.

For the longer version, here are some screenshots:

We used to show a popup when you clicked on the paste button on the notebookbar

The new paste code in action, handling an image

Import from Google Sheets to Calc: text is auto-converted to a number, bad

Import from Google Sheets to Calc: text is no longer auto-converted to a number, good

HTML import into an active cell edit, only RTF was working there previously

Paste from Google Sheets to Calc: text is no longer auto-converted to a number, good

Paste from Google Sheets to Calc: booleans are now also preserved

Paste from Google Sheets to Calc: number formats are now also preserved

Paste from Google Sheets to Calc: formulas are now also preserved

Paste from Google Sheets to Calc: also handling a single cell

Copy from Calc to Google Sheets: text is now handled, no longer auto-converted to a number

Copy from Calc to Google Sheets: booleans are now handled

Cross-origin iframes also block clipboard access, now fixed

Copy from Calc to Google Sheets: number formats are now also preserved

Copy from Calc to Google Sheets: formulas are now also preserved

Copy from COOL Writer to a text editor: much better result, new one on the right hand side

How is this implemented?

If you would like to know a bit more about how this works, continue reading... :-)

As usual, the high-level problem was addressed by a series of small changes:

The tracking issue on the Online side was cool#8023.

Want to start using this?

You can get a snapshot / demo of Collabora Office 24.04 and try it out yourself right now: try the unstable snapshot. Collabora intends to continue supporting and contributing to LibreOffice, the code is merged so we expect all of this work will be available in TDF's next release too (24.8).


Legal numbering in Writer: DOC and RTF support

Estimated read time: 3 minutes

Writer now supports legal numbering for two more formats: DOC and RTF (ODT and DOCX were working already.)

This work is primarily for Collabora Online, done as a HackWeek project, but the feature is fully available in desktop Writer as well.

Motivation

Legal numbering is a way to influence the number format of values inherited in a multi-level numbering. Say, the outer numbering uses Roman numerals and the inner numbering uses X.Y as the number format, but the inner level wants to display the outer values as Arabic numerals. If this is wanted (and guessing from the name, sometimes lawyers do want this), then the inner number portion will expand to values like "2.01" instead of "II.01", while the outer number portions will remain values like "II".

Mike did 80% of the work, what you can see here is just the RTF/DOC filters.

Picking a smaller feature task like this looked like a good idea, since I wanted to spend some of the time on regression fixing around last year's multi-page floating table project.

Results so far

For (binary) DOC, the relevant detail is the fLegal bit in the LVLF structure. Here is the result:

Improved handling of legal numbering from DOC: old, new and reference rendering

It shows how the outer "II" gets turned into "2", while it remained "II" in the past. This works for both loading and saving.

The same feature is now handled in the RTF filter as well. There the relevant detail is the \levellegal control word, which has an odd 1 default value (the default is usually 0). Here is the result:

Improved handling of legal numbering from RTF: old, new and reference rendering

It shows that the RTF filter is up to speed with the DOC one by now.

As for the multi-page floating tables, I looked at tdf#158986 and tdf#158801.

How is this implemented?

If you would like to know a bit more about how this works, continue reading... :-)

As usual, the high-level problem was addressed by a series of small changes:

Want to start using this?

You can get a snapshot / demo of Collabora Office 24.04 and try it out yourself right now: try the unstable snapshot. Collabora intends to continue supporting and contributing to LibreOffice, the code is merged so we expect all of this work will be available in TDF's next release too (24.8).

© Miklos Vajna. Built using Pelican. Theme by Giulio Fidente on github.