Index ¦ Archives ¦ RSS

Multi-page floating tables in Writer: from multiple columns to chaining

Estimated read time: 5 minutes

Writer now has continued steps to handle tables that are both floating and span over multiple pages.

This work is primarily for Collabora Online, but is useful on the desktop as well. See the third post for background.

Motivation

The previous post finished with crash testing: the interesting subset of that testing tool is to take hundreds of thousands of documents and in the Writer case import them into a document model and layout them. If any of this crashes, mark that for future investigation. In this post, we'll see what else started to work during the past month.

Results so far

The feature is enabled by default and now the DOCX/DOC/RTF import makes use of it if. This allows stress-testing the layout code with complex user documents, hopefully with the found breakage fixed before it would be released in a stable version.

On the positive side, core.git repository has has 37 files now which are focusing on correct handling of floating tables (abbreviated as "floattables"). Also, there are additional tests that quickly build a specific multi-page floating table in the memory and do some operation on it, e.g. delete the last row and assert what happens.

Here are some screenshots from the effort so far:

Floating table inside a multi-column section

The first case is about multi-column sections: in this case Word doesn't try to split them between pages. What you can see on the screenshot is that Writer lays out content on the previous page so that remaining space is left, but we don't try to split the table between the first and the second page, even if there would be space on the first page and even if this means the table overlaps with the second column, matching what Word does.

UI to disable split of a floating table

UI to enable split of floating tables were added quite early: this is a new checkbox on the frame properties dialog. However, disabling the split of floating tables was broken, the already created layout was not updated to properly move back "follow" fly frames from later pages to the current page, which is now fixed.

Chaining enabled, so no split frames

Writer already had a feature to split content in a frame into multiple frames, but that one required creating those frames in the model explicitly, such chaining is a feature that is useful in other use-cases and is parallel to multi-page floating tables. The UI now ensures that the user can split frames only in case chaining is not used, to avoid confusion.

Split enabled, so no chaining

This is now also true in the other way around: if split of a floating table is allowed, then we disable the frame chaining UI to avoid trouble later.

The latest crashing document

At this point I went back to crashtesting & crash bugreports, and the latest reported crash was for a document that is visible on the above screenshot. This was a bit tricky: it required 3 fixes to make it not crash and also a layout loop fix.

Disabling split of frames at a layout level

Next was a mini-feature: even if floating tables normally split across pages by default, Word has a document-level compatibility switch to turn this split on or off by default, at a layout level. Floating tables from RTF are not split by default, DOC and DOCX split them by default.

What you can see on this screenshot is that a DOCX document may have this flag enabled, and then you allow splitting on the UI / at a document model level, but Writer may still decide to not split them, to provide the correct layout.

Overlapping floating tables

The previous post already mentioned the problem area of overlapping tables. A first step in this direction is to fix this bug document with Arabic text and 2 overlapping tables, making them unreadable.

Fixed overlap of floating tables

And in this case here is the fixed version, where reading the table now depends on your language skills. :-)

In this case, the problem was a lost section break of type next page, when a section started with a floating table, which is a corner-case.

And that's where we stand. Certainly more work is needed to fix more unwanted overlapping of floating tables, but we get there step by step.

How is this implemented?

If you would like to know a bit more about how this works, continue reading... :-)

As usual, the high-level problem was addressed by a series of small changes:

Want to start using this?

You can get a snapshot / demo of Collabora Office 23.05 and try it out yourself right now: try the unstable snapshot. Collabora intends to continue supporting and contributing to LibreOffice, the code is merged so we expect all of this work will be available in TDF's next release too (7.6).


Multi-page floating tables in Writer: from row deletion to page breaks

Estimated read time: 6 minutes

Writer now has continued steps to handle tables that are both floating and span over multiple pages.

This work is primarily for Collabora Online, but is useful on the desktop as well. See the second post for background.

Motivation

The previous post finished with cursor traversal: if a floating table is on both page 1 and page 2, then you expect Writer to be able to move between the rows of the table, even if those are not on the same page. In this post, we'll see what else started to work during the past month.

Results so far

The feature is enabled by default and now the DOCX/DOC/RTF import makes use of it if. This allows stress-testing the layout code with complex user documents, hopefully with the found breakage fixed before it would be released in a stable version.

On the positive side, core.git repository has has 19 files now which are focusing on correct handling of floating tables. Also, there are additional tests that quickly build a specific multi-page floating table in the memory and do some operation on it, e.g. delete the last row and assert what happens.

Here are some screenshots from the effort so far:

Editing of floating tables: delete the last row of a table with 3 rows

The first case is about editing: if a floating table had a first, middle and last page, then deleting the last row of a table lead to incorrect layout, which is now fixed.

Selection & dragging of split floating tables

An odd problem is that the vertical position of tables on non-first pages is generated by the layout, which means that normal drag&move to position them won't work, leading to annoying jumps. This is now fixed by selecting the first (master) fly frame on click, and you can always reposition that table (even vertically.)

Bad binary DOC import

Once DOCX import/export was there, the next step is binary DOC import, which gives us access to a larger corpus of test documents, to stress-test the layout code. This shows how the binary DOC import looked before the work.

Good binary DOC import

And this one shows how it works now.

Good binary DOC export

DOC import is not enough, e.g. Collabora Online will save your documents automatically, so we really want to export everything that is possible to import. Here is how good DOC export looks like in Word.

In-footer floating table

At this point the first crashtest results arrived (we try to import about 280 thousand documents and see what crashes). The first problem was floating tables in footers. Well, we should not try to split such tables (even if they don't fit): adding one more page does not give us more footer space.

Bad RTF import

Similar to the DOC filter, RTF can express floating tables. Here is how we did a bad rendering of an RTF document before.

Good RTF import

And here is how we import it currently. The RTF control words are quite close to the binary DOC markup semantically, just the syntax is different.

Bad RTF export

The RTF export side was also missing, as visible in Word, before the work.

Good RTF export

And this is how the good RTF export result looks like in Word.

Floating table in a section

Another crashtest find was that sometimes we map Word's continuous section breaks to Writer sections, so we can't assume that tables are anchored directly in body frames. This is now fixed.

Correct handling of the TableRowKeep flag in floating tables

A related problem was that non-floating tables have a trick, that we call the TableRowKeep mode. If this is on (which is the default for documents imported form Word), a table row will stick to the next table row (we try to keep them on the same page) if the first cell's first paragraph in that row has the "keep with next" paragraph property specified. It turns out, this should be ignored when the table is floating.

Page break before a floating table

A next problem was that some page breaks simply disappeared. It turns out that we need to transfer the "break before" property from the table to the table anchor (paragraph) to get the desired layout, since page breaks are generally ignored inside text frames.

Handling of 2 times nested tables, middle one is floating

All combinations of nesting with floating tables is not yet handled, but at least we should not crash when the user tries to do that. Here is 3 tables, nested in each other, the second table is marked to be floating.

Handling of a floating table, immediately followed by an other table

The last fixed problem is when a floating table is immediately followed by an other, non-floating table. Given that we try to anchor the floating table in the next paragraph, the layout could not handle this previously, but now we ensure that each floating table is followed by a paragraph.

And that's where we stand. Hope to address all problems reported by crashtesting soon. Once that happens, it may be possible to switch from bugfixing mode to feature mode again, e.g. better handling of overlapping or nested tables could be done.

How is this implemented?

If you would like to know a bit more about how this works, continue reading... :-)

As usual, the high-level problem was addressed by a series of small changes:

Want to start using this?

You can get a snapshot / demo of Collabora Office 23.05 and try it out yourself right now: try the unstable snapshot. Collabora intends to continue supporting and contributing to LibreOffice, the code is merged so we expect all of this work will be available in TDF's next release too (7.6).


Multi-page floating tables in Writer: from split rows to cursor traversal

Estimated read time: 6 minutes

Writer now has the early steps to handle tables that are both floating and span over multiple pages.

This work is primarily for Collabora Online, but is useful on the desktop as well. See the first post for background.

Motivation

The previous post finished with split rows are now in a reasonable shape towards our journey to fix tdf#61594. In this post, we'll see what else is needed to get perfect rendering for that single document.

The plan is to iterate on that later, adding more and more incremental improvements & fixes for this feature.

Results so far

The feature is still enabled by default, but the DOCX import only makes use of it if you set the SW_FORCE_FLY_SPLIT=1 environment variable. This allows playing with the feature even if there are lots of known problems still.

On the positive side, core.git sw/qa/core/layout/data/ has 12 files now which are rendered exactly the way Word does. Also, there are additional tests that quickly build a specific multi-page floating table in the memory and do some operation on it, e.g. delete the last row and assert what happens.

Here are some screenshots from the effort so far:

Split row and an additional one on 2 pages

Here the problem was that a normal row went to a next page after a split row. Now the document is correctly of 2 pages, instead of the previous unwanted 3 pages.

Floating table with multiple columns

Here the additional complexity was to have multiple columns on a table, since previously we always had 1 column and 2 or more rows. Now these are also split correctly across pages.

Incorrect widow control inside split floating tables

This is an incorrect table row split, because widow control is broken.

Fixed widow control inside split floating tables

And here is how it looks when it's working. That little line on page 2 is no longer alone.

Working minimal height

Even better when the minimal height for non-first ("follow") table frames is working, as you can notice that space between the last line and the table bottom border on page 2.

At this point, the bug document from the motivation section worked fine, apart from the workaround that one has to re-save it in non-legacy mode in Word. So what's next? We need to instantly add a legacy mode for the brand new (not even fully enabled) multi-page floating table feature, since otherwise whatever we do, some DOCX files will be handled incorrectly.

Legacy mode: bad margin

As it turns out, the core of the legacy mode is that the floating table is sometimes allowed to flow into the footer / bottom margin area of the page, but not always. It's quite inconsistent, so one can understand why this is no longer the default behavior. The above is the naive rendering, which is logical, but incorrect.

Legacy mode: good margin

And this is the correct result in legacy mode. After a bit of experimenting, it seems one can flow into the bottom margin area if the height of the table frame would fit the body frame, but some vertical offset causes it to be pushed down.

Legacy mode: minimal row height causes no row split

The final trick with legacy mode is to make sure that all tables (first one, middle ones, last one) have the required minimal height, which can result in not splitting the row in case a part of that would be less than the minimal height. E.g. a 3 cm minimal height means that a total height of 4 cm (2cm + 2cm) is not enough for a split row.

With this, we reached the goal to render that given bug document perfectly (when compared to Word), and the next step is to fix up breakage that would be caused by enabling by default.

Tracked changes in floating tables

The first problem was tracked changes support, which needs special care: as the importer converts body text to table cells, we need to keep the tracked insert/delete text ranges correctly. This is now working fine.

Nested tables: the outer is floating

The next problem is around nested tables: a normal inner table inside a floating table was lost on DOCX file open, now fixed.

Nested tables: broken inner floating table

The other version is when a normal table has an inner floating table. This broke badly, the outer table was not imported at all.

Nested tables: better inner floating table

And it's now better. The inner table is still not actually floating, but turns out that was never working for DOCX files, so it's not a regression. Fine to revisit that only later.

Follow table: bad horizontal positioning

So far all the previous tables were aligned to the left. It turns out that the horizontal positioning was bad in every other case for non-first tables, e.g. when you wanted to center them.

Follow table: good horizontal positioning

And it's now fixed.

As a last fix for this post, let's look at traveling with the cursor:

Good cursor traversal

After fixing this, now you can use the up/down arrows to go from the A1 cell to A2 and back. The cursor traversal code wasn't aware that the master/follow table frame was connected.

And that's where we stand. Hope to enable even the DOCX import bit by default soon.

How is this implemented?

If you would like to know a bit more about how this works, continue reading... :-)

As usual, the high-level problem was addressed by a series of small changes:

Want to start using this?

You can get a snapshot / demo of Collabora Office 23.05 and try it out yourself right now: try the unstable snapshot. Collabora intends to continue supporting and contributing to LibreOffice, the code is merged so we expect all of this work will be available in TDF's next release too (7.6).


Start of multi-page floating tables in Writer

Estimated read time: 5 minutes

Writer now has the early steps to handle tables that are both floating and span over multiple pages.

This work is primarily for Collabora Online, but is useful on the desktop as well.

Motivation

As requested in tdf#61594 10 year ago, the use-case is that you can already have floating tables:

Table in a Writer text frame

And multi-page tables:

Multi-page table

And what we want is a combination of them, like this:

Multi-page floating table

This is a quite complicated feature, since both floating objects and tables are complex, and this combines them to create even more complexity.

However, such constructs are used in existing DOCX files and we're expected to correctly display them.

Results so far

The feature is enabled by default, but the DOCX import only makes use of it if you set the SW_FORCE_FLY_SPLIT=1 environment variable. This allows playing with the feature even if there are lots of known problems still.

On the positive side, core.git sw/qa/core/layout/data/ has 4 files now which are rendered exactly the way Word does.

A bit of terminology: once a frame is split, the first element of the chain is called master, the remaining frames are called follows.

Here are some screenshots from the journey so far:

Not splitting Writer text frame

This is a fly frame with enough content that it doesn't fit the body frame. It should split, but fly frames could not be split.

Writer text frame kept inside the body frame

First try, just limit the height of the (master) fly frame, so at least it stays inside the body frame. But now some content is not rendered.

Incorrect split of a text frame

Next try. Now have have 2 flys, but the second has zero height and the content of the second fly leaks into the body of the second page.

Last version with bad anchoring

This one is better, but the position of the follow fly frame is bad, no actual wrapping happens. Also, we assume that there are multiple paragraphs after the table, which will cause problems for floating tables at the end of the document. So I reworked the anchoring code to split the anchor to as many pages as necessary...

Duplicated anchor text

Which sounds good, but now the text around the anchor point is duplicated.

Less duplicated anchor text on the first page

Better, now the anchor text is gone in the master anchor, but still there is a misleading paragraph marker.

Last text frame without a table

And now this looks reasonable. Fine, we have some minimal split flys, let's try it with tables instead of just two paragraphs:

Floating table with duplicated anchor text

With a bit of work, the table's two rows can split, but again the text in the anchor is duplicated.

Bad horizontal position

Next try, now the anchor text is correct, but the horizontal position of the table is still bad, it bleeds out towards the left margin area.

Fixed horizontal position

And with more work, now this looks correct.

Fixed vertical position

Let's add some vertical offset! That should be only applied on the first page, and now the follow fly doesn't have that unwanted offset.

Now we have 2 documents that lay out correctly on 2 pages. Let's try 3 pages:

Wanted 3 pages, have 2 pages

This falls apart, the 2nd and the 3rd row are both on page 2.

Correctly rendered 3 pages

After partitioning the fly frames to 3 categories (master, non-last follows, last follow), more than 2 pages also work.

Row split is not performed at all

This is a sample where the table has a single cell, so we need to split the (only) row, not just split the table's rows. The first is harder. Currently we don't even try to split it.

Row split is performed, but the 2nd page's object has a bad position

Next try, now we split it, but the position of the follow fly is wrong.

Row split with correct object positioning on all pages

Finally split of a single row inside multi-page floating tables also work. That's where we are. Don't try to do anything too custom (like inserting a header or footer), those cases are still known-broken.

How is this implemented?

If you would like to know a bit more about how this works, continue reading... :-)

As usual, the high-level problem was addressed by a series of small changes:

The design of the layout representation is documented in the SwFormatFlySplit constructor.

Want to start using this?

You can get a snapshot / demo of Collabora Office 23.05 and try it out yourself right now: try the unstable snapshot. Collabora intends to continue supporting and contributing to LibreOffice, the code is merged so we expect all of this work will be available in TDF's next release too (7.6).


Citation handling: plumbing in Writer for e.g. Zotero

Estimated read time: 5 minutes

Writer now has a set of new automation commands and APIs that allow clients to build user interface for citation handling that's more advanced than the default in-Writer bibliography support.

This work is primarily for Collabora Online, see the CODE release notes for one possible way to use this.

Motivation

Citations and bibliography in Writer, using fieldmarks

Users frequently using scientific citations are probably familiar with the limits of Writer's built-in bibliography support, and solutions like Zotero appeared (with a LibreOffice extension included) to improve that situation.

This means that instead of storing all your scientific notes and data locally, you can store them on a Zotero server, then work with that from anywhere, once you provide your credentials.

The trouble comes when you want to combine this with collaborative editing, which is provided by Online, but you can't use the extension made for the desktop.

The above CODE release notes explains how an end user can use this feature, this post is meant to document what new UNO commands and LOK APIs I added that serve as a backend for this. Especially the UNO commands are also useful in other contexts, like in macros or other extensions.

Results so far

Zotero can store citations using 3 markups in documents: fields (DOCX only), bookmarks (DOCX and ODT) and finally reference marks / sections (ODT only). The added plumbing allows several operations for all 3 cases, to work with existing documents using any of these markups.

The citation and the bibliography is handled the same way for fields (Writer's fieldmarks) and bookmarks. The last case uses reference marks for citations, but sections for the bibliography.

The following operations are supported:

  • create the citation / bibliography

  • read the object under the cursor

  • read all objects of a given type in the document

  • update the object under the cursor

  • update all objects of a given type in the document

  • delete all objects of a given type in the document

Reading is only available to LOK clients, you need to call the getCommandValues() API. The rest is normal UNO commands that you can invoke from document macros or extensions as well.

The added plumbing is the following:

Operation Fieldmark Bookmark Refmark Section
Create .uno:TextFormField .uno:InsertBookmark .uno:InsertField .uno:InsertSection
Read getCommandValues(".uno:TextFormField") getCommandValues(".uno:Bookmark") getCommandValues(".uno:Field") None
Read all getCommandValues(".uno:TextFormFields") getCommandValues(".uno:Bookmarks") getCommandValues(".uno:Fields") getCommandValues(".uno:Sections")
Update .uno:UpdateTextFormField .uno:UpdateBookmark .uno:UpdateField None
Update all .uno:TextFormFields .uno:UpdateBookmarks .uno:UpdateFields .uno:UpdateSections
Delete all .uno:DeleteTextFormFields .uno:DeleteBookmarks .uno:DeleteFields .uno:DeleteSections

How is this implemented?

If you would like to know a bit more about how this works, continue reading... :-)

As usual, the high-level problem was addressed by a series of small changes:

Want to start using this?

You can get a snapshot / demo of Collabora Office 22.05 and try it out yourself right now: try the unstable snapshot. Collabora intends to continue supporting and contributing to LibreOffice, the code is merged so we expect all of this work will be available in TDF's next release too (7.6).


Improved number portion formatting in Writer

Estimated read time: 3 minutes

Number portions generated when using lists/numberings/bullets in Writer now can have formatting which is preserved in ODT files as well.

First, thanks Docmosis for funding this work by Collabora.

Motivation

Word and DOCX files support explicit character properties for the paragraph marker, and these are also used for the formatting of a number portion if the paragraph has one. This was already loaded from / saved to DOCX, but it was lost when saving to ODT.

Results so far

First, we got a bug document, where the reference rendering and our rendering differed:

Reference (on the left) and our old (on the right) rendering, due to bookmarks

In this case, what happened was that part of the heading text was covered by a bookmark, so we first created multiple character ranges (outside the bookmark, inside the bookmark), then as an optimization we even unified them to be a single formatted character range, covering the entire paragraph. This was a document model that is different from the bookmark-free version, where the character formatting was set on the paragraph itself.

This was fixed at render time and at DOCX export time to consider both full-paragraph character ranges and in-paragraph character properties. For a while, this looked like the entire story, since this now looks good in Writer:

Our new rendering, handling bookmarks

A bit later another, related bug was discovered. Given a reference document:

Reference rendering of a second document

Just opening this DOCX file in Writer, it looked like this:

Old rendering in Writer

Note how the first number portion turned into bold! This was expected after the above layout change to consider full-paragraph formatted character ranges, but it also meant that Word can have one set of character formatting for the entire character range of a paragraph, and another for the paragraph marker.

To make the problem worse, this second document was showing that even the ODT export/export feature had problems, still:

Old rendering in Writer after ODT save + load

The fix to solve all of the above was to undo the previous render / DOCX export change, then teach the ODT export to explicitly save the paragraph marker formatting (as an empty span at the end of the text node) to ODT, and also to load it back.

This means that now Writer can render the second document correctly, without breaking the first document:

New rendering in Writer

How is this implemented?

If you would like to know a bit more about how this works, continue reading... :-)

As usual, the high-level problem was addressed by a series of small changes:

Want to start using this?

You can get a snapshot / demo of Collabora Office 22.05 and try it out yourself right now: try the unstable snapshot. Collabora intends to continue supporting and contributing to LibreOffice, the code is merged so we expect all of this work will be available in TDF's next release too (7.6).


Improved watermark in the PDF export

Estimated read time: 3 minutes

The PDF export now supports various additional properties for the optional PDF watermark.

First, thanks Docmosis for funding this work by Collabora.

Motivation

Rendering of a PDF watermark with custom rotation and color

When you hear the word "watermark", you probably have something like the above picture in mind.

Instead, what the PDF export had is more like a proof of concept:

Rendering of a PDF watermark with default settings

The request was to add new options to control the font size, font name, rotation angle and color of the watermark, so in case an organization already has a given style of watermarks they prefer, our PDF export can be adapted accordingly.

Results so far

First, now you can specify a custom color, e.g. gray (#7f7f7f), using:

soffice --convert-to pdf:writer_pdf_Export:'{"Watermark":{"type":"string","value":"draft"}, "WatermarkColor":{"type":"long","value":"8355711"}}' test.odt

Rendering of a PDF watermark with custom color

Then you can also customize the font size, in case the automatic size would not fit your needs, using:

soffice --convert-to pdf:writer_pdf_Export:'{"Watermark":{"type":"string","value":"draft"}, "WatermarkFontHeight":{"type":"long","value":"100"}}' test.odt

Rendering of a PDF watermark with custom font size

Or perhaps you want a serif font, not a sans one:

soffice --convert-to pdf:writer_pdf_Export:'{"Watermark":{"type":"string","value":"draft"}, "WatermarkFontName":{"type":"string","value":"Times"}}' test.odt

Rendering of a PDF watermark with custom font name

Finally you can have a custom rotate angle:

soffice --convert-to pdf:writer_pdf_Export:'{"Watermark":{"type":"string","value":"draft"}, "WatermarkRotateAngle":{"type":"long","value":"450"}}' test.odt

Rendering of a PDF watermark with custom rotation

Using these building blocks, you can also build combinations, the first screenshot above was created using:

soffice --convert-to pdf:writer_pdf_Export:'{"Watermark":{"type":"string","value":"draft"}, "WatermarkRotateAngle":{"type":"long","value":"450"}, "WatermarkColor":{"type":"long","value":"8355711"}}' test.odt

i.e. the configuration JSON is:

{
    "Watermark": {
        "type": "string",
        "value": "draft"
    },
    "WatermarkRotateAngle": {
        "type": "long",
        "value": "450"
    },
    "WatermarkColor": {
        "type": "long",
        "value": "8355711"
    }
}

How is this implemented?

If you would like to know a bit more about how this works, continue reading... :-)

As usual, the high-level problem was addressed by a series of small changes:

Want to start using this?

You can get a snapshot / demo of Collabora Office 22.05 and try it out yourself right now: try the unstable snapshot. Collabora intends to continue supporting and contributing to LibreOffice, the code is merged so we expect all of this work will be available in TDF's next release too (7.5).


Content controls in Writer: titles and tags

Estimated read time: 3 minutes

Writer now supports titles and tags for content controls, which helps providing context for the filled in text even if the placeholder text is replaced already.

This work is primarily for Collabora Online, see the previous post for background.

Motivation

Rendering of a title for a content control

Once several content controls are added to a document, it's easy to forget what was the exact purpose of what content control. Think of a press release for example – those regularly start with a location and a date, but once this information is provided, one no longer knows which content control was for which content.

Titles solve this problem for the user: similar to Writer's header/footer buttons, this button appears when you enter the content control and reminds you what content is expected there, even if the placeholder is already replaced.

Tags serve a similar purpose, but they are unique, machine-readable keys or identifiers, so once the form is filled in, an external consumer can easily extract the information from the document, given a specific tag.

Results so far

Titles (also known as aliases) and tags are now not only preserved, but also we have a UI to create, show, edit and delete them. This is available in the desktop rendering and also in the LOK API.

Somewhat related, in case a content control breaks into multiple lines or has formatting to break into multiple text portions, we now only emit one PDF widget for it, taking the description of the widget from the content control's title.

The last related improvement is that now we handle data binding for date content controls, which means that you can specify a timestamp, a language and a date format, and we'll format that timestamp and update the content control's string value at import time from DOCX.

How is this implemented?

If you would like to know a bit more about how this works, continue reading... :-)

As usual, the high-level problem was addressed by a series of small changes:

Want to start using this?

You can get a snapshot / demo of Collabora Office 22.05 and try it out yourself right now: try the unstable snapshot. Collabora intends to continue supporting and contributing to LibreOffice, the code is merged so we expect all of this work will be available in TDF's next release too (7.5).


Content controls in Writer: PDF export and combo box type

Estimated read time: 3 minutes

Writer now supports exporting content controls to PDF and a 7th content control type: it is possible to differentiate between drop-downs and combo boxes.

This work is primarily for Collabora Online, see the previous post for background.

Motivation

PDF export of Writer content controls into PDF

Writer users can create fillable forms using content controls, but the PDF export only contained the plain text representation of them. PDF can also have fillable widgets for form filling, so it's logical to map content controls to PDF widgets.

A perfect mapping is not possible, since PDF widgets are always a single rectangle and Writer content controls is a list of rectangles (text portions), but this doesn't cause a problem in most cases. The size of the PDF rectangle is determined based on the placeholder's size from Writer.

Benefits includes not having to insert a control shape, manually positioned to look like it's in line with the surrounding text. An other benefit is that this way the widget's style (font name, size, etc) can be specified using Writer styles, not with shape properties. It's also interesting that Word itself doesn't seem to support content controls in its PDF export at the time of writing.

Results so far

PDF export now automatically turns Writer content controls into fillable widgets for the rich text, plain text, checkbox, drop-down, combo box and date types.

Combo box itself is a new type: now combo box content can be either free-form or one of its list items, while drop-down can only be one of its list items.

How is this implemented?

If you would like to know a bit more about how this works, continue reading... :-)

As usual, the high-level problem was addressed by a series of small changes:

Want to start using this?

You can get a snapshot / demo of Collabora Office 22.05 and try it out yourself right now: try unstable snapshot. Collabora intends to continue supporting and contributing to LibreOffice, the code is merged so we expect all of this work will be available in TDF's next release too (7.5).


Cropped video for media shapes in Impress

Estimated read time: 3 minutes

Impress now supports cropped videos in slide edit mode and during slideshow for documents imported from PowerPoint.

First, thanks to our partner SUSE for working with Collabora to make this possible.

Motivation

PowerPoint-style cropped video in Impress

PowerPoint handles videos by taking a preview bitmap from the video, and then it allows users to apply various effects on that bitmap, like cropping. The complex aspect of this is that such filters are also respected while playing the video as well.

Impress didn't store such properties on the media shape, which lead to distorted aspect ratio when playing some cropped videos from PPTX files. This lead to this preview in Impress before the work:

Video with lost cropping in Impress

Results so far

The first problem was that the Impress preview was picked from the 3rd second of the video (presumably to avoid a black preview in many videos that start with a short black fade-in), while PowerPoint can store an explicit preview from the video (seems to be the first frame), so no matter what effects you apply, the previews were just different as the source bitmap was different. This could be fixed by looking for an explicitly provided bitmap for the video first, and only then asking the various avmedia/ backends to produce a preview.

Once the preview's initial bitmap was OK, it was necessary to take cropping into account. This was first done for the preview bitmap, and then also for the gstreamer backend (the relevant one for Linux, as a start) of avmedia/, which is responsible for the actual video playback. The gstreamer bits were done by first creating a videocrop element and then connecting that to the existing playbin.

With these sorted out, we get rendering which matches the reference:

Cropped video in PowerPoint

The last step was to load/save the explicit preview and the crop from/to ODF as well, not only PPTX. We use a markup like this to store the information:

<style:style style:name="gr1">
  <style:graphic-properties fo:clip="rect(0cm, 1.356cm, 0cm, 1.356cm)"/>
</style:style>

And now that the gr1 automatic style is defined, we can refer to it from a media shape:

<draw:frame draw:name="test" draw:style-name="gr1">
  <draw:plugin xlink:href="..." xlink:type="simple" xlink:show="embed" xlink:actuate="onLoad" draw:mime-type="application/vnd.sun.star.media">
    ...
  </draw:plugin>
  <draw:image xlink:href="Pictures/MediaPreview1.png"/>
</draw:frame>

The nice property of this markup is that automatic styles are already used for other shapes and image previews are also used for e.g. table shapes, so this is just using existing markup in a new context, but the ODF spec already allows this markup.

How is this implemented?

If you would like to know a bit more about how this works, continue reading... :-)

As usual, the high-level problem was addressed by a series of small changes:

User interface to create such a crop, support for other video effects (e.g. black-and-white) and other backends (Windows, macOS) could be done, but is future work currently.

Want to start using this?

You can get a snapshot / demo of Collabora Office and try it out yourself right now: try unstable snapshot. Collabora intends to continue supporting and contributing to LibreOffice, the code is merged so we expect all of this work will be available in TDF's next release too (7.5).

© Miklos Vajna. Built using Pelican. Theme by Giulio Fidente on github.