Index ¦ Archives ¦ RSS > Category: libreoffice ¦ RSS

Citation handling: plumbing in Writer for e.g. Zotero

Estimated read time: 5 minutes

Writer now has a set of new automation commands and APIs that allow clients to build user interface for citation handling that's more advanced than the default in-Writer bibliography support.

This work is primarily for Collabora Online, see the CODE release notes for one possible way to use this.

Motivation

Citations and bibliography in Writer, using fieldmarks

Users frequently using scientific citations are probably familiar with the limits of Writer's built-in bibliography support, and solutions like Zotero appeared (with a LibreOffice extension included) to improve that situation.

This means that instead of storing all your scientific notes and data locally, you can store them on a Zotero server, then work with that from anywhere, once you provide your credentials.

The trouble comes when you want to combine this with collaborative editing, which is provided by Online, but you can't use the extension made for the desktop.

The above CODE release notes explains how an end user can use this feature, this post is meant to document what new UNO commands and LOK APIs I added that serve as a backend for this. Especially the UNO commands are also useful in other contexts, like in macros or other extensions.

Results so far

Zotero can store citations using 3 markups in documents: fields (DOCX only), bookmarks (DOCX and ODT) and finally reference marks / sections (ODT only). The added plumbing allows several operations for all 3 cases, to work with existing documents using any of these markups.

The citation and the bibliography is handled the same way for fields (Writer's fieldmarks) and bookmarks. The last case uses reference marks for citations, but sections for the bibliography.

The following operations are supported:

  • create the citation / bibliography

  • read the object under the cursor

  • read all objects of a given type in the document

  • update the object under the cursor

  • update all objects of a given type in the document

  • delete all objects of a given type in the document

Reading is only available to LOK clients, you need to call the getCommandValues() API. The rest is normal UNO commands that you can invoke from document macros or extensions as well.

The added plumbing is the following:

Operation Fieldmark Bookmark Refmark Section
Create .uno:TextFormField .uno:InsertBookmark .uno:InsertField .uno:InsertSection
Read getCommandValues(".uno:TextFormField") getCommandValues(".uno:Bookmark") getCommandValues(".uno:Field") None
Read all getCommandValues(".uno:TextFormFields") getCommandValues(".uno:Bookmarks") getCommandValues(".uno:Fields") getCommandValues(".uno:Sections")
Update .uno:UpdateTextFormField .uno:UpdateBookmark .uno:UpdateField None
Update all .uno:TextFormFields .uno:UpdateBookmarks .uno:UpdateFields .uno:UpdateSections
Delete all .uno:DeleteTextFormFields .uno:DeleteBookmarks .uno:DeleteFields .uno:DeleteSections

How is this implemented?

If you would like to know a bit more about how this works, continue reading... :-)

As usual, the high-level problem was addressed by a series of small changes:

Want to start using this?

You can get a snapshot / demo of Collabora Office 22.05 and try it out yourself right now: try the unstable snapshot. Collabora intends to continue supporting and contributing to LibreOffice, the code is merged so we expect all of this work will be available in TDF's next release too (7.6).


Improved number portion formatting in Writer

Estimated read time: 3 minutes

Number portions generated when using lists/numberings/bullets in Writer now can have formatting which is preserved in ODT files as well.

First, thanks Docmosis for funding this work by Collabora.

Motivation

Word and DOCX files support explicit character properties for the paragraph marker, and these are also used for the formatting of a number portion if the paragraph has one. This was already loaded from / saved to DOCX, but it was lost when saving to ODT.

Results so far

First, we got a bug document, where the reference rendering and our rendering differed:

Reference (on the left) and our old (on the right) rendering, due to bookmarks

In this case, what happened was that part of the heading text was covered by a bookmark, so we first created multiple character ranges (outside the bookmark, inside the bookmark), then as an optimization we even unified them to be a single formatted character range, covering the entire paragraph. This was a document model that is different from the bookmark-free version, where the character formatting was set on the paragraph itself.

This was fixed at render time and at DOCX export time to consider both full-paragraph character ranges and in-paragraph character properties. For a while, this looked like the entire story, since this now looks good in Writer:

Our new rendering, handling bookmarks

A bit later another, related bug was discovered. Given a reference document:

Reference rendering of a second document

Just opening this DOCX file in Writer, it looked like this:

Old rendering in Writer

Note how the first number portion turned into bold! This was expected after the above layout change to consider full-paragraph formatted character ranges, but it also meant that Word can have one set of character formatting for the entire character range of a paragraph, and another for the paragraph marker.

To make the problem worse, this second document was showing that even the ODT export/export feature had problems, still:

Old rendering in Writer after ODT save + load

The fix to solve all of the above was to undo the previous render / DOCX export change, then teach the ODT export to explicitly save the paragraph marker formatting (as an empty span at the end of the text node) to ODT, and also to load it back.

This means that now Writer can render the second document correctly, without breaking the first document:

New rendering in Writer

How is this implemented?

If you would like to know a bit more about how this works, continue reading... :-)

As usual, the high-level problem was addressed by a series of small changes:

Want to start using this?

You can get a snapshot / demo of Collabora Office 22.05 and try it out yourself right now: try the unstable snapshot. Collabora intends to continue supporting and contributing to LibreOffice, the code is merged so we expect all of this work will be available in TDF's next release too (7.6).


Improved watermark in the PDF export

Estimated read time: 3 minutes

The PDF export now supports various additional properties for the optional PDF watermark.

First, thanks Docmosis for funding this work by Collabora.

Motivation

Rendering of a PDF watermark with custom rotation and color

When you hear the word "watermark", you probably have something like the above picture in mind.

Instead, what the PDF export had is more like a proof of concept:

Rendering of a PDF watermark with default settings

The request was to add new options to control the font size, font name, rotation angle and color of the watermark, so in case an organization already has a given style of watermarks they prefer, our PDF export can be adapted accordingly.

Results so far

First, now you can specify a custom color, e.g. gray (#7f7f7f), using:

soffice --convert-to pdf:writer_pdf_Export:'{"Watermark":{"type":"string","value":"draft"}, "WatermarkColor":{"type":"long","value":"8355711"}}' test.odt

Rendering of a PDF watermark with custom color

Then you can also customize the font size, in case the automatic size would not fit your needs, using:

soffice --convert-to pdf:writer_pdf_Export:'{"Watermark":{"type":"string","value":"draft"}, "WatermarkFontHeight":{"type":"long","value":"100"}}' test.odt

Rendering of a PDF watermark with custom font size

Or perhaps you want a serif font, not a sans one:

soffice --convert-to pdf:writer_pdf_Export:'{"Watermark":{"type":"string","value":"draft"}, "WatermarkFontName":{"type":"string","value":"Times"}}' test.odt

Rendering of a PDF watermark with custom font name

Finally you can have a custom rotate angle:

soffice --convert-to pdf:writer_pdf_Export:'{"Watermark":{"type":"string","value":"draft"}, "WatermarkRotateAngle":{"type":"long","value":"450"}}' test.odt

Rendering of a PDF watermark with custom rotation

Using these building blocks, you can also build combinations, the first screenshot above was created using:

soffice --convert-to pdf:writer_pdf_Export:'{"Watermark":{"type":"string","value":"draft"}, "WatermarkRotateAngle":{"type":"long","value":"450"}, "WatermarkColor":{"type":"long","value":"8355711"}}' test.odt

i.e. the configuration JSON is:

{
    "Watermark": {
        "type": "string",
        "value": "draft"
    },
    "WatermarkRotateAngle": {
        "type": "long",
        "value": "450"
    },
    "WatermarkColor": {
        "type": "long",
        "value": "8355711"
    }
}

How is this implemented?

If you would like to know a bit more about how this works, continue reading... :-)

As usual, the high-level problem was addressed by a series of small changes:

Want to start using this?

You can get a snapshot / demo of Collabora Office 22.05 and try it out yourself right now: try the unstable snapshot. Collabora intends to continue supporting and contributing to LibreOffice, the code is merged so we expect all of this work will be available in TDF's next release too (7.5).


Content controls in Writer: titles and tags

Estimated read time: 3 minutes

Writer now supports titles and tags for content controls, which helps providing context for the filled in text even if the placeholder text is replaced already.

This work is primarily for Collabora Online, see the previous post for background.

Motivation

Rendering of a title for a content control

Once several content controls are added to a document, it's easy to forget what was the exact purpose of what content control. Think of a press release for example – those regularly start with a location and a date, but once this information is provided, one no longer knows which content control was for which content.

Titles solve this problem for the user: similar to Writer's header/footer buttons, this button appears when you enter the content control and reminds you what content is expected there, even if the placeholder is already replaced.

Tags serve a similar purpose, but they are unique, machine-readable keys or identifiers, so once the form is filled in, an external consumer can easily extract the information from the document, given a specific tag.

Results so far

Titles (also known as aliases) and tags are now not only preserved, but also we have a UI to create, show, edit and delete them. This is available in the desktop rendering and also in the LOK API.

Somewhat related, in case a content control breaks into multiple lines or has formatting to break into multiple text portions, we now only emit one PDF widget for it, taking the description of the widget from the content control's title.

The last related improvement is that now we handle data binding for date content controls, which means that you can specify a timestamp, a language and a date format, and we'll format that timestamp and update the content control's string value at import time from DOCX.

How is this implemented?

If you would like to know a bit more about how this works, continue reading... :-)

As usual, the high-level problem was addressed by a series of small changes:

Want to start using this?

You can get a snapshot / demo of Collabora Office 22.05 and try it out yourself right now: try the unstable snapshot. Collabora intends to continue supporting and contributing to LibreOffice, the code is merged so we expect all of this work will be available in TDF's next release too (7.5).


Content controls in Writer: PDF export and combo box type

Estimated read time: 3 minutes

Writer now supports exporting content controls to PDF and a 7th content control type: it is possible to differentiate between drop-downs and combo boxes.

This work is primarily for Collabora Online, see the previous post for background.

Motivation

PDF export of Writer content controls into PDF

Writer users can create fillable forms using content controls, but the PDF export only contained the plain text representation of them. PDF can also have fillable widgets for form filling, so it's logical to map content controls to PDF widgets.

A perfect mapping is not possible, since PDF widgets are always a single rectangle and Writer content controls is a list of rectangles (text portions), but this doesn't cause a problem in most cases. The size of the PDF rectangle is determined based on the placeholder's size from Writer.

Benefits includes not having to insert a control shape, manually positioned to look like it's in line with the surrounding text. An other benefit is that this way the widget's style (font name, size, etc) can be specified using Writer styles, not with shape properties. It's also interesting that Word itself doesn't seem to support content controls in its PDF export at the time of writing.

Results so far

PDF export now automatically turns Writer content controls into fillable widgets for the rich text, plain text, checkbox, drop-down, combo box and date types.

Combo box itself is a new type: now combo box content can be either free-form or one of its list items, while drop-down can only be one of its list items.

How is this implemented?

If you would like to know a bit more about how this works, continue reading... :-)

As usual, the high-level problem was addressed by a series of small changes:

Want to start using this?

You can get a snapshot / demo of Collabora Office 22.05 and try it out yourself right now: try unstable snapshot. Collabora intends to continue supporting and contributing to LibreOffice, the code is merged so we expect all of this work will be available in TDF's next release too (7.5).


Cropped video for media shapes in Impress

Estimated read time: 3 minutes

Impress now supports cropped videos in slide edit mode and during slideshow for documents imported from PowerPoint.

First, thanks to our partner SUSE for working with Collabora to make this possible.

Motivation

PowerPoint-style cropped video in Impress

PowerPoint handles videos by taking a preview bitmap from the video, and then it allows users to apply various effects on that bitmap, like cropping. The complex aspect of this is that such filters are also respected while playing the video as well.

Impress didn't store such properties on the media shape, which lead to distorted aspect ratio when playing some cropped videos from PPTX files. This lead to this preview in Impress before the work:

Video with lost cropping in Impress

Results so far

The first problem was that the Impress preview was picked from the 3rd second of the video (presumably to avoid a black preview in many videos that start with a short black fade-in), while PowerPoint can store an explicit preview from the video (seems to be the first frame), so no matter what effects you apply, the previews were just different as the source bitmap was different. This could be fixed by looking for an explicitly provided bitmap for the video first, and only then asking the various avmedia/ backends to produce a preview.

Once the preview's initial bitmap was OK, it was necessary to take cropping into account. This was first done for the preview bitmap, and then also for the gstreamer backend (the relevant one for Linux, as a start) of avmedia/, which is responsible for the actual video playback. The gstreamer bits were done by first creating a videocrop element and then connecting that to the existing playbin.

With these sorted out, we get rendering which matches the reference:

Cropped video in PowerPoint

The last step was to load/save the explicit preview and the crop from/to ODF as well, not only PPTX. We use a markup like this to store the information:

<style:style style:name="gr1">
  <style:graphic-properties fo:clip="rect(0cm, 1.356cm, 0cm, 1.356cm)"/>
</style:style>

And now that the gr1 automatic style is defined, we can refer to it from a media shape:

<draw:frame draw:name="test" draw:style-name="gr1">
  <draw:plugin xlink:href="..." xlink:type="simple" xlink:show="embed" xlink:actuate="onLoad" draw:mime-type="application/vnd.sun.star.media">
    ...
  </draw:plugin>
  <draw:image xlink:href="Pictures/MediaPreview1.png"/>
</draw:frame>

The nice property of this markup is that automatic styles are already used for other shapes and image previews are also used for e.g. table shapes, so this is just using existing markup in a new context, but the ODF spec already allows this markup.

How is this implemented?

If you would like to know a bit more about how this works, continue reading... :-)

As usual, the high-level problem was addressed by a series of small changes:

User interface to create such a crop, support for other video effects (e.g. black-and-white) and other backends (Windows, macOS) could be done, but is future work currently.

Want to start using this?

You can get a snapshot / demo of Collabora Office and try it out yourself right now: try unstable snapshot. Collabora intends to continue supporting and contributing to LibreOffice, the code is merged so we expect all of this work will be available in TDF's next release too (7.5).


Content controls in Writer: the plain text type

Estimated read time: 2 minutes

Writer now supports a 6th content control type: it is possible to insert a plain text content control.

This work is primarily for Collabora Online, done as a HackWeek project, but the feature is fully available in desktop Writer as well.

Motivation

Word-style plain text content control, user interface

Writer users can put a content control around a piece of rich text, see Content controls in Writer: dropdown, picture and date types for the first five types.

The next step in this journey is plain text: even if one of the big advantages of content controls over input fields is that they allow rich formatting, sometimes you want to restrict this. For example, if one has to fill in their name, then it makes no sense to mark the family name as bold while leaving the given name as non-bold. This would just lead to inconsistent look.

Results so far

There is now a new Form → Content Controls → Insert Plain Text Content Control menu item to create a plain text content control. If you try to make a selection that is a subset of the text inside the content control and you try to format it, the whole text in the content control is formatted to maintain the invariant that plain text has no formatting itself, just the formatting of the whole content control.

As usual, you can delete this content control later. You can also load/save it to ODT/DOCX and it's preserved.

How is this implemented?

If you would like to know a bit more about how this works, continue reading... :-)

As usual, the high-level problem was addressed by a series of small changes:

Want to start using this?

You can get a snapshot / demo of Collabora Office 22.05 and try it out yourself right now: try unstable snapshot. Collabora intends to continue supporting and contributing to LibreOffice, the code is merged so we expect all of this work will be available in TDF's next release too (7.5).


Document themes in Impress: shape fill

Estimated read time: 4 minutes

Impress now has the next step of document theme support: it is possible to refer to the theme colors from shape fill colors (including effects).

This work is primarily for Collabora Online, but the feature is fully available in desktop Impress as well.

Motivation

PowerPoint-style themed shape fill, user interface

PowerPoint users can attach a set of colors (and fonts, etc.) to master pages, and then refer to these in many areas, like shape text or shape fill. It was already possible to define theme colors and refer to them from shape text (see Start of document themes in Impress: shape text for details).

The next step in this journey is shape fill: if your shape is filled with some color, it can be a theme color, as visible on the above screenshot. One interesting aspect of this is that the default shape fill color can now depend on the master page, and it may not be the same for all slides (this is what would happen with styles, when not using theming).

Results so far

Here is a demo that shows how it works:

If one opens the svx/qa/unit/data/theme.pptx test file from core.git, it looks like this:

PowerPoint-style themed shape fill, after opening

The middle row has 3 rounded rectangles: the first is filled with the 'Accent 1' color, the second is the same, but 60% lighter and finally the last one is the same, but 25% darker.

Here is how you can change what the 'Accent 1' color is:

  • Click 'Master View' on the sidebar to go to the master of the current slide.
  • Right click -> 'Slide Properties' opens the 'Slide Properties' dialog.
  • The 'Theme' page has an 'Accent 1' row, with a blue color.
  • Change that to an orange color: click on the 'Accent 1' drop-down, then select 'Theme colors', finally the 6th choice is orange in the first row -- this comes from the document's theme.
  • Click 'OK' to close the dialog, followed by 'Close Master View' on the sidebar.

Here is how your shapes now look like:

PowerPoint-style themed shape fill, after changing the theme

What you can see here is that the color effect (darker, lighter, default) of the rounded rectangles' fill color was preserved, but all the blue colors are replaced with orange.

As a cherry on the cake, now if you insert a new shape, that will also have an orange fill color by default as well.

You can see how this is useful when designing templates: a designer can create something good-looking, and all you have to do is to set the theme to the colors of your organization, and you're done.

How is this implemented?

If you would like to know a bit more about how this works, continue reading... :-)

As usual, the high-level problem was addressed by a series of small changes:

Want to start using this?

You can get a snapshot / demo of Collabora Office 22.05 and try it out yourself right now: try unstable snapshot. Collabora intends to continue supporting and contributing to LibreOffice, the code is merged so we expect all of this work will be available in TDF's next release too (7.4).


Content controls in Writer: dropdown, picture and date types

Estimated read time: 5 minutes

Writer already had rich text and checkbox content controls: a new way to set properties on a piece of text, primarily for form filling purposes. This feature now gained 3 additional types: dropdown, picture and date picker types. This improves compatibility with the DOCX format: there are now 5 inline content control types we can now import.

https://share.vmiklos.hu/blog/sw-content-controls2/feature.png
Figure 1. Word-style inline content controls in Writer.

First, thanks to NGI DAPSI who made this work by Collabora possible.

https://share.vmiklos.hu/blog/sw-content-controls/eu.png
Figure 2. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 871498

Motivation

Word users expect to be able to import their document to Writer and experience a matching feature set: form filling is not an exception. Word provides several content control kinds (inline, block, row and cell content controls), this project focuses on inline ("run") content controls.

In the scope of inline content controls, the above linked blog post already described the rich text and checkbox types. In this post, we’ll focus on the new dropdown, picture and date content controls.

You might wonder why content controls are useful, since Writer already has form controls and fieldmarks, which provide something similar. Here are some benefits:

  • Dropdown content controls have a list of dropdown items. Each item is a display-text and value pair, allowing to differentiate between a human-readable string and a machine-readable value. Fieldmarks only handled (machine-readable) values, resulting in document text different from Word.

  • Picture content controls allow the author of a form to pre-format the image before the filler of the form inserts the actual image. Writer already had placeholder fields for images in the past, but that was just text, allowing image format only after insertion of the actual image.

  • Date content controls were emulated with Writer fieldmarks in the past, which created trouble during export, since Word itself doesn’t have a date form-field.

Results

The feature consists of menu items to insert dropdown/picture/date content controls, and then you can interact with the inserted content controls or with their properties:

https://share.vmiklos.hu/blog/sw-content-controls2/menu.png
Figure 3. Menu items to insert drop-down, picture and date content controls.

Drop-down content controls show a dropdown button when you’re inside the content control:

https://share.vmiklos.hu/blog/sw-content-controls2/dropdown.png
Figure 4. A drop-down content control.

This is similar to dropdown fields, just allows display-text and value pairs, not limited to just values.

Picture content controls contain a single as-character image, but you can interact with them: clicking on the content control opens the file open dialog to provide a replacement for the placeholder:

https://share.vmiklos.hu/blog/sw-content-controls2/picture.png
Figure 5. Picture controls.

And these content controls can be saved to ODT and DOCX.

There is also a content control properties dialog, which allows setting if the content controls are in placeholder mode or not:

https://share.vmiklos.hu/blog/sw-content-controls2/properties.png
Figure 6. Content control properties.

It has additional widgets for dropdowns. There is UI to create, update or delete these list items:

https://share.vmiklos.hu/blog/sw-content-controls2/properties-inner.png
Figure 7. Content control properties inner UI for list items.

How is this implemented?

If you would like to know a bit more about how this works, continue reading… :-)

As usual, the high-level problem was addressed by a series of incremental commits:

To make this more interesting, Rashesh Padia of Collabora continued exposing this in Collabora Online, see the PR at https://github.com/CollaboraOnline/online/pull/4803.

Want to start using this?

You can get a snapshot / demo of Collabora Office 22.05 and try it out yourself right now: try unstable snapshot. Collabora intends to continue supporting and contributing to LibreOffice, the code is merged so we expect all of this work will be available in TDF’s next release too (7.4).


Content controls in Writer

Estimated read time: 4 minutes

Writer now has the start of content controls: a new way to set properties on a piece of text, primarily for form filling purposes. This feature improves compatibility with the DOCX format: inline content control types "rich text" and "checkbox" are the first two types we can now import.

https://share.vmiklos.hu/blog/sw-content-controls/feature.png
Figure 1. Word-style inline content controls in Writer.

First, thanks to NGI DAPSI who made this work by Collabora possible.

https://share.vmiklos.hu/blog/sw-content-controls/eu.png
Figure 2. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 871498

Motivation

Word users expect to be able to import their document to Writer and experience a matching feature set: form filling is not an exception. Word provides several content control kinds (inline, block, row and cell content controls), this project focuses on inline ("run") content controls.

In the scope of inline content controls, the plan is to support rich text, checkbox, dropdown, picture and date content controls. This blog post presents the already implemented rich text and checkbox types.

You might wonder why content controls are useful, since Writer already has form controls and fieldmarks, which provide something similar. Here are some properties of content controls, which make them incompatible with field-based fillable forms or form controls:

  • inline content controls can’t span over multiple paragraphs, while this is allowed for fieldmarks (bookmark-based fields)

  • content controls must be well-formed XML elements, this allows nesting (while Writer fields can’t be nested), but does not allow the start/end position to be a random place in the document (while this is allowed for fieldmarks, which have separate XML elements for start and end)

  • content controls just have a set of properties, while fieldmarks are supposed to have a field command and a result (with a separator between the two)

  • content controls can contain rich text (full set of character formatting), while Writer fields can only have one character formatting (e.g. half of the field can’t be bold)

Results

The feature consists of menu items to insert rich text or checkbox content controls, and then you can interact with the inserted content controls:

https://share.vmiklos.hu/blog/sw-content-controls/menu.png
Figure 3. Menu items to insert rich text and checkbox content controls.

Rich text content controls simply show an indicator when you’re inside the content control:

https://share.vmiklos.hu/blog/sw-content-controls/rich-text.png
Figure 4. A rich text content control.

This is similar to input fields, just allows rich text content, not limited to plain text.

Checkbox content controls contain a single character, but you can interact with them: clicking on the content control toggles the checked state of the checkbox:

https://share.vmiklos.hu/blog/sw-content-controls/checkbox.png
Figure 5. Checkbox content controls.

And these content controls can be saved to ODT and DOCX.

How is this implemented?

If you would like to know a bit more about how this works, continue reading… :-)

As usual, the high-level problem was addressed by a series of incremental commits:

To make this even more interesting, Rashesh Padia of Collabora started exposing this in Collabora Online, see the PR at https://github.com/CollaboraOnline/online/pull/4703.

Want to start using this?

You can get a snapshot / demo of Collabora Office 2022 and try it out yourself right now: try unstable snapshot. Collabora intends to continue supporting and contributing to LibreOffice, the code is merged so we expect all of this work will be available in TDF’s next release too (7.4).

© Miklos Vajna. Built using Pelican. Theme by Giulio Fidente on github.