Index ¦ Archives ¦ RSS > Category: libreoffice ¦ RSS

SmartArt improvements in LibreOffice, part 3

Posted on: Fri 04 January 2019

Estimated read time: 2 minutes

I recently dived into the SmartArt support of LibreOffice, which is the component responsible for displaying complex diagrams from PPTX. I focus on the case when only the document model and the layout constraints are given, not a pre-rendered result.

First, thanks to our partner SUSE for working with Collabora to make this possible.

Continuous Block Process, Accent Process and Organization Chart

In this post I would like to present the progress done last month regarding the above mentioned diagram types — these are used in many documents.

The improvement (as always) come in small incremental steps:

Continuous Block Process now reads space width from constraints.
Accent Process now has the missing bullets and fixes an incorrect large paragraph-level indent.
Organization Chart now has an initial implementation of the hierRoot and hierChild algorithms.
Organization Chart now handles multiple employees for a manager.

With all these fixed, we reach a much better state for the mentioned diagram types.

Results so far

The SmartArt test documents from sd/qa/unit/data/pptx/ is what I used for testing this work.

Here is how the baseline, the current and the reference rendering of Accent Process looks like:

smartart-accent-process.pptx, baseline

smartart-accent-process.pptx, current

smartart-accent-process.pptx, reference

And here is how the baseline, the current and the reference rendering of Organization Chart looks like:

smartart-org-chart.pptx, baseline

smartart-org-chart.pptx, current

smartart-org-chart.pptx, reference

This is not not perfect yet, but it’s clearly a large improvement, all text is now readable from the diagrams and bullets are no longer missing!

All this is available in master (towards LibreOffice 6.3), so you can grab a daily build and try it out right now. :-)

SmartArt improvements in LibreOffice, part 2

Posted on: Tue 04 December 2018

Estimated read time: 2 minutes

I recently dived into the SmartArt support of LibreOffice, which is the component responsible for displaying complex diagrams from PPTX. I focused especially on the case when only document model and the layout constraints are given, not a pre-rendered result.

First, thanks to our partner SUSE for working with Collabora to make this possible.

Accent Process

In this post I would like to present the progress regarding the Accent Process preset, available in PowerPoint — which is used in many documents.

This exposed several shortcomings of the current diagram layout we have in LibreOffice:

Values are not read from constraints (there was a reason for this, they can be complex, given that depending on the context, the unit is points or millimeters and the unit is always implicit).
ZOrder offsets were ignored.
Linear algorithm did not take size from constraints when it came to recursing into child algorithms.
Data point assumed that all text for it is a single "run" (i.e. either all text is bold or nothing, not half of it).
followSib axis was not implemented for forEach, so when you have arrow shapes between objects, we created N arrows, not N - 1 ones.
Connectors were created as invisible shapes and had the wrong width/height aspect.

With all these fixed, we reach a much better state for handling accent process.

Results so far

smartart-accent-process.pptx is what I used for testing of this work.

Here is how the baseline, the current and the reference rendering of the test documents look like:

smartart-accent-process.pptx, baseline

smartart-accent-process.pptx, current

smartart-accent-process.pptx, reference

This is not not perfect yet, but it’s clearly a large improvement, all text is now readable from the diagram!

All this is available in master (towards LibreOffice 6.3), so you can grab a daily build and try it out right now. :-)

SmartArt improvements in LibreOffice

Posted on: Mon 05 November 2018

Estimated read time: 2 minutes

I recently dived into the SmartArt support of LibreOffice, which is the component responsible for displaying complex diagrams from PPTX, especially in case only document model and the layout constraints are given, not a pre-rendered result.

First, thanks to our partner SUSE for working with Collabora to make this possible.

The problem

There are several ones. :-) If you are just interested in high quality viewing of PPTX files, then your problem started with PowerPoint 2007 not writing a pre-rendered drawingML markup of the diagram to the files, only PowerPoint 2010 started behaving like this. Additionally, if a diagram is not edited, then re-saving with PowerPoint 2010 doesn’t seem to generate the drawingML markup, either. This means that data + constraints cases are quite frequent even today.

Also, one day Impress should be able to actually edit these SmartArts as well, so having the knowledge how to lay out SmartArt (even if it’s import-time-only at the moment) is a good thing.

Results so far

I always write cppunit tests when I work on filter code (in this case OOXML), so far all fixes were visible in just two test files: smartart-vertial-box-list.pptx and vertical-bracket-list.pptx.

Here is how the baseline, the current and the reference rendering of these test documents look like:

smartart-vertial-box-list.pptx, baseline

smartart-vertial-box-list.pptx, current

smartart-vertial-box-list.pptx, reference

vertical-bracket-list.pptx, baseline

vertical-bracket-list.pptx, current

vertical-bracket-list.pptx, reference

In terms of code commits, the fixes are split into several ones:

Clearly the results are not perfect yet, but in both cases nothing was visible, and now all text is readable, so we’re moving in the right direction!

All this is available in master (towards LibreOffice 6.2), so you can grab a daily build and try it out right now. :-)

Text layout performance in LibreOffice conference lightning talk

Posted on: Mon 01 October 2018

Estimated read time: 1 minutes

https://farm2.staticflickr.com/1914/44115509595_ab58bf01f5_z.jpg

Last Friday I gave a Text layout performance lightning talk at LibreOffice Conference 2018. Click on the image to get the hybrid PDF slides!

ReqIF import/export in LibreOffice Writer conference talk

Posted on: Fri 28 September 2018

Estimated read time: 1 minutes

https://farm2.staticflickr.com/1939/44246333744_a805435168_z.jpg

Earlier today I gave an Editing ReqIF-XHTML fragments with Writer talk at LibreOffice Conference 2018. The room was well-crowded — perhaps because the previous talk was about OOXML interoperability. ;-)

I expect quite some other slides will be available on Planet, don’t miss them.

API improvements in LibreOffice

Posted on: Wed 08 August 2018

Estimated read time: 3 minutes

https://farm2.staticflickr.com/1793/43859007612_ca207fed0f_o.png

I worked on two small features to extend the public (UNO) API of LibreOffice. First, thanks to Vector for funding Collabora to make this possible.

Aliased paths and text in the PNG export for Draw

The UNO API of Draw allows you to build quite complex and custom shapes, but you may want to export the rendered result to a bitmap for testing purposes, so you can assert that the actual result matches a reference one.

One problem in this area is anti-aliasing, which can easily differ between machines. Given that normally aliased rendering is ugly, there is now a way to enable AA, but disable it just during a single invocation of the PNG exporter.

The above picture shows how the AA result looks like. You could write a Basic macro like this to trigger the PNG export from Draw:

xExporter = createUnoService("com.sun.star.drawing.GraphicExportFilter")
xExporter.SetSourceDocument(ThisComponent.DrawPages(0))
Dim aArgs(1) As new com.sun.star.beans.PropertyValue
aArgs(0).Name  = "URL"
aArgs(0).Value = "file:///tmp/debug/aa.png"
aArgs(1).Name  = "MediaType"
aArgs(1).Value = "image/png"
xExporter.filter(aArgs())

Let’s see how it looks like if you turn AA off:

https://farm2.staticflickr.com/1832/43859007522_aeb4516f02_o.png

You just need to specify a new Antialiasing key under FilterData:

Dim aFilterData(0) As new com.sun.star.beans.PropertyValue
aFilterData(0).Name = "AntiAliasing"
aFilterData(0).Value = False
xExporter = createUnoService("com.sun.star.drawing.GraphicExportFilter")
xExporter.SetSourceDocument(ThisComponent.DrawPages(0))
Dim aArgs(2) As new com.sun.star.beans.PropertyValue
aArgs(0).Name  = "URL"
aArgs(0).Value = "file:///tmp/debug/non-aa.png"
aArgs(1).Name  = "FilterData"
aArgs(1).Value = aFilterData()
aArgs(2).Name  = "MediaType"
aArgs(2).Value = "image/png"
xExporter.filter(aArgs())

You can imagine which rendering result is easier to debug when the reference and the actual bitmap doesn’t match. ;-)

Note	This feature is available for other bitmap formats as well, PNG is only an example.

Default character style in Writer

In most cases you don’t really need a default character style: if you’re fine with a default, then the default paragraph style should be enough for your needs. In general, paragraph styles can contain character properties, so if the default is fine for you, you just don’t set a character style.

However, there is an exception to all rules. If you want to reset the current character style, it makes sense to just set the CharStyleName property to a default value, especially since this works with paragraph styles already.

Now you can write C++ code like this (see SwUnoWriter::testDefaultCharStyle() for a full example):

xCursorProps->setPropertyValue("CharStyleName", uno::makeAny(OUString("Standard")));

And it’ll be handled as Default Style in English builds, or their localized versions in a non-English UI.

All this is available in master (towards LibreOffice 6.2), or you can grab a daily build and try it out right now. :-)

Editing ReqIF-XHTML fragments in LibreOffice Writer

Posted on: Tue 05 June 2018

Estimated read time: 2 minutes

https://farm2.staticflickr.com/1752/28703474708_bde744fb13_o.png

I worked on a small feature to use Writer as an editor for the XHTML fragments inside Requirements Interchange Format (ReqIF) files. First, thanks to Vector for funding Collabora to make this possible.

Writer already supported XHTML import and export before (see my previous post) as a special mode of the HTML filter, this work builds on top of that. The main speciality around XHTML as used for fragments inside a ReqIF file is embedded objects.

The special mode to opt-in for ReqIF-XHTML behavior can actived like this:

during import: --infilter="HTML (StarWriter):xhtmlns=reqif-xhtml"
during export: -convert-to "xhtml:HTML (StarWriter):xhtmlns=reqif-xhtml"

Three different cases are handled:

Image with native data we don’t understand and just preserve.
Image with OLE2 data, which we hand out to external applications (at least on Windows). On the above video this is an embedded PPSX file, handled by PowerPoint.
Image with ODF data, which we handle internally. This is a Draw document on the above video.

Regarding how it works, the import is a series of unwrapping containers till you get to the real data and the export is the opposite of this. Here are the layers:

Larger ReqIF files have the .reqifz extension, and are ZIP files containing an XML file, having the XHTML fragments. This is not relevant for this post, as Writer assumes that extracting the XHTML fragment from ReqIF is done before you load the content into Writer.
XHTML always has a PNG image for the object, and optionally it has RTF as native data for the object.
The RTF file is a fragment, containing just an embedded OLE1 container.
The OLE1 container is just a wrapper around the real OLE2 container.
The OLE2 container either has the data directly or MSO has a convention on how to include OOXML files in it (see the PPSX example above), and we handle that.

On export we do the opposite: save the file, put it into OLE2, then into OLE1, then into RTF, finally into XHTML.

There is no specification on how to put ODF files into OLE2, so I extracted the relevant code from LibreOffice’s binary MSO filters and now the Writer HTML filter uses that as well. This avoids code duplication and also could avoid inventing some new markup this way.

All this is available in master (towards LibreOffice 6.2), or you can grab a daily build and try it out right now. :-)

Lazy reading images from Microsoft formats in LibreOffice

Posted on: Fri 04 May 2018

Estimated read time: 1 minutes

I worked on improving document load performance of Microsoft formats in general, and DOC/DOCX in particular in LibreOffice recently. First, thanks to TDF and users that support the foundation by providing donations for funding Collabora to make this possible.

I built on top of the great work of Tomaz, focusing on these secondary, but important formats.

The idea is that if you load an Microsoft binary or OOXML file, it should not be necessary to parse all images at load time, it’s enough to lazy read it when we first render e.g. a Writer page containing that image.

The focus here was documents containing large images. I tested with an Earth photo of size 8000x8000 pixels from NASA, making little modifications to it, so each picture has a different checksum, embedding them into a binary DOC file.

https://farm1.staticflickr.com/980/41838412652_c1cbefcfc1_o.png

I measured the time from the soffice process startup to rendering the first page. We defer the work of loading most images now, as you can see on the chart. In contrast, we used to decompress all images on file import in the past. This means the new cost for e.g. 4 images is 37% of the original.

All this is available in master (towards LibreOffice 6.1), or you can grab a daily build and try it out right now. :-)

LibreOffice Hamburg Hackfest 2018

Posted on: Tue 10 April 2018

Estimated read time: 2 minutes

(via Sweet5hark)

I arrived home from Hamburg yesterday where I participated in the LibreOffice hackfest over the weekend as a mentor. First, thanks to The Document Foundation — and all the donors for funding Collabora to make this possible.

There were a few topics I mentored:

Patrick was interested fixing tdf#116486, which required some background knowledge on the Writer document model and layout, so we explored the relevant details together towards providing an actual patch for the bug.
Nithin wanted to fix tdf#112384, which turned out to be an ideal task for a hackfest. On one hand, the scope is limited so that you can implement this mini-feature over a weekened. On the other hand, it required touching various parts of Writer (UI, document model, UNO API, ODF filter), so it allowed seeing the process of adding a new feature. The patch is merged to master.
Linus looked for a task that is relatively easy, still useful, we looked at tdf#42949, and he identified and removed a number of unused includes himself. This should especially help with slow incremental builds. Again, the patch is already in master.
Zdeněk (raal) wanted to write a uitest for tdf#106280 so we were figuring out together how to select images from pyuno and how to avoid using graphic URLs in uitests in general.

The full list of achievements is on the wiki, if you were at the hackfest and you did not contribute to that section, please write a line about what did you hack on. :-)

Finally, thanks for the organizers and the sponsors of the hackfest, it was a really great event!

Optimizing ODT ↔ XHTML conversion performance for simple documents

Posted on: Fri 02 March 2018

Estimated read time: 2 minutes

I worked on improving the ODT ↔ XHTML conversion performance for simple documents in LibreOffice recently. First, thanks to Vector for funding Collabora to make this possible.

ODT → XHTML conversion

https://farm5.staticflickr.com/4605/26697712598_2ace3f45a3_o.png

The focus here was really simple documents, like just one sentence with minimal formatting. The use-case is to have thousands of these simple documents, only a minority containing complex formatting, the rest is just that simple.

Performance work usually focuses on one specific complex feature, e.g. lots of bookmarks, lots of document-level user-defined metadata, and so on — this way there were room for improvements when it comes to trivial documents.

I managed to reduce the cost of the conversion to the fifth of the original cost in both directions — the chart above shows the impact of my work for the ODT → XHTML direction. The steps that helped:

Recognize XHTML as a value for the FilterOptions key in the HTML (StarWriter) export filter, this way avoid the need to go via XSLT, which would be expensive.
Add a new NoFileSync flag to the frame::XStorable::storeToURL() API, so that if you know you’ll read the result after the conversion finished, you can avoid an expensive fsync() call for each and every file, which helps HDDs a lot, while means no overhead for SSDs.
If you know your input format already, then specifying an explicit FilterName key for the frame::XComponentLoader::loadComponentFromURL() API helps not spending time to detect the file format you already know.

Note that the XHTML mode for the Writer HTML export is still a work in progress, but it already produces valid output for such simple documents.

XHTML → ODT conversion

https://farm5.staticflickr.com/4608/39674632615_de78265c7f_o.png

The chart above shows the results of my work for the XHTML → ODT direction. The steps to get to the final reduced cost were:

The new NoFileSync flag, as mentioned previously.
A new NoThumbnail flag, which is useful if the ODT will be part of a next step in the pipeline and you know that the thumbnail image won’t be used anyway.
The default table autoformat definitions in Writer are now lazy-loaded. (This is my favorite one, you don’t have to opt-in for this, so everyone benefits.)
A new HiddenForConversion flag for frame::XComponentLoader::loadComponentFromURL(), which means we don’t lay out the UI elements (toolbars, sidebar, status bar, etc.) when we know the purpose of the document load is only to save the document model in an other format.

All this is available in master (towards LibreOffice 6.1), or you can grab a daily build and try it out right now. :-)