Estimated read time: 2 minutes
ODT → XHTML conversion
The focus here was really simple documents, like just one sentence with minimal formatting. The use-case is to have thousands of these simple documents, only a minority containing complex formatting, the rest is just that simple.
Performance work usually focuses on one specific complex feature, e.g. lots of bookmarks, lots of document-level user-defined metadata, and so on — this way there were room for improvements when it comes to trivial documents.
I managed to reduce the cost of the conversion to the fifth of the original cost in both directions — the chart above shows the impact of my work for the ODT → XHTML direction. The steps that helped:
-
Recognize
XHTML
as a value for theFilterOptions
key in theHTML (StarWriter)
export filter, this way avoid the need to go via XSLT, which would be expensive. -
Add a new
NoFileSync
flag to theframe::XStorable::storeToURL()
API, so that if you know you’ll read the result after the conversion finished, you can avoid an expensivefsync()
call for each and every file, which helps HDDs a lot, while means no overhead for SSDs. -
If you know your input format already, then specifying an explicit
FilterName
key for theframe::XComponentLoader::loadComponentFromURL()
API helps not spending time to detect the file format you already know.
Note that the XHTML mode for the Writer HTML export is still a work in progress, but it already produces valid output for such simple documents.
XHTML → ODT conversion
The chart above shows the results of my work for the XHTML → ODT direction. The steps to get to the final reduced cost were:
-
The new
NoFileSync
flag, as mentioned previously. -
A new
NoThumbnail
flag, which is useful if the ODT will be part of a next step in the pipeline and you know that the thumbnail image won’t be used anyway. -
The default table autoformat definitions in Writer are now lazy-loaded. (This is my favorite one, you don’t have to opt-in for this, so everyone benefits.)
-
A new
HiddenForConversion
flag forframe::XComponentLoader::loadComponentFromURL()
, which means we don’t lay out the UI elements (toolbars, sidebar, status bar, etc.) when we know the purpose of the document load is only to save the document model in an other format.
All this is available in master (towards LibreOffice 6.1), or you can grab a daily build and try it out right now. :-)