Estimated read time: 2 minutes
ODT → XHTML conversion
The focus here was really simple documents, like just one sentence with minimal formatting. The use-case is to have thousands of these simple documents, only a minority containing complex formatting, the rest is just that simple.
Performance work usually focuses on one specific complex feature, e.g. lots of bookmarks, lots of document-level user-defined metadata, and so on — this way there were room for improvements when it comes to trivial documents.
I managed to reduce the cost of the conversion to the fifth of the original cost in both directions — the chart above shows the impact of my work for the ODT → XHTML direction. The steps that helped:
XHTMLas a value for the
FilterOptionskey in the
HTML (StarWriter)export filter, this way avoid the need to go via XSLT, which would be expensive.
Add a new
NoFileSyncflag to the
frame::XStorable::storeToURL()API, so that if you know you’ll read the result after the conversion finished, you can avoid an expensive
fsync()call for each and every file, which helps HDDs a lot, while means no overhead for SSDs.
If you know your input format already, then specifying an explicit
FilterNamekey for the
frame::XComponentLoader::loadComponentFromURL()API helps not spending time to detect the file format you already know.
Note that the XHTML mode for the Writer HTML export is still a work in progress, but it already produces valid output for such simple documents.
XHTML → ODT conversion
The chart above shows the results of my work for the XHTML → ODT direction. The steps to get to the final reduced cost were:
NoFileSyncflag, as mentioned previously.
NoThumbnailflag, which is useful if the ODT will be part of a next step in the pipeline and you know that the thumbnail image won’t be used anyway.
The default table autoformat definitions in Writer are now lazy-loaded. (This is my favorite one, you don’t have to opt-in for this, so everyone benefits.)
frame::XComponentLoader::loadComponentFromURL(), which means we don’t lay out the UI elements (toolbars, sidebar, status bar, etc.) when we know the purpose of the document load is only to save the document model in an other format.
All this is available in master (towards LibreOffice 6.1), or you can grab a daily build and try it out right now. :-)