The recent rework of OpenOffice’s binary MS Word .doc (WW8) export
filter made it easy to extend for other MS Word formats, like DOCX. In
particular, using the advantages of the shared code to implement a new
RTF export filter based on this rework is certainly possible to
implement - and this summer I would like to work on this.
2.1. Detailed Description
2.1.1. Full description
OpenOffice Writer’s export functionality basically has two forms. The
old style ones are C++ filters are subcalled from the Writer class
(for example: SwRTFWriter for RTF, SwASCWriter for ASCII, SwHTMLWriter
for HTML). The new style ones are implemented as an UNO component (for
example: WW8Export for DOC, DocxExport for DOCX). There are currently
two general problems with the RTF filter:
-
It is an old style filter and that interface is meant to be removed
sooner or later.
-
It is an MS Word format, but it does not use any shared code from
MSWordExportBase, the ancestor of WW8Export and DocxExport.
Porting SwRTFWriter to become a subclass of MSWordExportBase is not yet
done. This summer I would like to implement it.
Additionally - as time permits, after porting is done - I want to pick
up features from RTF specification which are not yet supported by
OpenOffice’s RTF export filter and add support for them.
2.1.2. Benefits
When the project is completed, users who use the RTF export filter can
have a better one: from a technical view it will use a more modern API,
from a practical view, it’ll be improved in general.
2.1.3. Motivation
I got the idea of working on this because I remembered I had a few
problems with OpenOffice’s RTF export facility when I needed it at the
university: I can remember for example I had problems with justified
lines. (I haven’t checked the issue in detail, it just gave me a feeling
that improving RTF support would be a good idea in general.)
2.1.4. Implementation design
Regarding implementation, I want to take the recently reworked DOC /
DOCX filters as an example and implement the new RTF filter in a similar
way - but of course heavily based on the current SwRTFWriter
implementation.
I don’t know exact details, because I haven’t (yet) read the RTF
specification nor the relevant part of the OpenOffice API, but I expect
to read and understand the code from the sw/source/filter/rtf directory
and I want to place the new RTF code under sw/source/filter/ww8, where
the DOC / DOCX one already is.
I’m aware that it would be possible to use XSL transformations to create
a new RTF filter as well, but not ignoring performance aspects and my
(quite weak) XSLT knowledge, I decided to create a C++ filter.
2.1.5. References
2.2. Implementation timeline
2.2.1. Milestones
-
Update the development environment to the dev300_mXX milestone on
which I will base my work, decide where do I publish my code, work out
other infrastructure details, finally start reading at least the
important parts of the RTF specification. (Till 2010.05.31.)
-
Understand how the already existing UNO based filters work while
continuing to read the specification. (Till 2010.06.07.)
-
Understand how the current RTF export filter works - at this point all
the important parts of the RTF specification should be read as well.
(Till 2010.06.14.)
-
Decide how do I test my code, possibly write testcases and/or work out
a mechanism to compare the output of the new filter to the output of
the old one. The more automated way, the better. Partly to ensure I do
not break anything, and partly to provide an objective method to measure
my progress. (Till 2010.06.21.)
-
Do the actual porting. At first round I want to reuse code from
SwRTFWriter where possible and reach a point where the new filter is
as much good as the old one was. (Till 2010.07.05.)
-
Review what was reached, if there are differences which are decided to
be good ones, document them. Decide what to work on next, etc.
(Till 2010.07.12.)
-
Fix bugs, fine-tune, update the code based on suggestions from other
OpenOffice developers, implement features from the specification if
time permits. (Till 2010.08.08.)
So I would like the first version of the new RTF filter ready by the
time of mid-term evaluation, then I can work out the minor problems,
write documentation and if time permits - implement new features from
the specification in the second half of the summer. I used concrete
dates so that I can be checked easily if I’m on track, but of course I
may finish with a given part a bit earlier or I may have a little delay.
2.2.2. Exams, holidays, etc.
I’m a student in Hungary, and here the exam period is between
2010.05.25 and 2010.06.21. Sadly this has a big overlap with the
Summer of Code, which starts on 2010.05.24. I’m aware of this and last
year I managed to get over this as well. I plan to work on my project 40
hours a week, which permits to spend entire days on my project, then
work nothing a few days before the exam. I don’t know the exact date,
but probably after the mid-term evaluation I’ll have a week of holiday.
I think once I returned I can work more productive on my project with a
fresh mind, compared to going nowhere during the whole summer.
2.2.3. Communication
My experience is that IRC is handy for quick questions, but basically
the primary protocol is email. If this is OK for my mentor, then I would
like to use these two for this project as well.
2.2.4. Invested time before, during and after SoC
I would like to start with trivial patches or minor bugfixes before the
SoC starts, so when I start to work on my project, hopefully my name
will not be unknown on the ooo-build list (right now I think I’m only
known because of packaging issues, nothing else). I plan to work 40
hours a week during the SoC. (Well, basically. I do not use a timer, the
average is about something like this.) I don’t know the future, I hope I
can do some contribution after SoC as well.
2.2.5. Future work
I don’t know anything about it so far, but I’m sure there will be new
ideas during the implementation of the project. I’m sure I’ll at least
document these ideas once the project is finished, so I or others can
work on them after the end of SoC.
2.3. Relevant knowledge
2.3.1. Experience of OpenOffice.org
As every average Linux user - I’m using OpenOffice.org on a daily basis,
to read/write/convert any non-ascii document. I must admit that I rarely
write anything in Calc or Impress, but I use Writer for writing as well.
So I’m familiar with OpenOffice as a user, but I did not contribute "real
code" patches to ooo-build itself so far, not counting build bits.
2.3.2. Experience in the project specific area
I expect that I’ll need to write C++ code during the project. I
contributed to various projects written in that language, most notably I
added
director
support for PHP to SWIG last summer, which was implemented in about 800
lines of C++ code.
I’m a Linux guy, I used OS X for a long time (but not currently), mainly
for testing purposes as well. I do not really use Windows, but I have
one installed in a virtual machine (can be useful when I want to see if
MSO imports the RTF file I created :-) ). I’m familiar with gcc and
make, I used autotools for other projects previously as well. I’m
familiar with git as well, but I knew quite little about OpenOffice’s
build system (dmake and friends).
I’m hanging around on #go-oo, and I’m subscribed to the ooo-build
mailing list.
2.4. About Me
2.4.1. Where do I work/study and my interests
Education: Completing an M.Sc. degree in Computer Science (since this
year, February) at Budapest University of Technology and Economics. I
work for SZTAKI (http://www.sztaki.hu/?en, part time - 1 day in a week,
since 2004). I have a page with a few minor projects:
http://vmiklos.hu/projects/
2.4.2. Links
A have a page with links to patches I contributed to other FOSS
projects: http://vmiklos.hu/portfolio/ Mostly minor code and/or
documentation fixes. I contributed a few more complex patches to the
pacman package manager, the bitlbee IM gateway, MPlayer, Git and SWIG
(these are C and C++ projects). I’m generally a bash,
C/C++ and Python guy. I know some perl/Java/erlang/C# as well.
2.4.3. IRC name