最新消息:20210816 当前crifan.com域名已被污染,为防止失联,请关注(页面右下角的)公众号

【记录】将docbook的xml源码,通过xsltproc和FOP生成(可用word打开的)RTF(Word兼容)格式

Docbook crifan 2954浏览 0评论

【背景】

已经可以用Docbook生成:

单个HTML【记录】写docbook 5.0的book的过程中的问题及解决方案
【记录】另一次的Docbook 5.0的折腾过程
多个HTML【已解决】把Docbook默认输出的单个HTML文件,改为输出多个的HTML文件
PDF【已解决】docbook中pdf中没有出现/显示revhistory所对应的历史版本
TEXT纯文本【已解决】用Docbook生成纯文本格式
微软帮助文件CHM【全部解决】用Docbook生成htmlhelp + 【完全解决】生成的chm中标题和左边的索引目录是乱码的问题

现在想要用Docbook生成RTF,这样就可以用word打开了。

如果方便,再继续折腾,可以生成和word 2003,以及word 2007兼容的word文档。

【折腾过程】

1.这里:Installing docbook,倒是介绍了如何用openjade转换docbook为rtf的。

但是我此处是需要用xsltproc转换出rtf的。

2.看到这里:The New Way: XSLTProc and FOP提到,好像是FOP支持输出rtf的。

所以看来应该去找fop rtf相关的资料。

3.看到这里:Transition Docbook,用xsltproc利用html的xsl转换xml为rtf,所以去试试。

但是明显不靠谱,还是用的html的xsl,所以肯定生成的还是单一的html,然后只是改了个后缀为rtf而已。

最多也就算是word软件支持打开html为word的文档而已。

4.后来还是自己去找了fop的用法:

CLi@PC-CLI-1 ~/develop/docbook/books/VBR/VBR/src
$ D:/tmp/tmp_dev_root/cgwin/home/CLi/develop/docbook/tools/fop/fop.cmd -h

USAGE
fop [options] [-fo|-xml] infile [-xsl file] [-awt|-pdf|-mif|-rtf|-tiff|-png|-pcl|-ps|-txt|-at [mime]|-print] <outfile>
 [OPTIONS]
  -version          print FOP version and exit
  -d                debug mode
  -x                dump configuration settings
  -q                quiet mode
  -c cfg.xml        use additional configuration file cfg.xml
  -l lang           the language to use for user information
  -r                relaxed/less strict validation (where available)
  -dpi xxx          target resolution in dots per inch (dpi) where xxx is a number
  -s                for area tree XML, down to block areas only
  -v                run in verbose mode (currently simply print FOP version and continue)

  -o [password]     PDF file will be encrypted with option owner password
  -u [password]     PDF file will be encrypted with option user password
  -noprint          PDF file will be encrypted without printing permission
  -nocopy           PDF file will be encrypted without copy content permission
  -noedit           PDF file will be encrypted without edit content permission
  -noannotations    PDF file will be encrypted without edit annotation permission
  -a                enables accessibility features (Tagged PDF etc., default off)
  -pdfprofile prof  PDF file will be generated with the specified profile
                    (Examples for prof: PDF/A-1b or PDF/X-3:2003)

  -conserve         Enable memory-conservation policy (trades memory-consumption for disk I/O)
                    (Note: currently only influences whether the area tree is serialized.)

 [INPUT]
  infile            xsl:fo input file (the same as the next)
                    (use '-' for infile to pipe input from stdin)
  -fo  infile       xsl:fo input file
  -xml infile       xml input file, must be used together with -xsl
  -atin infile      area tree input file
  -ifin infile      intermediate format input file
  -imagein infile   image input file (piping through stdin not supported)
  -xsl stylesheet   xslt stylesheet

  -param name value <value> to use for parameter <name> in xslt stylesheet
                    (repeat '-param name value' for each parameter)

  -catalog          use catalog resolver for input XML and XSLT files
 [OUTPUT]
  outfile           input will be rendered as PDF into outfile
                    (use '-' for outfile to pipe output to stdout)
  -pdf outfile      input will be rendered as PDF (outfile req'd)
  -pdfa1b outfile   input will be rendered as PDF/A-1b compliant PDF
                    (outfile req'd, same as "-pdf outfile -pdfprofile PDF/A-1b")
  -awt              input will be displayed on screen
  -rtf outfile      input will be rendered as RTF (outfile req'd)
  -pcl outfile      input will be rendered as PCL (outfile req'd)
  -ps outfile       input will be rendered as PostScript (outfile req'd)
  -afp outfile      input will be rendered as AFP (outfile req'd)
  -tiff outfile     input will be rendered as TIFF (outfile req'd)
  -png outfile      input will be rendered as PNG (outfile req'd)
  -txt outfile      input will be rendered as plain text (outfile req'd)
  -at [mime] out    representation of area tree as XML (outfile req'd)
                    specify optional mime output to allow the AT to be converted
                    to final format later
  -if [mime] out    representation of document in intermediate format XML (outfile req'd)
                    specify optional mime output to allow the IF to be converted
                    to final format later
  -print            input file will be rendered and sent to the printer
                    see options with "-print help"
  -out mime outfile input will be rendered using the given MIME type
                    (outfile req'd) Example: "-out application/pdf D:\out.pdf"
                    (Tip: "-out list" prints the list of supported MIME types)
  -svg outfile      input will be rendered as an SVG slides file (outfile req'd)
                    Experimental feature - requires additional fop-sandbox.jar.

  -foout outfile    input will only be XSL transformed. The intermediate
                    XSL-FO file is saved and no rendering is performed.
                    (Only available if you use -xml and -xsl parameters)

 [Examples]
  fop foo.fo foo.pdf
  fop -fo foo.fo -pdf foo.pdf (does the same as the previous line)
  fop -xml foo.xml -xsl foo.xsl -pdf foo.pdf
  fop -xml foo.xml -xsl foo.xsl -foout foo.fo
  fop -xml - -xsl foo.xsl -pdf -
  fop foo.fo -mif foo.mif
  fop foo.fo -rtf foo.rtf
  fop foo.fo -print
  fop foo.fo -awt

可以看到,其生成rtf和pdf类似,只需要把pdf改成rtf即可。

所以去试试:

XML_CATALOG_FILES="/home/CLi/develop/docbook/config/catalog/catalog.xml" \
    XML_DEBUG_CATALOG=1 \
    xsltproc.exe --xinclude -o ../output/fo/MPEG_VBR.fo docbook_fo_crl.xsl MPEG_VBR.xml

D:/tmp/tmp_dev_root/cgwin/home/CLi/develop/docbook/tools/fop/fop.cmd -c D:/tmp/tmp_dev_root/cgwin/home/CLi/develop/docbook/config/fop/conf/fop.xconf ../output/fo/MPEG_VBR.fo -rtf ../output/rtf/MPEG_VBR.rtf

执行结果是,虽然最后fop有很多warning:

 

May 10, 2012 3:02:59 PM org.apache.fop.events.LoggingEventListener processEvent
WARNING: Only simple-page-masters are supported on page-sequences. Using default simple-page-master from page-sequence-master "lot". (See position 10:2150)
May 10, 2012 3:02:59 PM org.apache.fop.events.LoggingEventListener processEvent
WARNING: Only simple-page-masters are supported on page-sequences. Using default simple-page-master from page-sequence-master "front". (See position 10:79508)
May 10, 2012 3:02:59 PM org.apache.fop.events.LoggingEventListener processEvent
WARNING: Only simple-page-masters are supported on page-sequences. Using default simple-page-master from page-sequence-master "body". (See position 10:91114)
May 10, 2012 3:03:02 PM org.apache.fop.events.LoggingEventListener processEvent
WARNING: Only simple-page-masters are supported on page-sequences. Using default simple-page-master from page-sequence-master "back". (See position 97:285)

但是最后还是可以生成对应的rtf文件的,然后用word打开后,总体效果,还是可以的,包含表格的显示,表格中背景色,格式的背景色,表格标题背景色等,都显示很正常。

不过,有很多细节部分,还是有待改善:

1.没有左边的导航,即书签:没有书签

2.字体是msyh,不能真正使用到微软雅黑字体:

字体msyh 没能真正的使用微软雅黑

3.目录的页号显示不正常:

目录的页号显示不正常

4.很多的索引部分都乱了:

索引编号部分都乱了

5.源码中emphasis的强调加粗部分,也没有显示:

强调 加粗 部分也没有显示

6.表格中,有时候会多出几行空行:

表格中有时候会多出几行空行

但是不管怎么说,还是可以成功生成rtf的word文档的。

【总结】

想要让docbook输出rtf文档,只需要在之前输出pdf的基础上(先用xsltproc输出fo,再用fop将fo转换为pdf),利用fop将fo转换为rtf即可。

我此处具体用到的命令为:

XML_CATALOG_FILES=”/home/CLi/develop/docbook/config/catalog/catalog.xml” \XML_DEBUG_CATALOG=1 \

xsltproc.exe –xinclude -o ../output/fo/MPEG_VBR.fo docbook_fo_crl.xsl MPEG_VBR.xml

D:/tmp/tmp_dev_root/cgwin/home/CLi/develop/docbook/tools/fop/fop.cmd -c D:/tmp/tmp_dev_root/cgwin/home/CLi/develop/docbook/config/fop/conf/fop.xconf ../output/fo/MPEG_VBR.fo -rtf ../output/rtf/MPEG_VBR.rtf

剩下的一些rtf中的小问题,就有待后续一点点的再去解决了。

转载请注明:在路上 » 【记录】将docbook的xml源码,通过xsltproc和FOP生成(可用word打开的)RTF(Word兼容)格式

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
82 queries in 0.172 seconds, using 22.44MB memory