最新消息:20210816 当前crifan.com域名已被污染,为防止失联,请关注(页面右下角的)公众号

【未完全解决】找到合适的Python中XML库,可以方便的读写创建XML文件(使得支持格式化输出,CDATA,xml声明,xxx:xxx类型的tag等)

Python crifan 3130浏览 0评论

【问题】

由于功能上的需要,想要找一个合适的Python的XML的库,可以方便的创建XML,读取,写入XML中的值。

【解决过程】

1.先去看了下Python 2.7自带的文档,关于xml,找到了:

19. Structured Markup Processing Tools

其中列出了很多个库,看得很晕。不过后来导致在这里:

4 Package Overview,看到了对这些库的简要介绍,但是还是没看到说有哪个库,可以方便创建和操作xml的。

2.看到了:Python’s XML Tools,中看到了对SAX的解释,但是却好像只看懂如何读取解析xml,没看到如何创建xml。剩余部分内容,也没看到有效的例子。

3.在这里:The lxml.etree Tutorial看到了例子,但是貌似很不方便操作的说。

以及在这里:4. Creating a new XML document的创建xml的例子,也是创建的html,不方便参考。

4.在:Creating XML With Python倒是看到了真实有效的创建xml的例子。

可以参考试试。

5.另外找到了几个创建xml的帖子,有空可以参考看看:

使用PYTHON创建XML文档

Python 创建 XML文件

后来找到个很好的例子:

Creating XML Documents

最值得参考。

有空去试试。

6.后来,折腾了半天,终于是可以生成xml文件了。

但是输出的时候,由于ElementTree(xxx).write无法在输出到文件的时候,格式化(prettify,所以不得不通过规避的办法,先借用prettify,然后再替换prettify时候无法设置的xml_declaration,然后再输出到xml文件,算是暂时可以实现输出到xml文件,且格式化的了。

虽然还是很不好用,以及输出的时候,无法控制对于单行的xml的某个tag,不让其换行等等,暂时懒得折腾了,就这样吧。

本来也打算去用lxml的,但是发现其要下载,安装,太繁琐了,就尽量避免使用它了。

目前,还存在很多问题没有解决。包括:

(1)如何支持tag是content:encoded的这种。

(2)如何有效地支持CDATA

 

暂时的代码,先贴在这里,有空继续折腾:

#from xml.etree.ElementTree import Element, SubElement, Comment, tostring, ElementTree;
from xml.etree.ElementTree import Element, SubElement, ElementTree, Comment, tostring
from xml.dom import minidom;

#------------------------------------------------------------------------------
def prettify(elem):
    """Return a pretty-printed XML string for the Element.
    """
    #rough_string = ElementTree.tostring(elem, 'utf-8');
    rough_string = tostring(elem, 'utf-8');
    print "rough_string=",rough_string;
    reparsed = minidom.parseString(rough_string);
    return reparsed.toprettyxml(indent="\t");

#------------------------------------------------------------------------------
#prettify an existed xml file
def prettifyXmlFile(xmlFile, encoding="UTF-8"):
    print "before re-write:",xmlFile;
    #oooooooooooooooooooooooooooooooooooooooooooooooo
    tmpRoot = Element('rss');
    rootElement = ElementTree(tmpRoot).parse(source=xmlFile);
    print "rootElement=",rootElement;
    print "parse xml file OK";
    prettifiedXmlStr = prettify(rootElement);
    # 'a+': read,write,append
    # 'w' : clear before, then write
    outputFile = codecs.open(xmlFile, 'w', encoding);
    outputFile.write(prettifiedXmlStr);
    outputFile.close();
    print "after prettify xml file, rewrite OK";
    return;

#------------------------------------------------------------------------------
def testXml():
    #from ElementTree_pretty import prettify;

    print "############################"
    top = Element('channel');

    comment = Comment('Generated for PyMOTW')
    top.append(comment)

    child = SubElement(top, 'title')
    print "dir(child)=",dir(child);
    print "child.tag=",child.tag;
    print "child.tail=",child.tail;
    #child.text = 'This child contains text.'
    child.text = u'音乐天堂'
    
    item1 = SubElement(top, 'item')
    item1Tit = SubElement(item1, "title");
    item1Tit.text = u"关于本博客的介绍";
    comment1 = Comment('http://hi.baidu.com/recommend_music/blog/item/f36b071112416ac3a6ef3f0e.html')
    item1Tit.append(comment1)
    
    #item1Content = SubElement(item1, "content:encoded");
    #item1Content = SubElement(item1, "content");
    #item1Content.text = u'<![CDATA[<p><span style="background-color: #ffff99;">----------------------------------搬家声明--------------------------------------</span></p><p><span style="background-color: #ffff99;">本博客已搬家至个人网站 </span><a href="https://www.crifan.com/" target="_blank"><strong><span style="background-color: #ffff99; color: #ff0000;">在路上 - On the way</span></strong></a><span style="background-color: #ffff99;">&nbsp;下面的&nbsp;<strong><a href="https://www.crifan.com/category/recommend_music/" target="_blank"><span style="color: #ff0000;">音乐</span></a></strong>&nbsp;分类。</span></p><p><span style="background-color: #ffff99;">你可以通过点击&nbsp;<strong><a href="https://www.crifan.com/?s=%E5%85%B3%E4%BA%8E%E6%9C%AC%E5%8D%9A%E5%AE%A2%E7%9A%84%E4%BB%8B%E7%BB%8D&amp;submit=Search" target="_blank"><span style="color: #ff0000;">关于本博客的介绍</span></a></strong>&nbsp;找到当前帖子的新地址。</span></p><p><span style="background-color: #ffff99;">----------------------------------搬家声明--------------------------------------</span></p><p></p><p><span style="background-color: #ffff99;"></span></p>本博客,主要是为了记录我之前和之后喜欢的歌曲,并推荐给大家。也算给自己的音乐旅程做个记录。欢迎大家把好歌拿出来一起分享。]]>';
    
    item2 = SubElement(top, 'item')
    item2Tit = SubElement(item2, "title");
    item2Tit.text = u"【歌曲推荐】空位 - 纪如璟";
    comment2 = Comment('http://hi.baidu.com/recommend_music/blog/item/48696db15cdb5551082302e0.html')
    item2Tit.append(comment2)
    
    child_with_tail = SubElement(top, 'child_with_tail')
    child_with_tail.text = 'This child has regular text.'
    child_with_tail.tail = 'And "tail" text.'

    child_with_entity_ref = SubElement(top, 'child_with_entity_ref')
    child_with_entity_ref.text = 'This & that'

    #print tostring(top)
    #print prettify(top)
    
    print "type(top)=",type(top);
    prettifiedXmlStr = prettify(top);
    print "type(prettifiedXmlStr)=",type(prettifiedXmlStr);
    prettifiedXmlStr = re.compile(r'<\?\s*xml\s+version="1.0".*?\?>').sub("", prettifiedXmlStr);
    print "prettifiedXmlStr=",prettifiedXmlStr;
    xmlHeader = """<?xml version="1.0" encoding="UTF-8" ?>

<!--
    This is a WordPress eXtended RSS file generated by https://www.crifan.com as an export of 
    your blog. It contains information about your blog's posts, comments, and 
    categories. You may use this file to transfer that content from one site to 
    another. This file is not intended to serve as a complete backup of your 
    blog.
    
    To import this information into a WordPress blog follow these steps:
    
    1.    Log into that blog as an administrator.
    2.    Go to Manage > Import in the blog's admin.
    3.    Choose "WordPress" from the list of importers.
    4.    Upload this file using the form provided on that page.
    5.    You will first be asked to map the authors in this export file to users 
        on the blog. For each author, you may choose to map an existing user on 
        the blog or to create a new user.
    6.    WordPress will then import each of the posts, comments, and categories 
        contained in this file onto your blog.
-->

<!-- generator="https://www.crifan.com" created="2012-05-22 09:38"-->
"""

    testFile = "testXml.xml";
    #ElementTree(top).write(testFile, encoding="UTF-8", xml_declaration='version="1.0" encoding="UTF-8"', method="xml");
    
    outputFile = codecs.open(testFile, 'w', 'utf-8');
    outputFile.write(xmlHeader);
    outputFile.write(prettifiedXmlStr);
    outputFile.close();
    
    #prettifyXmlFile(testFile);

    print "############################"
    
    llllllllllllllllllllllllllllllllllllllllllll

 

不过想要感慨的是,python中支持xml的库虽然多,但多数都是如何解析xml的,对于如何写入,输出到文件,且自动格式化,CDATA等方面的支持,真的很搓,很不方便。

其中,对于格式化输出,有人提交了bug:

xml.etree.ElementTree: add feature to prettify XML output

也由于优先级低,而暂时还是没有实现。悲剧。。。

转载请注明:在路上 » 【未完全解决】找到合适的Python中XML库,可以方便的读写创建XML文件(使得支持格式化输出,CDATA,xml声明,xxx:xxx类型的tag等)

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
82 queries in 0.170 seconds, using 22.02MB memory