【整理】python学习心得和体会 v2011-12-31

本帖内容已移至：各种计算机语言简介和总结 – python学习心得和体会

【整理】python学习心得和体会 v2011-12-31

0. 开始之前，先列出一些，觉得值得看的教程：

再次提醒，遇到不懂的，先去看这个教程：

Python基础篇

http://www.tsnc.edu.cn/default/tsnc_wgrj/doc/python/basic.htm#id2878558

下面是一些自己在学习，使用Python期间，遇到的各种函数，语法，模块等内容，供参考。

1.学习python的本质就是学习对应的各种模块的用法

python中有很多其他的库，帮你实现各种需要的功能，而你要做的事情，就是import对应的库，然后调用对应的函数即可。

而利用python去编程，去实现一定的功能，更多的层面上，除了学习完python的基本语法之后，就是对各种库，各种模块的如何使用上了，可以极大地提高你做事情的效率。

2. Python的BeautifulSoup模块，可以帮助你实现HTML和XML的解析

比如我这里的想要实现博客搬家之前，想要抓取对应的博客中的内容，就需要先去打开一个URL地址，去解析其中的内容，找到第一个固定链接，然后一点点分析HTML中的内容，抓去下来，导出wordpress所需要的xml文件等。

这其中对于HTML的分析，就可以利用BeautifulSoup这个模块了。其功能很是强大。

（1）更多内容参见这里：

Beautiful Soup 中文文档

其中，原先链接：

http://www.crummy.com/software/BeautifulSoup/documentation.zh.html#contents

（2）这是BeautifulSoup的官网，其中可以下载到最新的版本：

http://www.crummy.com/software/BeautifulSoup/

3.利用OptionParser库中的add_option添加脚本参数和帮助信息

在写脚本的时候，可以利用add_option去添加对应的参数解析以及帮助信息，而剩下的事情，如何去解析对应的参数和显示帮助信息，都是由OptionParser自动完成了。

参考如下内容：

from optparse import OptionParser

def main():
    #main procedure begin
    parser = OptionParser()
    parser.add_option("-s","--source",action="store", type="string",dest="srcURL",help="source msn/live space address")
    parser.add_option("-f","--startfrom",action="store", type="string",dest="startfromURL",help="a permalink in source msn/live space address for starting with, if this is specified, srcURL will be ignored.")
    parser.add_option("-x","--proxy",action="store",type="string",dest="proxy",help="http proxy server, only for connecting live space.I don't know how to add proxy for metaWeblog yet. So this option is probably not useful...")

    (options, args) = parser.parse_args()
    #export all options variables
    for i in dir(options):
        exec i+" = options."+i
    #add proxy
    if proxy:
        XXX
    if startfromURL :
        XXX
    elif srcURL:
        XXX
    else:
        logging.error("错误XXX")
        sys.exit(2)

然后我们就可以在运行脚本的时候，添加参数了，比如

hi-baidu-mover_v20111211.py -s http://hi.baidu.com/recommend_music/blog

想要查看帮助信息，就是常见的-h或–help：

hi-baidu-mover_v20111211.py -h

更多例子和解释，可以参考这里：

python模块学习——optparse
http://www.cnblogs.com/captain_jack/archive/2011/01/11/1933366.html

4.将py文件编译成pyc

参考这里：

http://hi.baidu.com/%C1%AC%BF%B419%BC%AF/blog/item/2e3197dd8c209be476c63825.html

启动Python的IDE – IDLE (Python GUI)

然后在里面输入：

import py_compile

回车后再输入：

py_compile.compile(r”E:\dev_root\Python25\Lib\sgmllib.py”)

就可以将对应的py文件编译成pyc了，生成的sgmllib.pyc在同目录下。

5.re模块中search，find等之后所得到的匹配字符串后用group的含义 + re模块的基本语法

(1)re模块的search的含义和用法及查找后group的含义

参考这里：

http://www.tutorialspoint.com/python/python_reg_expressions.htm

Match Object Methods	Description
group(num=0)	This methods returns entire match (or specific subgroup num)
groups()	This method return all matching subgroups in a tuple (empty if there weren’t any)

知道了，原来group(0)，是所有匹配的内容，而group(N)指的是原先subgroup子组对应的内容，而subgroup是原先search等规则中，用括号()所括起来的。

举例1：

#!/usr/bin/python
import re
line = "Cats are smarter than dogs";
matchObj = re.search( r'(.*) are(\.*)', line, re.M|re.I)
if matchObj:
  print "matchObj.group() : ", matchObj.group()
  print "matchObj.group(1) : ", matchObj.group(1)
  print "matchObj.group(2) : ", matchObj.group(2)
else:
  print "No match!!"

输出是：

matchObj.group(): Cats are
matchObj.group(1) : Cats
matchObj.group(2) :

举例2：字符串：

var pre = [false,”, ”,’\/recommend_music/blog/item/.html’];

然后去search：

match = re.search(r”var pre = \[(.*?),.*?,.*?,'(.*?)’\]”, page, re.DOTALL | re.IGNORECASE | re.MULTILINE)print “match(0)=”, match.group(0),”match(1)=”,match.group(1),”match(2)=”,match.group(2),”match(3)=”,match.group(3)

得到的输出是：

match(0)= var pre = [false,'', '','\/recommend_music/blog/item/.html']
match(1)= false
match(2)= \/recommend_music/blog/item/.html
match(3)=

（2）re模块中的语法总结

关于re模块的基本语法，简单总结如下：

．	匹配任意字符
［］	用来匹配一个指定的字符类别，所谓的字符类别就是你想匹配的一个字符集，对于字符集中的字符可以理解成或的关系
^	(1)对于字符串，表示字符串的开头； (2)对于^加上一个其他数字或字符，表示取反。比如，[^5]表示除了5之外的任意字符。[^^]表示除了^字符之外任意字符。
$	匹配字符串的末尾，或者匹配换行之前的字符串末尾
*	对于前一个字符重复0到无穷次
+	对于前一个字符重复1到无穷次
?	对于前一个字符重复0到1次
{m,n}	对于前一个字符重复次数在为m到n次. 其中: {0,}  == * {1,}  == {0,1} == ? {m} 对于前一个字符重复m次

A	匹配字符串的开头
b	匹配一个空字符（仅对一个单词word的开始或结束有效）
B	与b含义相反
d	匹配任何十进制数；它相当于类 [0-9]
D	匹配任何非数字字符；它相当于类 [^0-9]
s	匹配任何空白字符；它相当于类 [ tnrfv]
S	匹配任何非空白字符；它相当于类 [^ tnrfv]
w	匹配任何字母数字字符；它相当于类 [a-zA-Z0-9_]
W	匹配任何非字母数字字符；它相当于类 [^a-zA-Z0-9_]
Z	匹配字符串的结尾

（3）re模块的findall的模式（pattern）中是否加括号的区别

关于search的结果，（1）中已经解释过了。

下面详细给出关于findall中，对于pattern中，加括号，与不加括号，所查找到的结果的区别。

其中加括号，表示（）内的匹配的内容为一组，供得到结果，通过group（N）所获取的到，N从0开始。

下面是详细测试结果，看结果，就明白是否加括号之间的区别了：

# here blogContent contains following pic url link:
# http://hiphotos.baidu.com/againinput_tmp/pic/item/069e0d89033b5bb53d07e9b536d3d539b400bce2.jpg
# http://hiphotos.baidu.com/recommend_music/pic/item/221ebedfa1a34d224954039e.jpg
# following is test result:
pic_pattern_no_parenthesis = r'http://hiphotos.baidu.com/\S+/[ab]{0,2}pic/item/[a-zA-Z0-9]{24,40}\.\w{3}'
picList_no_parenthesis = re.findall(pic_pattern_no_parenthesis, blogContent) # findall result is a list if matched
print 'findall no()=',picList_no_parenthesis
print 'findall no() len=',len(picList_no_parenthesis)
#print 'findall no() group=',picList_no_parenthesis.group(0) # -> cause error
pic_pattern_with_parenthesis = r'http://hiphotos.baidu.com/(\S+)/([ab]{0,2})pic/item/([a-zA-Z0-9]+)\.([a-zA-Z]{3})'
picList_with_parenthesis = re.findall(pic_pattern_with_parenthesis, blogContent) # findall result is a list if matched
print 'findall with()=',picList_with_parenthesis
print 'findall with() len=',len(picList_with_parenthesis)
#print 'findall with() group(0)=',picList_with_parenthesis.group(0) # -> cause error
#print 'findall with() group(1)=',picList_with_parenthesis.group(1) # -> cause error
print 'findall with() [0][0]=',picList_with_parenthesis[0][0]
print 'findall with() [0][1]=',picList_with_parenthesis[0][1]
print 'findall with() [0][2]=',picList_with_parenthesis[0][2]
print 'findall with() [0][3]=',picList_with_parenthesis[0][3]
#print 'findall with() [0][4]=',picList_with_parenthesis[0][4] # no [4] -> cause error

测试结果：

findall no()= [u'http://hiphotos.baidu.com/againinput_tmp/pic/item/069e0d89033b5bb53d07e9b536d3d539b400bce2.jpg', u'http://hiphotos.baidu.com/recommend_music/pic/item/221ebedfa1a34d224954039e.jpg']
findall no() len= 2
findall with()= [(u'againinput_tmp', u'', u'069e0d89033b5bb53d07e9b536d3d539b400bce2', u'jpg'), (u'recommend_music', u'', u'221ebedfa1a34d224954039e', u'jpg')]
findall with() len= 2
findall with() [0][0]= againinput_tmp
findall with() [0][1]=
findall with() [0][2]= 069e0d89033b5bb53d07e9b536d3d539b400bce2
findall with() [0][3]= jpg

(4) 使用re.search需要注意的事情

pattern = re.compile(r'HTTP Error ([0-9]{3}):.*')
matched = re.search(pattern, errStr)
if matched : #注意，此处运行时候会直接出错！！！因为search查找后，应该用matched.group(0),matched.group(1)等方式查看查找出来的结果
    print 'is http type error'
    isHttpError = True
else :
    print 'not http type error'
    isHttpError = False

用re.search后，想要查看结果，如果直接用返回值matched的话，运行的时候会直接出错！！！因为search查找后，应该用matched.group(0),matched.group(1)等方式查看查找出来的结果。这点，需要特别注意。

【后记】

后来的测试结果表明上面的判断是错误的。上面的错误实际上是由于当时search的时候所传入的参数errStr实际上是个对象类型，而不是普通的str或者unicode字符类型。所以导致上面的search会直接运行出错。而如果在search之前，用errStr = str(errStr)后，search的结果，则是可以直接拿来判断是否为空，或者用来打印的。相应的打印出来的结果，是类似这样的： matched= <_sre.SRE_Match object at 0x02B4F1E0> 而对应的，matched.group(0)是对应的匹配此次查找的全部的字符： HTTP Error 500: ( The specified network name is no longer available. )

【总结】

在调用类似于re.search等函数的时候，要确保传入的所要查找的变量，是字符类型（str或者是unicode），否则，像我这里，传入的是一个对象，而不是字符，就会导致运行出错了。

6.python的第三方类库

一定要好好看看这个

http://www.elias.cn/Python/HomePage

中的“3.3  常用第三方类库”部分。

7.关于python的中文文档

这里：

http://www.elias.cn/Python/HomePage

介绍很多的资源。<br />其中就有python教程的中文版：

http://wiki.woodpecker.org.cn/moin/March_Liu/PyTutorial

更多内容，可以去这里找到：Python

转载请注明：在路上 » 【整理】python学习心得和体会 v2011-12-31

Post Views: 1,497

与本文相关的文章