【问题】
python中,使用正则期间,用如下代码:
#http://autoexplosion.com/cars/buy/150594.php foundMainType = re.search("http://autoexplosion\.com/(?<mainType>\w+)/buy/(?<adId>\d+)\.php", itemLink);
结果出错:
Traceback (most recent call last): 80, in <module> main(); File "E:\Dev_Root\freelance\Elance\projects\40377988_data_mining\40377988_data_mining\40377988_data_mining.py", line 3 04, in main itemInfoDict = processEachItem(itemLink); File "E:\Dev_Root\freelance\Elance\projects\40377988_data_mining\40377988_data_mining\40377988_data_mining.py", line 1 83, in processEachItem foundMainType = re.search("http://autoexplosion\.com/(?<mainType>\w+)/buy/(?<adId>\d+)\.php", itemLink); File "E:\dev_install_root\Python27\lib\re.py", line 142, in search return _compile(pattern, flags).search(string) File "E:\dev_install_root\Python27\lib\re.py", line 244, in _compile raise error, v # invalid expression sre_constants.error: syntax error |
【解决过程】
1.调试了半天,结果也还是没找到错误的原因。
2.后来去看了re的语法,才发现是:
Similar to regular parentheses, but the substring matched by the group is accessible within the rest of the regular expression via the symbolic group name name. Group names must be valid Python identifiers, and each group name must be defined only once within a regular expression. A symbolic group is also a numbered group, just as if the group were not named. So the group named id in the example below can also be referenced as the numbered group 1. For example, if the pattern is (?P<id>[a-zA-Z_]\w*), the group can be referenced by its name in arguments to methods of match objects, such as m.group('id') or m.end('id'), and also by name in the regular expression itself (using (?P=id)) and replacement text given to .sub() (using \g<id>). |
即,是:
(?P<xxx>…)
而不是:
(?<xxx>…)
所以,改为:
#http://autoexplosion.com/cars/buy/150594.php |
就可以了。
3. 而此处,之所以错写成:
(?<xxx>…)
是因为,
最近写C#程序太多了,写Python程序太少了。。。
注:C#中的正则,named group是(?<xxx>…)
【总结】
看来真是,代码一旦不经常写,之前再熟悉的东西,都可能会忘记,都可能搞混淆的。。。
转载请注明:在路上 » 【已解决】Python中使用re.search时出错:raise error, v # invalid expression, sre_constants.error: syntax error