【已解决】Python中使用re.search时出错:raise error, v # invalid expression, sre_constants.error: syntax error

【问题】

python中,使用正则期间,用如下代码:

#http://autoexplosion.com/cars/buy/150594.php
foundMainType = re.search("http://autoexplosion\.com/(?<mainType>\w+)/buy/(?<adId>\d+)\.php", itemLink);

结果出错:

Traceback (most recent call last):

  File "E:\Dev_Root\freelance\Elance\projects\40377988_data_mining\40377988_data_mining\40377988_data_mining.py", line 3

80, in <module>

    main();

  File "E:\Dev_Root\freelance\Elance\projects\40377988_data_mining\40377988_data_mining\40377988_data_mining.py", line 3

04, in main

    itemInfoDict = processEachItem(itemLink);

  File "E:\Dev_Root\freelance\Elance\projects\40377988_data_mining\40377988_data_mining\40377988_data_mining.py", line 1

83, in processEachItem

    foundMainType = re.search("http://autoexplosion\.com/(?<mainType>\w+)/buy/(?<adId>\d+)\.php", itemLink);

  File "E:\dev_install_root\Python27\lib\re.py", line 142, in search

    return _compile(pattern, flags).search(string)

  File "E:\dev_install_root\Python27\lib\re.py", line 244, in _compile

    raise error, v # invalid expression

sre_constants.error: syntax error

【解决过程】

1.调试了半天,结果也还是没找到错误的原因。

2.后来去看了re的语法,才发现是:

(?P<name>...)

Similar to regular parentheses, but the substring matched by the group is accessible within the rest of the regular expression via the symbolic group name name. Group names must be valid Python identifiers, and each group name must be defined only once within a regular expression. A symbolic group is also a numbered group, just as if the group were not named. So the group named id in the example below can also be referenced as the numbered group 1.

For example, if the pattern is (?P<id>[a-zA-Z_]\w*), the group can be referenced by its name in arguments to methods of match objects, such as m.group('id') or m.end('id'), and also by name in the regular expression itself (using (?P=id)) and replacement text given to .sub() (using \g<id>).

即,是:

(?P<xxx>…)

而不是:

(?<xxx>…)

所以,改为:

#http://autoexplosion.com/cars/buy/150594.php

foundMainType = re.search("http://autoexplosion\.com/(?P<mainType>\w+)/buy/(?P<adId>\d+)\.php", itemLink);

就可以了。

3. 而此处,之所以错写成:

(?<xxx>…)

是因为,

最近写C#程序太多了,写Python程序太少了。。。

注:C#中的正则,named group是(?<xxx>…)

 

【总结】

看来真是,代码一旦不经常写,之前再熟悉的东西,都可能会忘记,都可能搞混淆的。。。



发表评论

电子邮件地址不会被公开。 必填项已用*标注

无觅相关文章插件,快速提升流量