【已解决】Python中的正则re查找中,从多个匹配的组中获得所有的匹配的值

【问题】

别人遇到的问题:

求正则表达式牛人 怎样获得截获了多次的组的所有子串

Match.group(i)方法说明里说
如果一个组被截获了多次 则 截获了多次的组返回最后一次截获的子串
比如"(\w)*"这样在组后跟数量词就会造成一个分组被截获多次,我想拿到某一组全部被截获的子串,而不仅是最后一次的该怎么做。

【解决过程】

1.参考:

Python regexes: How to access multiple matches of a group?

可知,有三种方法:

(1)去除星号*,然后用re.findall

解释详见:

http://docs.python.org/2/library/re.html#re.findall

re.findall(pattern, string, flags=0)

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.

New in version 1.5.2.

Changed in version 2.4: Added the optional flags argument.

 

具体如何使用可参考我的:

【整理】Python中的re.search和re.findall之间的区别和联系 + re.finall中带命名的组,不带命名的组,非捕获的组,没有分组四种类型之间的区别

 

(2)去除星号*,然后用re.finditer

解释详见:

http://docs.python.org/2/library/re.html#re.finditer

re.finditer(pattern, string, flags=0)

Return an iterator yielding MatchObject instances over all non-overlapping matches for the RE pattern in string. The string is scanned left-to-right, and matches are returned in the order found. Empty matches are included in the result unless they touch the beginning of another match.

New in version 2.2.

Changed in version 2.4: Added the optional flags argument.

 

(3)(对于更复杂的解析任务)用pyparsing

 

【总结】

之前还真没注意到过这个问题。

也还真没去用re.finditer,有空可以去试试。



发表评论

电子邮件地址不会被公开。 必填项已用*标注

无觅相关文章插件,快速提升流量