【问题解答】Notepad++中正则用[\u4e00-\u9fa5]匹配中文的问题

【问题】

“您好，Notepad++是一个非常优秀的文字处理软件，在使用过程中，我发现用正则表达式[\u4e00-\u9fa5]来匹配中文好象会有问题。

比如：我有一个Ansi编码的txt文件，里面有字母、数字和一些中文，用[\u4e00-\u9fa5]会把Ansi编码的字母、数字也匹配上(我确信这些字母数字只占一个字节且与旁边字节组成的双字节也不在[\u4e00-\u9fa5]范围内)，能请教一下是什么原因吗？？？多谢帮助！”

【问题解答】

1. 参考：

3.4. Notepad++的正则表达式替换和替换

提到的：

[11] How to use regular expressions in Notepad++ (tutorial)

[12] Regular Expressions in SciTE

得知：

http://www.scintilla.org/SciTERegEx.html

[12] \xHH

a backslash followed by x and two hexa digits, becomes the character whose Ascii code is equal to these digits. If not followed by two digits, it is ‘x’ char itself.

http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=Regular_Expressions#Example_1

Non ASCII characters

\xnn

Specify a single chracter with code nn. What this stands for depends on the text encoding. For instance, \xE9 may match an é or a θ depending on the code page in an ANSI encoded document.

\x{nnnn}

Like above, but matches a full 16-bit Unicode character. If the document is ANSI encoded, this construct is invalid.

所以，把你的

[\u4e00-\u9fa5]

改为：

[\x{4e00}-\x{9fa5}]

就可以实现匹配中文了。

转载请注明：在路上 » 【问题解答】Notepad++中正则用[\u4e00-\u9fa5]匹配中文的问题

Post Views: 3,379

与本文相关的文章