【已解决】android中的java的正则的替换:去除掉宏定义中行末的反斜杠

【背景】

希望把,普通宏定义中末尾的:

\回车 换行

即:

MACRO_TEXT :    (('\\' '\r'? '\n') | (~('\r'|'\n')))*;

中的:

('\\' '\r'? '\n')

替换掉。

而之前的做法是,很鲁莽的,直接替换掉反斜杠:

                definedContent = definedContent.replace("\\", ""); // remove mutile line define last's '\'

所以,对于:

#define  __LBL_TXT_l_valve_end_action    "\n Do you want \n to change valve \n low end action?"

则会误操作,把反斜杠干掉了。

所以要去写正则,只替换掉:

('\\' '\r'? '\n')

中的反斜杠。

【解决过程】

1. 参考自己的:

【已解决】Java中的正则表达式(java.util.regex)的替换

但是觉得很麻烦。

2.因为记得,java中字符串,本身的replace中,好像就支持正则的replace的。

3.然后搜了下,找到了:

replaceAll

看到了解释:

replaceAll
public String replaceAll(String regex,
                         String replacement)
Replaces each substring of this string that matches the given regular expression with the given replacement.

An invocation of this method of the form str.replaceAll(regex, repl) yields exactly the same result as the expression

Pattern.compile(regex).matcher(str).replaceAll(repl)

Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string; see Matcher.replaceAll. Use Matcher.quoteReplacement(java.lang.String) to suppress the special meaning of these characters, if desired.

Parameters:
regex – the regular expression to which this string is to be matched
replacement – the string to be substituted for each match
Returns:
The resulting String
Throws:
PatternSyntaxException – if the regular expression’s syntax is invalid
Since:
1.4
See Also:
Pattern

所以去试试:

definedContent = definedContent.replaceAll("\\(\\r)?\\n", ""); // remove mutile line define last's '\'

结果是:

出现异常:

Exception in thread "main" java.util.regex.PatternSyntaxException: Unmatched closing ‘)’ near index 3

\(\r)?\n

   ^

    at java.util.regex.Pattern.error(Unknown Source)

    at java.util.regex.Pattern.compile(Unknown Source)

    at java.util.regex.Pattern.<init>(Unknown Source)

    at java.util.regex.Pattern.compile(Unknown Source)

    at java.lang.String.replaceAll(Unknown Source)

4.改为:

definedContent = definedContent.replaceAll("\\\\r?\\n", ""); // remove mutile line define last's '\'

结果是,

可以替换单行了。

但是当测试多行:

#define  __LBL_TXT_l_valve_end_action    "\n Do you want \n to change valve \n low end action?" \

                                            2nd line test \

                                            end line

ACKNOWLEDGE(__LBL_TXT_l_valve_end_action);

结果还是无法替换掉行末的反斜杠。

5.再去试试:

definedContent = definedContent.replaceAll("\\\r?\n", ""); // remove mutile line define last's '\'

结果是

从:

"\n Do you want \n to change valve \n low end action?" \

                                            2nd line test \

                                            end line

换成:

"\n Do you want \n to change valve \n low end action?" \                                            2nd line test \                                            end line

只是去掉了回车换行,但是反斜杠没有去掉。

6.再回去仔细看看replaceAll的解释,其中的:

Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string; see Matcher.replaceAll. Use Matcher.quoteReplacement(java.lang.String) to suppress the special meaning of these characters, if desired.

貌似和此处有关。

先不管,再去试试:

definedContent = definedContent.replaceAll("\\r?\n", ""); // remove mutile line define last's '\'

结果是,效果如上,还是没去掉行末的反斜杠,只去掉了回车换行。

7.再去试试:

definedContent = definedContent.replaceAll("\\r\n", ""); // remove mutile line define last's '\'

结果是,问题依旧。

8.后来又去试试:

definedContent = definedContent.replaceAll("\\\\\r?\n", ""); // remove mutile line define last's '\'

竟然,可以将:

"\n Do you want \n to change valve \n low end action?" \

                                            2nd line test \

                                            end line

替换为:

"\n Do you want \n to change valve \n low end action?"                                             2nd line test                                             end line

了。

8.后来,发现,此处被替换后的,回车换行,应该保留,所以改为:

definedContent = definedContent.replaceAll("\\\\(\r?\n)", "$1"); // remove line end '\' but retain \r?\n, in multile line define

可以将:

"\n Do you want \n to change valve \n low end action?" \

                                            2nd line test \

                                            end line

替换为:

"\n Do you want \n to change valve \n low end action?"

                                            2nd line test

                                            end line

 

【总结】

表象是:

java中的String的replaceAll中,如果想要表示反斜杠这个字符本身’\’,则要写成四个反斜杠:

\\\\

然后才能识别。

比如上面的,用:

definedContent = definedContent.replaceAll("\\\\\r?\n", ""); // remove mutile line define last's '\'

去匹配:

\,回车(可以有,可以没有),换行

 

根本原因:

没去深究,估计是replaceAll,内部处理,变成:

Pattern.compile(regex).matcher(str).replaceAll(repl)

可能涉及多次的,将字符串,变成正则的pattern,而导致的,本来只需要两个反斜杠:

\\

就可以表示反斜杠字符本身,结果此处需要4个反斜杠才可以。

 

感慨:

java,尤其在正则方面,真的不是一般的垃圾。


【后记】

1.参考:

String.replaceAll with Backslashes error

看到了此处的原因的解释:

As to your actual problem: the \ is an escape character in both the String and in regex. You need to re-escape it as well:

string.replaceAll("\\\\", "\\\\\\\\");

即,此处,传入的字符串,对于反斜杠这个字符本身来说,对于String的话,要加一层escape,变成:

\\

再传递给正则regex,又要加一层escape,所以变成了:

\\\\

2.然后再回头看官网的解释中的:

Pattern.compile(regex).matcher(str).replaceAll(repl)

就更加明白了。

一个是调用matcher时,传入的str:第一次的escape,使得’\’变成’\\’

再一个是Pattern.compile传入的regex:第二次的escape,使得’\\’变成’\\\\’

反正结论是:

java的string和regex,都是不好用。



发表评论

电子邮件地址不会被公开。 必填项已用*标注

无觅相关文章插件,快速提升流量