【问题】
折腾:
【已解决】antlr解析双引号出错:MismatchedTokenException(0!=0)
的过程中,去把ID之前的fragment去掉,变成:
//fragment ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* ;
试试,结果报错了:
[15:01:01] error(208): DDParserDemo.g:80:1: The following token definitions can never be matched because prior tokens match the same input: HEX_DIGIT |
【解决过程】
1.意思好像是:
之前已经有了别的token,去实现了现在这个HEX_DIGIT同样的效果了,所以无法正常编译。
应该是去,找到之前是哪个token,实现了HEX_DIGIT的效果。
不过又却是还是可以正常debug,正常compile的。。。
2.后来发现,貌似自己此处少匹配了#include后面的空格,所以改为:
//singleInclude : '#include' '"' ID '"' '.h'; singleInclude : '#include' BLANKS '"' ID '"' '.h';
再去调试:
结果又出现,和上面的同样的错误了:
[15:11:40] error(208): DDParserDemo.g:80:1: The following token definitions can never be matched because prior tokens match the same input: HEX_DIGIT |
但是却始终,无法找到,之前到底是哪个token,和此处的HEX_DIGIT是一样的含义。
3. 只能再去把ID的fragment再加回来:
fragment ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* ;
然后看看结果,就可以正常编译,正常去debug了。
但是竟然又出现之前在:
【基本解决】antlr v3,用包含{$channel=HIDDEN;}语法,结果解析出错:MissingTokenException
见到过的MissingTokenException:
此处,暂时忽略,往后继续调试看看。
4. 结果后面,还是同样的错误,还是
MismatchedTokenException(0!=0)
还是无法识别对应的
#include "std_defs.h"
中的
std_defs
不过,刚又注意到,其实是写错了,应该改为:
//singleInclude : '#include' BLANKS '"' ID '"' '.h'; singleInclude : '#include' BLANKS '"' ID '.h' '"';
然后再调试看看。
此处还是同样错误。
不过看到很诡异的现象是:
对于通过上面的代码,尤其是ID是:
fragment ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* ;
结果,去解析:
#include "std_defs.h" #include "com_tbls.h" #include "rev_defs.h" #include "fbk_hm.h" #include "fdiag_FBK2_Start.h" #include "blk_err.h"
结果却只识别出来部分的ID的内容:
#include "ddef.h" #include "cb.h" #include "edef.h" #include "fb.h" #include "fdaFB2a.h" #include "be.h"
比如,第一个是:
std_defs
只是识别出来:
ddef
很奇怪,没搞懂为何。
5.看起来,像是ID的定义有误?
那么就专门去试试,重新写一个ID的定义。
换成把ID展开:
//singleInclude : '#include' BLANKS '"' ID '.h' '"'; singleInclude : '#include' BLANKS '"' ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* '.h' '"';
看看效果,
结果出现语法错误了:
[15:30:51] error(170): DDParserDemo.g:123:85: the .. range operator isn’t allowed in parser rules |
所以,再去改为:
//singleInclude : '#include' BLANKS '"' ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* '.h' '"'; ID_START : ('a'..'z'|'A'..'Z'|'_'); ID_MIDDLE_END : ('a'..'z'|'A'..'Z'|'0'..'9'|'_'); singleInclude : '#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"';
看看效果,结果又出现语法错误了:
2] error(208): DDParserDemo.g:125:1: The following token definitions can never be matched because prior tokens match the same input: ID_MIDDLE_END |
就去把ID去掉:
/* fragment ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* ; */
试试,结果还是同样语法错误,无法debug。
再改为:
ID_START : ('a'..'z'|'A'..'Z'|'_'); ID_MIDDLE_END : (ID_START | DIGIT); singleInclude : '#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"';
试试,结果同样错误:
[15:43:18] error(208): DDParserDemo.g:126:1: The following token definitions can never be matched because prior tokens match the same input: ID_MIDDLE_END |
总之,都是同一个问题。
6.再改为:
ID_START : 'a'..'z'|'A'..'Z'|'_'; //ID_MIDDLE_END : ID_START | DIGIT; ID_MIDDLE_END : HEX_DIGIT | '_'; singleInclude : '#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"';
结果错误依旧。
7.改为:
ID_START : 'a'..'z'|'A'..'Z'|'_'; //ID_MIDDLE_END : ID_START | DIGIT; //ID_MIDDLE_END : HEX_DIGIT | '_'; //singleInclude : '#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"'; singleInclude : '#include' BLANKS '"' ID_START (ID_START | DIGIT)* '.h' '"';
结果,至少是可以正常编译和调试的。
结果也是很诡异的,虽然是仍然有那个MismatchedTokenException的错误,但是却是可以匹配到所有的,整行的include的语句的:
8.改为:
ID_START : 'a'..'z'|'A'..'Z'|'_'; //ID_MIDDLE_END : ID_START | DIGIT; //ID_MIDDLE_END : HEX_DIGIT | '_'; //singleInclude : '#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"'; //singleInclude : '#include' BLANKS '"' ID_START (ID_START | DIGIT)* '.h' '"'; singleInclude : '#include' BLANKS '"' ID_START (ID_START | DIGIT)+ '.h' '"';
试试,结果和上面是一样的,仍有错误,但是却可以匹配整行:
9.看起来,感觉是,对于变量类型的值,比如之前定义的ID,此处,只能正确匹配到第一个字母,后面的,就无法正常匹配了。
所以,就故意去掉,变成:
ID_START : 'a'..'z'|'A'..'Z'|'_'; //ID_MIDDLE_END : ID_START | DIGIT; //ID_MIDDLE_END : HEX_DIGIT | '_'; //singleInclude : '#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"'; //singleInclude : '#include' BLANKS '"' ID_START (ID_START | DIGIT)* '.h' '"'; //singleInclude : '#include' BLANKS '"' ID_START (ID_START | DIGIT)+ '.h' '"'; singleInclude : '#include' BLANKS '"' ID_START '.h' '"';
试试,结果错误依旧。
10.去给ID_START加fragment:
//ID_START : 'a'..'z'|'A'..'Z'|'_'; fragment ID_START : 'a'..'z'|'A'..'Z'|'_'; //ID_MIDDLE_END : ID_START | DIGIT; //ID_MIDDLE_END : HEX_DIGIT | '_'; //singleInclude : '#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"'; //singleInclude : '#include' BLANKS '"' ID_START (ID_START | DIGIT)* '.h' '"'; singleInclude : '#include' BLANKS '"' ID_START (ID_START | DIGIT)+ '.h' '"'; //singleInclude : '#include' BLANKS '"' ID_START '.h' '"';
试试,结果是,用了fragment,不仅同样错误,还只能匹配到部分的内容:
所以,更不能用了。
11.所以再去,把之前的DIGIT的fragment也去掉:
//fragment DIGIT : '0'..'9';
试试,结果竟然也还是不行,错误依旧,真的不知道到底是什么原因。
12.后来,重新弄个全新的定义:
//ID_START : 'a'..'z'|'A'..'Z'|'_'; //fragment ID_START : 'a'..'z'|'A'..'Z'|'_'; WHOLE_ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*; //ID_MIDDLE_END : ID_START | DIGIT; //ID_MIDDLE_END : HEX_DIGIT | '_'; //singleInclude : '#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"'; //singleInclude : '#include' BLANKS '"' ID_START (ID_START | DIGIT)* '.h' '"'; //singleInclude : '#include' BLANKS '"' ID_START (ID_START | DIGIT)+ '.h' '"'; //singleInclude : '#include' BLANKS '"' ID_START '.h' '"'; singleInclude : '#include' BLANKS '"' WHOLE_ID '.h' '"';
试试,结果,竟然可以正常解析了:
此处,暂时保存一下,当前文件的内容:
grammar DDParserDemo; options { output = AST; ASTLabelType = CommonTree; // type of $stat.tree ref etc... } //NEWLINE : '\r'? '\n' ; //NEWLINE : '\r' '\n' ; fragment NEWLINE : '\r'? '\n' ; /* fragment ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* ; */ fragment FLOAT : ('0'..'9')+ '.' ('0'..'9')* EXPONENT? | '.' ('0'..'9')+ EXPONENT? | ('0'..'9')+ EXPONENT ; COMMENT : '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;} | '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;} ; //fragment WS : ( ' ' | '\t' | '\r' | '\n') {skip();}; //fragment WS : ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;}; WS : ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;}; /* STRING : '"' ( ESC_SEQ | ~('\\'|'"') )* '"' ; */ CHAR: '\'' ( ESC_SEQ | ~('\''|'\\') ) '\'' ; fragment EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ; ESC_SEQ : '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\') | UNICODE_ESC | OCTAL_ESC ; fragment OCTAL_ESC : '\\' ('0'..'3') ('0'..'7') ('0'..'7') | '\\' ('0'..'7') ('0'..'7') | '\\' ('0'..'7') ; fragment UNICODE_ESC : '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT ; //fragment DIGIT : '0'..'9'; //FAKE_TOKEN : '1' '2' '3'; /* DECIMAL_VALUE : '1'..'9' DIGIT*; */ //DECIMAL_VALUE : DIGIT*; DECIMAL_VALUE : DIGIT+; //HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ; HEX_DIGIT : (DIGIT|'a'..'f'|'A'..'F') ; HEX_VALUE : '0x' HEX_DIGIT+; /* fragment HEADER_FILENAME : ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'_')*; */ /* BLANKSPACE_TAB // : (' ' | '\t'){skip();}; : (' ' | '\t') {$channel=HIDDEN;}; */ //fragment BLANK : (' '|'\t')+ {skip();}; //BLANK : (' '|'\t') {skip();}; //BLANK : (' '|'\t'); //BLANK : (' '|'\t') {$channel=HIDDEN;}; //BLANKS : (' '|'\t')+ {$channel=HIDDEN;}; //BLANKS : (' '|'\t')+ {$channel=HIDDEN;}; //BLANKS : (' '|'\t')+; //BLANK : (' '|'\t') {$channel=HIDDEN;}; //BLANK : (' '|'\t') {skip();}; BLANKS : (' '|'\t')+; //BLANKS : (' '|'\t')+ {skip();}; //BLANKS : ' '+ {$channel=HIDDEN;}; //singleInclude : '#include' ' '+ '"' ID '.h"' ; //singleInclude : '#include' ' '+ '"' ID+ '.h"' ; //singleInclude : '#include' ' '+ '"' HEADER_FILENAME '.h"'; //singleInclude : '#include' ' ' '"' HEADER_FILENAME '.h"'; //singleInclude : '#include "' HEADER_FILENAME '.h"'; //fragment singleInclude : '#include' (' ')+ '"' ID '.h"'; //singleInclude : '#include' (' '|'\t')+ '""' ID '.h"'; //singleInclude : '#include' (' '|'\t')+ '"std_defs.h"'; //singleInclude : '#include' BLANKS '"' ID '"' '.h'; //singleInclude : '#include' '"' ID '"' '.h'; //singleInclude : '#include' BLANKS '"' ID '"' '.h'; //singleInclude : '#include' BLANKS '"' ID '.h' '"'; //singleInclude : '#include' BLANKS '"' ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* '.h' '"'; //ID_START : 'a'..'z'|'A'..'Z'|'_'; //fragment ID_START : 'a'..'z'|'A'..'Z'|'_'; WHOLE_ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*; //ID_MIDDLE_END : ID_START | DIGIT; //ID_MIDDLE_END : HEX_DIGIT | '_'; //singleInclude : '#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"'; //singleInclude : '#include' BLANKS '"' ID_START (ID_START | DIGIT)* '.h' '"'; //singleInclude : '#include' BLANKS '"' ID_START (ID_START | DIGIT)+ '.h' '"'; //singleInclude : '#include' BLANKS '"' ID_START '.h' '"'; singleInclude : '#include' BLANKS '"' WHOLE_ID '.h' '"'; //include : singleInclude WS* -> singleInclude; include : singleInclude WS*; //startParse : include* identification+; //startParse : include+ identification+; //startParse : identification+; //startParse : manufacture deviceType deviceRevison ddRevision; startParse : include+ manufacture deviceType deviceRevison ddRevision; //manufacture : 'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*; //manufacture : 'MANUFACTURER'^ (BLANK+! (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*; //manufacture : 'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) ','? WS*; manufacture : 'MANUFACTURER'^ BLANKS (HEX_VALUE | DECIMAL_VALUE) ','? WS*; deviceType : 'DEVICE_TYPE'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*; deviceRevison : 'DEVICE_REVISION'^ BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*; ddRevision : 'DD_REVISION'^ BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*; //identification : definiton WS* (','?)! WS* -> definiton; //definiton : (ID)^ ('\t'!|' '!)+ (DECIMAL_VALUE | HEX_VALUE) //definiton : (ID)^ BLANKSPACE_TAB+ (DECIMAL_VALUE | HEX_VALUE) //definiton : ID ('\t'!|' '!)+ (DECIMAL_VALUE | HEX_VALUE);
13.那么就再回来,一点点改为,单独的定义:
//WHOLE_ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*; WHOLE_ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'| DIGIT)*; //ID_MIDDLE_END : ID_START | DIGIT; //ID_MIDDLE_END : HEX_DIGIT | '_'; //singleInclude : '#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"'; //singleInclude : '#include' BLANKS '"' ID_START (ID_START | DIGIT)* '.h' '"'; //singleInclude : '#include' BLANKS '"' ID_START (ID_START | DIGIT)+ '.h' '"'; //singleInclude : '#include' BLANKS '"' ID_START '.h' '"'; singleInclude : '#include' BLANKS '"' WHOLE_ID '.h' '"';
试试,结果同样可以正常解析。
14.再去分解:
//WHOLE_ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*; //WHOLE_ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'| DIGIT)*; WHOLE_ID : ('a'..'z'|'A'..'Z'|'_') (HEX_DIGIT|'_')*; //ID_MIDDLE_END : ID_START | DIGIT; //ID_MIDDLE_END : HEX_DIGIT | '_'; //singleInclude : '#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"'; //singleInclude : '#include' BLANKS '"' ID_START (ID_START | DIGIT)* '.h' '"'; //singleInclude : '#include' BLANKS '"' ID_START (ID_START | DIGIT)+ '.h' '"'; //singleInclude : '#include' BLANKS '"' ID_START '.h' '"'; singleInclude : '#include' BLANKS '"' WHOLE_ID '.h' '"';
试试,结果就出错了,
此时才注意到,原来,此处的HEX_DIGIT,只是从a到f:
而不是此处的,a到z,所以才出错的。
所以,此处,人家antlrworks报错,是正常的。
15.所以,那么就写正常的,等价的定义:
ID_START : 'a'..'z'|'A'..'Z'|'_'; WHOLE_ID : (ID_START) (ID_START | DIGIT)*; //ID_MIDDLE_END : ID_START | DIGIT; //ID_MIDDLE_END : HEX_DIGIT | '_'; //singleInclude : '#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"'; //singleInclude : '#include' BLANKS '"' ID_START (ID_START | DIGIT)* '.h' '"'; //singleInclude : '#include' BLANKS '"' ID_START (ID_START | DIGIT)+ '.h' '"'; //singleInclude : '#include' BLANKS '"' ID_START '.h' '"'; singleInclude : '#include' BLANKS '"' WHOLE_ID '.h' '"';
再去试试,结果是OK的。
16.那就很奇怪了,为何,之前用那个ID却不可以?
所以,再去试试:
//fragment ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* ; //ID_START : 'a'..'z'|'A'..'Z'|'_'; //WHOLE_ID : (ID_START) (ID_START | DIGIT)*; //ID_MIDDLE_END : ID_START | DIGIT; //ID_MIDDLE_END : HEX_DIGIT | '_'; //singleInclude : '#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"'; //singleInclude : '#include' BLANKS '"' ID_START (ID_START | DIGIT)* '.h' '"'; //singleInclude : '#include' BLANKS '"' ID_START (ID_START | DIGIT)+ '.h' '"'; //singleInclude : '#include' BLANKS '"' ID_START '.h' '"'; //singleInclude : '#include' BLANKS '"' WHOLE_ID '.h' '"'; singleInclude : '#include' BLANKS '"' ID '.h' '"';
试试,结果此时,却又都可以了都是正常的。
17.那再试试,加上fragment:
fragment ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* ; //ID_START : 'a'..'z'|'A'..'Z'|'_'; //WHOLE_ID : (ID_START) (ID_START | DIGIT)*; //ID_MIDDLE_END : ID_START | DIGIT; //ID_MIDDLE_END : HEX_DIGIT | '_'; //singleInclude : '#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"'; //singleInclude : '#include' BLANKS '"' ID_START (ID_START | DIGIT)* '.h' '"'; //singleInclude : '#include' BLANKS '"' ID_START (ID_START | DIGIT)+ '.h' '"'; //singleInclude : '#include' BLANKS '"' ID_START '.h' '"'; //singleInclude : '#include' BLANKS '"' WHOLE_ID '.h' '"'; singleInclude : '#include' BLANKS '"' ID '.h' '"';
试试,则的确是不可以的,会出现那个
MismatchedTokenException(0!=0)
的错误的。
【总结】
其实还是没有完全搞懂,当报错:
The following token definitions can never be matched because prior tokens match the same input
时,到底如何正常分析出来,当然的token变量的定义,为何,和之前的冲突的。
因为此处,即使改成,利用已有的token去匹配,结果也还是报同样的错误的。
更多总结,详见:
【已解决】antlr解析双引号出错:MismatchedTokenException(0!=0)
转载请注明:在路上 » 【未完全解决】antlr调试出错:The following token definitions can never be matched because prior tokens match the same input