【未完全解决】antlr调试出错：The following token definitions can never be matched because prior tokens match the same input

【问题】

折腾：

【已解决】antlr解析双引号出错：MismatchedTokenException(0!=0)

的过程中，去把ID之前的fragment去掉，变成：

//fragment
ID  :	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;

试试，结果报错了：

[15:01:01] error(208): DDParserDemo.g:80:1: The following token definitions can never be matched because prior tokens match the same input: HEX_DIGIT

[15:01:01] error(208): D:\DevRoot\IndustrialMobileAutomation\HandheldDataSetter\ANTLR\projects\v1.5\DDParserDemo\DDParserDemo.g:80:1: The following token definitions can never be matched because prior tokens match the same input: HEX_DIGIT

【解决过程】

1.意思好像是：

之前已经有了别的token，去实现了现在这个HEX_DIGIT同样的效果了，所以无法正常编译。

应该是去，找到之前是哪个token，实现了HEX_DIGIT的效果。

不过又却是还是可以正常debug，正常compile的。。。

2.后来发现，貌似自己此处少匹配了#include后面的空格，所以改为：

//singleInclude	:	'#include' '"' ID '"' '.h';
singleInclude	:	'#include' BLANKS '"' ID '"' '.h';

再去调试：

结果又出现，和上面的同样的错误了：

[15:11:40] error(208): DDParserDemo.g:80:1: The following token definitions can never be matched because prior tokens match the same input: HEX_DIGIT

[15:11:40] error(208): D:\DevRoot\IndustrialMobileAutomation\HandheldDataSetter\ANTLR\projects\v1.5\DDParserDemo\DDParserDemo.g:80:1: The following token definitions can never be matched because prior tokens match the same input: HEX_DIGIT

但是却始终，无法找到，之前到底是哪个token，和此处的HEX_DIGIT是一样的含义。

3. 只能再去把ID的fragment再加回来：

fragment
ID  :	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;

然后看看结果，就可以正常编译，正常去debug了。

但是竟然又出现之前在：

【基本解决】antlr v3，用包含{$channel=HIDDEN;}语法，结果解析出错：MissingTokenException

见到过的MissingTokenException：

此处，暂时忽略，往后继续调试看看。

4. 结果后面，还是同样的错误，还是

MismatchedTokenException(0!=0)

还是无法识别对应的

#include "std_defs.h"

中的

std_defs

不过，刚又注意到，其实是写错了，应该改为：

//singleInclude	:	'#include' BLANKS '"' ID '"' '.h';
singleInclude	:	'#include' BLANKS '"' ID '.h' '"';

然后再调试看看。

此处还是同样错误。

不过看到很诡异的现象是：

对于通过上面的代码，尤其是ID是：

fragment
ID  :	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;

结果，去解析：

#include "std_defs.h"
#include "com_tbls.h"
#include "rev_defs.h"
#include "fbk_hm.h"
#include "fdiag_FBK2_Start.h"
#include "blk_err.h"

结果却只识别出来部分的ID的内容：

#include "ddef.h"
#include "cb.h"
#include "edef.h"
#include "fb.h"
#include "fdaFB2a.h"
#include "be.h"

比如，第一个是：

std_defs

只是识别出来：

ddef

很奇怪，没搞懂为何。

5.看起来，像是ID的定义有误？

那么就专门去试试，重新写一个ID的定义。

换成把ID展开：

//singleInclude	:	'#include' BLANKS '"' ID '.h' '"';
singleInclude	:	'#include' BLANKS '"' ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* '.h' '"';

看看效果，

结果出现语法错误了：

[15:30:51] error(170): DDParserDemo.g:123:85: the .. range operator isn’t allowed in parser rules

所以，再去改为：

//singleInclude	:	'#include' BLANKS '"' ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* '.h' '"';
ID_START 	:	('a'..'z'|'A'..'Z'|'_');
ID_MIDDLE_END	:	('a'..'z'|'A'..'Z'|'0'..'9'|'_');
singleInclude	:	'#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"';

看看效果，结果又出现语法错误了：

2] error(208): DDParserDemo.g:125:1: The following token definitions can never be matched because prior tokens match the same input: ID_MIDDLE_END

[15:32:52] error(208): D:\DevRoot\IndustrialMobileAutomation\HandheldDataSetter\ANTLR\projects\v1.5\DDParserDemo\DDParserDemo.g:125:1: The following token definitions can never be matched because prior tokens match the same input: ID_MIDDLE_END

就去把ID去掉：

/*
fragment
ID  :	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;
*/

试试，结果还是同样语法错误，无法debug。

再改为：

ID_START 	:	('a'..'z'|'A'..'Z'|'_');
ID_MIDDLE_END	:	(ID_START | DIGIT);
singleInclude	:	'#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"';

试试，结果同样错误：

[15:43:18] error(208): DDParserDemo.g:126:1: The following token definitions can never be matched because prior tokens match the same input: ID_MIDDLE_END

总之，都是同一个问题。

6.再改为：

ID_START 	:	'a'..'z'|'A'..'Z'|'_';
//ID_MIDDLE_END	:	ID_START | DIGIT;
ID_MIDDLE_END	:	HEX_DIGIT | '_';
singleInclude	:	'#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"';

结果错误依旧。

7.改为：

ID_START 	:	'a'..'z'|'A'..'Z'|'_';
//ID_MIDDLE_END	:	ID_START | DIGIT;
//ID_MIDDLE_END	:	HEX_DIGIT | '_';
//singleInclude	:	'#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"';
singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)* '.h' '"';

结果，至少是可以正常编译和调试的。

结果也是很诡异的，虽然是仍然有那个MismatchedTokenException的错误，但是却是可以匹配到所有的，整行的include的语句的：

8.改为：

ID_START 	:	'a'..'z'|'A'..'Z'|'_';
//ID_MIDDLE_END	:	ID_START | DIGIT;
//ID_MIDDLE_END	:	HEX_DIGIT | '_';
//singleInclude	:	'#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)* '.h' '"';
singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)+ '.h' '"';

试试，结果和上面是一样的，仍有错误，但是却可以匹配整行：

9.看起来，感觉是，对于变量类型的值，比如之前定义的ID，此处，只能正确匹配到第一个字母，后面的，就无法正常匹配了。

所以，就故意去掉，变成：

ID_START 	:	'a'..'z'|'A'..'Z'|'_';
//ID_MIDDLE_END	:	ID_START | DIGIT;
//ID_MIDDLE_END	:	HEX_DIGIT | '_';
//singleInclude	:	'#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)+ '.h' '"';
singleInclude	:	'#include' BLANKS '"' ID_START '.h' '"';

试试，结果错误依旧。

10.去给ID_START加fragment：

//ID_START 	:	'a'..'z'|'A'..'Z'|'_';
fragment ID_START 	:	'a'..'z'|'A'..'Z'|'_';
//ID_MIDDLE_END	:	ID_START | DIGIT;
//ID_MIDDLE_END	:	HEX_DIGIT | '_';
//singleInclude	:	'#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)* '.h' '"';
singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)+ '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START '.h' '"';

试试，结果是，用了fragment，不仅同样错误，还只能匹配到部分的内容：

所以，更不能用了。

11.所以再去，把之前的DIGIT的fragment也去掉：

//fragment
DIGIT
	:	'0'..'9';

试试，结果竟然也还是不行，错误依旧，真的不知道到底是什么原因。

12.后来，重新弄个全新的定义：

//ID_START 	:	'a'..'z'|'A'..'Z'|'_';
//fragment ID_START 	:	'a'..'z'|'A'..'Z'|'_';
WHOLE_ID	:	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;
//ID_MIDDLE_END	:	ID_START | DIGIT;
//ID_MIDDLE_END	:	HEX_DIGIT | '_';
//singleInclude	:	'#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)+ '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START '.h' '"';
singleInclude	:	'#include' BLANKS '"' WHOLE_ID '.h' '"';

试试，结果，竟然可以正常解析了：

此处，暂时保存一下，当前文件的内容：

grammar DDParserDemo;

options {
	output = AST;
	ASTLabelType = CommonTree; // type of $stat.tree ref etc...
}

//NEWLINE :   '\r'? '\n' ;
//NEWLINE :   '\r' '\n' ;
fragment 
NEWLINE :   '\r'? '\n' ;

/*
fragment
ID  :	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;
*/
   
fragment
FLOAT
    :   ('0'..'9')+ '.' ('0'..'9')* EXPONENT?
    |   '.' ('0'..'9')+ EXPONENT?
    |   ('0'..'9')+ EXPONENT
    ;

COMMENT
    :   '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
    |   '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
    ;

//fragment WS  :   ( ' ' | '\t' | '\r' | '\n') {skip();};
//fragment WS  :   ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
WS  :   ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};

/*
STRING
    :  '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
    ;
*/

CHAR:  '\'' ( ESC_SEQ | ~('\''|'\\') ) '\''
    ;

fragment
EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;


ESC_SEQ
    :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
    |   UNICODE_ESC
    |   OCTAL_ESC
    ;

fragment
OCTAL_ESC
    :   '\\' ('0'..'3') ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7')
    ;

fragment
UNICODE_ESC
    :   '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
    ;

//fragment
DIGIT
	:	'0'..'9';

//FAKE_TOKEN 	:	'1' '2' '3';

/*
DECIMAL_VALUE
	:	'1'..'9' DIGIT*;
*/

//DECIMAL_VALUE	:	DIGIT*;
DECIMAL_VALUE	:	DIGIT+;

//HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;
HEX_DIGIT : (DIGIT|'a'..'f'|'A'..'F') ;


HEX_VALUE
	:	'0x' HEX_DIGIT+;

/*
fragment
HEADER_FILENAME
	:	('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'_')*;
*/

/*
BLANKSPACE_TAB
//	:	(' ' | '\t'){skip();};
	:	(' ' | '\t')
	{$channel=HIDDEN;};
*/
//fragment BLANK	:	(' '|'\t')+ {skip();};
//BLANK	:	(' '|'\t') {skip();};
//BLANK	:	(' '|'\t');
//BLANK	:	(' '|'\t') {$channel=HIDDEN;};
//BLANKS	:	(' '|'\t')+ {$channel=HIDDEN;};
//BLANKS	:	(' '|'\t')+ {$channel=HIDDEN;};
//BLANKS	:	(' '|'\t')+;
//BLANK	:	(' '|'\t') {$channel=HIDDEN;};
//BLANK	:	(' '|'\t') {skip();};
BLANKS	:	(' '|'\t')+;
//BLANKS	:	(' '|'\t')+ {skip();};
//BLANKS	:	' '+ {$channel=HIDDEN;};

//singleInclude	:	'#include' ' '+ '"' ID '.h"' ;
//singleInclude	:	'#include' ' '+ '"' ID+ '.h"' ;
//singleInclude	:	'#include' ' '+ '"' HEADER_FILENAME '.h"';
//singleInclude	:	'#include' ' ' '"' HEADER_FILENAME '.h"';
//singleInclude	:	'#include "' HEADER_FILENAME '.h"';
//fragment singleInclude	:	'#include' (' ')+ '"' ID '.h"';
//singleInclude	:	'#include' (' '|'\t')+ '""' ID '.h"';
//singleInclude	:	'#include' (' '|'\t')+ '"std_defs.h"';
//singleInclude	:	'#include' BLANKS  '"' ID '"' '.h';
//singleInclude	:	'#include' '"' ID '"' '.h';
//singleInclude	:	'#include' BLANKS '"' ID '"' '.h';
//singleInclude	:	'#include' BLANKS '"' ID '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* '.h' '"';
//ID_START 	:	'a'..'z'|'A'..'Z'|'_';
//fragment ID_START 	:	'a'..'z'|'A'..'Z'|'_';
WHOLE_ID	:	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;
//ID_MIDDLE_END	:	ID_START | DIGIT;
//ID_MIDDLE_END	:	HEX_DIGIT | '_';
//singleInclude	:	'#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)+ '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START '.h' '"';
singleInclude	:	'#include' BLANKS '"' WHOLE_ID '.h' '"';

//include		:	singleInclude WS*   -> singleInclude;
include		:	singleInclude WS*;

//startParse	:	include* identification+;
//startParse	:	include+ identification+;
//startParse	:	identification+;
//startParse	:	manufacture deviceType deviceRevison ddRevision;
startParse	:	include+ manufacture deviceType deviceRevison ddRevision;
//manufacture	:	'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
//manufacture	:	'MANUFACTURER'^ 	(BLANK+! (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
//manufacture	:	'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) ','? WS*;
manufacture	:	'MANUFACTURER'^ 	BLANKS (HEX_VALUE | DECIMAL_VALUE) ','? WS*;
deviceType	:	'DEVICE_TYPE'^ 		BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
deviceRevison	:	'DEVICE_REVISION'^ 	BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;
ddRevision	:	'DD_REVISION'^ 		BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;
	
//identification	:	definiton WS* (','?)! WS*   -> definiton;
	
//definiton	:	(ID)^ ('\t'!|' '!)+ (DECIMAL_VALUE | HEX_VALUE)
//definiton	:	(ID)^ BLANKSPACE_TAB+ (DECIMAL_VALUE | HEX_VALUE)
//definiton	:	ID ('\t'!|' '!)+ (DECIMAL_VALUE | HEX_VALUE);

13.那么就再回来，一点点改为，单独的定义：

//WHOLE_ID	:	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;
WHOLE_ID	:	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'| DIGIT)*;
//ID_MIDDLE_END	:	ID_START | DIGIT;
//ID_MIDDLE_END	:	HEX_DIGIT | '_';
//singleInclude	:	'#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)+ '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START '.h' '"';
singleInclude	:	'#include' BLANKS '"' WHOLE_ID '.h' '"';

试试，结果同样可以正常解析。

14.再去分解：

//WHOLE_ID	:	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;
//WHOLE_ID	:	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'| DIGIT)*;
WHOLE_ID	:	('a'..'z'|'A'..'Z'|'_') (HEX_DIGIT|'_')*;

//ID_MIDDLE_END	:	ID_START | DIGIT;
//ID_MIDDLE_END	:	HEX_DIGIT | '_';
//singleInclude	:	'#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)+ '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START '.h' '"';
singleInclude	:	'#include' BLANKS '"' WHOLE_ID '.h' '"';

试试，结果就出错了，

此时才注意到，原来，此处的HEX_DIGIT，只是从a到f：

而不是此处的，a到z，所以才出错的。

所以，此处，人家antlrworks报错，是正常的。

15.所以，那么就写正常的，等价的定义：

ID_START 	:	'a'..'z'|'A'..'Z'|'_';
WHOLE_ID	:	(ID_START) (ID_START | DIGIT)*;

//ID_MIDDLE_END	:	ID_START | DIGIT;
//ID_MIDDLE_END	:	HEX_DIGIT | '_';
//singleInclude	:	'#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)+ '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START '.h' '"';
singleInclude	:	'#include' BLANKS '"' WHOLE_ID '.h' '"';

再去试试，结果是OK的。

16.那就很奇怪了，为何，之前用那个ID却不可以？

所以，再去试试：

//fragment
ID  :	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;


//ID_START 	:	'a'..'z'|'A'..'Z'|'_';
//WHOLE_ID	:	(ID_START) (ID_START | DIGIT)*;

//ID_MIDDLE_END	:	ID_START | DIGIT;
//ID_MIDDLE_END	:	HEX_DIGIT | '_';
//singleInclude	:	'#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)+ '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START '.h' '"';
//singleInclude	:	'#include' BLANKS '"' WHOLE_ID '.h' '"';
singleInclude	:	'#include' BLANKS '"' ID '.h' '"';

试试，结果此时，却又都可以了都是正常的。

17.那再试试，加上fragment：

fragment
ID  :	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;


//ID_START 	:	'a'..'z'|'A'..'Z'|'_';
//WHOLE_ID	:	(ID_START) (ID_START | DIGIT)*;

//ID_MIDDLE_END	:	ID_START | DIGIT;
//ID_MIDDLE_END	:	HEX_DIGIT | '_';
//singleInclude	:	'#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)+ '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START '.h' '"';
//singleInclude	:	'#include' BLANKS '"' WHOLE_ID '.h' '"';
singleInclude	:	'#include' BLANKS '"' ID '.h' '"';

试试，则的确是不可以的，会出现那个

MismatchedTokenException(0!=0)

的错误的。

【总结】

其实还是没有完全搞懂，当报错：

The following token definitions can never be matched because prior tokens match the same input

时，到底如何正常分析出来，当然的token变量的定义，为何，和之前的冲突的。

因为此处，即使改成，利用已有的token去匹配，结果也还是报同样的错误的。

转载请注明：在路上 » 【未完全解决】antlr调试出错：The following token definitions can never be matched because prior tokens match the same input

Post Views: 1,757

与本文相关的文章