最新消息:20210816 当前crifan.com域名已被污染,为防止失联,请关注(页面右下角的)公众号

【未完全解决】antlr调试出错:The following token definitions can never be matched because prior tokens match the same input

ANTLR crifan 2283浏览 0评论

【问题】

折腾:

【已解决】antlr解析双引号出错:MismatchedTokenException(0!=0)

的过程中,去把ID之前的fragment去掉,变成:

//fragment
ID  :	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;

试试,结果报错了:

[15:01:01] error(208): DDParserDemo.g:80:1: The following token definitions can never be matched because prior tokens match the same input: HEX_DIGIT

[15:01:01] error(208): D:\DevRoot\IndustrialMobileAutomation\HandheldDataSetter\ANTLR\projects\v1.5\DDParserDemo\DDParserDemo.g:80:1: The following token definitions can never be matched because prior tokens match the same input: HEX_DIGIT

 

【解决过程】

1.意思好像是:

之前已经有了别的token,去实现了现在这个HEX_DIGIT同样的效果了,所以无法正常编译。

应该是去,找到之前是哪个token,实现了HEX_DIGIT的效果。

不过又却是还是可以正常debug,正常compile的。。。

2.后来发现,貌似自己此处少匹配了#include后面的空格,所以改为:

//singleInclude	:	'#include' '"' ID '"' '.h';
singleInclude	:	'#include' BLANKS '"' ID '"' '.h';

再去调试:

结果又出现,和上面的同样的错误了:

[15:11:40] error(208): DDParserDemo.g:80:1: The following token definitions can never be matched because prior tokens match the same input: HEX_DIGIT

[15:11:40] error(208): D:\DevRoot\IndustrialMobileAutomation\HandheldDataSetter\ANTLR\projects\v1.5\DDParserDemo\DDParserDemo.g:80:1: The following token definitions can never be matched because prior tokens match the same input: HEX_DIGIT

但是却始终,无法找到,之前到底是哪个token,和此处的HEX_DIGIT是一样的含义。

3. 只能再去把ID的fragment再加回来:

fragment
ID  :	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;

然后看看结果,就可以正常编译,正常去debug了。

但是竟然又出现之前在:

【基本解决】antlr v3,用包含{$channel=HIDDEN;}语法,结果解析出错:MissingTokenException

见到过的MissingTokenException:

blanks error missingtokenexception

此处,暂时忽略,往后继续调试看看。

4. 结果后面,还是同样的错误,还是

MismatchedTokenException(0!=0)

还是无法识别对应的

#include "std_defs.h"

中的

std_defs

不过,刚又注意到,其实是写错了,应该改为:

//singleInclude	:	'#include' BLANKS '"' ID '"' '.h';
singleInclude	:	'#include' BLANKS '"' ID '.h' '"';

然后再调试看看。

此处还是同样错误。

不过看到很诡异的现象是:

对于通过上面的代码,尤其是ID是:

fragment
ID  :	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;

结果,去解析:

#include "std_defs.h"
#include "com_tbls.h"
#include "rev_defs.h"
#include "fbk_hm.h"
#include "fdiag_FBK2_Start.h"
#include "blk_err.h"

结果却只识别出来部分的ID的内容:

#include "ddef.h"
#include "cb.h"
#include "edef.h"
#include "fb.h"
#include "fdaFB2a.h"
#include "be.h"

比如,第一个是:

std_defs

只是识别出来:

ddef

很奇怪,没搞懂为何。

5.看起来,像是ID的定义有误?

那么就专门去试试,重新写一个ID的定义。

换成把ID展开:

//singleInclude	:	'#include' BLANKS '"' ID '.h' '"';
singleInclude	:	'#include' BLANKS '"' ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* '.h' '"';

看看效果,

结果出现语法错误了:

[15:30:51] error(170): DDParserDemo.g:123:85: the .. range operator isn’t allowed in parser rules

所以,再去改为:

//singleInclude	:	'#include' BLANKS '"' ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* '.h' '"';
ID_START 	:	('a'..'z'|'A'..'Z'|'_');
ID_MIDDLE_END	:	('a'..'z'|'A'..'Z'|'0'..'9'|'_');
singleInclude	:	'#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"';

看看效果,结果又出现语法错误了:

2] error(208): DDParserDemo.g:125:1: The following token definitions can never be matched because prior tokens match the same input: ID_MIDDLE_END

[15:32:52] error(208): D:\DevRoot\IndustrialMobileAutomation\HandheldDataSetter\ANTLR\projects\v1.5\DDParserDemo\DDParserDemo.g:125:1: The following token definitions can never be matched because prior tokens match the same input: ID_MIDDLE_END

就去把ID去掉:

/*
fragment
ID  :	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;
*/

试试,结果还是同样语法错误,无法debug。

再改为:

ID_START 	:	('a'..'z'|'A'..'Z'|'_');
ID_MIDDLE_END	:	(ID_START | DIGIT);
singleInclude	:	'#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"';

试试,结果同样错误:

[15:43:18] error(208): DDParserDemo.g:126:1: The following token definitions can never be matched because prior tokens match the same input: ID_MIDDLE_END

总之,都是同一个问题。

6.再改为:

ID_START 	:	'a'..'z'|'A'..'Z'|'_';
//ID_MIDDLE_END	:	ID_START | DIGIT;
ID_MIDDLE_END	:	HEX_DIGIT | '_';
singleInclude	:	'#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"';

结果错误依旧。

7.改为:

ID_START 	:	'a'..'z'|'A'..'Z'|'_';
//ID_MIDDLE_END	:	ID_START | DIGIT;
//ID_MIDDLE_END	:	HEX_DIGIT | '_';
//singleInclude	:	'#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"';
singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)* '.h' '"';

结果,至少是可以正常编译和调试的。

结果也是很诡异的,虽然是仍然有那个MismatchedTokenException的错误,但是却是可以匹配到所有的,整行的include的语句的:

still mismatch error but can match all include line

8.改为:

ID_START 	:	'a'..'z'|'A'..'Z'|'_';
//ID_MIDDLE_END	:	ID_START | DIGIT;
//ID_MIDDLE_END	:	HEX_DIGIT | '_';
//singleInclude	:	'#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)* '.h' '"';
singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)+ '.h' '"';

试试,结果和上面是一样的,仍有错误,但是却可以匹配整行:

same as above error but match

 

9.看起来,感觉是,对于变量类型的值,比如之前定义的ID,此处,只能正确匹配到第一个字母,后面的,就无法正常匹配了。

所以,就故意去掉,变成:

ID_START 	:	'a'..'z'|'A'..'Z'|'_';
//ID_MIDDLE_END	:	ID_START | DIGIT;
//ID_MIDDLE_END	:	HEX_DIGIT | '_';
//singleInclude	:	'#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)+ '.h' '"';
singleInclude	:	'#include' BLANKS '"' ID_START '.h' '"';

试试,结果错误依旧。

10.去给ID_START加fragment:

//ID_START 	:	'a'..'z'|'A'..'Z'|'_';
fragment ID_START 	:	'a'..'z'|'A'..'Z'|'_';
//ID_MIDDLE_END	:	ID_START | DIGIT;
//ID_MIDDLE_END	:	HEX_DIGIT | '_';
//singleInclude	:	'#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)* '.h' '"';
singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)+ '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START '.h' '"';

试试,结果是,用了fragment,不仅同样错误,还只能匹配到部分的内容:

if use fragment then only match partly

所以,更不能用了。

11.所以再去,把之前的DIGIT的fragment也去掉:

//fragment
DIGIT
	:	'0'..'9';

试试,结果竟然也还是不行,错误依旧,真的不知道到底是什么原因。

12.后来,重新弄个全新的定义:

//ID_START 	:	'a'..'z'|'A'..'Z'|'_';
//fragment ID_START 	:	'a'..'z'|'A'..'Z'|'_';
WHOLE_ID	:	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;
//ID_MIDDLE_END	:	ID_START | DIGIT;
//ID_MIDDLE_END	:	HEX_DIGIT | '_';
//singleInclude	:	'#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)+ '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START '.h' '"';
singleInclude	:	'#include' BLANKS '"' WHOLE_ID '.h' '"';

试试,结果,竟然可以正常解析了:

use whole new include then can parse all ok

 

此处,暂时保存一下,当前文件的内容:

grammar DDParserDemo;

options {
	output = AST;
	ASTLabelType = CommonTree; // type of $stat.tree ref etc...
}

//NEWLINE :   '\r'? '\n' ;
//NEWLINE :   '\r' '\n' ;
fragment 
NEWLINE :   '\r'? '\n' ;

/*
fragment
ID  :	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;
*/
   
fragment
FLOAT
    :   ('0'..'9')+ '.' ('0'..'9')* EXPONENT?
    |   '.' ('0'..'9')+ EXPONENT?
    |   ('0'..'9')+ EXPONENT
    ;

COMMENT
    :   '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
    |   '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
    ;

//fragment WS  :   ( ' ' | '\t' | '\r' | '\n') {skip();};
//fragment WS  :   ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
WS  :   ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};

/*
STRING
    :  '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
    ;
*/

CHAR:  '\'' ( ESC_SEQ | ~('\''|'\\') ) '\''
    ;

fragment
EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;


ESC_SEQ
    :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
    |   UNICODE_ESC
    |   OCTAL_ESC
    ;

fragment
OCTAL_ESC
    :   '\\' ('0'..'3') ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7')
    ;

fragment
UNICODE_ESC
    :   '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
    ;

//fragment
DIGIT
	:	'0'..'9';

//FAKE_TOKEN 	:	'1' '2' '3';

/*
DECIMAL_VALUE
	:	'1'..'9' DIGIT*;
*/

//DECIMAL_VALUE	:	DIGIT*;
DECIMAL_VALUE	:	DIGIT+;

//HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;
HEX_DIGIT : (DIGIT|'a'..'f'|'A'..'F') ;


HEX_VALUE
	:	'0x' HEX_DIGIT+;

/*
fragment
HEADER_FILENAME
	:	('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'_')*;
*/

/*
BLANKSPACE_TAB
//	:	(' ' | '\t'){skip();};
	:	(' ' | '\t')
	{$channel=HIDDEN;};
*/
//fragment BLANK	:	(' '|'\t')+ {skip();};
//BLANK	:	(' '|'\t') {skip();};
//BLANK	:	(' '|'\t');
//BLANK	:	(' '|'\t') {$channel=HIDDEN;};
//BLANKS	:	(' '|'\t')+ {$channel=HIDDEN;};
//BLANKS	:	(' '|'\t')+ {$channel=HIDDEN;};
//BLANKS	:	(' '|'\t')+;
//BLANK	:	(' '|'\t') {$channel=HIDDEN;};
//BLANK	:	(' '|'\t') {skip();};
BLANKS	:	(' '|'\t')+;
//BLANKS	:	(' '|'\t')+ {skip();};
//BLANKS	:	' '+ {$channel=HIDDEN;};

//singleInclude	:	'#include' ' '+ '"' ID '.h"' ;
//singleInclude	:	'#include' ' '+ '"' ID+ '.h"' ;
//singleInclude	:	'#include' ' '+ '"' HEADER_FILENAME '.h"';
//singleInclude	:	'#include' ' ' '"' HEADER_FILENAME '.h"';
//singleInclude	:	'#include "' HEADER_FILENAME '.h"';
//fragment singleInclude	:	'#include' (' ')+ '"' ID '.h"';
//singleInclude	:	'#include' (' '|'\t')+ '""' ID '.h"';
//singleInclude	:	'#include' (' '|'\t')+ '"std_defs.h"';
//singleInclude	:	'#include' BLANKS  '"' ID '"' '.h';
//singleInclude	:	'#include' '"' ID '"' '.h';
//singleInclude	:	'#include' BLANKS '"' ID '"' '.h';
//singleInclude	:	'#include' BLANKS '"' ID '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* '.h' '"';
//ID_START 	:	'a'..'z'|'A'..'Z'|'_';
//fragment ID_START 	:	'a'..'z'|'A'..'Z'|'_';
WHOLE_ID	:	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;
//ID_MIDDLE_END	:	ID_START | DIGIT;
//ID_MIDDLE_END	:	HEX_DIGIT | '_';
//singleInclude	:	'#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)+ '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START '.h' '"';
singleInclude	:	'#include' BLANKS '"' WHOLE_ID '.h' '"';

//include		:	singleInclude WS*   -> singleInclude;
include		:	singleInclude WS*;

//startParse	:	include* identification+;
//startParse	:	include+ identification+;
//startParse	:	identification+;
//startParse	:	manufacture deviceType deviceRevison ddRevision;
startParse	:	include+ manufacture deviceType deviceRevison ddRevision;
//manufacture	:	'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
//manufacture	:	'MANUFACTURER'^ 	(BLANK+! (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
//manufacture	:	'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) ','? WS*;
manufacture	:	'MANUFACTURER'^ 	BLANKS (HEX_VALUE | DECIMAL_VALUE) ','? WS*;
deviceType	:	'DEVICE_TYPE'^ 		BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
deviceRevison	:	'DEVICE_REVISION'^ 	BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;
ddRevision	:	'DD_REVISION'^ 		BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;
	
//identification	:	definiton WS* (','?)! WS*   -> definiton;
	
//definiton	:	(ID)^ ('\t'!|' '!)+ (DECIMAL_VALUE | HEX_VALUE)
//definiton	:	(ID)^ BLANKSPACE_TAB+ (DECIMAL_VALUE | HEX_VALUE)
//definiton	:	ID ('\t'!|' '!)+ (DECIMAL_VALUE | HEX_VALUE);

 

13.那么就再回来,一点点改为,单独的定义:

//WHOLE_ID	:	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;
WHOLE_ID	:	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'| DIGIT)*;
//ID_MIDDLE_END	:	ID_START | DIGIT;
//ID_MIDDLE_END	:	HEX_DIGIT | '_';
//singleInclude	:	'#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)+ '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START '.h' '"';
singleInclude	:	'#include' BLANKS '"' WHOLE_ID '.h' '"';

试试,结果同样可以正常解析。

14.再去分解:

//WHOLE_ID	:	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;
//WHOLE_ID	:	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'| DIGIT)*;
WHOLE_ID	:	('a'..'z'|'A'..'Z'|'_') (HEX_DIGIT|'_')*;

//ID_MIDDLE_END	:	ID_START | DIGIT;
//ID_MIDDLE_END	:	HEX_DIGIT | '_';
//singleInclude	:	'#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)+ '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START '.h' '"';
singleInclude	:	'#include' BLANKS '"' WHOLE_ID '.h' '"';

试试,结果就出错了,

此时才注意到,原来,此处的HEX_DIGIT,只是从a到f:

hex digit is only a to f

而不是此处的,a到z,所以才出错的。

所以,此处,人家antlrworks报错,是正常的。

15.所以,那么就写正常的,等价的定义:

ID_START 	:	'a'..'z'|'A'..'Z'|'_';
WHOLE_ID	:	(ID_START) (ID_START | DIGIT)*;

//ID_MIDDLE_END	:	ID_START | DIGIT;
//ID_MIDDLE_END	:	HEX_DIGIT | '_';
//singleInclude	:	'#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)+ '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START '.h' '"';
singleInclude	:	'#include' BLANKS '"' WHOLE_ID '.h' '"';

再去试试,结果是OK的。

16.那就很奇怪了,为何,之前用那个ID却不可以?

所以,再去试试:

//fragment
ID  :	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;


//ID_START 	:	'a'..'z'|'A'..'Z'|'_';
//WHOLE_ID	:	(ID_START) (ID_START | DIGIT)*;

//ID_MIDDLE_END	:	ID_START | DIGIT;
//ID_MIDDLE_END	:	HEX_DIGIT | '_';
//singleInclude	:	'#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)+ '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START '.h' '"';
//singleInclude	:	'#include' BLANKS '"' WHOLE_ID '.h' '"';
singleInclude	:	'#include' BLANKS '"' ID '.h' '"';

试试,结果此时,却又都可以了都是正常的。

17.那再试试,加上fragment:

fragment
ID  :	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;


//ID_START 	:	'a'..'z'|'A'..'Z'|'_';
//WHOLE_ID	:	(ID_START) (ID_START | DIGIT)*;

//ID_MIDDLE_END	:	ID_START | DIGIT;
//ID_MIDDLE_END	:	HEX_DIGIT | '_';
//singleInclude	:	'#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)+ '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START '.h' '"';
//singleInclude	:	'#include' BLANKS '"' WHOLE_ID '.h' '"';
singleInclude	:	'#include' BLANKS '"' ID '.h' '"';

试试,则的确是不可以的,会出现那个

MismatchedTokenException(0!=0)

的错误的。

 

【总结】

其实还是没有完全搞懂,当报错:

The following token definitions can never be matched because prior tokens match the same input

时,到底如何正常分析出来,当然的token变量的定义,为何,和之前的冲突的。

因为此处,即使改成,利用已有的token去匹配,结果也还是报同样的错误的。

 

更多总结,详见:

【已解决】antlr解析双引号出错:MismatchedTokenException(0!=0)

转载请注明:在路上 » 【未完全解决】antlr调试出错:The following token definitions can never be matched because prior tokens match the same input

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
82 queries in 0.153 seconds, using 22.19MB memory