最新消息:20210816 当前crifan.com域名已被污染,为防止失联,请关注(页面右下角的)公众号

【基本解决】antlr v3,用包含{$channel=HIDDEN;}语法,结果解析出错:MissingTokenException

ANTLR crifan 2779浏览 0评论

【问题】

折腾:

【基本解决】antlr v3中包含{skip();}的语法,调试解析时出错:org.antlr.runtime.EarlyExitException

的过程中,把语法改为:

BLANKS	:	(' '|'\t')+ {$channel=HIDDEN;};


startParse	:	manufacture deviceType deviceRevison ddRevision;
manufacture	:	'MANUFACTURER'^ 	BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;

 

结果,虽然是可以正常识别数值了,但是却又出现了MissingTokenException:

chanel hidden error MissingTokenException

 

【解决过程】

1.此处,很明显,还是没有完全搞懂:

{skip();}

{$channel=HIDDEN;}

的语法的含义。

2.参考:

cannot debug simple channel flag in ANTLR with Eclipse

没啥帮助,其是把

{$channel = HIDDEN;}

误写成:

($channel = HIDDEN;)

了。我此处不存在这等语法问题。

3。参考:

MissingTokenException

看起来像是,如果本身语法写的不好,变成:

不是context-free

那么就会导致此类问题。

所以,再回去,检查一下语法,看看自己能否看出一些端倪。

4.改为:

//BLANKS	:	(' '|'\t')+ {$channel=HIDDEN;};
BLANKS	:	' '+ {$channel=HIDDEN;};

startParse	:	manufacture deviceType deviceRevison ddRevision;
manufacture	:	'MANUFACTURER'^ 	BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;

试试,结果错误依旧,还是MissingTokenException。

5.怀疑,现在是

BLANK+

BLANKS,两者冲突了,所以,去把现在的:

BLANK	:	(' '|'\t') {$channel=HIDDEN;};

//BLANKS	:	(' '|'\t')+ {$channel=HIDDEN;};
BLANKS	:	' '+ {$channel=HIDDEN;};


//startParse	:	include* identification+;
//startParse	:	include+ identification+;
//startParse	:	identification+;
startParse	:	manufacture deviceType deviceRevison ddRevision;
manufacture	:	'MANUFACTURER'^ 	BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
//manufacture	:	'MANUFACTURER'^ 	(BLANK+! (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
deviceType	:	'DEVICE_TYPE'^ 		BLANK+ (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
deviceRevison	:	'DEVICE_REVISION'^ 	BLANK+ (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;
ddRevision	:	'DD_REVISION'^ 		BLANK+ (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;

改为:

//BLANK	:	(' '|'\t') {$channel=HIDDEN;};

//BLANKS	:	(' '|'\t')+ {$channel=HIDDEN;};
BLANKS	:	' '+ {$channel=HIDDEN;};


//startParse	:	include* identification+;
//startParse	:	include+ identification+;
//startParse	:	identification+;
startParse	:	manufacture deviceType deviceRevison ddRevision;
manufacture	:	'MANUFACTURER'^ 	BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
//manufacture	:	'MANUFACTURER'^ 	(BLANK+! (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
deviceType	:	'DEVICE_TYPE'^ 		BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
deviceRevison	:	'DEVICE_REVISION'^ 	BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;
ddRevision	:	'DD_REVISION'^ 		BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;

试试,结果错误依旧。

6.后来,倒是,改为skip的形式:

//BLANKS	:	(' '|'\t')+ {$channel=HIDDEN;};
BLANKS	:	(' '|'\t')+ {skip();};
//BLANKS	:	' '+ {$channel=HIDDEN;};


//startParse	:	include* identification+;
//startParse	:	include+ identification+;
//startParse	:	identification+;
startParse	:	manufacture deviceType deviceRevison ddRevision;
manufacture	:	'MANUFACTURER'^ 	BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
//manufacture	:	'MANUFACTURER'^ 	(BLANK+! (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
deviceType	:	'DEVICE_TYPE'^ 		BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
deviceRevison	:	'DEVICE_REVISION'^ 	BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;
ddRevision	:	'DD_REVISION'^ 		BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;

结果也是,错误依旧。

7.把中间的空格去掉,变成:

//BLANKS	:	(' '|'\t')+ {$channel=HIDDEN;};
BLANKS	:	(' '|'\t')+ {skip();};
//BLANKS	:	' '+ {$channel=HIDDEN;};


//startParse	:	include* identification+;
//startParse	:	include+ identification+;
//startParse	:	identification+;
startParse	:	manufacture deviceType deviceRevison ddRevision;
manufacture	:	'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
//manufacture	:	'MANUFACTURER'^ 	(BLANK+! (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
deviceType	:	'DEVICE_TYPE'^ 		BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
deviceRevison	:	'DEVICE_REVISION'^ 	BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;
ddRevision	:	'DD_REVISION'^ 		BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;

试试,结果错误依旧。

说明不是写语法时候的多余的空格或tab引起的。

8.难道是,前面的语法中的

DIGIT和HEX_DIGIT有冲突?

对应的定义是:

fragment
DIGIT
	:	'0'..'9';

//FAKE_TOKEN 	:	'1' '2' '3';

/*
DECIMAL_VALUE
	:	'1'..'9' DIGIT*;
*/

DECIMAL_VALUE
	:	DIGIT*;

HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;

那么就去掉重复的定义,改为:

fragment
DIGIT
	:	'0'..'9';

//FAKE_TOKEN 	:	'1' '2' '3';

/*
DECIMAL_VALUE
	:	'1'..'9' DIGIT*;
*/

DECIMAL_VALUE
	:	DIGIT*;

//HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;
HEX_DIGIT : (DIGIT|'a'..'f'|'A'..'F') ;


HEX_VALUE
	:	'0x' HEX_DIGIT+;

试试,结果错误依旧。

9.参考了:

[antlr-interest] C Runtime problem with $channel=HIDDEN and SKIP()

难道是,此处的Java版本的,

{$channel=HIDDEN;}

也是有bug,所以才导致MissingTokenException的?

10.后来找到此MissingTokenException错误,是3.1版本中新加的:

Errors and warnings

为了更好的提供错误的详细信息的。

11.再去改为:

//BLANKS	:	(' '|'\t')+ {$channel=HIDDEN;};
BLANKS	:	((' '|'\t')+) {$channel=HIDDEN;};

试试,结果错误依旧。

12.后来仔细去查看了一下,关于MissingTokenException的错误的产生的过程:

see events for error of recognition exception MissingTokenException

觉得,好像MissingTokenException的产生,是多次检索此处的值0x1E6D11之后,而产生的。

换句话说,好像此处的MissingTokenException,和前面的

BLANKS	:	(' '|'\t')+ {$channel=HIDDEN;};

没啥关系,而是和后面的语法有关系。

所以,就去研究看看后面的语法:

(DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;

是不是哪里写的不妥。

13.先把感叹号去掉:

//manufacture	:	'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
//manufacture	:	'MANUFACTURER'^ 	(BLANK+! (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
manufacture	:	'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) ','? WS*;

试试,结果错误依旧。

14.怀疑是不是DECIMAL_VALUE或者HEX_VALUE写的有问题。

所以去改为:

//DECIMAL_VALUE	:	DIGIT*;
DECIMAL_VALUE	:	DIGIT+;

试试,结果错误依旧。

15.去把HEX_VALUE和DECIMAL_VALUE顺序换一个:

//manufacture	:	'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) ','? WS*;
manufacture	:	'MANUFACTURER'^ BLANKS (HEX_VALUE | DECIMAL_VALUE) ','? WS*;

试试,结果错误依旧。

16.把WS的skip换为hidden:

//fragment WS  :   ( ' ' | '\t' | '\r' | '\n') {skip();};
fragment WS  :   ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};

试试,结果直接出错:

[14:05:28] D:\DevRoot\IndustrialMobileAutomation\HandheldDataSetter\ANTLR\projects\v1.5\DDParserDemo\output\DDParserDemoLexer.java:593: error: cannot find symbol

[14:05:28]             _channel=HIDDEN;

[14:05:28]             ^

[14:05:28]   symbol:   variable _channel

[14:05:28]   location: class DDParserDemoLexer

[14:05:28] 1 error

17.所以再把fragment去掉:

//fragment WS  :   ( ' ' | '\t' | '\r' | '\n') {skip();};
//fragment WS  :   ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
WS  :   ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};

试试,结果错误依旧,还是MissingTokenException。

18.再去仔细研究后发现, 好像还是,在识别数字0x1E6D11之前,发生的MissingTokenException,所以,还是要去折腾BLANKS。

改为:

//BLANKS	:	(' '|'\t')+ {$channel=HIDDEN;};
BLANKS	:	(' '|'\t')+;

试试,结果,最终,才算是,正常识别空格:

can recognize blanks ok when no skip or hidden

但是很是诡异的是,为何,此处无法给多个空格,添加对应的skip()或hidden呢?

19.所以,再去把BLANKS改为BLANK,同时添加hidden:

BLANK	:	(' '|'\t') {$channel=HIDDEN;};
//BLANKS	:	(' '|'\t')+ {skip();};
//BLANKS	:	' '+ {$channel=HIDDEN;};

//startParse	:	include* identification+;
//startParse	:	include+ identification+;
//startParse	:	identification+;
startParse	:	manufacture deviceType deviceRevison ddRevision;
//manufacture	:	'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
//manufacture	:	'MANUFACTURER'^ 	(BLANK+! (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
//manufacture	:	'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) ','? WS*;
manufacture	:	'MANUFACTURER'^ 	BLANK+ (HEX_VALUE | DECIMAL_VALUE) ','? WS*;
deviceType	:	'DEVICE_TYPE'^ 		BLANK+ (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
deviceRevison	:	'DEVICE_REVISION'^ 	BLANK+ (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;
ddRevision	:	'DD_REVISION'^ 		BLANK+ (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;

试试,结果又回到了开始的那个org.antlr.runtime.EarlyExitException的错误了:

if use single blank with hidden then still EarlyExitException

所以,没法这么用。

20.然后再去试试skip:

//BLANK	:	(' '|'\t') {$channel=HIDDEN;};
BLANK	:	(' '|'\t') {skip();};

结果报错:

[14:27:15] error(208): DDParserDemo.g:119:1: The following token definitions can never be matched because prior tokens match the same input: BLANK

去看了下,应该是,已经有的WS,同样匹配此等输入了,所以,去改为:

/*
BLANKSPACE_TAB
//	:	(' ' | '\t'){skip();};
	:	(' ' | '\t')
	{$channel=HIDDEN;};
*/	
//fragment BLANK	:	(' '|'\t')+ {skip();};
//BLANK	:	(' '|'\t') {skip();};
//BLANK	:	(' '|'\t');
//BLANK	:	(' '|'\t') {$channel=HIDDEN;};

//BLANKS	:	(' '|'\t')+ {$channel=HIDDEN;};
//BLANKS	:	(' '|'\t')+ {$channel=HIDDEN;};
//BLANKS	:	(' '|'\t')+;
//BLANK	:	(' '|'\t') {$channel=HIDDEN;};
//BLANK	:	(' '|'\t') {skip();};
//BLANKS	:	(' '|'\t')+ {skip();};
//BLANKS	:	' '+ {$channel=HIDDEN;};

//startParse	:	include* identification+;
//startParse	:	include+ identification+;
//startParse	:	identification+;
startParse	:	manufacture deviceType deviceRevison ddRevision;
//manufacture	:	'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
//manufacture	:	'MANUFACTURER'^ 	(BLANK+! (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
//manufacture	:	'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) ','? WS*;
manufacture	:	'MANUFACTURER'^ 	WS+ (HEX_VALUE | DECIMAL_VALUE) ','? WS*;
deviceType	:	'DEVICE_TYPE'^ 		WS+ (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
deviceRevison	:	'DEVICE_REVISION'^ 	WS+ (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;
ddRevision	:	'DD_REVISION'^ 		WS+ (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;

试试,结果仍是EarlyExitException的问题。

所以,貌似还是不能在此处使用skip或者hidden。

21.最后,还是通过:

grammar DDParserDemo;

options {
	output = AST;
	ASTLabelType = CommonTree; // type of $stat.tree ref etc...
}

//NEWLINE :   '\r'? '\n' ;
//NEWLINE :   '\r' '\n' ;
fragment 
NEWLINE :   '\r'? '\n' ;


fragment
ID  :	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;
    
fragment
FLOAT
    :   ('0'..'9')+ '.' ('0'..'9')* EXPONENT?
    |   '.' ('0'..'9')+ EXPONENT?
    |   ('0'..'9')+ EXPONENT
    ;

COMMENT
    :   '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
    |   '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
    ;

//fragment WS  :   ( ' ' | '\t' | '\r' | '\n') {skip();};
//fragment WS  :   ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
WS  :   ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};

STRING
    :  '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
    ;

CHAR:  '\'' ( ESC_SEQ | ~('\''|'\\') ) '\''
    ;

fragment
EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;


ESC_SEQ
    :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
    |   UNICODE_ESC
    |   OCTAL_ESC
    ;

fragment
OCTAL_ESC
    :   '\\' ('0'..'3') ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7')
    ;

fragment
UNICODE_ESC
    :   '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
    ;

fragment
DIGIT
	:	'0'..'9';

//FAKE_TOKEN 	:	'1' '2' '3';

/*
DECIMAL_VALUE
	:	'1'..'9' DIGIT*;
*/

//DECIMAL_VALUE	:	DIGIT*;
DECIMAL_VALUE	:	DIGIT+;

//HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;
HEX_DIGIT : (DIGIT|'a'..'f'|'A'..'F') ;


HEX_VALUE
	:	'0x' HEX_DIGIT+;

fragment
HEADER_FILENAME
	:	('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'_')*;

/*
//singleInclude	:	'#include' ' '+ '"' ID '.h"' ;
//singleInclude	:	'#include' ' '+ '"' ID+ '.h"' ;
//singleInclude	:	'#include' ' '+ '"' HEADER_FILENAME '.h"';
//singleInclude	:	'#include' ' ' '"' HEADER_FILENAME '.h"';
//singleInclude	:	'#include "' HEADER_FILENAME '.h"';
//fragment singleInclude	:	'#include' (' ')+ '"' ID '.h"';
//singleInclude	:	'#include' (' '|'\t')+ '""' ID '.h"';
//singleInclude	:	'#include' (' '|'\t')+ '"std_defs.h"';
singleInclude	:	'#include' (' '|'\t')+  ID '.h';

include		:	singleInclude WS*   -> singleInclude;
*/



/*
BLANKSPACE_TAB
//	:	(' ' | '\t'){skip();};
	:	(' ' | '\t')
	{$channel=HIDDEN;};
*/	
//fragment BLANK	:	(' '|'\t')+ {skip();};
//BLANK	:	(' '|'\t') {skip();};
//BLANK	:	(' '|'\t');
//BLANK	:	(' '|'\t') {$channel=HIDDEN;};

//BLANKS	:	(' '|'\t')+ {$channel=HIDDEN;};
//BLANKS	:	(' '|'\t')+ {$channel=HIDDEN;};
//BLANKS	:	(' '|'\t')+;
//BLANK	:	(' '|'\t') {$channel=HIDDEN;};
//BLANK	:	(' '|'\t') {skip();};
BLANKS	:	(' '|'\t')+;
//BLANKS	:	(' '|'\t')+ {skip();};
//BLANKS	:	' '+ {$channel=HIDDEN;};

//startParse	:	include* identification+;
//startParse	:	include+ identification+;
//startParse	:	identification+;
startParse	:	manufacture deviceType deviceRevison ddRevision;
//manufacture	:	'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
//manufacture	:	'MANUFACTURER'^ 	(BLANK+! (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
//manufacture	:	'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) ','? WS*;
manufacture	:	'MANUFACTURER'^ 	BLANKS (HEX_VALUE | DECIMAL_VALUE) ','? WS*;
deviceType	:	'DEVICE_TYPE'^ 		BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
deviceRevison	:	'DEVICE_REVISION'^ 	BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;
ddRevision	:	'DD_REVISION'^ 		BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;
	
//identification	:	definiton WS* (','?)! WS*   -> definiton;
	
//definiton	:	(ID)^ ('\t'!|' '!)+ (DECIMAL_VALUE | HEX_VALUE)
//definiton	:	(ID)^ BLANKSPACE_TAB+ (DECIMAL_VALUE | HEX_VALUE)
//definiton	:	ID ('\t'!|' '!)+ (DECIMAL_VALUE | HEX_VALUE);

去匹配:

MANUFACTURER      0x1E6D11,
DEVICE_TYPE       0x00FF,
DEVICE_REVISION   5,
DD_REVISION       1

然后得到如下树结构:

parse tree ok for define blanks and use blanks

use blank and blanks to parse the tree value

 

【总结】

1.对于匹配空格或Tab,无法使用skip()或者$channel=HIDDEN,否则,会导致无法正常解析。

2.不能在已经定义好了WS的情况下,再次单独定义单个的BLANK为空格或Tab,否则会导致重复定义,会报错:

The following token definitions can never be matched because prior tokens match the same input: BLANK

3.最终只能使用,单独定义BLANKS:

BLANKS	:	(' '|'\t')+;

然后在后面使用:

manufacture	:	'MANUFACTURER'^ 	BLANKS (HEX_VALUE | DECIMAL_VALUE) ','? WS*;

如此:

  • 才能正常识别输入的内容,包括空格;
  • 但是识别出来的空格,就没法实现hidden或skip的效果了。目前貌似没法实现此效果。

 


【后记】

1.后来,看到这个:

what is wrong with this grammar

感觉那人说的有理,我感觉可能也是:

此MissingTokenException,可能是antlr(或antlrworks)的bug。

毕竟,语法上,貌似都没有问题,并且也都可以正常执行代码,不应该报此错误才对。

当然,有待更清楚人的来确认一下。是不是bug。

转载请注明:在路上 » 【基本解决】antlr v3,用包含{$channel=HIDDEN;}语法,结果解析出错:MissingTokenException

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
82 queries in 0.147 seconds, using 22.18MB memory