【已解决】antlr解析双引号出错:MismatchedTokenException(0!=0)

【问题】

antlr v3的语法,在antlrworks中调试。

核心部分的代码是:

fragment
ID  :	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;
     
//singleInclude	:	'#include' BLANKS  '"' ID '"' '.h';
singleInclude	:	'#include'   '"' ID '"' '.h';

//include		:	singleInclude WS*   -> singleInclude;
include		:	singleInclude WS*;


//startParse	:	include* identification+;
//startParse	:	include+ identification+;
//startParse	:	identification+;
//startParse	:	manufacture deviceType deviceRevison ddRevision;

解析的内容是:

/*
**********************************************************************
** Includes
**********************************************************************
*/

#include "std_defs.h"
#include "com_tbls.h"
#include "rev_defs.h"
#include "fbk_hm.h"
#include "fdiag_FBK2_Start.h"
#include "blk_err.h"

/*
**********************************************************************
********** DEVICE SECTION ********************************************
**********************************************************************
*/

MANUFACTURER      0x1E6D11,
DEVICE_TYPE       0x00FF,
DEVICE_REVISION   5,
DD_REVISION       1

结果调试出错:

MismatchedTokenException (0!=0) for quote

【解决过程】

1.很明显,是双引号无法识别,出现MismatchedTokenException(0!=0)的问题。

2.参考:

构建自定义的语法分析器

解释的很清楚,可惜对此问题没帮助。

3.参考:

[antlr-interest] MismatchedTokenException

没太看懂。。。

对解决问题,没帮助。

4.参考:

Antlr.Runtime.MismatchedTokenException from Envers with generic entities

没用。

5.后来搜:

antlr MismatchedTokenException(0!=0) double quote

而参考:

ANTLR grammar how to capture all characters to end of line

其说的,和我此处有点类似:

好像是comment等的定义,和此处的 双引号的匹配,有点冲突了?

所以试着看,把原先的代码:

grammar DDParserDemo;

options {
	output = AST;
	ASTLabelType = CommonTree; // type of $stat.tree ref etc...
}

//NEWLINE :   '\r'? '\n' ;
//NEWLINE :   '\r' '\n' ;
fragment 
NEWLINE :   '\r'? '\n' ;


fragment
ID  :	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;
     
fragment
FLOAT
    :   ('0'..'9')+ '.' ('0'..'9')* EXPONENT?
    |   '.' ('0'..'9')+ EXPONENT?
    |   ('0'..'9')+ EXPONENT
    ;

COMMENT
    :   '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
    |   '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
    ;

//fragment WS  :   ( ' ' | '\t' | '\r' | '\n') {skip();};
//fragment WS  :   ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
WS  :   ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};

STRING
    :  '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
    ;

CHAR:  '\'' ( ESC_SEQ | ~('\''|'\\') ) '\''
    ;

fragment
EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;


ESC_SEQ
    :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
    |   UNICODE_ESC
    |   OCTAL_ESC
    ;

fragment
OCTAL_ESC
    :   '\\' ('0'..'3') ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7')
    ;

fragment
UNICODE_ESC
    :   '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
    ;

fragment
DIGIT
	:	'0'..'9';

//FAKE_TOKEN 	:	'1' '2' '3';

/*
DECIMAL_VALUE
	:	'1'..'9' DIGIT*;
*/

//DECIMAL_VALUE	:	DIGIT*;
DECIMAL_VALUE	:	DIGIT+;

//HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;
HEX_DIGIT : (DIGIT|'a'..'f'|'A'..'F') ;


HEX_VALUE
	:	'0x' HEX_DIGIT+;

fragment
HEADER_FILENAME
	:	('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'_')*;


/*
BLANKSPACE_TAB
//	:	(' ' | '\t'){skip();};
	:	(' ' | '\t')
	{$channel=HIDDEN;};
*/
//fragment BLANK	:	(' '|'\t')+ {skip();};
//BLANK	:	(' '|'\t') {skip();};
//BLANK	:	(' '|'\t');
//BLANK	:	(' '|'\t') {$channel=HIDDEN;};
//BLANKS	:	(' '|'\t')+ {$channel=HIDDEN;};
//BLANKS	:	(' '|'\t')+ {$channel=HIDDEN;};
//BLANKS	:	(' '|'\t')+;
//BLANK	:	(' '|'\t') {$channel=HIDDEN;};
//BLANK	:	(' '|'\t') {skip();};
BLANKS	:	(' '|'\t')+;
//BLANKS	:	(' '|'\t')+ {skip();};
//BLANKS	:	' '+ {$channel=HIDDEN;};

//singleInclude	:	'#include' ' '+ '"' ID '.h"' ;
//singleInclude	:	'#include' ' '+ '"' ID+ '.h"' ;
//singleInclude	:	'#include' ' '+ '"' HEADER_FILENAME '.h"';
//singleInclude	:	'#include' ' ' '"' HEADER_FILENAME '.h"';
//singleInclude	:	'#include "' HEADER_FILENAME '.h"';
//fragment singleInclude	:	'#include' (' ')+ '"' ID '.h"';
//singleInclude	:	'#include' (' '|'\t')+ '""' ID '.h"';
//singleInclude	:	'#include' (' '|'\t')+ '"std_defs.h"';
//singleInclude	:	'#include' BLANKS  '"' ID '"' '.h';
singleInclude	:	'#include'   '"' ID '"' '.h';

//include		:	singleInclude WS*   -> singleInclude;
include		:	singleInclude WS*;


//startParse	:	include* identification+;
//startParse	:	include+ identification+;
//startParse	:	identification+;
//startParse	:	manufacture deviceType deviceRevison ddRevision;
startParse	:	include+ manufacture deviceType deviceRevison ddRevision;
//manufacture	:	'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
//manufacture	:	'MANUFACTURER'^ 	(BLANK+! (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
//manufacture	:	'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) ','? WS*;
manufacture	:	'MANUFACTURER'^ 	BLANKS (HEX_VALUE | DECIMAL_VALUE) ','? WS*;
deviceType	:	'DEVICE_TYPE'^ 		BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
deviceRevison	:	'DEVICE_REVISION'^ 	BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;
ddRevision	:	'DD_REVISION'^ 		BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;
	
//identification	:	definiton WS* (','?)! WS*   -> definiton;
	
//definiton	:	(ID)^ ('\t'!|' '!)+ (DECIMAL_VALUE | HEX_VALUE)
//definiton	:	(ID)^ BLANKSPACE_TAB+ (DECIMAL_VALUE | HEX_VALUE)
//definiton	:	ID ('\t'!|' '!)+ (DECIMAL_VALUE | HEX_VALUE);

中的STRING注释掉:

/*
STRING
    :  '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
    ;
*/

去重新debug看看结果,结果,果然可以识别第一个双引号了,不过接着又出现了另外的

MismatchedTokenException(0!=0)

的问题:

fix first mismatch occur another

但是,这样就离着最终解决此问题,前进了一大步了。

因为,搞懂了,之前之所以没有匹配第一个双引号,是因为,之前无故地,多定义了个STRING,但是却没使用。

导致后续无法正常匹配所需要的双引号。

6.此处,之所以错在ID位置,好像是之前多余的,自己定义了一个:

fragment
HEADER_FILENAME
	:	('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'_')*;

所以,去掉:

/*
fragment
HEADER_FILENAME
	:	('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'_')*;
*/

试试结果,结果错误依旧。

7.期间遇到类似于重复定义的问题,详见:

【未完全解决】antlr调试出错:The following token definitions can never be matched because prior tokens match the same input

 

【总结】

1.不要随便,乱用,Antlrworks创建新的.g文件时所自带的语法

比如ID,STRING等等。

否则,后期可能和你真正要处理的内容,有冲突:

比如此处就是,之前模板所生成的STRING,和后续的识别双引号,而产生冲突,导致出现了

MismatchedTokenException(0!=0)

而无法正常继续解析。

2.之前的ID定义,其实是可以用的,即:

ID  :	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;

是可以正常使用的。

3.但是对应ID,不能加上fragment,即不能用:

fragment
ID  :	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;

否则,是会报错:MismatchedTokenException(0!=0),的。

4.单引号的表示,的确就是正常的:

'"'

即可。

5.此处,还仍旧会有那个MissingTokenException的,目前看来,估计是bug。

详见:

【基本解决】antlr v3,用包含{$channel=HIDDEN;}语法,结果解析出错:MissingTokenException

6.目前是用如下代码:

grammar DDParserDemo;

options {
	output = AST;
	ASTLabelType = CommonTree; // type of $stat.tree ref etc...
}

//NEWLINE :   '\r'? '\n' ;
//NEWLINE :   '\r' '\n' ;
fragment 
NEWLINE :   '\r'? '\n' ;

   
fragment
FLOAT
    :   ('0'..'9')+ '.' ('0'..'9')* EXPONENT?
    |   '.' ('0'..'9')+ EXPONENT?
    |   ('0'..'9')+ EXPONENT
    ;

COMMENT
    :   '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
    |   '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
    ;

//fragment WS  :   ( ' ' | '\t' | '\r' | '\n') {skip();};
//fragment WS  :   ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
WS  :   ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};

/*
STRING
    :  '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
    ;
*/

CHAR:  '\'' ( ESC_SEQ | ~('\''|'\\') ) '\''
    ;

fragment
EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;


ESC_SEQ
    :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
    |   UNICODE_ESC
    |   OCTAL_ESC
    ;

fragment
OCTAL_ESC
    :   '\\' ('0'..'3') ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7')
    ;

fragment
UNICODE_ESC
    :   '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
    ;

//fragment
DIGIT
	:	'0'..'9';

//FAKE_TOKEN 	:	'1' '2' '3';

/*
DECIMAL_VALUE
	:	'1'..'9' DIGIT*;
*/

//DECIMAL_VALUE	:	DIGIT*;
DECIMAL_VALUE	:	DIGIT+;

//HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;
HEX_DIGIT : (DIGIT|'a'..'f'|'A'..'F') ;


HEX_VALUE
	:	'0x' HEX_DIGIT+;

/*
fragment
HEADER_FILENAME
	:	('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'_')*;
*/

/*
BLANKSPACE_TAB
//	:	(' ' | '\t'){skip();};
	:	(' ' | '\t')
	{$channel=HIDDEN;};
*/
//fragment BLANK	:	(' '|'\t')+ {skip();};
//BLANK	:	(' '|'\t') {skip();};
//BLANK	:	(' '|'\t');
//BLANK	:	(' '|'\t') {$channel=HIDDEN;};
//BLANKS	:	(' '|'\t')+ {$channel=HIDDEN;};
//BLANKS	:	(' '|'\t')+ {$channel=HIDDEN;};
//BLANKS	:	(' '|'\t')+;
//BLANK	:	(' '|'\t') {$channel=HIDDEN;};
//BLANK	:	(' '|'\t') {skip();};
BLANKS	:	(' '|'\t')+;
//BLANKS	:	(' '|'\t')+ {skip();};
//BLANKS	:	' '+ {$channel=HIDDEN;};

//singleInclude	:	'#include' ' '+ '"' ID '.h"' ;
//singleInclude	:	'#include' ' '+ '"' ID+ '.h"' ;
//singleInclude	:	'#include' ' '+ '"' HEADER_FILENAME '.h"';
//singleInclude	:	'#include' ' ' '"' HEADER_FILENAME '.h"';
//singleInclude	:	'#include "' HEADER_FILENAME '.h"';
//fragment singleInclude	:	'#include' (' ')+ '"' ID '.h"';
//singleInclude	:	'#include' (' '|'\t')+ '""' ID '.h"';
//singleInclude	:	'#include' (' '|'\t')+ '"std_defs.h"';
//singleInclude	:	'#include' BLANKS  '"' ID '"' '.h';
//singleInclude	:	'#include' '"' ID '"' '.h';
//singleInclude	:	'#include' BLANKS '"' ID '"' '.h';
//singleInclude	:	'#include' BLANKS '"' ID '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* '.h' '"';
//ID_START 	:	'a'..'z'|'A'..'Z'|'_';
//fragment ID_START 	:	'a'..'z'|'A'..'Z'|'_';

//WHOLE_ID	:	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;
//WHOLE_ID	:	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'| DIGIT)*;
//WHOLE_ID	:	('a'..'z'|'A'..'Z'|'_') (HEX_DIGIT|'_')*;


//fragment
ID  :	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;


//ID_START 	:	'a'..'z'|'A'..'Z'|'_';
//WHOLE_ID	:	(ID_START) (ID_START | DIGIT)*;

//ID_MIDDLE_END	:	ID_START | DIGIT;
//ID_MIDDLE_END	:	HEX_DIGIT | '_';
//singleInclude	:	'#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)+ '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START '.h' '"';
//singleInclude	:	'#include' BLANKS '"' WHOLE_ID '.h' '"';
singleInclude	:	'#include' BLANKS '"' ID '.h' '"';


//include		:	singleInclude WS*   -> singleInclude;
include		:	singleInclude WS*;

//startParse	:	include* identification+;
//startParse	:	include+ identification+;
//startParse	:	identification+;
//startParse	:	manufacture deviceType deviceRevison ddRevision;
startParse	:	include+ manufacture deviceType deviceRevison ddRevision;
//manufacture	:	'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
//manufacture	:	'MANUFACTURER'^ 	(BLANK+! (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
//manufacture	:	'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) ','? WS*;
manufacture	:	'MANUFACTURER'^ 	BLANKS (HEX_VALUE | DECIMAL_VALUE) ','? WS*;
deviceType	:	'DEVICE_TYPE'^ 		BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
deviceRevison	:	'DEVICE_REVISION'^ 	BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;
ddRevision	:	'DD_REVISION'^ 		BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;
	
//identification	:	definiton WS* (','?)! WS*   -> definiton;
	
//definiton	:	(ID)^ ('\t'!|' '!)+ (DECIMAL_VALUE | HEX_VALUE)
//definiton	:	(ID)^ BLANKSPACE_TAB+ (DECIMAL_VALUE | HEX_VALUE)
//definiton	:	ID ('\t'!|' '!)+ (DECIMAL_VALUE | HEX_VALUE);

去解析:

/*
**********************************************************************
** Includes
**********************************************************************
*/

#include "std_defs.h"
#include "com_tbls.h"
#include "rev_defs.h"
#include "fbk_hm.h"
#include "fdiag_FBK2_Start.h"
#include "blk_err.h"

/*
**********************************************************************
********** DEVICE SECTION ********************************************
**********************************************************************
*/

MANUFACTURER      0x1E6D11,
DEVICE_TYPE       0x00FF,
DEVICE_REVISION   5,
DD_REVISION       1

对应的截图为:

now can use id to parse include

7.



发表评论

电子邮件地址不会被公开。 必填项已用*标注

无觅相关文章插件,快速提升流量