【已解决】antlr解析双引号出错：MismatchedTokenException(0!=0)

【问题】

antlr v3的语法，在antlrworks中调试。

核心部分的代码是：

fragment
ID  :	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;
     
//singleInclude	:	'#include' BLANKS  '"' ID '"' '.h';
singleInclude	:	'#include'   '"' ID '"' '.h';

//include		:	singleInclude WS*   -> singleInclude;
include		:	singleInclude WS*;


//startParse	:	include* identification+;
//startParse	:	include+ identification+;
//startParse	:	identification+;
//startParse	:	manufacture deviceType deviceRevison ddRevision;

解析的内容是：

/*
**********************************************************************
** Includes
**********************************************************************
*/

#include "std_defs.h"
#include "com_tbls.h"
#include "rev_defs.h"
#include "fbk_hm.h"
#include "fdiag_FBK2_Start.h"
#include "blk_err.h"

/*
**********************************************************************
********** DEVICE SECTION ********************************************
**********************************************************************
*/

MANUFACTURER      0x1E6D11,
DEVICE_TYPE       0x00FF,
DEVICE_REVISION   5,
DD_REVISION       1

结果调试出错：

【解决过程】

1.很明显，是双引号无法识别，出现MismatchedTokenException(0!=0)的问题。

2.参考：

构建自定义的语法分析器

解释的很清楚，可惜对此问题没帮助。

3.参考：

[antlr-interest] MismatchedTokenException

没太看懂。。。

对解决问题，没帮助。

4.参考：

Antlr.Runtime.MismatchedTokenException from Envers with generic entities

没用。

5.后来搜：

antlr MismatchedTokenException(0!=0) double quote

而参考：

ANTLR grammar how to capture all characters to end of line

其说的，和我此处有点类似：

好像是comment等的定义，和此处的双引号的匹配，有点冲突了？

所以试着看，把原先的代码：

grammar DDParserDemo;

options {
	output = AST;
	ASTLabelType = CommonTree; // type of $stat.tree ref etc...
}

//NEWLINE :   '\r'? '\n' ;
//NEWLINE :   '\r' '\n' ;
fragment 
NEWLINE :   '\r'? '\n' ;


fragment
ID  :	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;
     
fragment
FLOAT
    :   ('0'..'9')+ '.' ('0'..'9')* EXPONENT?
    |   '.' ('0'..'9')+ EXPONENT?
    |   ('0'..'9')+ EXPONENT
    ;

COMMENT
    :   '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
    |   '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
    ;

//fragment WS  :   ( ' ' | '\t' | '\r' | '\n') {skip();};
//fragment WS  :   ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
WS  :   ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};

STRING
    :  '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
    ;

CHAR:  '\'' ( ESC_SEQ | ~('\''|'\\') ) '\''
    ;

fragment
EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;


ESC_SEQ
    :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
    |   UNICODE_ESC
    |   OCTAL_ESC
    ;

fragment
OCTAL_ESC
    :   '\\' ('0'..'3') ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7')
    ;

fragment
UNICODE_ESC
    :   '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
    ;

fragment
DIGIT
	:	'0'..'9';

//FAKE_TOKEN 	:	'1' '2' '3';

/*
DECIMAL_VALUE
	:	'1'..'9' DIGIT*;
*/

//DECIMAL_VALUE	:	DIGIT*;
DECIMAL_VALUE	:	DIGIT+;

//HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;
HEX_DIGIT : (DIGIT|'a'..'f'|'A'..'F') ;


HEX_VALUE
	:	'0x' HEX_DIGIT+;

fragment
HEADER_FILENAME
	:	('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'_')*;


/*
BLANKSPACE_TAB
//	:	(' ' | '\t'){skip();};
	:	(' ' | '\t')
	{$channel=HIDDEN;};
*/
//fragment BLANK	:	(' '|'\t')+ {skip();};
//BLANK	:	(' '|'\t') {skip();};
//BLANK	:	(' '|'\t');
//BLANK	:	(' '|'\t') {$channel=HIDDEN;};
//BLANKS	:	(' '|'\t')+ {$channel=HIDDEN;};
//BLANKS	:	(' '|'\t')+ {$channel=HIDDEN;};
//BLANKS	:	(' '|'\t')+;
//BLANK	:	(' '|'\t') {$channel=HIDDEN;};
//BLANK	:	(' '|'\t') {skip();};
BLANKS	:	(' '|'\t')+;
//BLANKS	:	(' '|'\t')+ {skip();};
//BLANKS	:	' '+ {$channel=HIDDEN;};

//singleInclude	:	'#include' ' '+ '"' ID '.h"' ;
//singleInclude	:	'#include' ' '+ '"' ID+ '.h"' ;
//singleInclude	:	'#include' ' '+ '"' HEADER_FILENAME '.h"';
//singleInclude	:	'#include' ' ' '"' HEADER_FILENAME '.h"';
//singleInclude	:	'#include "' HEADER_FILENAME '.h"';
//fragment singleInclude	:	'#include' (' ')+ '"' ID '.h"';
//singleInclude	:	'#include' (' '|'\t')+ '""' ID '.h"';
//singleInclude	:	'#include' (' '|'\t')+ '"std_defs.h"';
//singleInclude	:	'#include' BLANKS  '"' ID '"' '.h';
singleInclude	:	'#include'   '"' ID '"' '.h';

//include		:	singleInclude WS*   -> singleInclude;
include		:	singleInclude WS*;


//startParse	:	include* identification+;
//startParse	:	include+ identification+;
//startParse	:	identification+;
//startParse	:	manufacture deviceType deviceRevison ddRevision;
startParse	:	include+ manufacture deviceType deviceRevison ddRevision;
//manufacture	:	'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
//manufacture	:	'MANUFACTURER'^ 	(BLANK+! (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
//manufacture	:	'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) ','? WS*;
manufacture	:	'MANUFACTURER'^ 	BLANKS (HEX_VALUE | DECIMAL_VALUE) ','? WS*;
deviceType	:	'DEVICE_TYPE'^ 		BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
deviceRevison	:	'DEVICE_REVISION'^ 	BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;
ddRevision	:	'DD_REVISION'^ 		BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;
	
//identification	:	definiton WS* (','?)! WS*   -> definiton;
	
//definiton	:	(ID)^ ('\t'!|' '!)+ (DECIMAL_VALUE | HEX_VALUE)
//definiton	:	(ID)^ BLANKSPACE_TAB+ (DECIMAL_VALUE | HEX_VALUE)
//definiton	:	ID ('\t'!|' '!)+ (DECIMAL_VALUE | HEX_VALUE);

中的STRING注释掉：

/*
STRING
    :  '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
    ;
*/

去重新debug看看结果，结果，果然可以识别第一个双引号了，不过接着又出现了另外的

MismatchedTokenException(0!=0)

的问题：

但是，这样就离着最终解决此问题，前进了一大步了。

因为，搞懂了，之前之所以没有匹配第一个双引号，是因为，之前无故地，多定义了个STRING，但是却没使用。

导致后续无法正常匹配所需要的双引号。

6.此处，之所以错在ID位置，好像是之前多余的，自己定义了一个：

fragment
HEADER_FILENAME
	:	('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'_')*;

所以，去掉：

/*
fragment
HEADER_FILENAME
	:	('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'_')*;
*/

试试结果，结果错误依旧。

7.期间遇到类似于重复定义的问题，详见：

【未完全解决】antlr调试出错：The following token definitions can never be matched because prior tokens match the same input

【总结】

1.不要随便，乱用，Antlrworks创建新的.g文件时所自带的语法

比如ID，STRING等等。

否则，后期可能和你真正要处理的内容，有冲突：

比如此处就是，之前模板所生成的STRING，和后续的识别双引号，而产生冲突，导致出现了

MismatchedTokenException(0!=0)

而无法正常继续解析。

2.之前的ID定义，其实是可以用的，即：

ID  :	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;

是可以正常使用的。

3.但是对应ID，不能加上fragment，即不能用：

fragment
ID  :	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;

否则，是会报错：MismatchedTokenException(0!=0)，的。

4.单引号的表示，的确就是正常的：

'"'

即可。

5.此处，还仍旧会有那个MissingTokenException的，目前看来，估计是bug。

详见：

【基本解决】antlr v3，用包含{$channel=HIDDEN;}语法，结果解析出错：MissingTokenException

6.目前是用如下代码：

grammar DDParserDemo;

options {
	output = AST;
	ASTLabelType = CommonTree; // type of $stat.tree ref etc...
}

//NEWLINE :   '\r'? '\n' ;
//NEWLINE :   '\r' '\n' ;
fragment 
NEWLINE :   '\r'? '\n' ;

   
fragment
FLOAT
    :   ('0'..'9')+ '.' ('0'..'9')* EXPONENT?
    |   '.' ('0'..'9')+ EXPONENT?
    |   ('0'..'9')+ EXPONENT
    ;

COMMENT
    :   '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
    |   '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
    ;

//fragment WS  :   ( ' ' | '\t' | '\r' | '\n') {skip();};
//fragment WS  :   ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
WS  :   ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};

/*
STRING
    :  '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
    ;
*/

CHAR:  '\'' ( ESC_SEQ | ~('\''|'\\') ) '\''
    ;

fragment
EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;


ESC_SEQ
    :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
    |   UNICODE_ESC
    |   OCTAL_ESC
    ;

fragment
OCTAL_ESC
    :   '\\' ('0'..'3') ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7')
    ;

fragment
UNICODE_ESC
    :   '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
    ;

//fragment
DIGIT
	:	'0'..'9';

//FAKE_TOKEN 	:	'1' '2' '3';

/*
DECIMAL_VALUE
	:	'1'..'9' DIGIT*;
*/

//DECIMAL_VALUE	:	DIGIT*;
DECIMAL_VALUE	:	DIGIT+;

//HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;
HEX_DIGIT : (DIGIT|'a'..'f'|'A'..'F') ;


HEX_VALUE
	:	'0x' HEX_DIGIT+;

/*
fragment
HEADER_FILENAME
	:	('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'_')*;
*/

/*
BLANKSPACE_TAB
//	:	(' ' | '\t'){skip();};
	:	(' ' | '\t')
	{$channel=HIDDEN;};
*/
//fragment BLANK	:	(' '|'\t')+ {skip();};
//BLANK	:	(' '|'\t') {skip();};
//BLANK	:	(' '|'\t');
//BLANK	:	(' '|'\t') {$channel=HIDDEN;};
//BLANKS	:	(' '|'\t')+ {$channel=HIDDEN;};
//BLANKS	:	(' '|'\t')+ {$channel=HIDDEN;};
//BLANKS	:	(' '|'\t')+;
//BLANK	:	(' '|'\t') {$channel=HIDDEN;};
//BLANK	:	(' '|'\t') {skip();};
BLANKS	:	(' '|'\t')+;
//BLANKS	:	(' '|'\t')+ {skip();};
//BLANKS	:	' '+ {$channel=HIDDEN;};

//singleInclude	:	'#include' ' '+ '"' ID '.h"' ;
//singleInclude	:	'#include' ' '+ '"' ID+ '.h"' ;
//singleInclude	:	'#include' ' '+ '"' HEADER_FILENAME '.h"';
//singleInclude	:	'#include' ' ' '"' HEADER_FILENAME '.h"';
//singleInclude	:	'#include "' HEADER_FILENAME '.h"';
//fragment singleInclude	:	'#include' (' ')+ '"' ID '.h"';
//singleInclude	:	'#include' (' '|'\t')+ '""' ID '.h"';
//singleInclude	:	'#include' (' '|'\t')+ '"std_defs.h"';
//singleInclude	:	'#include' BLANKS  '"' ID '"' '.h';
//singleInclude	:	'#include' '"' ID '"' '.h';
//singleInclude	:	'#include' BLANKS '"' ID '"' '.h';
//singleInclude	:	'#include' BLANKS '"' ID '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* '.h' '"';
//ID_START 	:	'a'..'z'|'A'..'Z'|'_';
//fragment ID_START 	:	'a'..'z'|'A'..'Z'|'_';

//WHOLE_ID	:	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;
//WHOLE_ID	:	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'| DIGIT)*;
//WHOLE_ID	:	('a'..'z'|'A'..'Z'|'_') (HEX_DIGIT|'_')*;


//fragment
ID  :	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;


//ID_START 	:	'a'..'z'|'A'..'Z'|'_';
//WHOLE_ID	:	(ID_START) (ID_START | DIGIT)*;

//ID_MIDDLE_END	:	ID_START | DIGIT;
//ID_MIDDLE_END	:	HEX_DIGIT | '_';
//singleInclude	:	'#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)* '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START (ID_START | DIGIT)+ '.h' '"';
//singleInclude	:	'#include' BLANKS '"' ID_START '.h' '"';
//singleInclude	:	'#include' BLANKS '"' WHOLE_ID '.h' '"';
singleInclude	:	'#include' BLANKS '"' ID '.h' '"';


//include		:	singleInclude WS*   -> singleInclude;
include		:	singleInclude WS*;

//startParse	:	include* identification+;
//startParse	:	include+ identification+;
//startParse	:	identification+;
//startParse	:	manufacture deviceType deviceRevison ddRevision;
startParse	:	include+ manufacture deviceType deviceRevison ddRevision;
//manufacture	:	'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
//manufacture	:	'MANUFACTURER'^ 	(BLANK+! (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
//manufacture	:	'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) ','? WS*;
manufacture	:	'MANUFACTURER'^ 	BLANKS (HEX_VALUE | DECIMAL_VALUE) ','? WS*;
deviceType	:	'DEVICE_TYPE'^ 		BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
deviceRevison	:	'DEVICE_REVISION'^ 	BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;
ddRevision	:	'DD_REVISION'^ 		BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;
	
//identification	:	definiton WS* (','?)! WS*   -> definiton;
	
//definiton	:	(ID)^ ('\t'!|' '!)+ (DECIMAL_VALUE | HEX_VALUE)
//definiton	:	(ID)^ BLANKSPACE_TAB+ (DECIMAL_VALUE | HEX_VALUE)
//definiton	:	ID ('\t'!|' '!)+ (DECIMAL_VALUE | HEX_VALUE);

去解析：

/*
**********************************************************************
** Includes
**********************************************************************
*/

#include "std_defs.h"
#include "com_tbls.h"
#include "rev_defs.h"
#include "fbk_hm.h"
#include "fdiag_FBK2_Start.h"
#include "blk_err.h"

/*
**********************************************************************
********** DEVICE SECTION ********************************************
**********************************************************************
*/

MANUFACTURER      0x1E6D11,
DEVICE_TYPE       0x00FF,
DEVICE_REVISION   5,
DD_REVISION       1

对应的截图为：

转载请注明：在路上 » 【已解决】antlr解析双引号出错：MismatchedTokenException(0!=0)

Post Views: 2,383

与本文相关的文章