【已解决】antlr v3的lexer的条件性匹配

【背景】

折腾:

【记录】将antlr v2的C/C++的preprocess,即cpp.g,转换为antlr v3

期间,参考之前antlr v2的代码:

IDENTIFIER @init{
	List define = new ArrayList();
	List foundArgs = new ArrayList();
    
    String callArg0Text = "";
    String callArg1Text = "";
} :
    identifier=RAW_IDENTIFIER
    {
        // see if this is a macro argument
        define = (List)defineArgs.get(identifier.getText());
        if (define==null) {
            // see if this is a macro call
            define = (List)defines.get(identifier.getText());
        }
    }
    ( { (define!=null) && (define.size()>1) }? (WS|COMMENT)?
        // take in arguments if macro call requires them
        '('
        callArg0=EXPR
        {
            callArg0Text = callArg0.getText(); 
            foundArgs.add(callArg0Text);
        }
        ( COMMA callArg1=EXPR 
        {
            callArg1Text = callArg1.getText();
            foundArgs.add(callArg1Text);
        }
        )*
        { foundArgs.size()==define.size()-1 }? // better have right amount
        ')'
    | { !((define!=null) && (define.size()>1)) }?
    )

去实现匹配define被调用的时候或者是普通的ID。

其中,后来看懂了,是通过:

{ (define!=null) && (define.size()>1) }?

去实现,条件性的匹配的,即当define不为空,且size大于1,然后才继续后面的匹配:

 (WS|COMMENT)?
        // take in arguments if macro call requires them
        '('
        callArg0=EXPR
        {
            callArg0Text = callArg0.getText(); 
            foundArgs.add(callArg0Text);
        }
        ( COMMA callArg1=EXPR 
        {
            callArg1Text = callArg1.getText();
            foundArgs.add(callArg1Text);
        }
        )*
        { foundArgs.size()==define.size()-1 }? // better have right amount
        ')'

而如果不满足该条件,则才匹配或运算符’|’后面的:

{ !((define!=null) && (define.size()>1)) }?

【解决过程】

1.所以,问题转化为,在antlr v3中,如何实现lexer中的条件性匹配。

2.这人:

Conditional lexing

遇到的问题,和我这里类似。

虽然没有直接的参考答案,但是其中提到了:

  • ({boolExpr}?):叫做消除二义性/验证性的语法预测disambiguating/validating semantic predicate
  • ({boolExpr}?=>):才是所需要的gated semantic predicate

其中的代码:

fragment VERSION_COMMENT_TAIL[bool matches_version]:
        {!matches_version}? => ( options { greedy = false; }: . )* '*' '/' { $type = MULTILINE_COMMENT; $channel = 98; }
        | { $type = VERSION_COMMENT; $channel = 98; }
; 

给了提示,说明是

{xxx}? => yyy{do_A} | {do_B}

的形式。

和此处很类似。

3.对于此,官网:

http://www.antlr2.org/doc/lexer.html

即antlr v2中的相关解释是:

DEFINE
    :   {getColumn()==1}? "#define" ID
    ;

Semantic predicates on the left-edge of single-alternative lexical rules get hoisted into the nextToken prediction mechanism. Adding the predicate to a rule makes it so that it is not a candidate for recognition until the predicate evaluates to true. In this case, the method for DEFINE would never be entered, even if the lookahead predicted #define, if the column > 1.

也是符合预期的,即:

对于

{xxx}? => yyy{do_A}

中的表达式xxx,如果xxx不满足的话,则是不会去匹配对应的内容的。且一直不会去匹配的,直到找到匹配的。

而不是原先所要的效果:

希望当xxx不满足,则就不去匹配 -> 而去匹配或者关系后面的内容。

 

4.也参考了antlr v4的官网:

Semantic Predicates

expr: ID '(' expr ')' // array reference (ANTLR picks this one)

| {istype()}? ID '(' expr ')' // ctor-style typecast

| ID '(' expr ')' // function call

;

和:

stat: decl | expr ;

decl: ID ID ;

expr: {istype()}? ID '(' expr ')' // ctor-style typecast

| {isfunc()}? ID '(' expr ')' // function call

;

但是还是没有理解透彻。

因为在antlr v3中,对应的语法:

    ( { (define!=null) && (define.size()>1) }? (WS|COMMENT)?

所产生的java代码是:

			switch (alt18) {
				case 1 :
					// D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:174:7: {...}? ( WS | COMMENT )? '(' callArg0= EXPR ( COMMA callArg1= EXPR )* {...}? ')'
					{
					if ( !(( (define!=null) && (define.size()>1) )) ) {
						throw new FailedPredicateException(input, "IDENTIFIER", " (define!=null) && (define.size()>1) ");
					}
					// D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:174:48: ( WS | COMMENT )?
					int alt16=3;
					int LA16_0 = input.LA(1);

很明显,是一旦遇到,不满足此处判断:

(define!=null) && (define.size()>1)

就会抛出异常,而不会继续执行下去的,

不会像预期的,继续去判断和匹配,或运算符’|’后面的内容:

{ !((define!=null) && (define.size()>1)) }?

的。所以很是奇怪。

5.去把两者顺序调换一下,变为:

    ( { !((define!=null) && (define.size()>1)) }? 
    |
    { (define!=null) && (define.size()>1) }? (WS|COMMENT)?
        // take in arguments if macro call requires them
        '('
        callArg0=EXPR
        {
            callArg0Text = callArg0.getText(); 
            foundArgs.add(callArg0Text);
        }
        ( COMMA callArg1=EXPR 
        {
            callArg1Text = callArg1.getText();
            foundArgs.add(callArg1Text);
        }
        )*
        { foundArgs.size()==define.size()-1 }? // better have right amount
        ')'
    )

试试效果,结果还是无法解决问题。还是原先的效果:

虽然可以跳过了:

			switch (alt18) {
				case 1 :
					// D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:174:7: {...}?
					{
					if ( !(( !((define!=null) && (define.size()>1)) )) ) {
						throw new FailedPredicateException(input, "IDENTIFIER", " !((define!=null) && (define.size()>1)) ");
					}
					}
					break;

但是对于后面的代码:

			if (define!=null) {
				String defineText = (String)define.get(0);
			    
			    if (define.size()==1) {
			        //only have one value in list -> the defineText is the define para content -> just need replace directly
			        setText(defineText);
			    } else {
			        //add new dict pair: (para, call value)
			        for (int i=0;i<foundArgs.size();++i) {
			            // treat macro arguments similar to local defines
			            List arg = new ArrayList();
			            arg.add((String)foundArgs.get(i));
			            defineArgs.put( (String)define.get(1+i), arg );
			        }
			        
			        // save current lexer's state
			        SaveStruct ss = new SaveStruct(input);
			        includes.push(ss);

			        // switch on new input stream
			        setCharStream(new ANTLRStringStream(defineText));
			        reset();
			    }
			}

还是无法执行,因为define的确是null。

所以,还是暂时没解决,antlr v3的选择性匹配的问题。

6.参考:

Forcing an alternative in ANTLR lexer rule

去改为 => 的格式的语法:

    ({ (define!=null) && (define.size()>1) }?=> (WS|COMMENT)?
        // take in arguments if macro call requires them
        '('
        callArg0=EXPR
        {
            callArg0Text = callArg0.getText(); 
            foundArgs.add(callArg0Text);
        }
        ( COMMA callArg1=EXPR 
        {
            callArg1Text = callArg1.getText();
            foundArgs.add(callArg1Text);
        }
        )*
        { foundArgs.size()==define.size()-1 }? // better have right amount
        ')'
    | { !((define!=null) && (define.size()>1)) }?=>
    )

试试,结果生成的代码还是:

if ( ((LA18_0 >= '\t' && LA18_0 <= '\n')||LA18_0=='\r'||LA18_0==' '||LA18_0=='('||LA18_0=='/') && (((define!=null) && (define.size()>1)))) {
    alt18=1;
}

switch (alt18) {
    case 1 :
        // D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:174:7: {...}? => ( WS | COMMENT )? '(' callArg0= EXPR ( COMMA callArg1= EXPR )* {...}? ')'
        {
        if ( !(((define!=null) && (define.size()>1))) ) {
            throw new FailedPredicateException(input, "IDENTIFIER", "(define!=null) && (define.size()>1)");
        }
        ......
        if ( !(( foundArgs.size()==define.size()-1 )) ) {
            throw new FailedPredicateException(input, "IDENTIFIER", " foundArgs.size()==define.size()-1 ");
        }
        match(')'); 
        }
        break;
    case 2 :
        // D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:190:7: {...}? =>
        {
        if ( !(( !((define!=null) && (define.size()>1)) )) ) {
            throw new FailedPredicateException(input, "IDENTIFIER", " !((define!=null) && (define.size()>1)) ");
        }
        }
        break;

}

很明显,还是会抛异常的。

截止目前,感觉貌似对于上述的semantic predicate,貌似只是antlr v2有效?

而对于antlr v3是含义变了, 变成了预测了 -> 不符合条件,就抛异常 ???

 

7.后来参考:

http://www.egtry.com/tools/antlr/gated_semantic_predicate

其例子:

Example 2

give a sequence of digits, the first digit states how many digits to take next.

antlr grammar
@init {
  int len=0;
  int count=0;
}
: 
  d1=DIGIT {len=Integer.parseInt($d1.text); System.out.println("size of the following digits: "+len);} 
  ( { count< len }?=> d2=DIGIT {count++;System.out.println("element: "+$d2.text);}  )+ 

  (d3=DIGIT {System.out.println("Remaining Digit: "+$d3.text);})* 
  
  '\r'? '\n'
;


DIGIT: '0' .. '9';
input example
3123888
Output
size of the following digits: 3
element: 1
element: 2
element: 3
Remaining Digit: 8
Remaining Digit: 8
Remaining Digit: 8

很明显,就是我们所希望的效果:

可以条件性的判断,然后执行不同的语句,即不会当条件不符合,就乱抛异常的。

所以,既然人家的可以正常执行,那么就先去测试该语法,生成的代码是否是预期的,不带乱跑异常的。

测试代码为:

grammar gatedSynmaticPredicateDemo;

options{
	language=Java;
	output = AST;
}

parseInput
@init {
  int len=0;
  int count=0;
}
: 
  d1=DIGIT {len=Integer.parseInt($d1.text); System.out.println("size of the following digits: "+len);} 
  ( { count< len }?=> d2=DIGIT {count++;System.out.println("element: "+$d2.text);}  )+ 

  (d3=DIGIT {System.out.println("Remaining Digit: "+$d3.text);})* 
  
  '\r'? '\n'
;


DIGIT: '0' .. '9';

然后是找到生成的代码了:

while (true) {
    int alt1=2;
    int LA1_0 = input.LA(1);
    if ( (LA1_0==DIGIT) ) {
        int LA1_1 = input.LA(2);
        if ( (( count< len )) ) {
            alt1=1;
        }

    }

    switch (alt1) {
    case 1 :
        // D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\gatedSynmaticPredicateDemo\\gatedSynmaticPredicateDemo.g:15:5: {...}? =>d2= DIGIT
        {
        if ( !(( count< len )) ) {
            throw new FailedPredicateException(input, "parseInput", " count< len ");
        }
        d2=(Token)match(input,DIGIT,FOLLOW_DIGIT_in_parseInput53); 
        d2_tree = (Object)adaptor.create(d2);
        adaptor.addChild(root_0, d2_tree);

        count++;System.out.println("element: "+(d2!=null?d2.getText():null));
        }
        break;

但是是在gatedSynmaticPredicateDemoParser.java中,而不是Lexer.java中找到的。

并且测试结果是正常的:

gatedSynmaticPredicateDemo grammar test ok

但是很明显,此处的gated Synmatic Predicate,是写在parse中的,而不是lexer中的。

8.再参考:

[antlr-interest] Semantic Predicates in a Lexer

好像,应该在parser中使用gated Synmatic Predicate。

9.但是,此处,真正去运行上述的语法:

    ( {(define!=null) && (define.size()>1)}?=> (WS|COMMENT)?
        // take in arguments if macro call requires them
        '('
        callArg0=EXPR
        {
            callArg0Text = callArg0.getText(); 
            foundArgs.add(callArg0Text);
        }
        ( COMMA callArg1=EXPR 
        {
            callArg1Text = callArg1.getText();
            foundArgs.add(callArg1Text);
        }
        )*
        { foundArgs.size()==define.size()-1 }? // better have right amount
        ')'
    | {!((define!=null) && (define.size()>1))}?=>
    )

所产生的代码:

    
// D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:174:5: ({...}? => ( WS | COMMENT )? '(' callArg0= EXPR ( COMMA callArg1= EXPR )* {...}? ')' |{...}? =>)
int alt18=2;
int LA18_0 = input.LA(1);
if ( ((LA18_0 >= '\t' && LA18_0 <= '\n')||LA18_0=='\r'||LA18_0==' '||LA18_0=='('||LA18_0=='/') && (((define!=null) && (define.size()>1)))) {
    alt18=1;
}

switch (alt18) {
    case 1 :
        // D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:174:7: {...}? => ( WS | COMMENT )? '(' callArg0= EXPR ( COMMA callArg1= EXPR )* {...}? ')'
        {
        if ( !(((define!=null) && (define.size()>1))) ) {
            throw new FailedPredicateException(input, "IDENTIFIER", "(define!=null) && (define.size()>1)");
        }
        ......

        if ( !(( foundArgs.size()==define.size()-1 )) ) {
            throw new FailedPredicateException(input, "IDENTIFIER", " foundArgs.size()==define.size()-1 ");
        }
        match(')'); 
        }
        break;
    case 2 :
        // D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:190:7: {...}? =>
        {
        if ( !((!((define!=null) && (define.size()>1)))) ) {
            throw new FailedPredicateException(input, "IDENTIFIER", "!((define!=null) && (define.size()>1))");
        }
        }
        break;

}


if (define!=null) {
    String defineText = (String)define.get(0);
    
    if (define.size()==1) {
        //only have one value in list -> the defineText is the define para content -> just need replace directly
        setText(defineText);
    } else {
        //add new dict pair: (para, call value)
        for (int i=0;i<foundArgs.size();++i) {
            // treat macro arguments similar to local defines
            List arg = new ArrayList();
            arg.add((String)foundArgs.get(i));
            defineArgs.put( (String)define.get(1+i), arg );
        }

        // save current lexer's state
        SaveStruct ss = new SaveStruct(input);
        includes.push(ss);

        // switch on new input stream
        setCharStream(new ANTLRStringStream(defineText));
        reset();
    }
}

结果是,我打了几处的断点:

FailedPredicateException at first check

FailedPredicateException at second check

真的是没有执行到,即没有抛异常了。

然后执行到了,真正要执行的代码的部分:

can run into real code

 

 

【总结】

antlr v2的lexer中通过

 {testExpression}?

(好像叫做validating semantic predicate

的方式去实现选择性匹配的代码:

( { (define!=null) && (define.size()>1) }? (WS|COMMENT)?
    // take in arguments if macro call requires them
    '('
    callArg0=EXPR
    {
        callArg0Text = callArg0.getText(); 
        foundArgs.add(callArg0Text);
    }
    ( COMMA callArg1=EXPR 
    {
        callArg1Text = callArg1.getText();
        foundArgs.add(callArg1Text);
    }
    )*
    { foundArgs.size()==define.size()-1 }? // better have right amount
    ')'
| { !((define!=null) && (define.size()>1)) }?
)

在antlr v3的lexer中,需要改为:

{testExpression}?=>

(好像叫做gated Synmatic Predicate

的形式:

( {(define!=null) && (define.size()>1)}?=> (WS|COMMENT)?
    // take in arguments if macro call requires them
    '('
    callArg0=EXPR
    {
        callArg0Text = callArg0.getText(); 
        foundArgs.add(callArg0Text);
    }
    ( COMMA callArg1=EXPR 
    {
        callArg1Text = callArg1.getText();
        foundArgs.add(callArg1Text);
    }
    )*
    { foundArgs.size()==define.size()-1 }? // better have right amount
    ')'
| {!((define!=null) && (define.size()>1))}?=>
)

然后才可以真正实现,选择性的匹配对应的内容。



发表评论

电子邮件地址不会被公开。 必填项已用*标注

无觅相关文章插件,快速提升流量