最新消息:20210816 当前crifan.com域名已被污染,为防止失联,请关注(页面右下角的)公众号

【已解决】antlr v3的lexer的条件性匹配

ANTLR crifan 3064浏览 0评论

【背景】

折腾:

【记录】将antlr v2的C/C++的preprocess,即cpp.g,转换为antlr v3

期间,参考之前antlr v2的代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
IDENTIFIER @init{
    List define = new ArrayList();
    List foundArgs = new ArrayList();
     
    String callArg0Text = "";
    String callArg1Text = "";
} :
    identifier=RAW_IDENTIFIER
    {
        // see if this is a macro argument
        define = (List)defineArgs.get(identifier.getText());
        if (define==null) {
            // see if this is a macro call
            define = (List)defines.get(identifier.getText());
        }
    }
    ( { (define!=null) && (define.size()>1) }? (WS|COMMENT)?
        // take in arguments if macro call requires them
        '('
        callArg0=EXPR
        {
            callArg0Text = callArg0.getText();
            foundArgs.add(callArg0Text);
        }
        ( COMMA callArg1=EXPR
        {
            callArg1Text = callArg1.getText();
            foundArgs.add(callArg1Text);
        }
        )*
        { foundArgs.size()==define.size()-1 }? // better have right amount
        ')'
    | { !((define!=null) && (define.size()>1)) }?
    )

去实现匹配define被调用的时候或者是普通的ID。

其中,后来看懂了,是通过:

1
{ (define!=null) && (define.size()>1) }?

去实现,条件性的匹配的,即当define不为空,且size大于1,然后才继续后面的匹配:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
(WS|COMMENT)?
       // take in arguments if macro call requires them
       '('
       callArg0=EXPR
       {
           callArg0Text = callArg0.getText();
           foundArgs.add(callArg0Text);
       }
       ( COMMA callArg1=EXPR
       {
           callArg1Text = callArg1.getText();
           foundArgs.add(callArg1Text);
       }
       )*
       { foundArgs.size()==define.size()-1 }? // better have right amount
       ')'

而如果不满足该条件,则才匹配或运算符’|’后面的:

1
{ !((define!=null) && (define.size()>1)) }?

【解决过程】

1.所以,问题转化为,在antlr v3中,如何实现lexer中的条件性匹配。

2.这人:

Conditional lexing

遇到的问题,和我这里类似。

虽然没有直接的参考答案,但是其中提到了:

  • ({boolExpr}?):叫做消除二义性/验证性的语法预测disambiguating/validating semantic predicate
  • ({boolExpr}?=>):才是所需要的gated semantic predicate

其中的代码:

1
2
3
4
fragment VERSION_COMMENT_TAIL[bool matches_version]:
        {!matches_version}? => ( options { greedy = false; }: . )* '*' '/' { $type = MULTILINE_COMMENT; $channel = 98; }
        | { $type = VERSION_COMMENT; $channel = 98; }
;

给了提示,说明是

{xxx}? => yyy{do_A} | {do_B}

的形式。

和此处很类似。

3.对于此,官网:

http://www.antlr2.org/doc/lexer.html

即antlr v2中的相关解释是:

DEFINE
    :   {getColumn()==1}? "#define" ID
    ;

Semantic predicates on the left-edge of single-alternative lexical rules get hoisted into the nextToken prediction mechanism. Adding the predicate to a rule makes it so that it is not a candidate for recognition until the predicate evaluates to true. In this case, the method for DEFINE would never be entered, even if the lookahead predicted #define, if the column > 1.

也是符合预期的,即:

对于

{xxx}? => yyy{do_A}

中的表达式xxx,如果xxx不满足的话,则是不会去匹配对应的内容的。且一直不会去匹配的,直到找到匹配的。

而不是原先所要的效果:

希望当xxx不满足,则就不去匹配 -> 而去匹配或者关系后面的内容。

 

4.也参考了antlr v4的官网:

Semantic Predicates

expr: ID '(' expr ')' // array reference (ANTLR picks this one)

| {istype()}? ID '(' expr ')' // ctor-style typecast

| ID '(' expr ')' // function call

;

和:

stat: decl | expr ;

decl: ID ID ;

expr: {istype()}? ID '(' expr ')' // ctor-style typecast

| {isfunc()}? ID '(' expr ')' // function call

;

但是还是没有理解透彻。

因为在antlr v3中,对应的语法:

1
( { (define!=null) && (define.size()>1) }? (WS|COMMENT)?

所产生的java代码是:

1
2
3
4
5
6
7
8
9
10
switch (alt18) {
    case 1 :
        // D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:174:7: {...}? ( WS | COMMENT )? '(' callArg0= EXPR ( COMMA callArg1= EXPR )* {...}? ')'
        {
        if ( !(( (define!=null) && (define.size()>1) )) ) {
            throw new FailedPredicateException(input, "IDENTIFIER", " (define!=null) && (define.size()>1) ");
        }
        // D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:174:48: ( WS | COMMENT )?
        int alt16=3;
        int LA16_0 = input.LA(1);

很明显,是一旦遇到,不满足此处判断:

(define!=null) && (define.size()>1)

就会抛出异常,而不会继续执行下去的,

不会像预期的,继续去判断和匹配,或运算符’|’后面的内容:

{ !((define!=null) && (define.size()>1)) }?

的。所以很是奇怪。

5.去把两者顺序调换一下,变为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
( { !((define!=null) && (define.size()>1)) }?
|
{ (define!=null) && (define.size()>1) }? (WS|COMMENT)?
    // take in arguments if macro call requires them
    '('
    callArg0=EXPR
    {
        callArg0Text = callArg0.getText();
        foundArgs.add(callArg0Text);
    }
    ( COMMA callArg1=EXPR
    {
        callArg1Text = callArg1.getText();
        foundArgs.add(callArg1Text);
    }
    )*
    { foundArgs.size()==define.size()-1 }? // better have right amount
    ')'
)

试试效果,结果还是无法解决问题。还是原先的效果:

虽然可以跳过了:

1
2
3
4
5
6
7
8
9
switch (alt18) {
    case 1 :
        // D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:174:7: {...}?
        {
        if ( !(( !((define!=null) && (define.size()>1)) )) ) {
            throw new FailedPredicateException(input, "IDENTIFIER", " !((define!=null) && (define.size()>1)) ");
        }
        }
        break;

但是对于后面的代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
if (define!=null) {
    String defineText = (String)define.get(0);
     
    if (define.size()==1) {
        //only have one value in list -> the defineText is the define para content -> just need replace directly
        setText(defineText);
    } else {
        //add new dict pair: (para, call value)
        for (int i=0;i<foundArgs.size();++i) {
            // treat macro arguments similar to local defines
            List arg = new ArrayList();
            arg.add((String)foundArgs.get(i));
            defineArgs.put( (String)define.get(1+i), arg );
        }
         
        // save current lexer's state
        SaveStruct ss = new SaveStruct(input);
        includes.push(ss);
 
        // switch on new input stream
        setCharStream(new ANTLRStringStream(defineText));
        reset();
    }
}

还是无法执行,因为define的确是null。

所以,还是暂时没解决,antlr v3的选择性匹配的问题。

6.参考:

Forcing an alternative in ANTLR lexer rule

去改为 => 的格式的语法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
({ (define!=null) && (define.size()>1) }?=> (WS|COMMENT)?
    // take in arguments if macro call requires them
    '('
    callArg0=EXPR
    {
        callArg0Text = callArg0.getText();
        foundArgs.add(callArg0Text);
    }
    ( COMMA callArg1=EXPR
    {
        callArg1Text = callArg1.getText();
        foundArgs.add(callArg1Text);
    }
    )*
    { foundArgs.size()==define.size()-1 }? // better have right amount
    ')'
| { !((define!=null) && (define.size()>1)) }?=>
)

试试,结果生成的代码还是:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
if ( ((LA18_0 >= '\t' && LA18_0 <= '\n')||LA18_0=='\r'||LA18_0==' '||LA18_0=='('||LA18_0=='/') && (((define!=null) && (define.size()>1)))) {
    alt18=1;
}
 
switch (alt18) {
    case 1 :
        // D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:174:7: {...}? => ( WS | COMMENT )? '(' callArg0= EXPR ( COMMA callArg1= EXPR )* {...}? ')'
        {
        if ( !(((define!=null) && (define.size()>1))) ) {
            throw new FailedPredicateException(input, "IDENTIFIER", "(define!=null) && (define.size()>1)");
        }
        ......
        if ( !(( foundArgs.size()==define.size()-1 )) ) {
            throw new FailedPredicateException(input, "IDENTIFIER", " foundArgs.size()==define.size()-1 ");
        }
        match(')');
        }
        break;
    case 2 :
        // D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:190:7: {...}? =>
        {
        if ( !(( !((define!=null) && (define.size()>1)) )) ) {
            throw new FailedPredicateException(input, "IDENTIFIER", " !((define!=null) && (define.size()>1)) ");
        }
        }
        break;
 
}

很明显,还是会抛异常的。

截止目前,感觉貌似对于上述的semantic predicate,貌似只是antlr v2有效?

而对于antlr v3是含义变了, 变成了预测了 -> 不符合条件,就抛异常 ???

 

7.后来参考:

http://www.egtry.com/tools/antlr/gated_semantic_predicate

其例子:

Example 2

give a sequence of digits, the first digit states how many digits to take next.

antlr grammar
@init {
  int len=0;
  int count=0;
}
: 
  d1=DIGIT {len=Integer.parseInt($d1.text); System.out.println("size of the following digits: "+len);} 
  ( { count< len }?=> d2=DIGIT {count++;System.out.println("element: "+$d2.text);}  )+ 

  (d3=DIGIT {System.out.println("Remaining Digit: "+$d3.text);})* 
  
  '\r'? '\n'
;


DIGIT: '0' .. '9';
input example
3123888
Output
size of the following digits: 3
element: 1
element: 2
element: 3
Remaining Digit: 8
Remaining Digit: 8
Remaining Digit: 8

很明显,就是我们所希望的效果:

可以条件性的判断,然后执行不同的语句,即不会当条件不符合,就乱抛异常的。

所以,既然人家的可以正常执行,那么就先去测试该语法,生成的代码是否是预期的,不带乱跑异常的。

测试代码为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
grammar gatedSynmaticPredicateDemo;
 
options{
    language=Java;
    output = AST;
}
 
parseInput
@init {
  int len=0;
  int count=0;
}
:
  d1=DIGIT {len=Integer.parseInt($d1.text); System.out.println("size of the following digits: "+len);}
  ( { count< len }?=> d2=DIGIT {count++;System.out.println("element: "+$d2.text);}  )+
 
  (d3=DIGIT {System.out.println("Remaining Digit: "+$d3.text);})*
   
  '\r'? '\n'
;
 
 
DIGIT: '0' .. '9';

然后是找到生成的代码了:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
while (true) {
    int alt1=2;
    int LA1_0 = input.LA(1);
    if ( (LA1_0==DIGIT) ) {
        int LA1_1 = input.LA(2);
        if ( (( count< len )) ) {
            alt1=1;
        }
 
    }
 
    switch (alt1) {
    case 1 :
        // D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\gatedSynmaticPredicateDemo\\gatedSynmaticPredicateDemo.g:15:5: {...}? =>d2= DIGIT
        {
        if ( !(( count< len )) ) {
            throw new FailedPredicateException(input, "parseInput", " count< len ");
        }
        d2=(Token)match(input,DIGIT,FOLLOW_DIGIT_in_parseInput53);
        d2_tree = (Object)adaptor.create(d2);
        adaptor.addChild(root_0, d2_tree);
 
        count++;System.out.println("element: "+(d2!=null?d2.getText():null));
        }
        break;

但是是在gatedSynmaticPredicateDemoParser.java中,而不是Lexer.java中找到的。

并且测试结果是正常的:

gatedSynmaticPredicateDemo grammar test ok

但是很明显,此处的gated Synmatic Predicate,是写在parse中的,而不是lexer中的。

8.再参考:

[antlr-interest] Semantic Predicates in a Lexer

好像,应该在parser中使用gated Synmatic Predicate。

9.但是,此处,真正去运行上述的语法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
( {(define!=null) && (define.size()>1)}?=> (WS|COMMENT)?
    // take in arguments if macro call requires them
    '('
    callArg0=EXPR
    {
        callArg0Text = callArg0.getText();
        foundArgs.add(callArg0Text);
    }
    ( COMMA callArg1=EXPR
    {
        callArg1Text = callArg1.getText();
        foundArgs.add(callArg1Text);
    }
    )*
    { foundArgs.size()==define.size()-1 }? // better have right amount
    ')'
| {!((define!=null) && (define.size()>1))}?=>
)

所产生的代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
// D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:174:5: ({...}? => ( WS | COMMENT )? '(' callArg0= EXPR ( COMMA callArg1= EXPR )* {...}? ')' |{...}? =>)
int alt18=2;
int LA18_0 = input.LA(1);
if ( ((LA18_0 >= '\t' && LA18_0 <= '\n')||LA18_0=='\r'||LA18_0==' '||LA18_0=='('||LA18_0=='/') && (((define!=null) && (define.size()>1)))) {
    alt18=1;
}
 
switch (alt18) {
    case 1 :
        // D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:174:7: {...}? => ( WS | COMMENT )? '(' callArg0= EXPR ( COMMA callArg1= EXPR )* {...}? ')'
        {
        if ( !(((define!=null) && (define.size()>1))) ) {
            throw new FailedPredicateException(input, "IDENTIFIER", "(define!=null) && (define.size()>1)");
        }
        ......
 
        if ( !(( foundArgs.size()==define.size()-1 )) ) {
            throw new FailedPredicateException(input, "IDENTIFIER", " foundArgs.size()==define.size()-1 ");
        }
        match(')');
        }
        break;
    case 2 :
        // D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:190:7: {...}? =>
        {
        if ( !((!((define!=null) && (define.size()>1)))) ) {
            throw new FailedPredicateException(input, "IDENTIFIER", "!((define!=null) && (define.size()>1))");
        }
        }
        break;
 
}
 
 
if (define!=null) {
    String defineText = (String)define.get(0);
     
    if (define.size()==1) {
        //only have one value in list -> the defineText is the define para content -> just need replace directly
        setText(defineText);
    } else {
        //add new dict pair: (para, call value)
        for (int i=0;i<foundArgs.size();++i) {
            // treat macro arguments similar to local defines
            List arg = new ArrayList();
            arg.add((String)foundArgs.get(i));
            defineArgs.put( (String)define.get(1+i), arg );
        }
 
        // save current lexer's state
        SaveStruct ss = new SaveStruct(input);
        includes.push(ss);
 
        // switch on new input stream
        setCharStream(new ANTLRStringStream(defineText));
        reset();
    }
}

结果是,我打了几处的断点:

FailedPredicateException at first check

FailedPredicateException at second check

真的是没有执行到,即没有抛异常了。

然后执行到了,真正要执行的代码的部分:

can run into real code

 

 

【总结】

antlr v2的lexer中通过

 {testExpression}?

(好像叫做validating semantic predicate

的方式去实现选择性匹配的代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
( { (define!=null) && (define.size()>1) }? (WS|COMMENT)?
    // take in arguments if macro call requires them
    '('
    callArg0=EXPR
    {
        callArg0Text = callArg0.getText();
        foundArgs.add(callArg0Text);
    }
    ( COMMA callArg1=EXPR
    {
        callArg1Text = callArg1.getText();
        foundArgs.add(callArg1Text);
    }
    )*
    { foundArgs.size()==define.size()-1 }? // better have right amount
    ')'
| { !((define!=null) && (define.size()>1)) }?
)

在antlr v3的lexer中,需要改为:

{testExpression}?=>

(好像叫做gated Synmatic Predicate

的形式:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
( {(define!=null) && (define.size()>1)}?=> (WS|COMMENT)?
    // take in arguments if macro call requires them
    '('
    callArg0=EXPR
    {
        callArg0Text = callArg0.getText();
        foundArgs.add(callArg0Text);
    }
    ( COMMA callArg1=EXPR
    {
        callArg1Text = callArg1.getText();
        foundArgs.add(callArg1Text);
    }
    )*
    { foundArgs.size()==define.size()-1 }? // better have right amount
    ')'
| {!((define!=null) && (define.size()>1))}?=>
)

然后才可以真正实现,选择性的匹配对应的内容。

转载请注明:在路上 » 【已解决】antlr v3的lexer的条件性匹配

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
89 queries in 0.367 seconds, using 22.21MB memory