【背景】
折腾:
【记录】将antlr v2的C/C++的preprocess,即cpp.g,转换为antlr v3
期间,参考之前antlr v2的代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | IDENTIFIER @init{ List define = new ArrayList(); List foundArgs = new ArrayList(); String callArg0Text = "" ; String callArg1Text = "" ; } : identifier=RAW_IDENTIFIER { // see if this is a macro argument define = (List)defineArgs.get(identifier.getText()); if (define== null ) { // see if this is a macro call define = (List)defines.get(identifier.getText()); } } ( { (define!= null ) && (define.size()>1) }? (WS|COMMENT)? // take in arguments if macro call requires them '(' callArg0=EXPR { callArg0Text = callArg0.getText(); foundArgs.add(callArg0Text); } ( COMMA callArg1=EXPR { callArg1Text = callArg1.getText(); foundArgs.add(callArg1Text); } )* { foundArgs.size()==define.size()-1 }? // better have right amount ')' | { !((define!= null ) && (define.size()>1)) }? ) |
去实现匹配define被调用的时候或者是普通的ID。
其中,后来看懂了,是通过:
1 | { (define!= null ) && (define.size()>1) }? |
去实现,条件性的匹配的,即当define不为空,且size大于1,然后才继续后面的匹配:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | (WS|COMMENT)? // take in arguments if macro call requires them '(' callArg0=EXPR { callArg0Text = callArg0.getText(); foundArgs.add(callArg0Text); } ( COMMA callArg1=EXPR { callArg1Text = callArg1.getText(); foundArgs.add(callArg1Text); } )* { foundArgs.size()==define.size()-1 }? // better have right amount ')' |
而如果不满足该条件,则才匹配或运算符’|’后面的:
1 | { !((define!= null ) && (define.size()>1)) }? |
【解决过程】
1.所以,问题转化为,在antlr v3中,如何实现lexer中的条件性匹配。
2.这人:
遇到的问题,和我这里类似。
虽然没有直接的参考答案,但是其中提到了:
- ({boolExpr}?):叫做消除二义性/验证性的语法预测disambiguating/validating semantic predicate
- ({boolExpr}?=>):才是所需要的gated semantic predicate
其中的代码:
1 2 3 4 | fragment VERSION_COMMENT_TAIL[bool matches_version]: {!matches_version}? => ( options { greedy = false ; }: . )* '*' '/' { $type = MULTILINE_COMMENT; $channel = 98; } | { $type = VERSION_COMMENT; $channel = 98; } ; |
给了提示,说明是
{xxx}? => yyy{do_A} | {do_B}
的形式。
和此处很类似。
3.对于此,官网:
http://www.antlr2.org/doc/lexer.html
即antlr v2中的相关解释是:
DEFINE : {getColumn()==1}? "#define" ID ; Semantic predicates on the left-edge of single-alternative lexical rules get hoisted into the nextToken prediction mechanism. Adding the predicate to a rule makes it so that it is not a candidate for recognition until the predicate evaluates to true. In this case, the method for DEFINE would never be entered, even if the lookahead predicted #define, if the column > 1. |
也是符合预期的,即:
对于
{xxx}? => yyy{do_A}
中的表达式xxx,如果xxx不满足的话,则是不会去匹配对应的内容的。且一直不会去匹配的,直到找到匹配的。
而不是原先所要的效果:
希望当xxx不满足,则就不去匹配 -> 而去匹配或者关系后面的内容。
4.也参考了antlr v4的官网:
|
和:
|
但是还是没有理解透彻。
因为在antlr v3中,对应的语法:
1 | ( { (define!= null ) && (define.size()>1) }? (WS|COMMENT)? |
所产生的java代码是:
1 2 3 4 5 6 7 8 9 10 | switch (alt18) { case 1 : // D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:174:7: {...}? ( WS | COMMENT )? '(' callArg0= EXPR ( COMMA callArg1= EXPR )* {...}? ')' { if ( !(( (define!= null ) && (define.size()> 1 ) )) ) { throw new FailedPredicateException(input, "IDENTIFIER" , " (define!=null) && (define.size()>1) " ); } // D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:174:48: ( WS | COMMENT )? int alt16= 3 ; int LA16_0 = input.LA( 1 ); |
很明显,是一旦遇到,不满足此处判断:
(define!=null) && (define.size()>1) |
就会抛出异常,而不会继续执行下去的,
不会像预期的,继续去判断和匹配,或运算符’|’后面的内容:
{ !((define!=null) && (define.size()>1)) }? |
的。所以很是奇怪。
5.去把两者顺序调换一下,变为:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | ( { !((define!= null ) && (define.size()>1)) }? | { (define!= null ) && (define.size()>1) }? (WS|COMMENT)? // take in arguments if macro call requires them '(' callArg0=EXPR { callArg0Text = callArg0.getText(); foundArgs.add(callArg0Text); } ( COMMA callArg1=EXPR { callArg1Text = callArg1.getText(); foundArgs.add(callArg1Text); } )* { foundArgs.size()==define.size()-1 }? // better have right amount ')' ) |
试试效果,结果还是无法解决问题。还是原先的效果:
虽然可以跳过了:
1 2 3 4 5 6 7 8 9 | switch (alt18) { case 1 : // D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:174:7: {...}? { if ( !(( !((define!= null ) && (define.size()> 1 )) )) ) { throw new FailedPredicateException(input, "IDENTIFIER" , " !((define!=null) && (define.size()>1)) " ); } } break ; |
但是对于后面的代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | if (define!= null ) { String defineText = (String)define.get( 0 ); if (define.size()== 1 ) { //only have one value in list -> the defineText is the define para content -> just need replace directly setText(defineText); } else { //add new dict pair: (para, call value) for ( int i= 0 ;i<foundArgs.size();++i) { // treat macro arguments similar to local defines List arg = new ArrayList(); arg.add((String)foundArgs.get(i)); defineArgs.put( (String)define.get( 1 +i), arg ); } // save current lexer's state SaveStruct ss = new SaveStruct(input); includes.push(ss); // switch on new input stream setCharStream( new ANTLRStringStream(defineText)); reset(); } } |
还是无法执行,因为define的确是null。
所以,还是暂时没解决,antlr v3的选择性匹配的问题。
6.参考:
Forcing an alternative in ANTLR lexer rule
去改为 => 的格式的语法:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | ({ (define!= null ) && (define.size()>1) }?=> (WS|COMMENT)? // take in arguments if macro call requires them '(' callArg0=EXPR { callArg0Text = callArg0.getText(); foundArgs.add(callArg0Text); } ( COMMA callArg1=EXPR { callArg1Text = callArg1.getText(); foundArgs.add(callArg1Text); } )* { foundArgs.size()==define.size()-1 }? // better have right amount ')' | { !((define!= null ) && (define.size()>1)) }?=> ) |
试试,结果生成的代码还是:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | if ( ((LA18_0 >= '\t' && LA18_0 <= '\n' )||LA18_0== '\r' ||LA18_0== ' ' ||LA18_0== '(' ||LA18_0== '/' ) && (((define!= null ) && (define.size()> 1 )))) { alt18= 1 ; } switch (alt18) { case 1 : // D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:174:7: {...}? => ( WS | COMMENT )? '(' callArg0= EXPR ( COMMA callArg1= EXPR )* {...}? ')' { if ( !(((define!= null ) && (define.size()> 1 ))) ) { throw new FailedPredicateException(input, "IDENTIFIER" , "(define!=null) && (define.size()>1)" ); } ...... if ( !(( foundArgs.size()==define.size()- 1 )) ) { throw new FailedPredicateException(input, "IDENTIFIER" , " foundArgs.size()==define.size()-1 " ); } match( ')' ); } break ; case 2 : // D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:190:7: {...}? => { if ( !(( !((define!= null ) && (define.size()> 1 )) )) ) { throw new FailedPredicateException(input, "IDENTIFIER" , " !((define!=null) && (define.size()>1)) " ); } } break ; } |
很明显,还是会抛异常的。
截止目前,感觉貌似对于上述的semantic predicate,貌似只是antlr v2有效?
而对于antlr v3是含义变了, 变成了预测了 -> 不符合条件,就抛异常 ???
7.后来参考:
http://www.egtry.com/tools/antlr/gated_semantic_predicate
其例子:
Example 2give a sequence of digits, the first digit states how many digits to take next. antlr grammar@init { int len=0; int count=0; } : d1=DIGIT {len=Integer.parseInt($d1.text); System.out.println("size of the following digits: "+len);} ( { count< len }?=> d2=DIGIT {count++;System.out.println("element: "+$d2.text);} )+ (d3=DIGIT {System.out.println("Remaining Digit: "+$d3.text);})* '\r'? '\n' ; DIGIT: '0' .. '9'; input example3123888 Outputsize of the following digits: 3 element: 1 element: 2 element: 3 Remaining Digit: 8 Remaining Digit: 8 Remaining Digit: 8 |
很明显,就是我们所希望的效果:
可以条件性的判断,然后执行不同的语句,即不会当条件不符合,就乱抛异常的。
所以,既然人家的可以正常执行,那么就先去测试该语法,生成的代码是否是预期的,不带乱跑异常的。
测试代码为:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | grammar gatedSynmaticPredicateDemo; options{ language=Java; output = AST; } parseInput @init { int len=0; int count=0; } : d1=DIGIT {len=Integer.parseInt($d1.text); System.out.println( "size of the following digits: " +len);} ( { count< len }?=> d2=DIGIT {count++;System.out.println( "element: " +$d2.text);} )+ (d3=DIGIT {System.out.println( "Remaining Digit: " +$d3.text);})* '\r' ? '\n' ; DIGIT: '0' .. '9' ; |
然后是找到生成的代码了:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | while ( true ) { int alt1= 2 ; int LA1_0 = input.LA( 1 ); if ( (LA1_0==DIGIT) ) { int LA1_1 = input.LA( 2 ); if ( (( count< len )) ) { alt1= 1 ; } } switch (alt1) { case 1 : // D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\gatedSynmaticPredicateDemo\\gatedSynmaticPredicateDemo.g:15:5: {...}? =>d2= DIGIT { if ( !(( count< len )) ) { throw new FailedPredicateException(input, "parseInput" , " count< len " ); } d2=(Token)match(input,DIGIT,FOLLOW_DIGIT_in_parseInput53); d2_tree = (Object)adaptor.create(d2); adaptor.addChild(root_0, d2_tree); count++;System.out.println( "element: " +(d2!= null ?d2.getText(): null )); } break ; |
但是是在gatedSynmaticPredicateDemoParser.java中,而不是Lexer.java中找到的。
并且测试结果是正常的:
但是很明显,此处的gated Synmatic Predicate,是写在parse中的,而不是lexer中的。
8.再参考:
[antlr-interest] Semantic Predicates in a Lexer
好像,应该在parser中使用gated Synmatic Predicate。
9.但是,此处,真正去运行上述的语法:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | ( {(define!= null ) && (define.size()>1)}?=> (WS|COMMENT)? // take in arguments if macro call requires them '(' callArg0=EXPR { callArg0Text = callArg0.getText(); foundArgs.add(callArg0Text); } ( COMMA callArg1=EXPR { callArg1Text = callArg1.getText(); foundArgs.add(callArg1Text); } )* { foundArgs.size()==define.size()-1 }? // better have right amount ')' | {!((define!= null ) && (define.size()>1))}?=> ) |
所产生的代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | // D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:174:5: ({...}? => ( WS | COMMENT )? '(' callArg0= EXPR ( COMMA callArg1= EXPR )* {...}? ')' |{...}? =>) int alt18= 2 ; int LA18_0 = input.LA( 1 ); if ( ((LA18_0 >= '\t' && LA18_0 <= '\n' )||LA18_0== '\r' ||LA18_0== ' ' ||LA18_0== '(' ||LA18_0== '/' ) && (((define!= null ) && (define.size()> 1 )))) { alt18= 1 ; } switch (alt18) { case 1 : // D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:174:7: {...}? => ( WS | COMMENT )? '(' callArg0= EXPR ( COMMA callArg1= EXPR )* {...}? ')' { if ( !(((define!= null ) && (define.size()> 1 ))) ) { throw new FailedPredicateException(input, "IDENTIFIER" , "(define!=null) && (define.size()>1)" ); } ...... if ( !(( foundArgs.size()==define.size()- 1 )) ) { throw new FailedPredicateException(input, "IDENTIFIER" , " foundArgs.size()==define.size()-1 " ); } match( ')' ); } break ; case 2 : // D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:190:7: {...}? => { if ( !((!((define!= null ) && (define.size()> 1 )))) ) { throw new FailedPredicateException(input, "IDENTIFIER" , "!((define!=null) && (define.size()>1))" ); } } break ; } if (define!= null ) { String defineText = (String)define.get( 0 ); if (define.size()== 1 ) { //only have one value in list -> the defineText is the define para content -> just need replace directly setText(defineText); } else { //add new dict pair: (para, call value) for ( int i= 0 ;i<foundArgs.size();++i) { // treat macro arguments similar to local defines List arg = new ArrayList(); arg.add((String)foundArgs.get(i)); defineArgs.put( (String)define.get( 1 +i), arg ); } // save current lexer's state SaveStruct ss = new SaveStruct(input); includes.push(ss); // switch on new input stream setCharStream( new ANTLRStringStream(defineText)); reset(); } } |
结果是,我打了几处的断点:
真的是没有执行到,即没有抛异常了。
然后执行到了,真正要执行的代码的部分:
【总结】
antlr v2的lexer中通过
{testExpression}? |
(好像叫做validating semantic predicate)
的方式去实现选择性匹配的代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | ( { (define!= null ) && (define.size()>1) }? (WS|COMMENT)? // take in arguments if macro call requires them '(' callArg0=EXPR { callArg0Text = callArg0.getText(); foundArgs.add(callArg0Text); } ( COMMA callArg1=EXPR { callArg1Text = callArg1.getText(); foundArgs.add(callArg1Text); } )* { foundArgs.size()==define.size()-1 }? // better have right amount ')' | { !((define!= null ) && (define.size()>1)) }? ) |
在antlr v3的lexer中,需要改为:
{testExpression}?=> |
(好像叫做gated Synmatic Predicate)
的形式:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | ( {(define!= null ) && (define.size()>1)}?=> (WS|COMMENT)? // take in arguments if macro call requires them '(' callArg0=EXPR { callArg0Text = callArg0.getText(); foundArgs.add(callArg0Text); } ( COMMA callArg1=EXPR { callArg1Text = callArg1.getText(); foundArgs.add(callArg1Text); } )* { foundArgs.size()==define.size()-1 }? // better have right amount ')' | {!((define!= null ) && (define.size()>1))}?=> ) |
然后才可以真正实现,选择性的匹配对应的内容。
转载请注明:在路上 » 【已解决】antlr v3的lexer的条件性匹配