【背景】
折腾:
【记录】将antlr v2的C/C++的preprocess,即cpp.g,转换为antlr v3
期间,参考之前antlr v2的代码:
IDENTIFIER @init{ List define = new ArrayList(); List foundArgs = new ArrayList(); String callArg0Text = ""; String callArg1Text = ""; } : identifier=RAW_IDENTIFIER { // see if this is a macro argument define = (List)defineArgs.get(identifier.getText()); if (define==null) { // see if this is a macro call define = (List)defines.get(identifier.getText()); } } ( { (define!=null) && (define.size()>1) }? (WS|COMMENT)? // take in arguments if macro call requires them '(' callArg0=EXPR { callArg0Text = callArg0.getText(); foundArgs.add(callArg0Text); } ( COMMA callArg1=EXPR { callArg1Text = callArg1.getText(); foundArgs.add(callArg1Text); } )* { foundArgs.size()==define.size()-1 }? // better have right amount ')' | { !((define!=null) && (define.size()>1)) }? )
去实现匹配define被调用的时候或者是普通的ID。
其中,后来看懂了,是通过:
{ (define!=null) && (define.size()>1) }?
去实现,条件性的匹配的,即当define不为空,且size大于1,然后才继续后面的匹配:
(WS|COMMENT)? // take in arguments if macro call requires them '(' callArg0=EXPR { callArg0Text = callArg0.getText(); foundArgs.add(callArg0Text); } ( COMMA callArg1=EXPR { callArg1Text = callArg1.getText(); foundArgs.add(callArg1Text); } )* { foundArgs.size()==define.size()-1 }? // better have right amount ')'
而如果不满足该条件,则才匹配或运算符’|’后面的:
{ !((define!=null) && (define.size()>1)) }?
【解决过程】
1.所以,问题转化为,在antlr v3中,如何实现lexer中的条件性匹配。
2.这人:
遇到的问题,和我这里类似。
虽然没有直接的参考答案,但是其中提到了:
- ({boolExpr}?):叫做消除二义性/验证性的语法预测disambiguating/validating semantic predicate
- ({boolExpr}?=>):才是所需要的gated semantic predicate
其中的代码:
fragment VERSION_COMMENT_TAIL[bool matches_version]: {!matches_version}? => ( options { greedy = false; }: . )* '*' '/' { $type = MULTILINE_COMMENT; $channel = 98; } | { $type = VERSION_COMMENT; $channel = 98; } ;
给了提示,说明是
{xxx}? => yyy{do_A} | {do_B}
的形式。
和此处很类似。
3.对于此,官网:
http://www.antlr2.org/doc/lexer.html
即antlr v2中的相关解释是:
DEFINE : {getColumn()==1}? "#define" ID ; Semantic predicates on the left-edge of single-alternative lexical rules get hoisted into the nextToken prediction mechanism. Adding the predicate to a rule makes it so that it is not a candidate for recognition until the predicate evaluates to true. In this case, the method for DEFINE would never be entered, even if the lookahead predicted #define, if the column > 1. |
也是符合预期的,即:
对于
{xxx}? => yyy{do_A}
中的表达式xxx,如果xxx不满足的话,则是不会去匹配对应的内容的。且一直不会去匹配的,直到找到匹配的。
而不是原先所要的效果:
希望当xxx不满足,则就不去匹配 -> 而去匹配或者关系后面的内容。
4.也参考了antlr v4的官网:
|
和:
|
但是还是没有理解透彻。
因为在antlr v3中,对应的语法:
( { (define!=null) && (define.size()>1) }? (WS|COMMENT)?
所产生的java代码是:
switch (alt18) { case 1 : // D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:174:7: {...}? ( WS | COMMENT )? '(' callArg0= EXPR ( COMMA callArg1= EXPR )* {...}? ')' { if ( !(( (define!=null) && (define.size()>1) )) ) { throw new FailedPredicateException(input, "IDENTIFIER", " (define!=null) && (define.size()>1) "); } // D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:174:48: ( WS | COMMENT )? int alt16=3; int LA16_0 = input.LA(1);
很明显,是一旦遇到,不满足此处判断:
(define!=null) && (define.size()>1) |
就会抛出异常,而不会继续执行下去的,
不会像预期的,继续去判断和匹配,或运算符’|’后面的内容:
{ !((define!=null) && (define.size()>1)) }? |
的。所以很是奇怪。
5.去把两者顺序调换一下,变为:
( { !((define!=null) && (define.size()>1)) }? | { (define!=null) && (define.size()>1) }? (WS|COMMENT)? // take in arguments if macro call requires them '(' callArg0=EXPR { callArg0Text = callArg0.getText(); foundArgs.add(callArg0Text); } ( COMMA callArg1=EXPR { callArg1Text = callArg1.getText(); foundArgs.add(callArg1Text); } )* { foundArgs.size()==define.size()-1 }? // better have right amount ')' )
试试效果,结果还是无法解决问题。还是原先的效果:
虽然可以跳过了:
switch (alt18) { case 1 : // D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:174:7: {...}? { if ( !(( !((define!=null) && (define.size()>1)) )) ) { throw new FailedPredicateException(input, "IDENTIFIER", " !((define!=null) && (define.size()>1)) "); } } break;
但是对于后面的代码:
if (define!=null) { String defineText = (String)define.get(0); if (define.size()==1) { //only have one value in list -> the defineText is the define para content -> just need replace directly setText(defineText); } else { //add new dict pair: (para, call value) for (int i=0;i<foundArgs.size();++i) { // treat macro arguments similar to local defines List arg = new ArrayList(); arg.add((String)foundArgs.get(i)); defineArgs.put( (String)define.get(1+i), arg ); } // save current lexer's state SaveStruct ss = new SaveStruct(input); includes.push(ss); // switch on new input stream setCharStream(new ANTLRStringStream(defineText)); reset(); } }
还是无法执行,因为define的确是null。
所以,还是暂时没解决,antlr v3的选择性匹配的问题。
6.参考:
Forcing an alternative in ANTLR lexer rule
去改为 => 的格式的语法:
({ (define!=null) && (define.size()>1) }?=> (WS|COMMENT)? // take in arguments if macro call requires them '(' callArg0=EXPR { callArg0Text = callArg0.getText(); foundArgs.add(callArg0Text); } ( COMMA callArg1=EXPR { callArg1Text = callArg1.getText(); foundArgs.add(callArg1Text); } )* { foundArgs.size()==define.size()-1 }? // better have right amount ')' | { !((define!=null) && (define.size()>1)) }?=> )
试试,结果生成的代码还是:
if ( ((LA18_0 >= '\t' && LA18_0 <= '\n')||LA18_0=='\r'||LA18_0==' '||LA18_0=='('||LA18_0=='/') && (((define!=null) && (define.size()>1)))) { alt18=1; } switch (alt18) { case 1 : // D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:174:7: {...}? => ( WS | COMMENT )? '(' callArg0= EXPR ( COMMA callArg1= EXPR )* {...}? ')' { if ( !(((define!=null) && (define.size()>1))) ) { throw new FailedPredicateException(input, "IDENTIFIER", "(define!=null) && (define.size()>1)"); } ...... if ( !(( foundArgs.size()==define.size()-1 )) ) { throw new FailedPredicateException(input, "IDENTIFIER", " foundArgs.size()==define.size()-1 "); } match(')'); } break; case 2 : // D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:190:7: {...}? => { if ( !(( !((define!=null) && (define.size()>1)) )) ) { throw new FailedPredicateException(input, "IDENTIFIER", " !((define!=null) && (define.size()>1)) "); } } break; }
很明显,还是会抛异常的。
截止目前,感觉貌似对于上述的semantic predicate,貌似只是antlr v2有效?
而对于antlr v3是含义变了, 变成了预测了 -> 不符合条件,就抛异常 ???
7.后来参考:
http://www.egtry.com/tools/antlr/gated_semantic_predicate
其例子:
Example 2give a sequence of digits, the first digit states how many digits to take next. antlr grammar@init { int len=0; int count=0; } : d1=DIGIT {len=Integer.parseInt($d1.text); System.out.println("size of the following digits: "+len);} ( { count< len }?=> d2=DIGIT {count++;System.out.println("element: "+$d2.text);} )+ (d3=DIGIT {System.out.println("Remaining Digit: "+$d3.text);})* '\r'? '\n' ; DIGIT: '0' .. '9'; input example3123888 Outputsize of the following digits: 3 element: 1 element: 2 element: 3 Remaining Digit: 8 Remaining Digit: 8 Remaining Digit: 8 |
很明显,就是我们所希望的效果:
可以条件性的判断,然后执行不同的语句,即不会当条件不符合,就乱抛异常的。
所以,既然人家的可以正常执行,那么就先去测试该语法,生成的代码是否是预期的,不带乱跑异常的。
测试代码为:
grammar gatedSynmaticPredicateDemo; options{ language=Java; output = AST; } parseInput @init { int len=0; int count=0; } : d1=DIGIT {len=Integer.parseInt($d1.text); System.out.println("size of the following digits: "+len);} ( { count< len }?=> d2=DIGIT {count++;System.out.println("element: "+$d2.text);} )+ (d3=DIGIT {System.out.println("Remaining Digit: "+$d3.text);})* '\r'? '\n' ; DIGIT: '0' .. '9';
然后是找到生成的代码了:
while (true) { int alt1=2; int LA1_0 = input.LA(1); if ( (LA1_0==DIGIT) ) { int LA1_1 = input.LA(2); if ( (( count< len )) ) { alt1=1; } } switch (alt1) { case 1 : // D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\gatedSynmaticPredicateDemo\\gatedSynmaticPredicateDemo.g:15:5: {...}? =>d2= DIGIT { if ( !(( count< len )) ) { throw new FailedPredicateException(input, "parseInput", " count< len "); } d2=(Token)match(input,DIGIT,FOLLOW_DIGIT_in_parseInput53); d2_tree = (Object)adaptor.create(d2); adaptor.addChild(root_0, d2_tree); count++;System.out.println("element: "+(d2!=null?d2.getText():null)); } break;
但是是在gatedSynmaticPredicateDemoParser.java中,而不是Lexer.java中找到的。
并且测试结果是正常的:
但是很明显,此处的gated Synmatic Predicate,是写在parse中的,而不是lexer中的。
8.再参考:
[antlr-interest] Semantic Predicates in a Lexer
好像,应该在parser中使用gated Synmatic Predicate。
9.但是,此处,真正去运行上述的语法:
( {(define!=null) && (define.size()>1)}?=> (WS|COMMENT)? // take in arguments if macro call requires them '(' callArg0=EXPR { callArg0Text = callArg0.getText(); foundArgs.add(callArg0Text); } ( COMMA callArg1=EXPR { callArg1Text = callArg1.getText(); foundArgs.add(callArg1Text); } )* { foundArgs.size()==define.size()-1 }? // better have right amount ')' | {!((define!=null) && (define.size()>1))}?=> )
所产生的代码:
// D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:174:5: ({...}? => ( WS | COMMENT )? '(' callArg0= EXPR ( COMMA callArg1= EXPR )* {...}? ')' |{...}? =>) int alt18=2; int LA18_0 = input.LA(1); if ( ((LA18_0 >= '\t' && LA18_0 <= '\n')||LA18_0=='\r'||LA18_0==' '||LA18_0=='('||LA18_0=='/') && (((define!=null) && (define.size()>1)))) { alt18=1; } switch (alt18) { case 1 : // D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:174:7: {...}? => ( WS | COMMENT )? '(' callArg0= EXPR ( COMMA callArg1= EXPR )* {...}? ')' { if ( !(((define!=null) && (define.size()>1))) ) { throw new FailedPredicateException(input, "IDENTIFIER", "(define!=null) && (define.size()>1)"); } ...... if ( !(( foundArgs.size()==define.size()-1 )) ) { throw new FailedPredicateException(input, "IDENTIFIER", " foundArgs.size()==define.size()-1 "); } match(')'); } break; case 2 : // D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:190:7: {...}? => { if ( !((!((define!=null) && (define.size()>1)))) ) { throw new FailedPredicateException(input, "IDENTIFIER", "!((define!=null) && (define.size()>1))"); } } break; } if (define!=null) { String defineText = (String)define.get(0); if (define.size()==1) { //only have one value in list -> the defineText is the define para content -> just need replace directly setText(defineText); } else { //add new dict pair: (para, call value) for (int i=0;i<foundArgs.size();++i) { // treat macro arguments similar to local defines List arg = new ArrayList(); arg.add((String)foundArgs.get(i)); defineArgs.put( (String)define.get(1+i), arg ); } // save current lexer's state SaveStruct ss = new SaveStruct(input); includes.push(ss); // switch on new input stream setCharStream(new ANTLRStringStream(defineText)); reset(); } }
结果是,我打了几处的断点:
真的是没有执行到,即没有抛异常了。
然后执行到了,真正要执行的代码的部分:
【总结】
antlr v2的lexer中通过
{testExpression}? |
(好像叫做validating semantic predicate)
的方式去实现选择性匹配的代码:
( { (define!=null) && (define.size()>1) }? (WS|COMMENT)? // take in arguments if macro call requires them '(' callArg0=EXPR { callArg0Text = callArg0.getText(); foundArgs.add(callArg0Text); } ( COMMA callArg1=EXPR { callArg1Text = callArg1.getText(); foundArgs.add(callArg1Text); } )* { foundArgs.size()==define.size()-1 }? // better have right amount ')' | { !((define!=null) && (define.size()>1)) }? )
在antlr v3的lexer中,需要改为:
{testExpression}?=> |
(好像叫做gated Synmatic Predicate)
的形式:
( {(define!=null) && (define.size()>1)}?=> (WS|COMMENT)? // take in arguments if macro call requires them '(' callArg0=EXPR { callArg0Text = callArg0.getText(); foundArgs.add(callArg0Text); } ( COMMA callArg1=EXPR { callArg1Text = callArg1.getText(); foundArgs.add(callArg1Text); } )* { foundArgs.size()==define.size()-1 }? // better have right amount ')' | {!((define!=null) && (define.size()>1))}?=> )
然后才可以真正实现,选择性的匹配对应的内容。
转载请注明:在路上 » 【已解决】antlr v3的lexer的条件性匹配