【问题】
折腾:
【基本解决】antlr v3中包含{skip();}的语法,调试解析时出错:org.antlr.runtime.EarlyExitException
的过程中,把语法改为:
BLANKS : (' '|'\t')+ {$channel=HIDDEN;}; startParse : manufacture deviceType deviceRevison ddRevision; manufacture : 'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
结果,虽然是可以正常识别数值了,但是却又出现了MissingTokenException:
【解决过程】
1.此处,很明显,还是没有完全搞懂:
{skip();}
和
{$channel=HIDDEN;}
的语法的含义。
2.参考:
cannot debug simple channel flag in ANTLR with Eclipse
没啥帮助,其是把
{$channel = HIDDEN;}
误写成:
($channel = HIDDEN;)
了。我此处不存在这等语法问题。
3。参考:
看起来像是,如果本身语法写的不好,变成:
不是context-free
那么就会导致此类问题。
所以,再回去,检查一下语法,看看自己能否看出一些端倪。
4.改为:
//BLANKS : (' '|'\t')+ {$channel=HIDDEN;}; BLANKS : ' '+ {$channel=HIDDEN;}; startParse : manufacture deviceType deviceRevison ddRevision; manufacture : 'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
试试,结果错误依旧,还是MissingTokenException。
5.怀疑,现在是
BLANK+
和
BLANKS,两者冲突了,所以,去把现在的:
BLANK : (' '|'\t') {$channel=HIDDEN;}; //BLANKS : (' '|'\t')+ {$channel=HIDDEN;}; BLANKS : ' '+ {$channel=HIDDEN;}; //startParse : include* identification+; //startParse : include+ identification+; //startParse : identification+; startParse : manufacture deviceType deviceRevison ddRevision; manufacture : 'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*; //manufacture : 'MANUFACTURER'^ (BLANK+! (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*; deviceType : 'DEVICE_TYPE'^ BLANK+ (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*; deviceRevison : 'DEVICE_REVISION'^ BLANK+ (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*; ddRevision : 'DD_REVISION'^ BLANK+ (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;
改为:
//BLANK : (' '|'\t') {$channel=HIDDEN;}; //BLANKS : (' '|'\t')+ {$channel=HIDDEN;}; BLANKS : ' '+ {$channel=HIDDEN;}; //startParse : include* identification+; //startParse : include+ identification+; //startParse : identification+; startParse : manufacture deviceType deviceRevison ddRevision; manufacture : 'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*; //manufacture : 'MANUFACTURER'^ (BLANK+! (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*; deviceType : 'DEVICE_TYPE'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*; deviceRevison : 'DEVICE_REVISION'^ BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*; ddRevision : 'DD_REVISION'^ BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;
试试,结果错误依旧。
6.后来,倒是,改为skip的形式:
//BLANKS : (' '|'\t')+ {$channel=HIDDEN;}; BLANKS : (' '|'\t')+ {skip();}; //BLANKS : ' '+ {$channel=HIDDEN;}; //startParse : include* identification+; //startParse : include+ identification+; //startParse : identification+; startParse : manufacture deviceType deviceRevison ddRevision; manufacture : 'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*; //manufacture : 'MANUFACTURER'^ (BLANK+! (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*; deviceType : 'DEVICE_TYPE'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*; deviceRevison : 'DEVICE_REVISION'^ BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*; ddRevision : 'DD_REVISION'^ BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;
结果也是,错误依旧。
7.把中间的空格去掉,变成:
//BLANKS : (' '|'\t')+ {$channel=HIDDEN;}; BLANKS : (' '|'\t')+ {skip();}; //BLANKS : ' '+ {$channel=HIDDEN;}; //startParse : include* identification+; //startParse : include+ identification+; //startParse : identification+; startParse : manufacture deviceType deviceRevison ddRevision; manufacture : 'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*; //manufacture : 'MANUFACTURER'^ (BLANK+! (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*; deviceType : 'DEVICE_TYPE'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*; deviceRevison : 'DEVICE_REVISION'^ BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*; ddRevision : 'DD_REVISION'^ BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;
试试,结果错误依旧。
说明不是写语法时候的多余的空格或tab引起的。
8.难道是,前面的语法中的
DIGIT和HEX_DIGIT有冲突?
对应的定义是:
fragment DIGIT : '0'..'9'; //FAKE_TOKEN : '1' '2' '3'; /* DECIMAL_VALUE : '1'..'9' DIGIT*; */ DECIMAL_VALUE : DIGIT*; HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;
那么就去掉重复的定义,改为:
fragment DIGIT : '0'..'9'; //FAKE_TOKEN : '1' '2' '3'; /* DECIMAL_VALUE : '1'..'9' DIGIT*; */ DECIMAL_VALUE : DIGIT*; //HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ; HEX_DIGIT : (DIGIT|'a'..'f'|'A'..'F') ; HEX_VALUE : '0x' HEX_DIGIT+;
试试,结果错误依旧。
9.参考了:
[antlr-interest] C Runtime problem with $channel=HIDDEN and SKIP()
难道是,此处的Java版本的,
{$channel=HIDDEN;}
也是有bug,所以才导致MissingTokenException的?
10.后来找到此MissingTokenException错误,是3.1版本中新加的:
为了更好的提供错误的详细信息的。
11.再去改为:
//BLANKS : (' '|'\t')+ {$channel=HIDDEN;}; BLANKS : ((' '|'\t')+) {$channel=HIDDEN;};
试试,结果错误依旧。
12.后来仔细去查看了一下,关于MissingTokenException的错误的产生的过程:
觉得,好像MissingTokenException的产生,是多次检索此处的值0x1E6D11之后,而产生的。
换句话说,好像此处的MissingTokenException,和前面的
BLANKS : (' '|'\t')+ {$channel=HIDDEN;};
没啥关系,而是和后面的语法有关系。
所以,就去研究看看后面的语法:
(DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
是不是哪里写的不妥。
13.先把感叹号去掉:
//manufacture : 'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*; //manufacture : 'MANUFACTURER'^ (BLANK+! (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*; manufacture : 'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) ','? WS*;
试试,结果错误依旧。
14.怀疑是不是DECIMAL_VALUE或者HEX_VALUE写的有问题。
所以去改为:
//DECIMAL_VALUE : DIGIT*; DECIMAL_VALUE : DIGIT+;
试试,结果错误依旧。
15.去把HEX_VALUE和DECIMAL_VALUE顺序换一个:
//manufacture : 'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) ','? WS*; manufacture : 'MANUFACTURER'^ BLANKS (HEX_VALUE | DECIMAL_VALUE) ','? WS*;
试试,结果错误依旧。
16.把WS的skip换为hidden:
//fragment WS : ( ' ' | '\t' | '\r' | '\n') {skip();}; fragment WS : ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
试试,结果直接出错:
[14:05:28] D:\DevRoot\IndustrialMobileAutomation\HandheldDataSetter\ANTLR\projects\v1.5\DDParserDemo\output\DDParserDemoLexer.java:593: error: cannot find symbol [14:05:28] ^ [14:05:28] symbol: variable _channel [14:05:28] location: class DDParserDemoLexer [14:05:28] 1 error |
17.所以再把fragment去掉:
//fragment WS : ( ' ' | '\t' | '\r' | '\n') {skip();}; //fragment WS : ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;}; WS : ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
试试,结果错误依旧,还是MissingTokenException。
18.再去仔细研究后发现, 好像还是,在识别数字0x1E6D11之前,发生的MissingTokenException,所以,还是要去折腾BLANKS。
改为:
//BLANKS : (' '|'\t')+ {$channel=HIDDEN;}; BLANKS : (' '|'\t')+;
试试,结果,最终,才算是,正常识别空格:
但是很是诡异的是,为何,此处无法给多个空格,添加对应的skip()或hidden呢?
19.所以,再去把BLANKS改为BLANK,同时添加hidden:
BLANK : (' '|'\t') {$channel=HIDDEN;}; //BLANKS : (' '|'\t')+ {skip();}; //BLANKS : ' '+ {$channel=HIDDEN;}; //startParse : include* identification+; //startParse : include+ identification+; //startParse : identification+; startParse : manufacture deviceType deviceRevison ddRevision; //manufacture : 'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*; //manufacture : 'MANUFACTURER'^ (BLANK+! (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*; //manufacture : 'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) ','? WS*; manufacture : 'MANUFACTURER'^ BLANK+ (HEX_VALUE | DECIMAL_VALUE) ','? WS*; deviceType : 'DEVICE_TYPE'^ BLANK+ (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*; deviceRevison : 'DEVICE_REVISION'^ BLANK+ (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*; ddRevision : 'DD_REVISION'^ BLANK+ (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;
试试,结果又回到了开始的那个org.antlr.runtime.EarlyExitException的错误了:
所以,没法这么用。
20.然后再去试试skip:
//BLANK : (' '|'\t') {$channel=HIDDEN;}; BLANK : (' '|'\t') {skip();};
结果报错:
[14:27:15] error(208): DDParserDemo.g:119:1: The following token definitions can never be matched because prior tokens match the same input: BLANK
去看了下,应该是,已经有的WS,同样匹配此等输入了,所以,去改为:
/* BLANKSPACE_TAB // : (' ' | '\t'){skip();}; : (' ' | '\t') {$channel=HIDDEN;}; */ //fragment BLANK : (' '|'\t')+ {skip();}; //BLANK : (' '|'\t') {skip();}; //BLANK : (' '|'\t'); //BLANK : (' '|'\t') {$channel=HIDDEN;}; //BLANKS : (' '|'\t')+ {$channel=HIDDEN;}; //BLANKS : (' '|'\t')+ {$channel=HIDDEN;}; //BLANKS : (' '|'\t')+; //BLANK : (' '|'\t') {$channel=HIDDEN;}; //BLANK : (' '|'\t') {skip();}; //BLANKS : (' '|'\t')+ {skip();}; //BLANKS : ' '+ {$channel=HIDDEN;}; //startParse : include* identification+; //startParse : include+ identification+; //startParse : identification+; startParse : manufacture deviceType deviceRevison ddRevision; //manufacture : 'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*; //manufacture : 'MANUFACTURER'^ (BLANK+! (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*; //manufacture : 'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) ','? WS*; manufacture : 'MANUFACTURER'^ WS+ (HEX_VALUE | DECIMAL_VALUE) ','? WS*; deviceType : 'DEVICE_TYPE'^ WS+ (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*; deviceRevison : 'DEVICE_REVISION'^ WS+ (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*; ddRevision : 'DD_REVISION'^ WS+ (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;
试试,结果仍是EarlyExitException的问题。
所以,貌似还是不能在此处使用skip或者hidden。
21.最后,还是通过:
grammar DDParserDemo; options { output = AST; ASTLabelType = CommonTree; // type of $stat.tree ref etc... } //NEWLINE : '\r'? '\n' ; //NEWLINE : '\r' '\n' ; fragment NEWLINE : '\r'? '\n' ; fragment ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* ; fragment FLOAT : ('0'..'9')+ '.' ('0'..'9')* EXPONENT? | '.' ('0'..'9')+ EXPONENT? | ('0'..'9')+ EXPONENT ; COMMENT : '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;} | '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;} ; //fragment WS : ( ' ' | '\t' | '\r' | '\n') {skip();}; //fragment WS : ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;}; WS : ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;}; STRING : '"' ( ESC_SEQ | ~('\\'|'"') )* '"' ; CHAR: '\'' ( ESC_SEQ | ~('\''|'\\') ) '\'' ; fragment EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ; ESC_SEQ : '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\') | UNICODE_ESC | OCTAL_ESC ; fragment OCTAL_ESC : '\\' ('0'..'3') ('0'..'7') ('0'..'7') | '\\' ('0'..'7') ('0'..'7') | '\\' ('0'..'7') ; fragment UNICODE_ESC : '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT ; fragment DIGIT : '0'..'9'; //FAKE_TOKEN : '1' '2' '3'; /* DECIMAL_VALUE : '1'..'9' DIGIT*; */ //DECIMAL_VALUE : DIGIT*; DECIMAL_VALUE : DIGIT+; //HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ; HEX_DIGIT : (DIGIT|'a'..'f'|'A'..'F') ; HEX_VALUE : '0x' HEX_DIGIT+; fragment HEADER_FILENAME : ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'_')*; /* //singleInclude : '#include' ' '+ '"' ID '.h"' ; //singleInclude : '#include' ' '+ '"' ID+ '.h"' ; //singleInclude : '#include' ' '+ '"' HEADER_FILENAME '.h"'; //singleInclude : '#include' ' ' '"' HEADER_FILENAME '.h"'; //singleInclude : '#include "' HEADER_FILENAME '.h"'; //fragment singleInclude : '#include' (' ')+ '"' ID '.h"'; //singleInclude : '#include' (' '|'\t')+ '""' ID '.h"'; //singleInclude : '#include' (' '|'\t')+ '"std_defs.h"'; singleInclude : '#include' (' '|'\t')+ ID '.h'; include : singleInclude WS* -> singleInclude; */ /* BLANKSPACE_TAB // : (' ' | '\t'){skip();}; : (' ' | '\t') {$channel=HIDDEN;}; */ //fragment BLANK : (' '|'\t')+ {skip();}; //BLANK : (' '|'\t') {skip();}; //BLANK : (' '|'\t'); //BLANK : (' '|'\t') {$channel=HIDDEN;}; //BLANKS : (' '|'\t')+ {$channel=HIDDEN;}; //BLANKS : (' '|'\t')+ {$channel=HIDDEN;}; //BLANKS : (' '|'\t')+; //BLANK : (' '|'\t') {$channel=HIDDEN;}; //BLANK : (' '|'\t') {skip();}; BLANKS : (' '|'\t')+; //BLANKS : (' '|'\t')+ {skip();}; //BLANKS : ' '+ {$channel=HIDDEN;}; //startParse : include* identification+; //startParse : include+ identification+; //startParse : identification+; startParse : manufacture deviceType deviceRevison ddRevision; //manufacture : 'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*; //manufacture : 'MANUFACTURER'^ (BLANK+! (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*; //manufacture : 'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) ','? WS*; manufacture : 'MANUFACTURER'^ BLANKS (HEX_VALUE | DECIMAL_VALUE) ','? WS*; deviceType : 'DEVICE_TYPE'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*; deviceRevison : 'DEVICE_REVISION'^ BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*; ddRevision : 'DD_REVISION'^ BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*; //identification : definiton WS* (','?)! WS* -> definiton; //definiton : (ID)^ ('\t'!|' '!)+ (DECIMAL_VALUE | HEX_VALUE) //definiton : (ID)^ BLANKSPACE_TAB+ (DECIMAL_VALUE | HEX_VALUE) //definiton : ID ('\t'!|' '!)+ (DECIMAL_VALUE | HEX_VALUE);
去匹配:
MANUFACTURER 0x1E6D11, DEVICE_TYPE 0x00FF, DEVICE_REVISION 5, DD_REVISION 1
然后得到如下树结构:
【总结】
1.对于匹配空格或Tab,无法使用skip()或者$channel=HIDDEN,否则,会导致无法正常解析。
2.不能在已经定义好了WS的情况下,再次单独定义单个的BLANK为空格或Tab,否则会导致重复定义,会报错:
The following token definitions can never be matched because prior tokens match the same input: BLANK |
3.最终只能使用,单独定义BLANKS:
BLANKS : (' '|'\t')+;
然后在后面使用:
manufacture : 'MANUFACTURER'^ BLANKS (HEX_VALUE | DECIMAL_VALUE) ','? WS*;
如此:
- 才能正常识别输入的内容,包括空格;
- 但是识别出来的空格,就没法实现hidden或skip的效果了。目前貌似没法实现此效果。
【后记】
1.后来,看到这个:
what is wrong with this grammar
感觉那人说的有理,我感觉可能也是:
此MissingTokenException,可能是antlr(或antlrworks)的bug。
毕竟,语法上,貌似都没有问题,并且也都可以正常执行代码,不应该报此错误才对。
当然,有待更清楚人的来确认一下。是不是bug。
转载请注明:在路上 » 【基本解决】antlr v3,用包含{$channel=HIDDEN;}语法,结果解析出错:MissingTokenException