【已解决】antlr解析双引号出错：MismatchedTokenException(0!=0)

【问题】

antlr v3的语法，在antlrworks中调试。

核心部分的代码是：

fragment
ID  :   ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;
      
//singleInclude :   '#include' BLANKS  '"' ID '"' '.h';
singleInclude   :   '#include'   '"' ID '"' '.h';
 
//include       :   singleInclude WS*   -> singleInclude;
include     :   singleInclude WS*;
 
 
//startParse    :   include* identification+;
//startParse    :   include+ identification+;
//startParse    :   identification+;
//startParse    :   manufacture deviceType deviceRevison ddRevision;

解析的内容是：

/*
**********************************************************************
** Includes
**********************************************************************
*/
 
#include "std_defs.h"
#include "com_tbls.h"
#include "rev_defs.h"
#include "fbk_hm.h"
#include "fdiag_FBK2_Start.h"
#include "blk_err.h"
 
/*
**********************************************************************
********** DEVICE SECTION ********************************************
**********************************************************************
*/
 
MANUFACTURER      0x1E6D11,
DEVICE_TYPE       0x00FF,
DEVICE_REVISION   5,
DD_REVISION       1

结果调试出错：

【解决过程】

1.很明显，是双引号无法识别，出现MismatchedTokenException(0!=0)的问题。

2.参考：

构建自定义的语法分析器

解释的很清楚，可惜对此问题没帮助。

3.参考：

[antlr-interest] MismatchedTokenException

没太看懂。。。

对解决问题，没帮助。

4.参考：

Antlr.Runtime.MismatchedTokenException from Envers with generic entities

没用。

5.后来搜：

antlr MismatchedTokenException(0!=0) double quote

而参考：

ANTLR grammar how to capture all characters to end of line

其说的，和我此处有点类似：

好像是comment等的定义，和此处的双引号的匹配，有点冲突了？

所以试着看，把原先的代码：

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

grammar DDParserDemo;
 
options {
    output = AST;
    ASTLabelType = CommonTree; // type of $stat.tree ref etc...
}
 
//NEWLINE :   '\r'? '\n' ;
//NEWLINE :   '\r' '\n' ;
fragment 
NEWLINE :   '\r'? '\n' ;
 
 
fragment
ID  :   ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;
      
fragment
FLOAT
    :   ('0'..'9')+ '.' ('0'..'9')* EXPONENT?
    |   '.' ('0'..'9')+ EXPONENT?
    |   ('0'..'9')+ EXPONENT
    ;
 
COMMENT
    :   '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
    |   '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
    ;
 
//fragment WS  :   ( ' ' | '\t' | '\r' | '\n') {skip();};
//fragment WS  :   ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
WS  :   ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
 
STRING
    :  '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
    ;
 
CHAR:  '\'' ( ESC_SEQ | ~('\''|'\\') ) '\''
    ;
 
fragment
EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
 
 
ESC_SEQ
    :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
    |   UNICODE_ESC
    |   OCTAL_ESC
    ;
 
fragment
OCTAL_ESC
    :   '\\' ('0'..'3') ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7')
    ;
 
fragment
UNICODE_ESC
    :   '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
    ;
 
fragment
DIGIT
    :   '0'..'9';
 
//FAKE_TOKEN    :   '1' '2' '3';
 
/*
DECIMAL_VALUE
    :   '1'..'9' DIGIT*;
*/
 
//DECIMAL_VALUE :   DIGIT*;
DECIMAL_VALUE   :   DIGIT+;
 
//HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;
HEX_DIGIT : (DIGIT|'a'..'f'|'A'..'F') ;
 
 
HEX_VALUE
    :   '0x' HEX_DIGIT+;
 
fragment
HEADER_FILENAME
    :   ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'_')*;
 
 
/*
BLANKSPACE_TAB
//  :   (' ' | '\t'){skip();};
    :   (' ' | '\t')
    {$channel=HIDDEN;};
*/
//fragment BLANK    :   (' '|'\t')+ {skip();};
//BLANK :   (' '|'\t') {skip();};
//BLANK :   (' '|'\t');
//BLANK :   (' '|'\t') {$channel=HIDDEN;};
//BLANKS    :   (' '|'\t')+ {$channel=HIDDEN;};
//BLANKS    :   (' '|'\t')+ {$channel=HIDDEN;};
//BLANKS    :   (' '|'\t')+;
//BLANK :   (' '|'\t') {$channel=HIDDEN;};
//BLANK :   (' '|'\t') {skip();};
BLANKS  :   (' '|'\t')+;
//BLANKS    :   (' '|'\t')+ {skip();};
//BLANKS    :   ' '+ {$channel=HIDDEN;};
 
//singleInclude :   '#include' ' '+ '"' ID '.h"' ;
//singleInclude :   '#include' ' '+ '"' ID+ '.h"' ;
//singleInclude :   '#include' ' '+ '"' HEADER_FILENAME '.h"';
//singleInclude :   '#include' ' ' '"' HEADER_FILENAME '.h"';
//singleInclude :   '#include "' HEADER_FILENAME '.h"';
//fragment singleInclude    :   '#include' (' ')+ '"' ID '.h"';
//singleInclude :   '#include' (' '|'\t')+ '""' ID '.h"';
//singleInclude :   '#include' (' '|'\t')+ '"std_defs.h"';
//singleInclude :   '#include' BLANKS  '"' ID '"' '.h';
singleInclude   :   '#include'   '"' ID '"' '.h';
 
//include       :   singleInclude WS*   -> singleInclude;
include     :   singleInclude WS*;
 
 
//startParse    :   include* identification+;
//startParse    :   include+ identification+;
//startParse    :   identification+;
//startParse    :   manufacture deviceType deviceRevison ddRevision;
startParse  :   include+ manufacture deviceType deviceRevison ddRevision;
//manufacture   :   'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
//manufacture   :   'MANUFACTURER'^     (BLANK+! (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
//manufacture   :   'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) ','? WS*;
manufacture :   'MANUFACTURER'^     BLANKS (HEX_VALUE | DECIMAL_VALUE) ','? WS*;
deviceType  :   'DEVICE_TYPE'^      BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
deviceRevison   :   'DEVICE_REVISION'^  BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;
ddRevision  :   'DD_REVISION'^      BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;
     
//identification    :   definiton WS* (','?)! WS*   -> definiton;
     
//definiton :   (ID)^ ('\t'!|' '!)+ (DECIMAL_VALUE | HEX_VALUE)
//definiton :   (ID)^ BLANKSPACE_TAB+ (DECIMAL_VALUE | HEX_VALUE)
//definiton :   ID ('\t'!|' '!)+ (DECIMAL_VALUE | HEX_VALUE);

中的STRING注释掉：

/*
STRING
    :  '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
    ;
*/

去重新debug看看结果，结果，果然可以识别第一个双引号了，不过接着又出现了另外的

MismatchedTokenException(0!=0)

的问题：

但是，这样就离着最终解决此问题，前进了一大步了。

因为，搞懂了，之前之所以没有匹配第一个双引号，是因为，之前无故地，多定义了个STRING，但是却没使用。

导致后续无法正常匹配所需要的双引号。

6.此处，之所以错在ID位置，好像是之前多余的，自己定义了一个：

fragment
HEADER_FILENAME
    :   ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'_')*;

所以，去掉：

/*
fragment
HEADER_FILENAME
    :   ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'_')*;
*/

试试结果，结果错误依旧。

7.期间遇到类似于重复定义的问题，详见：

【未完全解决】antlr调试出错：The following token definitions can never be matched because prior tokens match the same input

【总结】

1.不要随便，乱用，Antlrworks创建新的.g文件时所自带的语法

比如ID，STRING等等。

否则，后期可能和你真正要处理的内容，有冲突：

比如此处就是，之前模板所生成的STRING，和后续的识别双引号，而产生冲突，导致出现了

MismatchedTokenException(0!=0)

而无法正常继续解析。

2.之前的ID定义，其实是可以用的，即：

1 2	`ID : ('a'..'z'\|'A'..'Z'\|'_') ('a'..'z'\|'A'..'Z'\|'0'..'9'\|'_')*` `;`

是可以正常使用的。

3.但是对应ID，不能加上fragment，即不能用：

fragment
ID  :   ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;

否则，是会报错：MismatchedTokenException(0!=0)，的。

4.单引号的表示，的确就是正常的：

'"'

即可。

5.此处，还仍旧会有那个MissingTokenException的，目前看来，估计是bug。

详见：

【基本解决】antlr v3，用包含{$channel=HIDDEN;}语法，结果解析出错：MissingTokenException

6.目前是用如下代码：

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

grammar DDParserDemo;
 
options {
    output = AST;
    ASTLabelType = CommonTree; // type of $stat.tree ref etc...
}
 
//NEWLINE :   '\r'? '\n' ;
//NEWLINE :   '\r' '\n' ;
fragment 
NEWLINE :   '\r'? '\n' ;
 
    
fragment
FLOAT
    :   ('0'..'9')+ '.' ('0'..'9')* EXPONENT?
    |   '.' ('0'..'9')+ EXPONENT?
    |   ('0'..'9')+ EXPONENT
    ;
 
COMMENT
    :   '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
    |   '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
    ;
 
//fragment WS  :   ( ' ' | '\t' | '\r' | '\n') {skip();};
//fragment WS  :   ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
WS  :   ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
 
/*
STRING
    :  '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
    ;
*/
 
CHAR:  '\'' ( ESC_SEQ | ~('\''|'\\') ) '\''
    ;
 
fragment
EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
 
 
ESC_SEQ
    :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
    |   UNICODE_ESC
    |   OCTAL_ESC
    ;
 
fragment
OCTAL_ESC
    :   '\\' ('0'..'3') ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7')
    ;
 
fragment
UNICODE_ESC
    :   '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
    ;
 
//fragment
DIGIT
    :   '0'..'9';
 
//FAKE_TOKEN    :   '1' '2' '3';
 
/*
DECIMAL_VALUE
    :   '1'..'9' DIGIT*;
*/
 
//DECIMAL_VALUE :   DIGIT*;
DECIMAL_VALUE   :   DIGIT+;
 
//HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;
HEX_DIGIT : (DIGIT|'a'..'f'|'A'..'F') ;
 
 
HEX_VALUE
    :   '0x' HEX_DIGIT+;
 
/*
fragment
HEADER_FILENAME
    :   ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'_')*;
*/
 
/*
BLANKSPACE_TAB
//  :   (' ' | '\t'){skip();};
    :   (' ' | '\t')
    {$channel=HIDDEN;};
*/
//fragment BLANK    :   (' '|'\t')+ {skip();};
//BLANK :   (' '|'\t') {skip();};
//BLANK :   (' '|'\t');
//BLANK :   (' '|'\t') {$channel=HIDDEN;};
//BLANKS    :   (' '|'\t')+ {$channel=HIDDEN;};
//BLANKS    :   (' '|'\t')+ {$channel=HIDDEN;};
//BLANKS    :   (' '|'\t')+;
//BLANK :   (' '|'\t') {$channel=HIDDEN;};
//BLANK :   (' '|'\t') {skip();};
BLANKS  :   (' '|'\t')+;
//BLANKS    :   (' '|'\t')+ {skip();};
//BLANKS    :   ' '+ {$channel=HIDDEN;};
 
//singleInclude :   '#include' ' '+ '"' ID '.h"' ;
//singleInclude :   '#include' ' '+ '"' ID+ '.h"' ;
//singleInclude :   '#include' ' '+ '"' HEADER_FILENAME '.h"';
//singleInclude :   '#include' ' ' '"' HEADER_FILENAME '.h"';
//singleInclude :   '#include "' HEADER_FILENAME '.h"';
//fragment singleInclude    :   '#include' (' ')+ '"' ID '.h"';
//singleInclude :   '#include' (' '|'\t')+ '""' ID '.h"';
//singleInclude :   '#include' (' '|'\t')+ '"std_defs.h"';
//singleInclude :   '#include' BLANKS  '"' ID '"' '.h';
//singleInclude :   '#include' '"' ID '"' '.h';
//singleInclude :   '#include' BLANKS '"' ID '"' '.h';
//singleInclude :   '#include' BLANKS '"' ID '.h' '"';
//singleInclude :   '#include' BLANKS '"' ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* '.h' '"';
//ID_START  :   'a'..'z'|'A'..'Z'|'_';
//fragment ID_START     :   'a'..'z'|'A'..'Z'|'_';
 
//WHOLE_ID  :   ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;
//WHOLE_ID  :   ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'| DIGIT)*;
//WHOLE_ID  :   ('a'..'z'|'A'..'Z'|'_') (HEX_DIGIT|'_')*;
 
 
//fragment
ID  :   ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;
 
 
//ID_START  :   'a'..'z'|'A'..'Z'|'_';
//WHOLE_ID  :   (ID_START) (ID_START | DIGIT)*;
 
//ID_MIDDLE_END :   ID_START | DIGIT;
//ID_MIDDLE_END :   HEX_DIGIT | '_';
//singleInclude :   '#include' BLANKS '"' ID_START ID_MIDDLE_END* '.h' '"';
//singleInclude :   '#include' BLANKS '"' ID_START (ID_START | DIGIT)* '.h' '"';
//singleInclude :   '#include' BLANKS '"' ID_START (ID_START | DIGIT)+ '.h' '"';
//singleInclude :   '#include' BLANKS '"' ID_START '.h' '"';
//singleInclude :   '#include' BLANKS '"' WHOLE_ID '.h' '"';
singleInclude   :   '#include' BLANKS '"' ID '.h' '"';
 
 
//include       :   singleInclude WS*   -> singleInclude;
include     :   singleInclude WS*;
 
//startParse    :   include* identification+;
//startParse    :   include+ identification+;
//startParse    :   identification+;
//startParse    :   manufacture deviceType deviceRevison ddRevision;
startParse  :   include+ manufacture deviceType deviceRevison ddRevision;
//manufacture   :   'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
//manufacture   :   'MANUFACTURER'^     (BLANK+! (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
//manufacture   :   'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) ','? WS*;
manufacture :   'MANUFACTURER'^     BLANKS (HEX_VALUE | DECIMAL_VALUE) ','? WS*;
deviceType  :   'DEVICE_TYPE'^      BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
deviceRevison   :   'DEVICE_REVISION'^  BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;
ddRevision  :   'DD_REVISION'^      BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;
     
//identification    :   definiton WS* (','?)! WS*   -> definiton;
     
//definiton :   (ID)^ ('\t'!|' '!)+ (DECIMAL_VALUE | HEX_VALUE)
//definiton :   (ID)^ BLANKSPACE_TAB+ (DECIMAL_VALUE | HEX_VALUE)
//definiton :   ID ('\t'!|' '!)+ (DECIMAL_VALUE | HEX_VALUE);

去解析：

/*
**********************************************************************
** Includes
**********************************************************************
*/
 
#include "std_defs.h"
#include "com_tbls.h"
#include "rev_defs.h"
#include "fbk_hm.h"
#include "fdiag_FBK2_Start.h"
#include "blk_err.h"
 
/*
**********************************************************************
********** DEVICE SECTION ********************************************
**********************************************************************
*/
 
MANUFACTURER      0x1E6D11,
DEVICE_TYPE       0x00FF,
DEVICE_REVISION   5,
DD_REVISION       1

对应的截图为：

转载请注明：在路上 » 【已解决】antlr解析双引号出错：MismatchedTokenException(0!=0)

Post Views: 1,488

【已解决】antlr解析双引号出错：MismatchedTokenException(0!=0)

与本文相关的文章

Hi，您需要填写昵称和邮箱！

与本文相关的文章

Hi，您需要填写昵称和邮箱！

订阅在路上