最新消息:20210816 当前crifan.com域名已被污染,为防止失联,请关注(页面右下角的)公众号

【已解决】Java的正则表达式java.util.regex中匹配星号’*’ asterisk字符本身

Java regex crifan 9161浏览 0评论

【问题】

想要在Java语言中的java.util.regex中,去匹配星号这个字符本身,结果使用

\*

却出现错误:

illegal escape character

Dangling meta character ‘*’

illegal escape character Dangling meta character

 

【解决过程】

1.去官网的Java的正则表达式的官网:

java.util.regex.Pattern

去看,结果却好像没有看到,关于如何表达星号字符本身的.

貌似也还是通过反斜杠去转义的.

但是此处却出错了.

2.后来参考:

java 的转义字符,在正则表达式中应用

好像是u002A表示星号’*’,但是感觉有点诡异。不至于非要通过unicode去表达吧,这也太麻烦了和不直观了。

3.后来参考:

Match asterisk with regexp

得知了,原来是通过:

\\*

去匹配星号字符本身。

具体解释是参考上面的官网中的:

Backslashes, escapes, and quoting

The backslash character (‘\’) serves to introduce escaped constructs, as defined in the table above, as well as to quote characters that otherwise would be interpreted as unescaped constructs. Thus the expression \\ matches a single backslash and \{ matches a left brace.

It is an error to use a backslash prior to any alphabetic character that does not denote an escaped construct; these are reserved for future extensions to the regular-expression language. A backslash may be used prior to a non-alphabetic character regardless of whether that character is part of an unescaped construct.

Backslashes within string literals in Java source code are interpreted as required by the

Java Language Specification as either Unicode escapes or other character escapes. It is therefore necessary to double backslashes in string literals that represent regular expressions to protect them from interpretation by the Java bytecode compiler. The string literal "\b", for example, matches a single backspace character when interpreted as a regular expression, while "\\b" matches a word boundary. The string literal "\(hello\)" is illegal and leads to a compile-time error; in order to match the string (hello) the string literal "\\(hello\\)" must be used.

但是说实话,没有完全看懂这部分的内容。似懂非懂。

好像是这个意思:

java中的反斜杠呢,是被Java语言中定义为了,专门用于表示Unicode字符或者其他转义字符的。

所以,在普通的正则表达式中,为了避免Java字节码编译器编译成上述的Unicode和其他转义字符,所以,在正则表达式中,需要写成两个反斜杠;

人家举的例子是,

\b,本身表示的是一个后退键,是单个(特殊的控制)字符;

对应的

\\b,就是正则表达式中的了,表示单词边界(word boundary)

 

【总结】

感觉上,Java中的正则的写法,还是有点小变态的。

 

现在可以简单理解为:

除非你使用\uXXXX表示Unicode字符本身和使用\x表示真正的转义字符本身;

在正则表达式中,其他语言中的\x表示对应的正则中的特殊含义的时候,都要写成\xx

比如常见的

\d -> \\d

\w -> \\w

等等。

转载请注明:在路上 » 【已解决】Java的正则表达式java.util.regex中匹配星号’*’ asterisk字符本身

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
82 queries in 0.187 seconds, using 22.12MB memory