【教程】详解Python正则表达式之： (?=…) lookahead assertion 前向匹配 /前向断言

Python 2.7的手册中的官网解释为：

(?=...)
Matches if ... matches next, but doesn’t consume any of the string. This is called a lookahead assertion. For example, Isaac (?=Asimov) will match 'Isaac ' only if it’s followed by 'Asimov'.

下面就来详细解释一下 (?=…) 的具体含义：

1.在解释此处(?=…) 的含义之前，建议先去看看：

【教程】详解Python正则表达式之： (?<=…) positive lookbehind assertion 后向匹配 /后向断言

和positive lookbehind assertion互相对应，此处(?=…) 的功能是：

判断（当前位置）（继续向前走，向前看）后面的内容是什么。

而此处的等于号，即表示，去判断，后面的内容是否等于…

2.为何要有lookahead assertion？

原因和：

【教程】详解Python正则表达式之： (?<=…) positive lookbehind assertion 后向匹配 /后向断言

中类似，当复杂的查询和匹配时，有时候也需要去判断字符串后面的内容，是否满足一定条件，然后最终才能做出决定，判断整体上是否匹配所需要的字符串的。

相应的例子的话，也参考

【教程】详解Python正则表达式之： (?<=…) positive lookbehind assertion 后向匹配 /后向断言

稍加修改一下，去判断html中常见的，img标签内的src的值，即两个双引号中间包含的http开头的，.jpg结尾的字符串。

而此时，为了更加精确的去匹配，则也可以同时利用上此处的(?=…) 。

示例代码：

#!/usr/bin/python
# -*- coding: utf-8 -*-
"""
【教程】详解Python正则表达式之： (?=…) lookahead assertion 前向匹配 /前向断言
https://www.crifan.com/detailed_explanation_about_python_regular_express_lookahead_assertion

Version:    2012-11-14
Author:     Crifan
"""

import re;

#提示：
#参考此示例代码之前，请先确保已了解positive lookbehind assertion
#【教程】详解Python正则表达式之： (?<=…) positive lookbehind assertion 后向匹配 /后向断言
#https://www.crifan.com/detailed_explanation_about_python_regular_express_positive_lookbehind_assertion

reLookaheadTestStr = """
fake html begin

out of src, normal content include pic url:
some sohu blog pic url is something like this:
"http://1802.img.pp.sohu.com.cn/images/blog/2012/4/12/16/20/u173669005_13766a912eag214.jpg"
which use img.pp.sohu.com.cn as its image server.

in end, include quote mark:
<img style="text-align:center;margin:0px auto 10px;zoom:1;display:block" border="0" src="http://1821.img.pp.sohu.com.cn/images/blog/2012/4/12/16/19/u173669005_13766a7cbebg214.jpg">

in end, NOT include quote mark:
<img style="text-align:center;margin:0px auto 10px;zoom:1;display:block" border="0" src="http://1811.img.pp.sohu.com.cn/images/blog/2012/4/12/16/15/u173669005_13766a48e19g213.jpg>

fake html end
"""

# 1. (?=…) - lookahead assertion 前向匹配 /前向断言

# 下面的正则表达式中，没有检查.jpg后面是否有双引号，所以，会匹配出上述两个地址的
foundAllSrcJpgUrl_lookbehind = re.findall(u'(?<=src=")(http://[\w\./]+\.jpg)', reLookaheadTestStr);
print "foundAllSrcJpgUrl_lookbehind=",foundAllSrcJpgUrl_lookbehind; #foundAllSrcJpgUrl_lookbehind= ['http://1821.img.pp.sohu.com.cn/images/blog/2012/4/12/16/19/u173669005_13766a7cbebg214.jpg', 'http://1811.img.pp.sohu.com.cn/images/blog/2012/4/12/16/15/u173669005_13766a48e19g213.jpg']

# 而加上了 lookbehind assertion后，同样判断.jpg后面也必须有双引号，则就可以精确只匹配.jpg后面同时也有双引号的图片地址了
foundAllSrcJpgUrl_lookbehind_alsoLookahead = re.findall(u'(?<=src=")(http://[\w\./]+\.jpg)(?=")', reLookaheadTestStr);
print "foundAllSrcJpgUrl_lookbehind_alsoLookahead=",foundAllSrcJpgUrl_lookbehind_alsoLookahead; #foundAllSrcJpgUrl_lookbehind_alsoLookahead= ['http://1821.img.pp.sohu.com.cn/images/blog/2012/4/12/16/19/u173669005_13766a7cbebg214.jpg']

【总结】

lookahead assertion就是看看后面的字符串，是否也满足相应的条件。

其和positive lookbehind assertion，是相对应的。

要注意的是，两者不是对立的，而是可以单独存在的。

且往往也是同时使用的，用来判断字符串前后都同时满足对应的条件，以此精确匹配所需内容。

转载请注明：在路上 » 【教程】详解Python正则表达式之： (?=…) lookahead assertion 前向匹配 /前向断言

Post Views: 1,922

【教程】详解Python正则表达式之： (?=…) lookahead assertion 前向匹配 /前向断言

与本文相关的文章

Hi，您需要填写昵称和邮箱！