目录
摘要
此处整理一下,Python中使用正则表达式的心得:
某次,对于字符串
footerUni=u"分类: | 标签:";
使用:
foundCatZhcn = re.search(u"分类:(?P<catName>.+)|", footerUni); print "foundCatZhcn=",foundCatZhcn; if(foundCatZhcn): print "foundCatZhcn.group(0)=",foundCatZhcn.group(0); print "foundCatZhcn.group(1)=",foundCatZhcn.group(1); catName = foundCatZhcn.group("catName"); print "catName=",catName;
所得到的结果却是:
foundCatZhcn= <_sre.SRE_Match object at 0x027E3C20> foundCatZhcn.group(0)= foundCatZhcn.group(1)= None catName= None
其中group(0),不是所期望的整个匹配的字符串,且group(1)应该是一个空格的字符,而不是None。
调试了半天,最后终于找到原因了,原来是在正则搜索中,竖线"|",是or的关系
“'|'
A|B, where A and B can be arbitrary REs, creates a regular expression that will match either A or B. An arbitrary number of REs can be separated by the '|' in this way. This can be used inside groups (see below) as well. As the target string is scanned, REs separated by '|' are tried from left to right. When one pattern completely matches, that branch is accepted. This means that once A matches, B will not be tested further, even if it would produce a longer overall match. In other words, the '|' operator is never greedy. To match a literal '|', use \|, or enclose it inside a character class, as in [|].
”所以此处匹配到的结果是空值
所以测试过程中,无论如何修改re中的表达式,也都会得到foundCatZhcn是非空的值
然后对应的解决办法是,给竖线加上反斜杠,表示竖线字符本身:
foundCatZhcn = re.search(u"分类:(?P<catName>.*?)\|", footerUni);
这样才能真正自己想要的效果。