关于search的结果,第 6.2 节 “re模块的search的含义和用法及查找后group的含义”中已经解释过了。
下面详细给出关于findall中,对于pattern中,加括号,与不加括号,所查找到的结果的区别。
其中加括号,表示()内的匹配的内容为一组,供得到结果,通过group(N)所获取的到,N从0开始。
下面是详细测试结果,看结果,就明白是否加括号之间的区别了:
# here blogContent contains following pic url link: # http://hiphotos.baidu.com/againinput_tmp/pic/item/069e0d89033b5bb53d07e9b536d3d539b400bce2.jpg # http://hiphotos.baidu.com/recommend_music/pic/item/221ebedfa1a34d224954039e.jpg # following is test result: pic_pattern_no_parenthesis = r'http://hiphotos.baidu.com/\S+/[ab]{0,2}pic/item/[a-zA-Z0-9]{24,40}\.\w{3}' picList_no_parenthesis = re.findall(pic_pattern_no_parenthesis, blogContent) # findall result is a list if matched print 'findall no()=',picList_no_parenthesis print 'findall no() len=',len(picList_no_parenthesis) #print 'findall no() group=',picList_no_parenthesis.group(0) # -> cause error pic_pattern_with_parenthesis = r'http://hiphotos.baidu.com/(\S+)/([ab]{0,2})pic/item/([a-zA-Z0-9]+)\.([a-zA-Z]{3})' picList_with_parenthesis = re.findall(pic_pattern_with_parenthesis, blogContent) # findall result is a list if matched print 'findall with()=',picList_with_parenthesis print 'findall with() len=',len(picList_with_parenthesis) #print 'findall with() group(0)=',picList_with_parenthesis.group(0) # -> cause error #print 'findall with() group(1)=',picList_with_parenthesis.group(1) # -> cause error print 'findall with() [0][0]=',picList_with_parenthesis[0][0] print 'findall with() [0][1]=',picList_with_parenthesis[0][1] print 'findall with() [0][2]=',picList_with_parenthesis[0][2] print 'findall with() [0][3]=',picList_with_parenthesis[0][3] #print 'findall with() [0][4]=',picList_with_parenthesis[0][4] # no [4] -> cause error
测试结果为:
findall no()= [u'http://hiphotos.baidu.com/againinput_tmp/pic/item/069e0d89033b5bb53d07e9b536d3d539b400bce2.jpg', u'http://hiphotos.baidu.com/recommend_music/pic/item/221ebedfa1a34d224954039e.jpg'] findall no() len= 2 findall with()= [(u'againinput_tmp', u'', u'069e0d89033b5bb53d07e9b536d3d539b400bce2', u'jpg'), (u'recommend_music', u'', u'221ebedfa1a34d224954039e', u'jpg')] findall with() len= 2 findall with() [0][0]= againinput_tmp findall with() [0][1]= findall with() [0][2]= 069e0d89033b5bb53d07e9b536d3d539b400bce2 findall with() [0][3]= jpg