最新消息:20210816 当前crifan.com域名已被污染,为防止失联,请关注(页面右下角的)公众号

【已解决】Python中用正则re去搜索分组的集合

Python crifan 2872浏览 0评论

折腾:

【已解决】Python 3中用正则匹配多段的脚本内容

期间,想要对于这种:

去匹配到:

多个script

即分组的分组

结果用:

<code>scriptMatch = re.search("(?P&lt;scriptList&gt;(?P&lt;singleScript&gt;place.+?)\n{2,1000})+", allLine, flags=re.I | re.M | re.DOTALL)
</code>

得到的scriptList也只是第一个script,而不是以为的所有的script

python re multiple group of group

Python Re: Multiple Capturing Groups – Stack Overflow

regex – Python Regular Expression Multiple Groups – Stack Overflow

python – How to match multiple groups regex – Stack Overflow

re中去搜索到分组的集合

好像都提到了re.finditer

去找找re.finditer

https://docs.python.org/3/library/re.html#re.finditer

re.finditer(patternstringflags=0)

Return an iterator yielding match objects over all non-overlapping matches for the RE pattern in string. The string is scanned left-to-right, and matches are returned in the order found. Empty matches are included in the result.

Changed in version 3.7: Non-empty matches can now start just after a previous empty match.

然后试试。

Repeating a Capturing Group vs. Capturing a Repeated Group

调试了好一会,加上

re.DOTALL的flag后,终于可以搜索到了:

然后又遇到一个坑:

iterator的object,被访问一次后,就变成空了:

所以此处的:

<code>    matchIterator = re.finditer(singleScriptPattern, allLine, flags=re.I | re.M | re.DOTALL)
    print("matchIterator=%s" % matchIterator)
    if matchIterator:
      for scriptNum, eachScriptMatch in enumerate(matchIterator):
        print("[%d] eachScriptMatch=%s" % (scriptNum, eachScriptMatch))
        singleScript = eachScriptMatch.group("singleScript")
        print("singleScript=%s" % singleScript)
</code>

在:

if matchIterator:

matchIterator就是空了,后续for循环就无效了,得不到我们要的数据了。

【总结】

此处,直接用:

<code>matchIterator = re.finditer(yourPattern, toSearchStr, flags)
</code>

即可得到对应的iterator类型的变量,可以用for循环去分别获取每个值

而其中的yourPattern,如果内部带group,则每个match的值中,可以直接用.group(“xx”)去获取对应的值,比如对于:

<code>
Place: School canteen
Topic: food
Tittle:Have lunch
Age: 3-4
J: What did you have for lunch?
L: I ate rice, fish and bread.
J: Do you like rice?
L: Yes, I do.
J: Do you like fish?
L: Yes, I do.
J: Do you like bread?
L: No, I don’t.
J: What did you drink?
L: I drank milk.
J: Do you like milk?
L: Yes, I do.

Place: home
Topic: house
Tittle: Doing housework 
Age: 4-5
J: Do you like cooking, mom?
M: Yes, I do a lot. What about you?
J: Mom, you know me. I can’t cook.
M: But can you help me wash dishes?
J: Yes, I can help you.
M: Let’s make a deal, ok?
J: What kind of deal?
M: I’m going to cook.
J: And then?
M: Then you wash the dishes after the meal.
J: That’s ok. I’ d like to help you mom.
M: You are a good boy.

。。。
</code>

正则代码:

<code>    singleScriptPattern = r"(?P&lt;singleScript&gt;place:(?P&lt;place&gt;.+?)\ntopic:(?P&lt;topic&gt;.+?)\ntittle:(?P&lt;title&gt;.+?)\nage:(?P&lt;age&gt;.+?)\n(?P&lt;content&gt;.+?))\n{2,1000}"
    matchIterator = re.finditer(singleScriptPattern, allLine, flags=re.I | re.M | re.DOTALL)
    print("matchIterator=%s" % matchIterator)
    # if matchIterator:
    for scriptNum, eachScriptMatch in enumerate(matchIterator):
      print("[%d] eachScriptMatch=%s" % (scriptNum, eachScriptMatch))
      singleScript = eachScriptMatch.group("singleScript")
      print("singleScript=%s" % singleScript)
      place = eachScriptMatch.group("place")
      print("place=%s" % place)
      topic = eachScriptMatch.group("topic")
      print("topic=%s" % topic)
      title = eachScriptMatch.group("title")
      print("title=%s" % title)
      age = eachScriptMatch.group("age")
      print("age=%s" % age)
      content = eachScriptMatch.group("content")
      print("content=%s" % content)
</code>

结果:

<code>matchIterator=&lt;callable_iterator object at 0x10e3f7b70&gt;
[0] eachScriptMatch=&lt;_sre.SRE_Match object; span=(1, 309), match='Place: School canteen\nTopic: food\nTittle:Have l&gt;
singleScript=Place: School canteen
Topic: food
Tittle:Have lunch
Age: 3-4
J: What did you have for lunch?
L: I ate rice, fish and bread.
J: Do you like rice?
L: Yes, I do.
J: Do you like fish?
L: Yes, I do.
J: Do you like bread?
L: No, I don’t.
J: What did you drink?
L: I drank milk.
J: Do you like milk?
L: Yes, I do.
place= School canteen
topic= food
title=Have lunch
age= 3-4
age=J: What did you have for lunch?
L: I ate rice, fish and bread.
J: Do you like rice?
L: Yes, I do.
J: Do you like fish?
L: Yes, I do.
J: Do you like bread?
L: No, I don’t.
J: What did you drink?
L: I drank milk.
J: Do you like milk?
L: Yes, I do.
</code>

转载请注明:在路上 » 【已解决】Python中用正则re去搜索分组的集合

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
82 queries in 0.176 seconds, using 22.07MB memory