【已解决】Python中用正则re去搜索分组的集合

折腾：

期间，想要对于这种：

去匹配到：

多个script

即分组的分组

结果用：

<code>scriptMatch = re.search("(?P&lt;scriptList&gt;(?P&lt;singleScript&gt;place.+?)\n{2,1000})+", allLine, flags=re.I | re.M | re.DOTALL)
</code>

得到的scriptList也只是第一个script，而不是以为的所有的script

python re multiple group of group

Python Re: Multiple Capturing Groups – Stack Overflow

regex – Python Regular Expression Multiple Groups – Stack Overflow

python – How to match multiple groups regex – Stack Overflow

re中去搜索到分组的集合

好像都提到了re.finditer

去找找re.finditer

https://docs.python.org/3/library/re.html#re.finditer

“re.finditer(pattern, string, flags=0)

Return an iterator yielding match objects over all non-overlapping matches for the RE pattern in string. The string is scanned left-to-right, and matches are returned in the order found. Empty matches are included in the result.

Changed in version 3.7: Non-empty matches can now start just after a previous empty match.
”

然后试试。

Repeating a Capturing Group vs. Capturing a Repeated Group

调试了好一会，加上

re.DOTALL的flag后，终于可以搜索到了：

然后又遇到一个坑：

iterator的object，被访问一次后，就变成空了：

所以此处的：

<code>    matchIterator = re.finditer(singleScriptPattern, allLine, flags=re.I | re.M | re.DOTALL)
    print("matchIterator=%s" % matchIterator)
    if matchIterator:
      for scriptNum, eachScriptMatch in enumerate(matchIterator):
        print("[%d] eachScriptMatch=%s" % (scriptNum, eachScriptMatch))
        singleScript = eachScriptMatch.group("singleScript")
        print("singleScript=%s" % singleScript)
</code>

在：

if matchIterator:

matchIterator就是空了，后续for循环就无效了，得不到我们要的数据了。

【总结】

此处，直接用：

<code>matchIterator = re.finditer(yourPattern, toSearchStr, flags)
</code>

即可得到对应的iterator类型的变量，可以用for循环去分别获取每个值

而其中的yourPattern，如果内部带group，则每个match的值中，可以直接用.group(“xx”)去获取对应的值，比如对于：

<code>
Place: School canteen
Topic: food
Tittle:Have lunch
Age: 3-4
J: What did you have for lunch?
L: I ate rice, fish and bread.
J: Do you like rice?
L: Yes, I do.
J: Do you like fish?
L: Yes, I do.
J: Do you like bread?
L: No, I don’t.
J: What did you drink?
L: I drank milk.
J: Do you like milk?
L: Yes, I do.

Place: home
Topic: house
Tittle: Doing housework 
Age: 4-5
J: Do you like cooking, mom?
M: Yes, I do a lot. What about you?
J: Mom, you know me. I can’t cook.
M: But can you help me wash dishes?
J: Yes, I can help you.
M: Let’s make a deal, ok?
J: What kind of deal?
M: I’m going to cook.
J: And then?
M: Then you wash the dishes after the meal.
J: That’s ok. I’ d like to help you mom.
M: You are a good boy.

。。。
</code>

正则代码：

<code>    singleScriptPattern = r"(?P&lt;singleScript&gt;place:(?P&lt;place&gt;.+?)\ntopic:(?P&lt;topic&gt;.+?)\ntittle:(?P&lt;title&gt;.+?)\nage:(?P&lt;age&gt;.+?)\n(?P&lt;content&gt;.+?))\n{2,1000}"
    matchIterator = re.finditer(singleScriptPattern, allLine, flags=re.I | re.M | re.DOTALL)
    print("matchIterator=%s" % matchIterator)
    # if matchIterator:
    for scriptNum, eachScriptMatch in enumerate(matchIterator):
      print("[%d] eachScriptMatch=%s" % (scriptNum, eachScriptMatch))
      singleScript = eachScriptMatch.group("singleScript")
      print("singleScript=%s" % singleScript)
      place = eachScriptMatch.group("place")
      print("place=%s" % place)
      topic = eachScriptMatch.group("topic")
      print("topic=%s" % topic)
      title = eachScriptMatch.group("title")
      print("title=%s" % title)
      age = eachScriptMatch.group("age")
      print("age=%s" % age)
      content = eachScriptMatch.group("content")
      print("content=%s" % content)
</code>

结果：

<code>matchIterator=&lt;callable_iterator object at 0x10e3f7b70&gt;
[0] eachScriptMatch=&lt;_sre.SRE_Match object; span=(1, 309), match='Place: School canteen\nTopic: food\nTittle:Have l&gt;
singleScript=Place: School canteen
Topic: food
Tittle:Have lunch
Age: 3-4
J: What did you have for lunch?
L: I ate rice, fish and bread.
J: Do you like rice?
L: Yes, I do.
J: Do you like fish?
L: Yes, I do.
J: Do you like bread?
L: No, I don’t.
J: What did you drink?
L: I drank milk.
J: Do you like milk?
L: Yes, I do.
place= School canteen
topic= food
title=Have lunch
age= 3-4
age=J: What did you have for lunch?
L: I ate rice, fish and bread.
J: Do you like rice?
L: Yes, I do.
J: Do you like fish?
L: Yes, I do.
J: Do you like bread?
L: No, I don’t.
J: What did you drink?
L: I drank milk.
J: Do you like milk?
L: Yes, I do.
</code>

转载请注明：在路上 » 【已解决】Python中用正则re去搜索分组的集合

Post Views: 1,470

【已解决】Python中用正则re去搜索分组的集合

与本文相关的文章

Hi，您需要填写昵称和邮箱！