Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #74080 > unrolled thread
| Started by | gintare <g.statkute@gmail.com> |
|---|---|
| First post | 2014-07-07 00:19 -0700 |
| Last post | 2014-07-07 21:38 -0600 |
| Articles | 2 — 2 participants |
Back to article view | Back to comp.lang.python
finditer gintare <g.statkute@gmail.com> - 2014-07-07 00:19 -0700
Re: finditer Jason Friedman <jsf80238@gmail.com> - 2014-07-07 21:38 -0600
| From | gintare <g.statkute@gmail.com> |
|---|---|
| Date | 2014-07-07 00:19 -0700 |
| Subject | finditer |
| Message-ID | <d580e76b-793e-435d-917b-613ae912a93f@googlegroups.com> |
If smbd has time, maybe you could advice how to accomplish this task in faster way.
I have a text = """ word{vb}
wordtransl {vb}
sent1.
sent1trans.
sent2
sent2trans... """
I need to match once wordtransl, and than many times repeating patterns consisting of sent and senttrans.
The way i achieved this goal is for sure not most efficient one:
sw=word # i know the word
stry='\s*'+sw+'\s*.*\{vb\}\n+'
stry=stry+'(?P<Wtrans>.*)\{vb\}\n+'
stryc=re.compile(stry, re.UNICODE)
LtryM=re.search(stryc, linef) #here i find wordtrans
part=re.split(stryc,linef) #here i split search text to obtain part with repeating sent and senttrans
stry2='(?:'
stry2=stry2+'\s*'+sw+'\s*.*\{vb\}\n+'
stry2=stry2+'(?P<Wtrans>.*)\{vb\}\n+'
stry2=stry2+')*'
stry2=stry2+'('
stry2=stry2+'(?P<SVsent>.*)\n+'
stry2=stry2+'(?P<SVtrans>.*)\n+'
stry2=stry2+')'
stryc2=re.compile(stry2, re.UNICODE)
LtryM=re.finditer(stryc2, part[2]) #here i find text pieces consisting sent and sentrans
for item in LtryM:
stry3=''
stry3=stry3+'(?P<SVsent>.*)\n+'
stry3=stry3+'(?P<SVtrans>.*)\n+'
stryc3=re.compile(stry3, re.UNICODE)
LtryM3=re.search(stryc3, item.group()) #here i find sent and senttrans
print(LtryM3.groupdict())
[toc] | [next] | [standalone]
| From | Jason Friedman <jsf80238@gmail.com> |
|---|---|
| Date | 2014-07-07 21:38 -0600 |
| Message-ID | <mailman.11615.1404790710.18130.python-list@python.org> |
| In reply to | #74080 |
On Mon, Jul 7, 2014 at 1:19 AM, gintare <g.statkute@gmail.com> wrote:
> If smbd has time, maybe you could advice how to accomplish this task in faster way.
>
> I have a text = """ word{vb}
> wordtransl {vb}
>
> sent1.
>
> sent1trans.
>
> sent2
>
> sent2trans... """
>
> I need to match once wordtransl, and than many times repeating patterns consisting of sent and senttrans.
You might try itertools.groupby
(https://docs.python.org/3/library/itertools.html#module-itertools).
text = """ word{vb}
wordtransl {vb}
sent1
sent1trans
sent2
sent2trans
"""
import itertools
import re
result_list = list()
lines = text.split("\n")
for line in lines[:]:
if line.startswith("sent"):
break
lines.pop(0)
def is_start(x):
pattern = re.compile(r"sent\d+$")
if re.search(pattern, x):
return True
for key, mygroup in itertools.groupby(lines, is_start):
result_list.append(list(mygroup))
print(result_list)
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web