Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #57767 > unrolled thread
| Started by | "Marc" <marc@marcd.org> |
|---|---|
| First post | 2013-10-27 17:09 -0400 |
| Last post | 2013-10-28 10:34 +1100 |
| Articles | 4 — 4 participants |
Back to article view | Back to comp.lang.python
Parsing multiple lines from text file using regex "Marc" <marc@marcd.org> - 2013-10-27 17:09 -0400
Re: Parsing multiple lines from text file using regex "Rhodri James" <rhodri@wildebst.demon.co.uk> - 2013-10-27 22:19 +0000
Re: Parsing multiple lines from text file using regex Roy Smith <roy@panix.com> - 2013-10-27 18:43 -0400
Re: Parsing multiple lines from text file using regex Ben Finney <ben+python@benfinney.id.au> - 2013-10-28 10:34 +1100
| From | "Marc" <marc@marcd.org> |
|---|---|
| Date | 2013-10-27 17:09 -0400 |
| Subject | Parsing multiple lines from text file using regex |
| Message-ID | <mailman.1665.1382911575.18130.python-list@python.org> |
[Multipart message — attachments visible in raw view] — view raw
Hi, I am having an issue with something that would seem to have an easy solution, but which escapes me. I have configuration files that I would like to parse. The data I am having issue with is a multi-line attribute that has the following structure: banner <option> <banner text delimiter> Banner text Banner text Banner text ... <banner text delimiter> The regex 'banner\s+(\w+)\s+(.+)' captures the command nicely and banner.group(2) captures the delimiter nicely. My issue is that I need to capture the lines between the delimiters (both delimiters are the same). I have tried various permutations of Delimiter=banner.group(2) re.findall(Delimiter'(.*?)'Delimiter, line, re.DOTALL|re.MULTILINE) with no luck Examples I have found online all assume that the starting and ending delimiters are different and are defined directly in re.findall(). I would like to use the original regex extracting the banner.group(2), since it is already done, if possible. Any help in pointing me in the right direction would be most appreciated. Thank you, Marc
[toc] | [next] | [standalone]
| From | "Rhodri James" <rhodri@wildebst.demon.co.uk> |
|---|---|
| Date | 2013-10-27 22:19 +0000 |
| Message-ID | <op.w5mwa3iaa8ncjz@gnudebeest> |
| In reply to | #57767 |
On Sun, 27 Oct 2013 21:09:46 -0000, Marc <marc@marcd.org> wrote:
> Hi,
> I am having an issue with something that would seem to have an easy
> solution, but which escapes me. I have configuration files that I would
> like to parse. The data I am having issue with is a multi-line attribute
> that has the following structure:
>
> banner <option> <banner text delimiter>
> Banner text
> Banner text
> Banner text
> ...
> <banner text delimiter>
>
> The regex 'banner\s+(\w+)\s+(.+)' captures the command nicely and
> banner.group(2) captures the delimiter nicely.
>
> My issue is that I need to capture the lines between the delimiters (both
> delimiters are the same).
I really, really wouldn't do this with a single regexp. You'll get a much
easier to understand program if you implement a small state machine
instead. In rough pseudocode:
collecting_banner = False
for line in configuration_file:
if not collecting_banner:
if found banner start:
get delimiter
collecting_banner = True
banner_lines = []
elif found other stuff:
do other stuff
elif found delimiter:
collecting_banner = False
else:
banner_lines.append(line)
--
Rhodri James *-* Wildebeest Herder to the Masses
[toc] | [prev] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2013-10-27 18:43 -0400 |
| Message-ID | <roy-A61323.18431627102013@news.panix.com> |
| In reply to | #57769 |
In article <op.w5mwa3iaa8ncjz@gnudebeest>, "Rhodri James" <rhodri@wildebst.demon.co.uk> wrote: > I really, really wouldn't do this with a single regexp. You'll get a much > easier to understand program if you implement a small state machine > instead. And what is a regex if not a small state machine?
[toc] | [prev] | [next] | [standalone]
| From | Ben Finney <ben+python@benfinney.id.au> |
|---|---|
| Date | 2013-10-28 10:34 +1100 |
| Message-ID | <mailman.1671.1382917205.18130.python-list@python.org> |
| In reply to | #57771 |
Roy Smith <roy@panix.com> writes: > In article <op.w5mwa3iaa8ncjz@gnudebeest>, > "Rhodri James" <rhodri@wildebst.demon.co.uk> wrote: > > > I really, really wouldn't do this with a single regexp. You'll get a much > > easier to understand program if you implement a small state machine > > instead. > > And what is a regex if not a small state machine? Regex is not a state machine implemented by the original poster :-) Or, in other words, I interpret Rhodri as saying that the right way to do this is by implementing a *different* small state machine, which will address the task better than the small state machine of regex. -- \ “Pinky, are you pondering what I'm pondering?” “I think so, | `\ Brain, but three round meals a day wouldn't be as hard to | _o__) swallow.” —_Pinky and The Brain_ | Ben Finney
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web