Groups > comp.lang.python > #57767 > unrolled thread

Parsing multiple lines from text file using regex

Started by	"Marc" <marc@marcd.org>
First post	2013-10-27 17:09 -0400
Last post	2013-10-28 10:34 +1100
Articles	4 — 4 participants

Back to article view | Back to comp.lang.python

  Parsing multiple lines from text file using regex "Marc" <marc@marcd.org> - 2013-10-27 17:09 -0400
    Re: Parsing multiple lines from text file using regex "Rhodri James" <rhodri@wildebst.demon.co.uk> - 2013-10-27 22:19 +0000
      Re: Parsing multiple lines from text file using regex Roy Smith <roy@panix.com> - 2013-10-27 18:43 -0400
        Re: Parsing multiple lines from text file using regex Ben Finney <ben+python@benfinney.id.au> - 2013-10-28 10:34 +1100

#57767 — Parsing multiple lines from text file using regex

From	"Marc" <marc@marcd.org>
Date	2013-10-27 17:09 -0400
Subject	Parsing multiple lines from text file using regex
Message-ID	<mailman.1665.1382911575.18130.python-list@python.org>

[Multipart message — attachments visible in raw view] — view raw

Hi,
I am having an issue with something that would seem to have an easy
solution, but which escapes me.  I have configuration files that I would
like to parse.  The data I am having issue with is a multi-line attribute
that has the following structure:

banner <option> <banner text delimiter>
Banner text
Banner text
Banner text
...
<banner text delimiter>

The regex 'banner\s+(\w+)\s+(.+)' captures the command nicely and
banner.group(2) captures the delimiter nicely.

My issue is that I need to capture the lines between the delimiters (both
delimiters are the same).

I have tried various permutations of 

Delimiter=banner.group(2)
re.findall(Delimiter'(.*?)'Delimiter, line, re.DOTALL|re.MULTILINE)

with no luck

Examples I have found online all assume that the starting and ending
delimiters are different and are defined directly in re.findall().  I would
like to use the original regex extracting the banner.group(2), since it is
already done, if possible.  

Any help in pointing me in the right direction would be most appreciated.

Thank you,

Marc

[toc] | [next] | [standalone]

#57769

From	"Rhodri James" <rhodri@wildebst.demon.co.uk>
Date	2013-10-27 22:19 +0000
Message-ID	<op.w5mwa3iaa8ncjz@gnudebeest>
In reply to	#57767

On Sun, 27 Oct 2013 21:09:46 -0000, Marc <marc@marcd.org> wrote:

> Hi,
> I am having an issue with something that would seem to have an easy
> solution, but which escapes me.  I have configuration files that I would
> like to parse.  The data I am having issue with is a multi-line attribute
> that has the following structure:
>
> banner <option> <banner text delimiter>
> Banner text
> Banner text
> Banner text
> ...
> <banner text delimiter>
>
> The regex 'banner\s+(\w+)\s+(.+)' captures the command nicely and
> banner.group(2) captures the delimiter nicely.
>
> My issue is that I need to capture the lines between the delimiters (both
> delimiters are the same).

I really, really wouldn't do this with a single regexp.  You'll get a much  
easier to understand program if you implement a small state machine  
instead.  In rough pseudocode:

collecting_banner = False
for line in configuration_file:
     if not collecting_banner:
         if found banner start:
             get delimiter
             collecting_banner = True
             banner_lines = []
         elif found other stuff:
             do other stuff
     elif found delimiter:
         collecting_banner = False
     else:
         banner_lines.append(line)

-- 
Rhodri James *-* Wildebeest Herder to the Masses

[toc] | [prev] | [next] | [standalone]

#57771

From	Roy Smith <roy@panix.com>
Date	2013-10-27 18:43 -0400
Message-ID	<roy-A61323.18431627102013@news.panix.com>
In reply to	#57769

In article <op.w5mwa3iaa8ncjz@gnudebeest>,
 "Rhodri James" <rhodri@wildebst.demon.co.uk> wrote:

> I really, really wouldn't do this with a single regexp.  You'll get a much  
> easier to understand program if you implement a small state machine  
> instead.

And what is a regex if not a small state machine?

[toc] | [prev] | [next] | [standalone]

#57775

From	Ben Finney <ben+python@benfinney.id.au>
Date	2013-10-28 10:34 +1100
Message-ID	<mailman.1671.1382917205.18130.python-list@python.org>
In reply to	#57771

Roy Smith <roy@panix.com> writes:

> In article <op.w5mwa3iaa8ncjz@gnudebeest>,
>  "Rhodri James" <rhodri@wildebst.demon.co.uk> wrote:
>
> > I really, really wouldn't do this with a single regexp.  You'll get a much  
> > easier to understand program if you implement a small state machine  
> > instead.
>
> And what is a regex if not a small state machine?

Regex is not a state machine implemented by the original poster :-)

Or, in other words, I interpret Rhodri as saying that the right way to
do this is by implementing a *different* small state machine, which will
address the task better than the small state machine of regex.

-- 
 \         “Pinky, are you pondering what I'm pondering?” “I think so, |
  `\         Brain, but three round meals a day wouldn't be as hard to |
_o__)                                 swallow.” —_Pinky and The Brain_ |
Ben Finney

[toc] | [prev] | [standalone]

csiph-web

Parsing multiple lines from text file using regex

Contents

#57767 — Parsing multiple lines from text file using regex

#57769

#57771

#57775