Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #75280

Re: Parse bug text file

References <de64e370-d83a-4919-863b-e744ad20b62a@googlegroups.com>
Date 2014-07-28 04:17 +1000
Subject Re: Parse bug text file
From Chris Angelico <rosuav@gmail.com>
Newsgroups comp.lang.python
Message-ID <mailman.12364.1406485035.18130.python-list@python.org> (permalink)

Show all headers | View raw


On Mon, Jul 28, 2014 at 4:08 AM, CM <cmpython@gmail.com> wrote:
> I can go through it with opening the text file and reading in the lines, and if the first character is a "-" then count that as the start of a bug block, but I am not sure how to find the last line of a bug block...it would be the line before the first line of the next bug block, but not sure the best way to go about it.
>
> There must be a rather standard way to do something like this in Python, and I'm requesting pointers toward that standard way (or what this type of task is usually called).  Thanks.

This is a fairly standard sort of job, but there's not really a
ready-to-go bit of code. This is just straight-forward text
processing.

What I'd do is a stateful parser. Something like this:

block = None
with open("bugs.txt",encoding="utf-8") as f:
    for line in f:
        if line.startswith("- "):
            if block: save_to_database(block)
            block = line
        else:
            block += "\n" + line
if block: save_to_database(block) # don't forget to grab that last one!

This is extremely simple, and you might want to use a regex to look
for the upper-case word and date as well (this would falsely notice
any description line that happens to begin with a hyphen and a space).
But the basic idea is: initialize an accumulator to a null state;
whenever you find the beginning of something, emit the previous and
reset the accumulator; otherwise, add to the accumulator. At the end,
emit any current block.

ChrisA

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Parse bug text file CM <cmpython@gmail.com> - 2014-07-27 11:08 -0700
  Re: Parse bug text file Chris Angelico <rosuav@gmail.com> - 2014-07-28 04:17 +1000
  Re: Parse bug text file Terry Reedy <tjreedy@udel.edu> - 2014-07-27 15:15 -0400
  Re: Parse bug text file wxjmfauth@gmail.com - 2014-07-27 13:55 -0700
  Re: Parse bug text file CM <cmpython@gmail.com> - 2014-07-28 11:37 -0700
    Re: Parse bug text file wxjmfauth@gmail.com - 2014-07-29 00:48 -0700

csiph-web