Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #75280
| References | <de64e370-d83a-4919-863b-e744ad20b62a@googlegroups.com> |
|---|---|
| Date | 2014-07-28 04:17 +1000 |
| Subject | Re: Parse bug text file |
| From | Chris Angelico <rosuav@gmail.com> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.12364.1406485035.18130.python-list@python.org> (permalink) |
On Mon, Jul 28, 2014 at 4:08 AM, CM <cmpython@gmail.com> wrote:
> I can go through it with opening the text file and reading in the lines, and if the first character is a "-" then count that as the start of a bug block, but I am not sure how to find the last line of a bug block...it would be the line before the first line of the next bug block, but not sure the best way to go about it.
>
> There must be a rather standard way to do something like this in Python, and I'm requesting pointers toward that standard way (or what this type of task is usually called). Thanks.
This is a fairly standard sort of job, but there's not really a
ready-to-go bit of code. This is just straight-forward text
processing.
What I'd do is a stateful parser. Something like this:
block = None
with open("bugs.txt",encoding="utf-8") as f:
for line in f:
if line.startswith("- "):
if block: save_to_database(block)
block = line
else:
block += "\n" + line
if block: save_to_database(block) # don't forget to grab that last one!
This is extremely simple, and you might want to use a regex to look
for the upper-case word and date as well (this would falsely notice
any description line that happens to begin with a hyphen and a space).
But the basic idea is: initialize an accumulator to a null state;
whenever you find the beginning of something, emit the previous and
reset the accumulator; otherwise, add to the accumulator. At the end,
emit any current block.
ChrisA
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Parse bug text file CM <cmpython@gmail.com> - 2014-07-27 11:08 -0700
Re: Parse bug text file Chris Angelico <rosuav@gmail.com> - 2014-07-28 04:17 +1000
Re: Parse bug text file Terry Reedy <tjreedy@udel.edu> - 2014-07-27 15:15 -0400
Re: Parse bug text file wxjmfauth@gmail.com - 2014-07-27 13:55 -0700
Re: Parse bug text file CM <cmpython@gmail.com> - 2014-07-28 11:37 -0700
Re: Parse bug text file wxjmfauth@gmail.com - 2014-07-29 00:48 -0700
csiph-web