Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #75286

Re: Parse bug text file

Path csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <python-python-list@m.gmane.org>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.000
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'python,': 0.02; 'else:': 0.03; 'skip:[ 20': 0.04; 'subject:text': 0.05; 'lines,': 0.07; 'subject:bug': 0.07; 'subject:file': 0.07; "'')": 0.09; 'bug.': 0.09; 'character,': 0.09; 'lines:': 0.09; 'pointers': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'slow.': 0.09; 'runs': 0.10; 'python': 0.11; 'def': 0.12; 'bug': 0.12; 'jan': 0.12; 'assume': 0.14; '"-"': 0.16; "'''": 0.16; '(there': 0.16; 'dump': 0.16; 'hat,': 0.16; 'looping': 0.16; 'received:80.91.229.3': 0.16; 'received:plane.gmane.org': 0.16; 'reedy': 0.16; 'wrote:': 0.18; 'basically': 0.19; 'split': 0.19; 'examples': 0.20; 'starts': 0.20; 'thanks.': 0.20; '(the': 0.22; '>>>': 0.22; 'example': 0.22; 'header:User-Agent:1': 0.23; 'format,': 0.24; 'lets': 0.24; 'parse': 0.24; 'text.': 0.24; 'file.': 0.24; '(or': 0.24; 'task': 0.26; 'second': 0.26; 'header:X-Complaints-To:1': 0.27; 'header:In-Reply-To:1': 0.27; 'record': 0.27; 'character': 0.29; 'words': 0.29; 'along': 0.30; "i'm": 0.30; 'lines': 0.31; 'usually': 0.31; 'bad.': 0.31; 'block,': 0.31; 'filed': 0.31; 'requesting': 0.31; 'file': 0.32; 'stuff': 0.32; 'text': 0.33; 'up.': 0.33; 'becomes': 0.33; 'bugs': 0.33; '"the': 0.34; 'something': 0.35; 'but': 0.35; 'google': 0.35; 'there': 0.35; 'really': 0.36; 'yield': 0.36; 'next': 0.36; 'application': 0.37; 'too': 0.37; 'two': 0.37; 'list': 0.37; 'sometimes': 0.38; 'to:addr:python-list': 0.38; 'issue': 0.38; 'pm,': 0.38; 'rather': 0.38; 'bad': 0.39; 'generating': 0.39; 'received:71': 0.39; 'sure': 0.39; 'to:addr:python.org': 0.39; 'received:org': 0.40; 'how': 0.40; 'blank': 0.60; 'wonderful': 0.60; 'first': 0.61; 'such': 0.63; 'skip:n 10': 0.64; 'grab': 0.64; 'more': 0.64; 'love': 0.65; 'within': 0.65; 'here': 0.66; 'between': 0.67; 'wear': 0.68; 'goal': 0.75; 'grabbing': 0.84; 'received:fios.verizon.net': 0.84; 'that),': 0.91
X-Injected-Via-Gmane http://gmane.org/
To python-list@python.org
From Terry Reedy <tjreedy@udel.edu>
Subject Re: Parse bug text file
Date Sun, 27 Jul 2014 15:15:56 -0400
References <de64e370-d83a-4919-863b-e744ad20b62a@googlegroups.com>
Mime-Version 1.0
Content-Type text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding 7bit
X-Gmane-NNTP-Posting-Host pool-71-175-90-87.phlapa.fios.verizon.net
User-Agent Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0
In-Reply-To <de64e370-d83a-4919-863b-e744ad20b62a@googlegroups.com>
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.12368.1406488571.18130.python-list@python.org> (permalink)
Lines 105
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1406488571 news.xs4all.nl 2859 [2001:888:2000:d::a6]:58827
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:75286

Show key headers only | View raw


On 7/27/2014 2:08 PM, CM wrote:
> I have a big text file of bugs that I want to use Python to parse
> such that the bugs can be neatly filed into a database. I can bumble
> toward a solution with looping but feel this is a classic example of
> reinventing the wheel, and yet I'm finding it hard to Google for.
>
> Basically the file is structured like this (silly examples, of
> course), with each of these three lets call a "bug block":
>
>
> - BUG 2.13.14  When you wear a purple hat, the application locks up.
> If you sing the theme to "The Love Boat", the application becomes
> available again.
>
> - ISSUE 2.13.14  During thunderstorms, the application runs
> backwards.
>
> - BUG/OPTIMIZE 11.12.12:  Sometimes the application is really slow.
> That's too bad.
>
>
> Generally, every bug block starts with a "-" as the first character,

I will assume 'always'

> then some words in all caps, a date in that format, and then the
> descriptive text. There is always a blank line in between bug blocks,
> but sometimes there may be a blank line within the bug description as
> well.
>
> The goal is to grab each bug block, clean up that text (there are CRs
> in it, etc., but I can do that), and dump it into a database record
> (the db stuff I can do).  Grabbing the date along the way would be
> wonderful as well.
>
> I can go through it with opening the text file and reading in the
> lines, and if the first character is a "-" then count that as the
> start of a bug block, but I am not sure how to find the last line of
> a bug block...it would be the line before the first line of the next
> bug block, but not sure the best way to go about it.
>
> There must be a rather standard way to do something like this in
> Python, and I'm requesting pointers toward that standard way (or what
> this type of task is usually called).  Thanks.

Split the processing into two phases: generating individual bugs and 
processing each bug. Here is a prototype.

with open(bugfile) as f:
     for bug in bugs(f):
         process(bug)

Here are two examples of the first phase. Use the second for a big file. 
  (If individual bugs are more than a few lines, I would collect lines 
in the generator in a list and use ''.join(<list>)).

bugtext = '''\
- BUG 2.13.14  When you wear a purple hat, the application locks up.
If you sing the theme to "The Love Boat",
the application becomes available again.

- ISSUE 2.13.14  During thunderstorms, the application runs backwards.

- BUG/OPTIMIZE 11.12.12:  Sometimes the application is really slow.
That's too bad
'''

buglist1 = [bug.strip().replace('\n', '') for bug in 
bugtext[1:].split('\n-')]
for bug in buglist1: print(bug)

def bugs(lines):
     lines = iter(lines)
     bug = next(lines)[1:]
     for line in lines:
         if line[:1] != '-':
             bug += line
         else:
             yield bug.strip()
             bug = line[1:]
     yield bug.strip()


buglist2 = [bug for bug in bugs(bugtext.splitlines())]
for bug in buglist2: print(bug)
print(buglist1 == buglist2)

 >>>
BUG 2.13.14  When you wear a purple hat, the application locks up.If you 
sing the theme to "The Love Boat",the application becomes available again.
ISSUE 2.13.14  During thunderstorms, the application runs backwards.
BUG/OPTIMIZE 11.12.12:  Sometimes the application is really slow.That's 
too bad
BUG 2.13.14  When you wear a purple hat, the application locks up.If you 
sing the theme to "The Love Boat",the application becomes available again.
ISSUE 2.13.14  During thunderstorms, the application runs backwards.
BUG/OPTIMIZE 11.12.12:  Sometimes the application is really slow.That's 
too bad
True

Now write process(bug)

-- 
Terry Jan Reedy

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Parse bug text file CM <cmpython@gmail.com> - 2014-07-27 11:08 -0700
  Re: Parse bug text file Chris Angelico <rosuav@gmail.com> - 2014-07-28 04:17 +1000
  Re: Parse bug text file Terry Reedy <tjreedy@udel.edu> - 2014-07-27 15:15 -0400
  Re: Parse bug text file wxjmfauth@gmail.com - 2014-07-27 13:55 -0700
  Re: Parse bug text file CM <cmpython@gmail.com> - 2014-07-28 11:37 -0700
    Re: Parse bug text file wxjmfauth@gmail.com - 2014-07-29 00:48 -0700

csiph-web