Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #75286
| Path | csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <python-python-list@m.gmane.org> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.000 |
| X-Spam-Evidence | '*H*': 1.00; '*S*': 0.00; 'python,': 0.02; 'else:': 0.03; 'skip:[ 20': 0.04; 'subject:text': 0.05; 'lines,': 0.07; 'subject:bug': 0.07; 'subject:file': 0.07; "'')": 0.09; 'bug.': 0.09; 'character,': 0.09; 'lines:': 0.09; 'pointers': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'slow.': 0.09; 'runs': 0.10; 'python': 0.11; 'def': 0.12; 'bug': 0.12; 'jan': 0.12; 'assume': 0.14; '"-"': 0.16; "'''": 0.16; '(there': 0.16; 'dump': 0.16; 'hat,': 0.16; 'looping': 0.16; 'received:80.91.229.3': 0.16; 'received:plane.gmane.org': 0.16; 'reedy': 0.16; 'wrote:': 0.18; 'basically': 0.19; 'split': 0.19; 'examples': 0.20; 'starts': 0.20; 'thanks.': 0.20; '(the': 0.22; '>>>': 0.22; 'example': 0.22; 'header:User-Agent:1': 0.23; 'format,': 0.24; 'lets': 0.24; 'parse': 0.24; 'text.': 0.24; 'file.': 0.24; '(or': 0.24; 'task': 0.26; 'second': 0.26; 'header:X-Complaints-To:1': 0.27; 'header:In-Reply-To:1': 0.27; 'record': 0.27; 'character': 0.29; 'words': 0.29; 'along': 0.30; "i'm": 0.30; 'lines': 0.31; 'usually': 0.31; 'bad.': 0.31; 'block,': 0.31; 'filed': 0.31; 'requesting': 0.31; 'file': 0.32; 'stuff': 0.32; 'text': 0.33; 'up.': 0.33; 'becomes': 0.33; 'bugs': 0.33; '"the': 0.34; 'something': 0.35; 'but': 0.35; 'google': 0.35; 'there': 0.35; 'really': 0.36; 'yield': 0.36; 'next': 0.36; 'application': 0.37; 'too': 0.37; 'two': 0.37; 'list': 0.37; 'sometimes': 0.38; 'to:addr:python-list': 0.38; 'issue': 0.38; 'pm,': 0.38; 'rather': 0.38; 'bad': 0.39; 'generating': 0.39; 'received:71': 0.39; 'sure': 0.39; 'to:addr:python.org': 0.39; 'received:org': 0.40; 'how': 0.40; 'blank': 0.60; 'wonderful': 0.60; 'first': 0.61; 'such': 0.63; 'skip:n 10': 0.64; 'grab': 0.64; 'more': 0.64; 'love': 0.65; 'within': 0.65; 'here': 0.66; 'between': 0.67; 'wear': 0.68; 'goal': 0.75; 'grabbing': 0.84; 'received:fios.verizon.net': 0.84; 'that),': 0.91 |
| X-Injected-Via-Gmane | http://gmane.org/ |
| To | python-list@python.org |
| From | Terry Reedy <tjreedy@udel.edu> |
| Subject | Re: Parse bug text file |
| Date | Sun, 27 Jul 2014 15:15:56 -0400 |
| References | <de64e370-d83a-4919-863b-e744ad20b62a@googlegroups.com> |
| Mime-Version | 1.0 |
| Content-Type | text/plain; charset=UTF-8; format=flowed |
| Content-Transfer-Encoding | 7bit |
| X-Gmane-NNTP-Posting-Host | pool-71-175-90-87.phlapa.fios.verizon.net |
| User-Agent | Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 |
| In-Reply-To | <de64e370-d83a-4919-863b-e744ad20b62a@googlegroups.com> |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.15 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.12368.1406488571.18130.python-list@python.org> (permalink) |
| Lines | 105 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1406488571 news.xs4all.nl 2859 [2001:888:2000:d::a6]:58827 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | csiph.com comp.lang.python:75286 |
Show key headers only | View raw
On 7/27/2014 2:08 PM, CM wrote:
> I have a big text file of bugs that I want to use Python to parse
> such that the bugs can be neatly filed into a database. I can bumble
> toward a solution with looping but feel this is a classic example of
> reinventing the wheel, and yet I'm finding it hard to Google for.
>
> Basically the file is structured like this (silly examples, of
> course), with each of these three lets call a "bug block":
>
>
> - BUG 2.13.14 When you wear a purple hat, the application locks up.
> If you sing the theme to "The Love Boat", the application becomes
> available again.
>
> - ISSUE 2.13.14 During thunderstorms, the application runs
> backwards.
>
> - BUG/OPTIMIZE 11.12.12: Sometimes the application is really slow.
> That's too bad.
>
>
> Generally, every bug block starts with a "-" as the first character,
I will assume 'always'
> then some words in all caps, a date in that format, and then the
> descriptive text. There is always a blank line in between bug blocks,
> but sometimes there may be a blank line within the bug description as
> well.
>
> The goal is to grab each bug block, clean up that text (there are CRs
> in it, etc., but I can do that), and dump it into a database record
> (the db stuff I can do). Grabbing the date along the way would be
> wonderful as well.
>
> I can go through it with opening the text file and reading in the
> lines, and if the first character is a "-" then count that as the
> start of a bug block, but I am not sure how to find the last line of
> a bug block...it would be the line before the first line of the next
> bug block, but not sure the best way to go about it.
>
> There must be a rather standard way to do something like this in
> Python, and I'm requesting pointers toward that standard way (or what
> this type of task is usually called). Thanks.
Split the processing into two phases: generating individual bugs and
processing each bug. Here is a prototype.
with open(bugfile) as f:
for bug in bugs(f):
process(bug)
Here are two examples of the first phase. Use the second for a big file.
(If individual bugs are more than a few lines, I would collect lines
in the generator in a list and use ''.join(<list>)).
bugtext = '''\
- BUG 2.13.14 When you wear a purple hat, the application locks up.
If you sing the theme to "The Love Boat",
the application becomes available again.
- ISSUE 2.13.14 During thunderstorms, the application runs backwards.
- BUG/OPTIMIZE 11.12.12: Sometimes the application is really slow.
That's too bad
'''
buglist1 = [bug.strip().replace('\n', '') for bug in
bugtext[1:].split('\n-')]
for bug in buglist1: print(bug)
def bugs(lines):
lines = iter(lines)
bug = next(lines)[1:]
for line in lines:
if line[:1] != '-':
bug += line
else:
yield bug.strip()
bug = line[1:]
yield bug.strip()
buglist2 = [bug for bug in bugs(bugtext.splitlines())]
for bug in buglist2: print(bug)
print(buglist1 == buglist2)
>>>
BUG 2.13.14 When you wear a purple hat, the application locks up.If you
sing the theme to "The Love Boat",the application becomes available again.
ISSUE 2.13.14 During thunderstorms, the application runs backwards.
BUG/OPTIMIZE 11.12.12: Sometimes the application is really slow.That's
too bad
BUG 2.13.14 When you wear a purple hat, the application locks up.If you
sing the theme to "The Love Boat",the application becomes available again.
ISSUE 2.13.14 During thunderstorms, the application runs backwards.
BUG/OPTIMIZE 11.12.12: Sometimes the application is really slow.That's
too bad
True
Now write process(bug)
--
Terry Jan Reedy
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Parse bug text file CM <cmpython@gmail.com> - 2014-07-27 11:08 -0700
Re: Parse bug text file Chris Angelico <rosuav@gmail.com> - 2014-07-28 04:17 +1000
Re: Parse bug text file Terry Reedy <tjreedy@udel.edu> - 2014-07-27 15:15 -0400
Re: Parse bug text file wxjmfauth@gmail.com - 2014-07-27 13:55 -0700
Re: Parse bug text file CM <cmpython@gmail.com> - 2014-07-28 11:37 -0700
Re: Parse bug text file wxjmfauth@gmail.com - 2014-07-29 00:48 -0700
csiph-web