Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #86784

Re: Picking apart a text line

From Russell Owen <rowen@uw.edu>
Subject Re: Picking apart a text line
Date 2015-03-02 12:25 -0800
References <mcopnu$s6t$1@ger.gmane.org>
Newsgroups comp.lang.python
Message-ID <mailman.67.1425327960.13471.python-list@python.org> (permalink)

Show all headers | View raw


On 2/26/15 7:53 PM, memilanuk wrote:
> So... okay.  I've got a bunch of PDFs of tournament reports that I want
> to sift thru for information.  Ended up using 'pdftotext -layout
> file.pdf file.txt' to extract the text from the PDF.  Still have a few
> little glitches to iron out there, but I'm getting decent enough results
> for the moment to move on.
>
...
> So back to the lines of text I have stored as strings in a list.  I
> think I want to convert that to a list of lists, i.e. split each line
> up, store that info in another list and ditch the whitespace.  Or would
> I be better off using dicts?  Originally I was thinking of how to
> process each line and split it them up based on what information was
> where - some sort of nested for/if mess.  Now I'm starting to think that
> the lines of text are pretty uniform in structure i.e. the same field is
> always in the same location, and that list slicing might be the way to
> go, if a bit tedious to set up initially...?
>
> Any thoughts or suggestions from people who've gone down this particular
> path would be greatly appreciated.  I think I have a general
> idea/direction, but I'm open to other ideas if the path I'm on is just
> blatantly wrong.

It sounds to me as if the best way to handle all this is keep the 
information it in a database, preferably one available from the network 
and centrally managed, so whoever enters the information in the first 
place enters it there. But I admit that setting such a thing up requires 
some overhead.

Simpler alternatives include using SQLite, a simple file-based database 
system, or numpy structured arrays (arrays with named fields). Python 
includes a standard library module for sqlite and numpy is easy to install.

-- Russell

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: Picking apart a text line Russell Owen <rowen@uw.edu> - 2015-03-02 12:25 -0800

csiph-web