Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #86784
| From | Russell Owen <rowen@uw.edu> |
|---|---|
| Subject | Re: Picking apart a text line |
| Date | 2015-03-02 12:25 -0800 |
| References | <mcopnu$s6t$1@ger.gmane.org> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.67.1425327960.13471.python-list@python.org> (permalink) |
On 2/26/15 7:53 PM, memilanuk wrote: > So... okay. I've got a bunch of PDFs of tournament reports that I want > to sift thru for information. Ended up using 'pdftotext -layout > file.pdf file.txt' to extract the text from the PDF. Still have a few > little glitches to iron out there, but I'm getting decent enough results > for the moment to move on. > ... > So back to the lines of text I have stored as strings in a list. I > think I want to convert that to a list of lists, i.e. split each line > up, store that info in another list and ditch the whitespace. Or would > I be better off using dicts? Originally I was thinking of how to > process each line and split it them up based on what information was > where - some sort of nested for/if mess. Now I'm starting to think that > the lines of text are pretty uniform in structure i.e. the same field is > always in the same location, and that list slicing might be the way to > go, if a bit tedious to set up initially...? > > Any thoughts or suggestions from people who've gone down this particular > path would be greatly appreciated. I think I have a general > idea/direction, but I'm open to other ideas if the path I'm on is just > blatantly wrong. It sounds to me as if the best way to handle all this is keep the information it in a database, preferably one available from the network and centrally managed, so whoever enters the information in the first place enters it there. But I admit that setting such a thing up requires some overhead. Simpler alternatives include using SQLite, a simple file-based database system, or numpy structured arrays (arrays with named fields). Python includes a standard library module for sqlite and numpy is easy to install. -- Russell
Back to comp.lang.python | Previous | Next | Find similar | Unroll thread
Re: Picking apart a text line Russell Owen <rowen@uw.edu> - 2015-03-02 12:25 -0800
csiph-web