Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #90653
| Date | 2015-05-14 08:58 -0700 |
|---|---|
| From | 20/20 Lab <lab@pacbell.net> |
| Organization | 20/20 Optometric |
| Subject | Re: Looking for direction |
| References | <5553DD2E.2080600@pacbell.net> <5553E72B.5000309@davea.name> <5553F695.9040903@davea.name> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.27.1431674916.17265.python-list@python.org> (permalink) |
On 05/13/2015 06:12 PM, Dave Angel wrote: > On 05/13/2015 08:45 PM, 20/20 Lab wrote:> > > You accidentally replied to me, rather than the mailing list. Please > use reply-list, or if your mailer can't handle that, do a Reply-All, > and remove the parts you don't want. > > > > > On 05/13/2015 05:07 PM, Dave Angel wrote: > >> On 05/13/2015 07:24 PM, 20/20 Lab wrote: > >>> I'm a beginner to python. Reading here and there. Written a > couple of > >>> short and simple programs to make life easier around the office. > >>> > >> Welcome to Python, and to this mailing list. > >> > >>> That being said, I'm not even sure what I need to ask for. I've never > >>> worked with external data before. > >>> > >>> I have a LARGE csv file that I need to process. 110+ columns, 72k > >>> rows. > >> > >> That's not very large at all. > >> > > In the grand scheme, I guess not. However I'm currently doing this > > whole process using office. So it can be a bit daunting. > > I'm not familiar with the "office" operating system. > > >>> I managed to write enough to reduce it to a few hundred rows, and > >>> the five columns I'm interested in. > >> > >>> > >>> Now is were I have my problem: > >>> > >>> myList = [ [123, "XXX", "Item", "Qty", "Noise"], > >>> [72976, "YYY", "Item", "Qty", "Noise"], > >>> [123, "XXX" "ItemTypo", "Qty", "Noise"] ] > >>> > >> > >> It'd probably be useful to identify names for your columns, even if > >> it's just in a comment. Guessing from the paragraph below, I figure > >> the first two columns are "account" & "staff" > > > > The columns that I pull are Account, Staff, Item Sold, Quantity sold, > > and notes about the sale (notes arent particularly needed, but the > > higher ups would like them in the report) > >> > >>> Basically, I need to check for rows with duplicate accounts row[0] > and > >>> staff (row[1]), and if so, remove that row, and add it's Qty to the > >>> original row. > >> > >> And which column is that supposed to be? Shouldn't there be a number > >> there, rather than a string? > >> > >>> I really dont have a clue how to go about this. The > >>> number of rows change based on which run it is, so I couldnt even get > >>> away with using hundreds of compare loops. > >>> > >>> If someone could point me to some documentation on the functions I > would > >>> need, or a tutorial it would be a great help. > >>> > >> > >> Is the order significant? Do you have to preserve the order that the > >> accounts appear? I'll assume not. > >> > >> Have you studied dictionaries? Seems to me the way to handle the > >> problem is to read in a row, create a dictionary with key of (account, > >> staff), and data of the rest of the line. > >> > >> Each time you read a row, you check if the key is already in the > >> dictionary. If not, add it. If it's already there, merge the data as > >> you say. > >> > >> Then when you're done, turn the dict back into a list of lists. > >> > > The order is irrelevant. No, I've not really studied dictionaries, but > > a few people have mentioned it. I'll have to read up on them and, more > > importantly, their applications. Seems that they are more versatile > > then I thought. > > > > Thank you. > > You have to realize that a tuple can be used as a key, in your case a > tuple of Account and Staff. > > You'll have to decide how you're going to merge the ItemSold, > QuantitySold, and notes. > Tells you how often I actually talk in mailing lists. My apologies, and thank you again.
Back to comp.lang.python | Previous | Next | Find similar | Unroll thread
Re: Looking for direction 20/20 Lab <lab@pacbell.net> - 2015-05-14 08:58 -0700
csiph-web