Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #90613
| Path | csiph.com!usenet.pasdenom.info!news.redatomik.org!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <lab@pacbell.net> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.007 |
| X-Spam-Evidence | '*H*': 0.99; '*S*': 0.00; 'python.': 0.02; 'else:': 0.03; 'beginner': 0.05; 'duplicate': 0.07; 'memory.': 0.07; 'problem:': 0.07; 'rows': 0.09; 'rows,': 0.09; '72k': 0.16; 'before.': 0.16; 'columns': 0.16; 'csv': 0.16; 'dict': 0.16; 'dictionaries': 0.16; 'does,': 0.16; 'help?': 0.16; 'mylist': 0.16; 'stumbled': 0.16; 'tuple.': 0.16; 'so.': 0.16; 'wrote:': 0.18; 'module': 0.19; 'file,': 0.19; 'thu,': 0.19; 'not,': 0.20; 'written': 0.21; 'coding': 0.22; 'header:User-Agent:1': 0.23; "aren't": 0.24; 'exists': 0.24; '(or': 0.24; "i've": 0.25; 'handling': 0.26; 'header:In-Reply-To:1': 0.27; 'external': 0.29; 'am,': 0.29; 'said,': 0.30; "i'm": 0.30; 'work.': 0.31; 'easier': 0.31; 'usually': 0.31; 'about.': 0.31; "d'aprano": 0.31; 'large.': 0.31; 'steven': 0.31; 'yesterday': 0.31; 'file': 0.32; 'there.': 0.32; 'this.': 0.32; 'worked': 0.33; 'computer.': 0.33; 'lab': 0.33; 'problem': 0.35; 'something': 0.35; 'add': 0.35; 'really': 0.36; 'combination': 0.36; 'in.': 0.36; 'processed': 0.36; 'doing': 0.36; "i'll": 0.36; 'should': 0.36; 'wrong': 0.37; 'so,': 0.37; 'list': 0.37; 'being': 0.38; 'thank': 0.38; 'lists.': 0.38; 'to:addr:python-list': 0.38; 'pm,': 0.38; 'short': 0.38; 'does': 0.39; 'sure': 0.39; 'to:addr:python.org': 0.39; 'enough': 0.39; 'how': 0.40; 'even': 0.60; 'remove': 0.60; 'received:67.195': 0.60; 'staff': 0.61; 'simple': 0.61; 'more': 0.64; 'accounts': 0.64; 'charset:windows-1252': 0.65; 'life': 0.66; 'hours': 0.66; 'here': 0.66; 'reply': 0.66; 'dont': 0.67; 'received:gq1.yahoo.com': 0.68; 'received:mail.gq1.yahoo.com': 0.68; 'study': 0.69; '2015': 0.84; 'received:bullet.mail.gq1.yahoo.com': 0.84; 'approached': 0.93; 'hundred': 0.95 |
| DKIM-Signature | v=1; a=rsa-sha256; c=relaxed/relaxed; d=pacbell.net; s=s2048; t=1431622669; bh=fiBi83Zg8BYNuQJc7pDul6l6siMd9G5VZQ+L2+ENsLY=; h=Date:From:To:Subject:References:In-Reply-To:From:Subject; b=Eovq8vfbzoRivJzkMeZGKWLtBzFkd71q5tlsEx34T8RCCeFjiY3FUg6G22YMGBDlsycvHSUOHQiVV45gCJTi58Eq0zhJ7HeZQBtmaF/4Ow+04NfAls8GRne8RiWSOKRwjy6yMF+EM9O7uC580kTFYERgjASFh+peb64fyltxGbEcMFTw2XjNviehv3NmlgrjsjXkymgYjdV2BKli5IwMVhhzC8Kx357JNaNm8rGdhRZkhf3Vo+5qbwoDdoOjzrhHgzOM+U5wJid0a6Uqfq+GYTzXt5XJ7/wRZar3y7FR2KJubwUHHeToNEFqn9bDO28LY3u/yQVQUQt9109MD41bmA== |
| X-Yahoo-Newman-Id | 468520.45482.bm@smtp119.sbc.mail.gq1.yahoo.com |
| X-Yahoo-Newman-Property | ymail-3 |
| X-YMail-OSG | FDmGGXoVM1kScIx1vu4NbGGUgqA.u.MaNwRqtoGha5QLjx_ CDmSOXYdeBE38U7.mfOl3ktrgp2VDkJB3k75eF3bwEVNqvxxuEqNo.RhCKi1 IHMGt3PVQ6WoVTXRyDUfb5pmxHlcroE9lroIpIcFX9lR0HWAuaIqsygXRpAs 2FWeC2oPUSfEkvU7ncYku7ewooJDXlaDR797JmOj11Apiii1EQOVeJYyvy_M GOdBn4GktBLebFTcLiDC0zinmNRAU9znd89CtkO8fW0.RgmSNaf0A3DLjtwL fREHwD3mchgv_FKV3.JpW6.O5dcL_hSwjGEx7JJrIjkaqPFrrkfPrEHLEps9 OBSEm.ueinZj8eXxWTARfdITHasIcw5WuPCJDl.5NcksM8awSzbmy1qP0nfg hqHLkBVLrNt8cEqerqHiHRQ4uGhz_qEq.J_EGUifkaZhB0VM8sM9l.uB0ViU NChXpn2WhMNKxsjrI9ZW3hUwdLalCAT8hpRIgYhAtQXgwrivPFVEH9LhWywZ LMcqkFp5lWCAy5SR2JX8D_f1AJrSI6h_BXHr8xEZm |
| X-Yahoo-SMTP | pX5JzlCswBCHjFe3u2nDwDnAKSoAPmfgWieH5g-- |
| Date | Thu, 14 May 2015 09:57:48 -0700 |
| From | 20/20 Lab <lab@pacbell.net> |
| Organization | 20/20 Optometric |
| User-Agent | Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 |
| MIME-Version | 1.0 |
| To | python-list@python.org |
| Subject | Re: Looking for direction |
| References | <mailman.465.1431559626.12865.python-list@python.org> <5553f8fe$0$13012$c3e8da3$5496439d@news.astraweb.com> |
| In-Reply-To | <5553f8fe$0$13012$c3e8da3$5496439d@news.astraweb.com> |
| Content-Type | text/plain; charset=windows-1252; format=flowed |
| Content-Transfer-Encoding | 7bit |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.20+ |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.5.1431622814.17265.python-list@python.org> (permalink) |
| Lines | 59 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1431622814 news.xs4all.nl 2933 [2001:888:2000:d::a6]:44278 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | csiph.com comp.lang.python:90613 |
Show key headers only | View raw
On 05/13/2015 06:23 PM, Steven D'Aprano wrote:
> On Thu, 14 May 2015 09:24 am, 20/20 Lab wrote:
>
>> I'm a beginner to python. Reading here and there. Written a couple of
>> short and simple programs to make life easier around the office.
>>
>> That being said, I'm not even sure what I need to ask for. I've never
>> worked with external data before.
>>
>> I have a LARGE csv file that I need to process. 110+ columns, 72k
>> rows. I managed to write enough to reduce it to a few hundred rows, and
>> the five columns I'm interested in.
> That's not large. Large is millions of rows, or tens of millions if you have
> enough memory. What's large to you and me is usually small to the computer.
>
> You should use the csv module for handling the CSV file, if you aren't
> already doing so. Do you need a url to the docs?
>
I actually stumbled across the csv module after coding enough to make a
list of lists. So that is more the reason I approached the list;
Nothing like spending hours (or days) coding something that already
exists and just dont know about.
>> Now is were I have my problem:
>>
>> myList = [ [123, "XXX", "Item", "Qty", "Noise"],
>> [72976, "YYY", "Item", "Qty", "Noise"],
>> [123, "XXX" "ItemTypo", "Qty", "Noise"] ]
>>
>> Basically, I need to check for rows with duplicate accounts row[0] and
>> staff (row[1]), and if so, remove that row, and add it's Qty to the
>> original row. I really dont have a clue how to go about this.
> Is the order of the rows important? If not, the problem is simpler.
>
>
> processed = {} # hold the processed data in a dict
>
> for row in myList:
> account, staff = row[0:2]
> key = (account, staff) # Put them in a tuple.
> if key in processed:
> # We've already seen this combination.
> processed[key][3] += row[3] # Add the quantities.
> else:
> # Never seen this combination before.
> processed[key] = row
>
> newlist = list(processed.values())
>
>
> Does that help?
>
>
>
It does, immensely. I'll make this work. Thank you again for the link
from yesterday and apologies for hitting the wrong reply button. I'll
have to study more on the usage and implementations of dictionaries and
tuples.
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Looking for direction 20/20 Lab <lab@pacbell.net> - 2015-05-13 16:24 -0700
Re: Looking for direction Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-05-14 11:23 +1000
Re: Looking for direction 20/20 Lab <lab@pacbell.net> - 2015-05-14 09:57 -0700
Re: Looking for direction Tim Chase <python.list@tim.thechases.com> - 2015-05-14 12:17 -0500
Re: Looking for direction Ziqi Xiong <xiongziqi84@gmail.com> - 2015-05-15 03:31 +0000
Re: Looking for direction darnold <darnold992000@yahoo.com> - 2015-05-20 05:50 -0700
Re: Looking for direction 20/20 Lab <lab@pacbell.net> - 2015-05-20 14:18 -0700
csiph-web