Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #68578

Re: Dictionaries

Path csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <python-python-list@m.gmane.org>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.001
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'else:': 0.03; 'assignment': 0.07; 'key.': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'trees': 0.09; 'python': 0.11; 'creates': 0.14; '"w")': 0.16; '(key,': 0.16; 'csv': 0.16; 'dict': 0.16; 'dictionaries': 0.16; 'duplicates': 0.16; 'files:': 0.16; 'overwriting': 0.16; 'received:80.91.229.3': 0.16; 'received:dip0.t-ipconnect.de': 0.16; 'received:plane.gmane.org': 0.16; 'received:t-ipconnect.de': 0.16; 'script?': 0.16; 'index': 0.16; 'wrote:': 0.18; 'obviously': 0.18; 'trying': 0.19; 'split': 0.19; 'print': 0.22; 'header:User-Agent:1': 0.23; 'skip': 0.24; 'script': 0.25; 'second': 0.26; 'asking': 0.27; 'values': 0.27; 'header:X-Complaints-To:1': 0.27; 'list:': 0.30; 'lines': 0.31; 'fine,': 0.31; 'file': 0.32; 'proceed': 0.33; 'problem': 0.35; "can't": 0.35; 'but': 0.35; 'there': 0.35; 'skip:[ 10': 0.38; 'to:addr:python-list': 0.38; 'previous': 0.38; 'to:addr:python.org': 0.39; 'received:org': 0.40; 'how': 0.40; 'solve': 0.60; 'new': 0.61; 'no.': 0.61; 'numbers': 0.61; 'first': 0.61; 'more': 0.64; 'different': 0.65; 'phone': 0.66; 'sound': 0.68; 'limit': 0.70; 'ref': 0.84
X-Injected-Via-Gmane http://gmane.org/
To python-list@python.org
From Peter Otten <__peter__@web.de>
Subject Re: Dictionaries
Date Thu, 20 Mar 2014 15:08:31 +0100
Organization None
References <CANXBEFogXsze6WByg_iCit-oLEQK-RXsgJCDM=ocZ7JPNgdb8g@mail.gmail.com> <cc9658ed24ed4af1b65b5098a00f9aac@home.minuskel.de>
Mime-Version 1.0
Content-Type text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding 7Bit
X-Gmane-NNTP-Posting-Host p57bdb635.dip0.t-ipconnect.de
User-Agent KNode/4.11.5
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.8301.1395324531.18130.python-list@python.org> (permalink)
Lines 76
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1395324532 news.xs4all.nl 2934 [2001:888:2000:d::a6]:56694
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:68578

Show key headers only | View raw


ishish wrote:

> This might sound weird, but is there a limit how many dictionaries a 
> can create/use in a single script?

No.
 
> My reason for asking is I split a 2-column-csv (phone#, ref#) file into 
> a dict and am trying to put duplicated phone numbers with different ref 
> numbers into new dictionaries. The script deducts the duplicated 46 
> numbers but it only creates batch1.csv. Since I obviously can't see the 
> wood for the trees here, can someone pls punch me into the right 
> direction....
> ...(No has_key is fine, its python 2.7)
> 
> f = open("file.csv", 'r')

Consider a csv with the lines

Number...
123,first
123,second
456,third
 
> myDict = {}
> Batch1 = {}
> Batch2 = {}
> Batch3 = {}
> 
> for line in f:
>         if line.startswith('Number' ):
>                 print "First line ignored..."
>         else:
>                 k, v = line.split(',')
>                 myDict[k] = v

the first time around the assignment is

myDict["123"] = "first\n"

the second time it is

myDict["123"] = "second\n"

i. e. you are overwriting the previous value and only keep the value 
corresponding to the last occurrence of a key.

A good approach to solve the problem of keeping an arbitrary number of 
values per key is to make the dict value a list:

myDict = {}
with open("data.csv") as f:
    next(f) # skip first line
    for line in f:
        k, v = line.split(",")
        myDict.setdefault(k, []).append(v)

This will produce a myDict
{
   "123": ["first\n", "second\n"],
   "456": ["third\n"]
}

You can then proceed to find out the number of batches:

num_batches = max(len(v) for v in myDict.values())

Now write the files:

for index in range(num_batches):
    with open("batch%s.csv" % (index+1), "w") as f:
        for key, values in myDict.items():
            if len(values) > index: # there are more than index duplicates
                f.write("%s,%s" % (key, values[index]))

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: Dictionaries Peter Otten <__peter__@web.de> - 2014-03-20 15:08 +0100

csiph-web