Groups > comp.lang.python > #68578 > unrolled thread

Re: Dictionaries

Started by	Peter Otten <__peter__@web.de>
First post	2014-03-20 15:08 +0100
Last post	2014-03-20 15:08 +0100
Articles	1 — 1 participant

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.

  Re: Dictionaries Peter Otten <__peter__@web.de> - 2014-03-20 15:08 +0100

#68578 — Re: Dictionaries

From	Peter Otten <__peter__@web.de>
Date	2014-03-20 15:08 +0100
Subject	Re: Dictionaries
Message-ID	<mailman.8301.1395324531.18130.python-list@python.org>

ishish wrote:

> This might sound weird, but is there a limit how many dictionaries a 
> can create/use in a single script?

No.
 
> My reason for asking is I split a 2-column-csv (phone#, ref#) file into 
> a dict and am trying to put duplicated phone numbers with different ref 
> numbers into new dictionaries. The script deducts the duplicated 46 
> numbers but it only creates batch1.csv. Since I obviously can't see the 
> wood for the trees here, can someone pls punch me into the right 
> direction....
> ...(No has_key is fine, its python 2.7)
> 
> f = open("file.csv", 'r')

Consider a csv with the lines

Number...
123,first
123,second
456,third
 
> myDict = {}
> Batch1 = {}
> Batch2 = {}
> Batch3 = {}
> 
> for line in f:
>         if line.startswith('Number' ):
>                 print "First line ignored..."
>         else:
>                 k, v = line.split(',')
>                 myDict[k] = v

the first time around the assignment is

myDict["123"] = "first\n"

the second time it is

myDict["123"] = "second\n"

i. e. you are overwriting the previous value and only keep the value 
corresponding to the last occurrence of a key.

A good approach to solve the problem of keeping an arbitrary number of 
values per key is to make the dict value a list:

myDict = {}
with open("data.csv") as f:
    next(f) # skip first line
    for line in f:
        k, v = line.split(",")
        myDict.setdefault(k, []).append(v)

This will produce a myDict
{
   "123": ["first\n", "second\n"],
   "456": ["third\n"]
}

You can then proceed to find out the number of batches:

num_batches = max(len(v) for v in myDict.values())

Now write the files:

for index in range(num_batches):
    with open("batch%s.csv" % (index+1), "w") as f:
        for key, values in myDict.items():
            if len(values) > index: # there are more than index duplicates
                f.write("%s,%s" % (key, values[index]))

[toc] | [standalone]

csiph-web

Re: Dictionaries

Contents

#68578 — Re: Dictionaries