Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #68578 > unrolled thread
| Started by | Peter Otten <__peter__@web.de> |
|---|---|
| First post | 2014-03-20 15:08 +0100 |
| Last post | 2014-03-20 15:08 +0100 |
| Articles | 1 — 1 participant |
Back to article view | Back to comp.lang.python
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Re: Dictionaries Peter Otten <__peter__@web.de> - 2014-03-20 15:08 +0100
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2014-03-20 15:08 +0100 |
| Subject | Re: Dictionaries |
| Message-ID | <mailman.8301.1395324531.18130.python-list@python.org> |
ishish wrote:
> This might sound weird, but is there a limit how many dictionaries a
> can create/use in a single script?
No.
> My reason for asking is I split a 2-column-csv (phone#, ref#) file into
> a dict and am trying to put duplicated phone numbers with different ref
> numbers into new dictionaries. The script deducts the duplicated 46
> numbers but it only creates batch1.csv. Since I obviously can't see the
> wood for the trees here, can someone pls punch me into the right
> direction....
> ...(No has_key is fine, its python 2.7)
>
> f = open("file.csv", 'r')
Consider a csv with the lines
Number...
123,first
123,second
456,third
> myDict = {}
> Batch1 = {}
> Batch2 = {}
> Batch3 = {}
>
> for line in f:
> if line.startswith('Number' ):
> print "First line ignored..."
> else:
> k, v = line.split(',')
> myDict[k] = v
the first time around the assignment is
myDict["123"] = "first\n"
the second time it is
myDict["123"] = "second\n"
i. e. you are overwriting the previous value and only keep the value
corresponding to the last occurrence of a key.
A good approach to solve the problem of keeping an arbitrary number of
values per key is to make the dict value a list:
myDict = {}
with open("data.csv") as f:
next(f) # skip first line
for line in f:
k, v = line.split(",")
myDict.setdefault(k, []).append(v)
This will produce a myDict
{
"123": ["first\n", "second\n"],
"456": ["third\n"]
}
You can then proceed to find out the number of batches:
num_batches = max(len(v) for v in myDict.values())
Now write the files:
for index in range(num_batches):
with open("batch%s.csv" % (index+1), "w") as f:
for key, values in myDict.items():
if len(values) > index: # there are more than index duplicates
f.write("%s,%s" % (key, values[index]))
Back to top | Article view | comp.lang.python
csiph-web