X-Received: by 10.236.207.169 with SMTP id n29mr11158395yho.50.1371546109533; Tue, 18 Jun 2013 02:01:49 -0700 (PDT)
X-Received: by 10.49.58.242 with SMTP id u18mr263772qeq.23.1371546109484; Tue, 18 Jun 2013 02:01:49 -0700 (PDT)
Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!npeer01.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!j2no599975qak.0!news-out.google.com!y6ni3349qax.0!nntp.google.com!j2no599970qak.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail
Newsgroups: comp.lang.python
Date: Tue, 18 Jun 2013 02:01:49 -0700 (PDT)
In-Reply-To: <mailman.3515.1371544747.3114.python-list@python.org>
Complaints-To: groups-abuse@google.com
Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=82.166.184.17; posting-account=Y-DeZQoAAAC_CbTLP6ps4IEt-cUntIzM
NNTP-Posting-Host: 82.166.184.17
References: <8390d9db-a670-4f39-81cb-34c14b59d29b@googlegroups.com> <mailman.3515.1371544747.3114.python-list@python.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <10d49cd4-90a9-4a9f-822e-207c87dafa50@googlegroups.com>
Subject: Re: Help me with the script? How to find items in csv file A and not in file B and vice versa
From: alonnirs@gmail.com
Injection-Date: Tue, 18 Jun 2013 09:01:49 +0000
Content-Type: text/plain; charset=ISO-8859-1
X-Received-Bytes: 5705
Xref: csiph.com comp.lang.python:48610

Hi Peter,
First off - many (many!) thanks.

There's some error I don't understand.
Here's the amended script I used:

import csv

#open CSV's and read first column with product IDs into variables pointing to lists
with open("Afile.csv", "rb") as f: 
    a = {row[0] for row in csv.reader(f)}
with open("Bfile.csv", "rb") as g: 
    b = {row[0] for row in csv.reader(g)} 

#create variables pointing to lists with unique product IDs in A and B respectively 
in_a_not_b = a-b 
in_b_not_a = b-a 

print in_a_not_b
print in_b_not_a

with open("inAnotB.csv", "wb") as f: 
    writer = csv.writer(f) 
    writer.writerows([item] for item in_a_not_b)

with open("inAnotB.csv", "wb") as g: 
    writer = csv.writer(g) 
    writer.writerows([item] for item in_b_not_a)

print "done!" 



and when I run it I get an invalid syntex error and (as a true newbie I used a GUI)in_a_not_b is highlighted in the 
with open("inAnotB.csv", "wb") as f: 
    writer = csv.writer(f) 
    writer.writerows([item] for item in_a_not_b)

part.

Could you please point our what I'm doing wrong?

Thanks again :)



On Tuesday, June 18, 2013 11:39:41 AM UTC+3, Peter Otten wrote:
> Alan Newbie wrote:
> 
> 
> 
> > Hello,
> 
> > Let's say I want to compare two csv files: file A and file B. They are
> 
> > both similarly built - the first column has product IDs (one product per
> 
> > row) and the columns provide some stats about the products such as sales
> 
> > in # and $.
> 
> > 
> 
> > I want to compare these files - see which product IDs appear in the first
> 
> > column of file A and not in B, and which in B and not A. Finally, it would
> 
> > be very great if the result could be written into two new CSV files - one
> 
> > product ID per row in the first column. (no other data in the other
> 
> > columns needed)
> 
> > 
> 
> > This is the script I tried:
> 
> > ==========================
> 
> > 
> 
> > import csv
> 
> > 
> 
> > #open CSV's and read first column with product IDs into variables pointing
> 
> > #to lists
> 
> > A = [line.split(',')[0] for line in open('Afile.csv')]
> 
> > B = [line.split(',')[0] for line in open('Bfile.csv')]
> 
> > 
> 
> > #create variables pointing to lists with unique product IDs in A and B
> 
> > #respectively
> 
> > inAnotB = list(set(A)-set(B))
> 
> > inBnotA = list(set(B)-set(A))
> 
> > 
> 
> > print inAnotB
> 
> > print inBnotA
> 
> > 
> 
> > c = csv.writer(open("inAnotB.csv", "wb"))
> 
> > c.writerow([inAnotB])
> 
> > 
> 
> > 
> 
> > d = csv.writer(open("inBnotA.csv", "wb"))
> 
> > d.writerow([inBnotA])
> 
> > 
> 
> > print "done!"
> 
> > 
> 
> > =====================================================
> 
> > 
> 
> > But it doesn't produce the required results.
> 
> > It prints IDs in this format:
> 
> > 247158132\n
> 
> 
> 
> Python reads lines from a file with the trailing newline included, and 
> 
> line.split(",") with only one column (i. e. no comma) keeps the whole line. 
> 
> As you already know about the csv module you should use it to read your 
> 
> data, e. g. instead of
> 
> 
> 
> > A = [line.split(',')[0] for line in open('Afile.csv')]
> 
> 
> 
> try
> 
> 
> 
> with open("Afile.csv", "rb") as f:
> 
>     a = {row[0] for row in csv.reader(f)}
> 
> ...
> 
> 
> 
> I used {...} instead of [...], so a is already a set and you can proceed:
> 
> 
> 
> 
> 
> in_a_not_b = a - b
> 
> 
> 
> Finally as a shortcut for
> 
> 
> 
> for item in in_a_not_b:
> 
>     writer.writerow([item])
> 
> 
> 
> use the writerows() method to write your data:
> 
> 
> 
> with open("inAnotB.csv", "wb") as f:
> 
>     writer = csv.writer(f)
> 
>     writer.writerows([item] for item in_a_not_b)
> 
> 
> 
> Note that I'm wrapping every item in the set rather than the complete set as 
> 
> a whole. If you wanted to be clever you could spell that even more succinct 
> 
> as
> 
> 
> 
>     writer.writerows(zip(in_a_not_b))
> 
> 
> 
> > and nothing to the csv files.
> 
> > 
> 
> > You could probably tell I'm a newbie.
> 
> > Could you help me out?
> 
> > 
> 
> > here's some dummy data:
> 
> > 
> 
> https://docs.google.com/file/d/0BwziqsHUZOWRYU15aEFuWm9fajA/edit?usp=sharing
> 
> > 
> 
> > 
> 
> https://docs.google.com/file/d/0BwziqsHUZOWRQVlTelVveEhsMm8/edit?usp=sharing
> 
> > 
> 
> > Thanks a bunch in advance! :)