X-Received: by 10.236.207.169 with SMTP id n29mr11158395yho.50.1371546109533; Tue, 18 Jun 2013 02:01:49 -0700 (PDT) X-Received: by 10.49.58.242 with SMTP id u18mr263772qeq.23.1371546109484; Tue, 18 Jun 2013 02:01:49 -0700 (PDT) Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!npeer01.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!j2no599975qak.0!news-out.google.com!y6ni3349qax.0!nntp.google.com!j2no599970qak.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.python Date: Tue, 18 Jun 2013 02:01:49 -0700 (PDT) In-Reply-To: Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=82.166.184.17; posting-account=Y-DeZQoAAAC_CbTLP6ps4IEt-cUntIzM NNTP-Posting-Host: 82.166.184.17 References: <8390d9db-a670-4f39-81cb-34c14b59d29b@googlegroups.com> User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: <10d49cd4-90a9-4a9f-822e-207c87dafa50@googlegroups.com> Subject: Re: Help me with the script? How to find items in csv file A and not in file B and vice versa From: alonnirs@gmail.com Injection-Date: Tue, 18 Jun 2013 09:01:49 +0000 Content-Type: text/plain; charset=ISO-8859-1 X-Received-Bytes: 5705 Xref: csiph.com comp.lang.python:48610 Hi Peter, First off - many (many!) thanks. There's some error I don't understand. Here's the amended script I used: import csv #open CSV's and read first column with product IDs into variables pointing to lists with open("Afile.csv", "rb") as f: a = {row[0] for row in csv.reader(f)} with open("Bfile.csv", "rb") as g: b = {row[0] for row in csv.reader(g)} #create variables pointing to lists with unique product IDs in A and B respectively in_a_not_b = a-b in_b_not_a = b-a print in_a_not_b print in_b_not_a with open("inAnotB.csv", "wb") as f: writer = csv.writer(f) writer.writerows([item] for item in_a_not_b) with open("inAnotB.csv", "wb") as g: writer = csv.writer(g) writer.writerows([item] for item in_b_not_a) print "done!" and when I run it I get an invalid syntex error and (as a true newbie I used a GUI)in_a_not_b is highlighted in the with open("inAnotB.csv", "wb") as f: writer = csv.writer(f) writer.writerows([item] for item in_a_not_b) part. Could you please point our what I'm doing wrong? Thanks again :) On Tuesday, June 18, 2013 11:39:41 AM UTC+3, Peter Otten wrote: > Alan Newbie wrote: > > > > > Hello, > > > Let's say I want to compare two csv files: file A and file B. They are > > > both similarly built - the first column has product IDs (one product per > > > row) and the columns provide some stats about the products such as sales > > > in # and $. > > > > > > I want to compare these files - see which product IDs appear in the first > > > column of file A and not in B, and which in B and not A. Finally, it would > > > be very great if the result could be written into two new CSV files - one > > > product ID per row in the first column. (no other data in the other > > > columns needed) > > > > > > This is the script I tried: > > > ========================== > > > > > > import csv > > > > > > #open CSV's and read first column with product IDs into variables pointing > > > #to lists > > > A = [line.split(',')[0] for line in open('Afile.csv')] > > > B = [line.split(',')[0] for line in open('Bfile.csv')] > > > > > > #create variables pointing to lists with unique product IDs in A and B > > > #respectively > > > inAnotB = list(set(A)-set(B)) > > > inBnotA = list(set(B)-set(A)) > > > > > > print inAnotB > > > print inBnotA > > > > > > c = csv.writer(open("inAnotB.csv", "wb")) > > > c.writerow([inAnotB]) > > > > > > > > > d = csv.writer(open("inBnotA.csv", "wb")) > > > d.writerow([inBnotA]) > > > > > > print "done!" > > > > > > ===================================================== > > > > > > But it doesn't produce the required results. > > > It prints IDs in this format: > > > 247158132\n > > > > Python reads lines from a file with the trailing newline included, and > > line.split(",") with only one column (i. e. no comma) keeps the whole line. > > As you already know about the csv module you should use it to read your > > data, e. g. instead of > > > > > A = [line.split(',')[0] for line in open('Afile.csv')] > > > > try > > > > with open("Afile.csv", "rb") as f: > > a = {row[0] for row in csv.reader(f)} > > ... > > > > I used {...} instead of [...], so a is already a set and you can proceed: > > > > > > in_a_not_b = a - b > > > > Finally as a shortcut for > > > > for item in in_a_not_b: > > writer.writerow([item]) > > > > use the writerows() method to write your data: > > > > with open("inAnotB.csv", "wb") as f: > > writer = csv.writer(f) > > writer.writerows([item] for item in_a_not_b) > > > > Note that I'm wrapping every item in the set rather than the complete set as > > a whole. If you wanted to be clever you could spell that even more succinct > > as > > > > writer.writerows(zip(in_a_not_b)) > > > > > and nothing to the csv files. > > > > > > You could probably tell I'm a newbie. > > > Could you help me out? > > > > > > here's some dummy data: > > > > > https://docs.google.com/file/d/0BwziqsHUZOWRYU15aEFuWm9fajA/edit?usp=sharing > > > > > > > > https://docs.google.com/file/d/0BwziqsHUZOWRQVlTelVveEhsMm8/edit?usp=sharing > > > > > > Thanks a bunch in advance! :)