Groups > comp.lang.python > #48605 > unrolled thread

Help me with the script? How to find items in csv file A and not in file B and vice versa

Started by	Alan Newbie <alonnirs@gmail.com>
First post	2013-06-18 01:01 -0700
Last post	2013-06-18 03:48 -0700
Articles	5 — 4 participants

Back to article view | Back to comp.lang.python

  Help me with the script? How to find items in csv file A and not in file B and vice versa Alan Newbie <alonnirs@gmail.com> - 2013-06-18 01:01 -0700
    Re: Help me with the script? How to find items in csv file A and not in file B and vice versa Peter Otten <__peter__@web.de> - 2013-06-18 10:39 +0200
      Re: Help me with the script? How to find items in csv file A and not in file B and vice versa alonnirs@gmail.com - 2013-06-18 02:01 -0700
        Re: Help me with the script? How to find items in csv file A and not in file B and vice versa Andreas Perstinger <andipersti@gmail.com> - 2013-06-18 12:14 +0200
          Re: Help me with the script? How to find items in csv file A and not in file B and vice versa Alan Newbie <alonnirs@gmail.com> - 2013-06-18 03:48 -0700

#48605 — Help me with the script? How to find items in csv file A and not in file B and vice versa

From	Alan Newbie <alonnirs@gmail.com>
Date	2013-06-18 01:01 -0700
Subject	Help me with the script? How to find items in csv file A and not in file B and vice versa
Message-ID	<8390d9db-a670-4f39-81cb-34c14b59d29b@googlegroups.com>

Hello,
Let's say I want to compare two csv files: file A and file B. They are both similarly built - the first column has product IDs (one product per row) and the columns provide some stats about the products such as sales in # and $.

I want to compare these files - see which product IDs appear in the first column of file A and not in B, and which in B and not A.
Finally, it would be very great if the result could be written into two new CSV files - one product ID per row in the first column. (no other data in the other columns needed)

This is the script I tried:
==========================

import csv

#open CSV's and read first column with product IDs into variables pointing to lists
A = [line.split(',')[0] for line in open('Afile.csv')]
B = [line.split(',')[0] for line in open('Bfile.csv')]

#create variables pointing to lists with unique product IDs in A and B respectively 
inAnotB = list(set(A)-set(B))
inBnotA = list(set(B)-set(A))

print inAnotB
print inBnotA

c = csv.writer(open("inAnotB.csv", "wb"))
c.writerow([inAnotB])


d = csv.writer(open("inBnotA.csv", "wb"))
d.writerow([inBnotA])

print "done!" 

=====================================================

But it doesn't produce the required results.
It prints IDs in this format:
247158132\n

and nothing to the csv files.

You could probably tell I'm a newbie.
Could you help me out?

here's some dummy data:
https://docs.google.com/file/d/0BwziqsHUZOWRYU15aEFuWm9fajA/edit?usp=sharing

https://docs.google.com/file/d/0BwziqsHUZOWRQVlTelVveEhsMm8/edit?usp=sharing

Thanks a bunch in advance! :)

[toc] | [next] | [standalone]

#48608

From	Peter Otten <__peter__@web.de>
Date	2013-06-18 10:39 +0200
Message-ID	<mailman.3515.1371544747.3114.python-list@python.org>
In reply to	#48605

Alan Newbie wrote:

> Hello,
> Let's say I want to compare two csv files: file A and file B. They are
> both similarly built - the first column has product IDs (one product per
> row) and the columns provide some stats about the products such as sales
> in # and $.
> 
> I want to compare these files - see which product IDs appear in the first
> column of file A and not in B, and which in B and not A. Finally, it would
> be very great if the result could be written into two new CSV files - one
> product ID per row in the first column. (no other data in the other
> columns needed)
> 
> This is the script I tried:
> ==========================
> 
> import csv
> 
> #open CSV's and read first column with product IDs into variables pointing
> #to lists
> A = [line.split(',')[0] for line in open('Afile.csv')]
> B = [line.split(',')[0] for line in open('Bfile.csv')]
> 
> #create variables pointing to lists with unique product IDs in A and B
> #respectively
> inAnotB = list(set(A)-set(B))
> inBnotA = list(set(B)-set(A))
> 
> print inAnotB
> print inBnotA
> 
> c = csv.writer(open("inAnotB.csv", "wb"))
> c.writerow([inAnotB])
> 
> 
> d = csv.writer(open("inBnotA.csv", "wb"))
> d.writerow([inBnotA])
> 
> print "done!"
> 
> =====================================================
> 
> But it doesn't produce the required results.
> It prints IDs in this format:
> 247158132\n

Python reads lines from a file with the trailing newline included, and 
line.split(",") with only one column (i. e. no comma) keeps the whole line. 
As you already know about the csv module you should use it to read your 
data, e. g. instead of

> A = [line.split(',')[0] for line in open('Afile.csv')]

try

with open("Afile.csv", "rb") as f:
    a = {row[0] for row in csv.reader(f)}
...

I used {...} instead of [...], so a is already a set and you can proceed:


in_a_not_b = a - b

Finally as a shortcut for

for item in in_a_not_b:
    writer.writerow([item])

use the writerows() method to write your data:

with open("inAnotB.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows([item] for item in_a_not_b)

Note that I'm wrapping every item in the set rather than the complete set as 
a whole. If you wanted to be clever you could spell that even more succinct 
as

    writer.writerows(zip(in_a_not_b))

> and nothing to the csv files.
> 
> You could probably tell I'm a newbie.
> Could you help me out?
> 
> here's some dummy data:
> 
https://docs.google.com/file/d/0BwziqsHUZOWRYU15aEFuWm9fajA/edit?usp=sharing
> 
> 
https://docs.google.com/file/d/0BwziqsHUZOWRQVlTelVveEhsMm8/edit?usp=sharing
> 
> Thanks a bunch in advance! :)

[toc] | [prev] | [next] | [standalone]

#48610

From	alonnirs@gmail.com
Date	2013-06-18 02:01 -0700
Message-ID	<10d49cd4-90a9-4a9f-822e-207c87dafa50@googlegroups.com>
In reply to	#48608

Hi Peter,
First off - many (many!) thanks.

There's some error I don't understand.
Here's the amended script I used:

import csv

#open CSV's and read first column with product IDs into variables pointing to lists
with open("Afile.csv", "rb") as f: 
    a = {row[0] for row in csv.reader(f)}
with open("Bfile.csv", "rb") as g: 
    b = {row[0] for row in csv.reader(g)} 

#create variables pointing to lists with unique product IDs in A and B respectively 
in_a_not_b = a-b 
in_b_not_a = b-a 

print in_a_not_b
print in_b_not_a

with open("inAnotB.csv", "wb") as f: 
    writer = csv.writer(f) 
    writer.writerows([item] for item in_a_not_b)

with open("inAnotB.csv", "wb") as g: 
    writer = csv.writer(g) 
    writer.writerows([item] for item in_b_not_a)

print "done!" 



and when I run it I get an invalid syntex error and (as a true newbie I used a GUI)in_a_not_b is highlighted in the 
with open("inAnotB.csv", "wb") as f: 
    writer = csv.writer(f) 
    writer.writerows([item] for item in_a_not_b)

part.

Could you please point our what I'm doing wrong?

Thanks again :)



On Tuesday, June 18, 2013 11:39:41 AM UTC+3, Peter Otten wrote:
> Alan Newbie wrote:
> 
> 
> 
> > Hello,
> 
> > Let's say I want to compare two csv files: file A and file B. They are
> 
> > both similarly built - the first column has product IDs (one product per
> 
> > row) and the columns provide some stats about the products such as sales
> 
> > in # and $.
> 
> > 
> 
> > I want to compare these files - see which product IDs appear in the first
> 
> > column of file A and not in B, and which in B and not A. Finally, it would
> 
> > be very great if the result could be written into two new CSV files - one
> 
> > product ID per row in the first column. (no other data in the other
> 
> > columns needed)
> 
> > 
> 
> > This is the script I tried:
> 
> > ==========================
> 
> > 
> 
> > import csv
> 
> > 
> 
> > #open CSV's and read first column with product IDs into variables pointing
> 
> > #to lists
> 
> > A = [line.split(',')[0] for line in open('Afile.csv')]
> 
> > B = [line.split(',')[0] for line in open('Bfile.csv')]
> 
> > 
> 
> > #create variables pointing to lists with unique product IDs in A and B
> 
> > #respectively
> 
> > inAnotB = list(set(A)-set(B))
> 
> > inBnotA = list(set(B)-set(A))
> 
> > 
> 
> > print inAnotB
> 
> > print inBnotA
> 
> > 
> 
> > c = csv.writer(open("inAnotB.csv", "wb"))
> 
> > c.writerow([inAnotB])
> 
> > 
> 
> > 
> 
> > d = csv.writer(open("inBnotA.csv", "wb"))
> 
> > d.writerow([inBnotA])
> 
> > 
> 
> > print "done!"
> 
> > 
> 
> > =====================================================
> 
> > 
> 
> > But it doesn't produce the required results.
> 
> > It prints IDs in this format:
> 
> > 247158132\n
> 
> 
> 
> Python reads lines from a file with the trailing newline included, and 
> 
> line.split(",") with only one column (i. e. no comma) keeps the whole line. 
> 
> As you already know about the csv module you should use it to read your 
> 
> data, e. g. instead of
> 
> 
> 
> > A = [line.split(',')[0] for line in open('Afile.csv')]
> 
> 
> 
> try
> 
> 
> 
> with open("Afile.csv", "rb") as f:
> 
>     a = {row[0] for row in csv.reader(f)}
> 
> ...
> 
> 
> 
> I used {...} instead of [...], so a is already a set and you can proceed:
> 
> 
> 
> 
> 
> in_a_not_b = a - b
> 
> 
> 
> Finally as a shortcut for
> 
> 
> 
> for item in in_a_not_b:
> 
>     writer.writerow([item])
> 
> 
> 
> use the writerows() method to write your data:
> 
> 
> 
> with open("inAnotB.csv", "wb") as f:
> 
>     writer = csv.writer(f)
> 
>     writer.writerows([item] for item in_a_not_b)
> 
> 
> 
> Note that I'm wrapping every item in the set rather than the complete set as 
> 
> a whole. If you wanted to be clever you could spell that even more succinct 
> 
> as
> 
> 
> 
>     writer.writerows(zip(in_a_not_b))
> 
> 
> 
> > and nothing to the csv files.
> 
> > 
> 
> > You could probably tell I'm a newbie.
> 
> > Could you help me out?
> 
> > 
> 
> > here's some dummy data:
> 
> > 
> 
> https://docs.google.com/file/d/0BwziqsHUZOWRYU15aEFuWm9fajA/edit?usp=sharing
> 
> > 
> 
> > 
> 
> https://docs.google.com/file/d/0BwziqsHUZOWRQVlTelVveEhsMm8/edit?usp=sharing
> 
> > 
> 
> > Thanks a bunch in advance! :)

[toc] | [prev] | [next] | [standalone]

#48616

From	Andreas Perstinger <andipersti@gmail.com>
Date	2013-06-18 12:14 +0200
Message-ID	<mailman.3522.1371550472.3114.python-list@python.org>
In reply to	#48610

alonnirs@gmail.com wrote:
>and when I run it I get an invalid syntex error and (as a true newbie
>I used a GUI)in_a_not_b is highlighted in the with open("inAnotB.csv",
>"wb") as f: 
>    writer = csv.writer(f) 
>    writer.writerows([item] for item in_a_not_b)
                             ^^^^^^^^^^^^^^^^^^^^

The syntax for the for-clause in a comprehension is

for x in something

thus you are missing the "in" keyword:

writer.writerows([item] for item in in_a_not_b)

Bye, Andreas

[toc] | [prev] | [next] | [standalone]

#48620

From	Alan Newbie <alonnirs@gmail.com>
Date	2013-06-18 03:48 -0700
Message-ID	<0f500f71-fd95-4501-9994-cc1929bdad8d@googlegroups.com>
In reply to	#48616

thanks a lot :)

[toc] | [prev] | [standalone]

csiph-web

Help me with the script? How to find items in csv file A and not in file B and vice versa

Contents

#48605 — Help me with the script? How to find items in csv file A and not in file B and vice versa

#48608

#48610

#48616

#48620