Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #48608

Re: Help me with the script? How to find items in csv file A and not in file B and vice versa

Path csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!ecngs!feeder2.ecngs.de!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <python-python-list@m.gmane.org>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.000
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'subject:not': 0.03; 'newbie': 0.05; 'column': 0.07; 'subject:file': 0.07; 'variables': 0.07; 'data:': 0.09; 'newline': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'subject:script': 0.09; 'trailing': 0.09; 'whole.': 0.09; 'subject:How': 0.10; 'subject:Help': 0.11; 'python': 0.11; '#create': 0.16; 'advance!': 0.16; 'columns': 0.16; 'csv': 0.16; 'files:': 0.16; 'format:': 0.16; 'included,': 0.16; 'received:80.91.229.3': 0.16; 'received:dip0.t-ipconnect.de': 0.16; 'received:plane.gmane.org': 0.16; 'received:t-ipconnect.de': 0.16; 'row)': 0.16; 'shortcut': 0.16; 'stats': 0.16; 'succinct': 0.16; 'tried:': 0.16; 'url:file': 0.16; 'files.': 0.16; 'wrote:': 0.18; 'module': 0.19; 'written': 0.21; 'import': 0.22; 'print': 0.22; 'header:User-Agent:1': 0.23; 'finally,': 0.24; 'script': 0.25; 'compare': 0.26; 'header:X -Complaints-To:1': 0.27; 'appear': 0.29; "doesn't": 0.30; "i'm": 0.30; 'lines': 0.31; 'alan': 0.31; 'bunch': 0.31; 'clever': 0.31; 'prints': 0.31; 'skip:= 20': 0.31; 'file': 0.32; 'lists': 0.32; 'probably': 0.32; 'skip:c 30': 0.32; 'skip:# 10': 0.33; 'skip:d 20': 0.34; 'subject:the': 0.34; 'could': 0.34; 'subject:with': 0.35; 'but': 0.35; 'data,': 0.36; 'method': 0.36; 'thanks': 0.36; 'should': 0.36; 'two': 0.37; 'skip:[ 10': 0.38; 'to:addr:python- list': 0.38; 'files': 0.38; 'rather': 0.38; 'to:addr:python.org': 0.39; 'received:org': 0.40; 'even': 0.60; 'read': 0.60; 'results.': 0.60; 'subject:? ': 0.60; 'tell': 0.60; 'new': 0.61; 'first': 0.61; 'complete': 0.62; 'such': 0.63; 'provide': 0.64; 'more': 0.64; 'great': 0.65; 'finally': 0.65; 'reads': 0.68; 'skip:w 30': 0.69; 'sales': 0.69; 'products': 0.71; 'column.': 0.84; 'subject:find': 0.84; 'url:edit': 0.84; '{...}': 0.84
X-Injected-Via-Gmane http://gmane.org/
To python-list@python.org
From Peter Otten <__peter__@web.de>
Subject Re: Help me with the script? How to find items in csv file A and not in file B and vice versa
Date Tue, 18 Jun 2013 10:39:41 +0200
Organization None
References <8390d9db-a670-4f39-81cb-34c14b59d29b@googlegroups.com>
Mime-Version 1.0
Content-Type text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding 7Bit
X-Gmane-NNTP-Posting-Host p5084b067.dip0.t-ipconnect.de
User-Agent KNode/4.7.3
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.3515.1371544747.3114.python-list@python.org> (permalink)
Lines 97
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1371544747 news.xs4all.nl 15907 [2001:888:2000:d::a6]:57479
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:48608

Show key headers only | View raw


Alan Newbie wrote:

> Hello,
> Let's say I want to compare two csv files: file A and file B. They are
> both similarly built - the first column has product IDs (one product per
> row) and the columns provide some stats about the products such as sales
> in # and $.
> 
> I want to compare these files - see which product IDs appear in the first
> column of file A and not in B, and which in B and not A. Finally, it would
> be very great if the result could be written into two new CSV files - one
> product ID per row in the first column. (no other data in the other
> columns needed)
> 
> This is the script I tried:
> ==========================
> 
> import csv
> 
> #open CSV's and read first column with product IDs into variables pointing
> #to lists
> A = [line.split(',')[0] for line in open('Afile.csv')]
> B = [line.split(',')[0] for line in open('Bfile.csv')]
> 
> #create variables pointing to lists with unique product IDs in A and B
> #respectively
> inAnotB = list(set(A)-set(B))
> inBnotA = list(set(B)-set(A))
> 
> print inAnotB
> print inBnotA
> 
> c = csv.writer(open("inAnotB.csv", "wb"))
> c.writerow([inAnotB])
> 
> 
> d = csv.writer(open("inBnotA.csv", "wb"))
> d.writerow([inBnotA])
> 
> print "done!"
> 
> =====================================================
> 
> But it doesn't produce the required results.
> It prints IDs in this format:
> 247158132\n

Python reads lines from a file with the trailing newline included, and 
line.split(",") with only one column (i. e. no comma) keeps the whole line. 
As you already know about the csv module you should use it to read your 
data, e. g. instead of

> A = [line.split(',')[0] for line in open('Afile.csv')]

try

with open("Afile.csv", "rb") as f:
    a = {row[0] for row in csv.reader(f)}
...

I used {...} instead of [...], so a is already a set and you can proceed:


in_a_not_b = a - b

Finally as a shortcut for

for item in in_a_not_b:
    writer.writerow([item])

use the writerows() method to write your data:

with open("inAnotB.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows([item] for item in_a_not_b)

Note that I'm wrapping every item in the set rather than the complete set as 
a whole. If you wanted to be clever you could spell that even more succinct 
as

    writer.writerows(zip(in_a_not_b))

> and nothing to the csv files.
> 
> You could probably tell I'm a newbie.
> Could you help me out?
> 
> here's some dummy data:
> 
https://docs.google.com/file/d/0BwziqsHUZOWRYU15aEFuWm9fajA/edit?usp=sharing
> 
> 
https://docs.google.com/file/d/0BwziqsHUZOWRQVlTelVveEhsMm8/edit?usp=sharing
> 
> Thanks a bunch in advance! :)

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Help me with the script? How to find items in csv file A and not in file B and vice versa Alan Newbie <alonnirs@gmail.com> - 2013-06-18 01:01 -0700
  Re: Help me with the script? How to find items in csv file A and not in file B and vice versa Peter Otten <__peter__@web.de> - 2013-06-18 10:39 +0200
    Re: Help me with the script? How to find items in csv file A and not in file B and vice versa alonnirs@gmail.com - 2013-06-18 02:01 -0700
      Re: Help me with the script? How to find items in csv file A and not in file B and vice versa Andreas Perstinger <andipersti@gmail.com> - 2013-06-18 12:14 +0200
        Re: Help me with the script? How to find items in csv file A and not in file B and vice versa Alan Newbie <alonnirs@gmail.com> - 2013-06-18 03:48 -0700

csiph-web