Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #33949

Re: Compare list entry from csv files

Path csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Return-Path <d@davea.name>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.020
X-Spam-Evidence '*H*': 0.96; '*S*': 0.00; 'importing': 0.04; 'finally:': 0.05; 'say,': 0.05; 'try:': 0.07; 'advice?': 0.09; 'subject:files': 0.09; 'cc:addr:python-list': 0.10; 'programs.': 0.11; 'index': 0.13; 'file,': 0.15; 'csv': 0.16; 'exceptions.': 0.16; 'name",': 0.16; 'row': 0.16; 'skip:n 50': 0.16; 'wrote:': 0.17; 'file.': 0.20; 'written': 0.20; 'trying': 0.21; 'bit': 0.21; 'import': 0.21; 'keys': 0.22; 'name;': 0.22; 'cc:2**0': 0.23; 'insert': 0.23; 'cc:no real name:2**0': 0.24; 'second': 0.24; 'connected': 0.24; 'cc:addr:python.org': 0.25; 'header:In-Reply- To:1': 0.25; 'header:User-Agent:1': 0.26; 'skip:" 20': 0.26; 'order.': 0.27; 'merge': 0.27; 'subject:list': 0.28; "they'll": 0.29; 'definition': 0.29; 'words': 0.29; 'probably': 0.29; "i'm": 0.29; 'lists': 0.31; 'file': 0.32; 'could': 0.32; 'another': 0.33; 'entry': 0.33; "can't": 0.34; 'thanks': 0.34; 'list': 0.35; 'pm,': 0.35; 'sometimes': 0.35; 'really': 0.36; 'except': 0.36; 'but': 0.36; 'compare': 0.36; 'skip:p 20': 0.36; 'one,': 0.37; 'two': 0.37; 'uses': 0.37; 'being': 0.37; 'ones': 0.37; 'subject:: ': 0.38; 'files': 0.38; 'skip:o 20': 0.38; 'some': 0.38; 'shows': 0.38; 'received:192': 0.39; 'build': 0.39; 'hello,': 0.39; 'little': 0.39; 'where': 0.40; 'received:192.168': 0.40; 'your': 0.60; 'claim': 0.60; 'first': 0.61; 'phone,': 0.62; 'necessarily': 0.63; 'skip:n 10': 0.63; 'information': 0.63; 'within': 0.64; 'other.': 0.64; 'header:Reply-To:1': 0.68; 'phone': 0.68; 'obvious': 0.71; 'received:74.208': 0.71; 'reply-to:no real name:2**0': 0.72; 'as:': 0.75; '(according': 0.84; 'received:74.208.4.194': 0.84; 'address;': 0.91; 'assured': 0.93; 'luck': 0.93
Date Mon, 26 Nov 2012 17:08:37 -0500
From Dave Angel <d@davea.name>
User-Agent Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121011 Thunderbird/16.0.1
MIME-Version 1.0
To Anatoli Hristov <tolidtm@gmail.com>
Subject Re: Compare list entry from csv files
References <CAKhY55M3iOhhMbn=VuwJ7OMz_JUHVxwGy1h_BDpfBY8jQ5aRkA@mail.gmail.com>
In-Reply-To <CAKhY55M3iOhhMbn=VuwJ7OMz_JUHVxwGy1h_BDpfBY8jQ5aRkA@mail.gmail.com>
Content-Type text/plain; charset=ISO-8859-1
Content-Transfer-Encoding 7bit
X-Provags-ID V02:K0:hqLawd4J6HJbuafavWXBWkaBK72fSjqvUJz8NRSLCNZ J4Tr35+fNz6mrvRamfkr4NJ5+Eck23ybvpliNyFRwXS0fSbI3i HNB3gbCiHo/NhoijXMlV6++kr+8H5BhyvmUJkJRNPNQwMWC6v7 QMfBq8uugCvpxHX7mDvx4SQEVj6PuMdFekhTAx6G/avqXjyo1h WDH23X/lIHKFpWfX9TsSpOWfOi2/Wwi7sfFxsDskbV3b9pupwK jk4G6Img+NIDLKQroBBeDU7crQ+jzOZwOgtaDQQIuXz1T9ENJp g9wsgWbfUYauo3kX72g/67H4ICIs2VHQYeui1F6PZu7o0KVZA= =
Cc python-list@python.org
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
Reply-To d@davea.name
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.301.1353967730.29569.python-list@python.org> (permalink)
Lines 70
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1353967730 news.xs4all.nl 6922 [2001:888:2000:d::a6]:44160
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:33949

Show key headers only | View raw


On 11/26/2012 04:08 PM, Anatoli Hristov wrote:
> Hello,
>
> I'm trying to complete a namebook CSV file with missing phone numbers
> which are in another CSV file.
> the namebook file is structured:
> First name;Lastname; Address; City; Country; Phone number, where the
> phone number is missing.
>
> The phonebook file is structured as:
> Name; phone, where the name shows first and last name and sometimes
> they are written together like "BillGates" or "Billgatesmicrosoft".
>
> I'm importing the files as lists ex.: phonelist" ["First name", "Last
> name","address","City"."Country","phone"],[etc...]
> in the loop I can compare the entry for ex. "Bill Gates" in the field
> "BillGatesmicrosoft" but I can't index it so I can only take the phone
> number from the file with the phones and insert it to field in the
> Namebook. Can you please give me an advice?
>
> Thanks
>
>
> import csv
>
> origf = open('c:/Working/Test_phonebook.csv', 'rt')
> phonelist = []
>
> try:
>     reader = csv.reader(origf, delimiter=';')
>     for row in reader:
>         phonelist.append(row)
> finally:
>     origf.close()
>
> secfile = open('c:/Working/phones.csv', 'rt')
> phones = []
>
> try:
>     readersec = csv.reader(secfile, delimiter=';')
>     for row in readersec:
>         phones.append(row)
> finally:
>     secfile.close()

You're trying to merge information from a second file into a first one,
where the shared key is only a little bit similar.  Good luck.

For example., in the first file, it might say  Susan; Gatley  and in the
other file it might say Mom.   Good luck coming up with an algorthm to
match those.

Now if you are assured that the two will be identical except for spaces,
then you could reduce both keys to the same format and then match them. 
Or if you want to say they're within a Soundex definition of each
other.  Or if you want to claim that they'll have the same words in
them, but not necessarily the same order.

But if these files are really as randomly connected as you say, then the
best you can probably do is to write two programs.  First is where you
take the names from each file and produce a 3rd file associating the
ones that are obvious (according to some algorithm), then build a list
of exceptions.  Then allow a human being to edit that file.  Then the
second file uses it to merge the first two files for your final pass.


-- 

DaveA

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: Compare list entry from csv files Dave Angel <d@davea.name> - 2012-11-26 17:08 -0500

csiph-web