Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #57015 > unrolled thread

finding data from two different files.

Started by"torque.india@gmail.com" <torque.india@gmail.com>
First post2013-10-18 07:18 +0530
Last post2013-10-18 15:54 -0700
Articles 4 — 4 participants

Back to article view | Back to comp.lang.python


Contents

  finding data from two different files. "torque.india@gmail.com" <torque.india@gmail.com> - 2013-10-18 07:18 +0530
    Re: finding data from two different files. Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-10-18 02:24 +0000
    Re: finding data from two different files. Roy Smith <roy@panix.com> - 2013-10-18 08:31 -0400
    Re: finding data from two different files. Jim Gibson <jimsgibson@gmail.com> - 2013-10-18 15:54 -0700

#57015 — finding data from two different files.

From"torque.india@gmail.com" <torque.india@gmail.com>
Date2013-10-18 07:18 +0530
Subjectfinding data from two different files.
Message-ID<mailman.1193.1382062311.18130.python-list@python.org>
Hi all,

I am new to python, just was looking for logic to understand to write code in the below scenario.

I am having a file (filea) with multiple columns, and another file(fileb) with again multiple columns, but say i want to use column2 of fileb as a search expression to search for similar value in column3 of filea. and print it with value of rows of filea.

filea:
a 1 ab
b 2 bc
d 3 de
e 4 ef
.
.
.

fileb
z ab 24
y bc 85
x ef 123
w de 33 

Regards../ omps

[toc] | [next] | [standalone]


#57018

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2013-10-18 02:24 +0000
Message-ID<52609bd3$0$29981$c3e8da3$5496439d@news.astraweb.com>
In reply to#57015
On Fri, 18 Oct 2013 07:18:49 +0530, torque.india@gmail.com wrote:

> Hi all,
> 
> I am new to python, just was looking for logic to understand to write
> code in the below scenario.
> 
> I am having a file (filea) with multiple columns, and another
> file(fileb) with again multiple columns, but say i want to use column2
> of fileb as a search expression to search for similar value in column3
> of filea. and print it with value of rows of filea.
> 
> filea:
> a 1 ab
> b 2 bc
> d 3 de
> e 4 ef
> .
> .
> .
> 
> fileb
> z ab 24
> y bc 85
> x ef 123
> w de 33


Can you explain your problem a little better? You've shown some example 
data, which is great, but what are we supposed to do with it? Given the 
data shown above, what result would you expect to get?

My guess is that you want to do something like this:

* walk through fileB, extract each line in turn
* extract the second column
* then search fileA for lines where column 3 matches
* then... I don't know, maybe print the match?


Repeatedly walking through fileA will be slow. So it is better to do this 
only once, ahead of time. I suggest that you probably want to use the csv 
module to read the data, but because I'm lazy, I'm going to do it by hand:

# Prepare fileA for later searches
data = {}  # use a dict to map column 3 to the rest of the data
with open("fileA") as f:
    for line in f:
        fields = line.split()  # split on whitespace
        col3 = fields[2]  # remember fields are numbered from 0, not 1
        data[col3] = line


The above assumes that each item in column 3 is unique. If it isn't, 
you'll need a different strategy.

Now on to the second part:


with open("fileB") as f:
    for line in f:
        col2 = line.split()[1]
        # This next line assumes you're using Python2
        print col2, data.get(col2, '***no match***')


Does this help?


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#57042

FromRoy Smith <roy@panix.com>
Date2013-10-18 08:31 -0400
Message-ID<roy-F77370.08312718102013@news.panix.com>
In reply to#57015
In article <mailman.1193.1382062311.18130.python-list@python.org>,
 "torque.india@gmail.com" <torque.india@gmail.com> wrote:

> Hi all,
> 
> I am new to python, just was looking for logic to understand to write code in 
> the below scenario.
> 
> I am having a file (filea) with multiple columns, and another file(fileb) 
> with again multiple columns, but say i want to use column2 of fileb as a 
> search expression to search for similar value in column3 of filea. and print 
> it with value of rows of filea.
> 
> filea:
> a 1 ab
> b 2 bc
> d 3 de
> e 4 ef
> .
> .
> .
> 
> fileb
> z ab 24
> y bc 85
> x ef 123
> w de 33 
> 
> Regards../ omps

Start by breaking this down into small tasks.  The first thing you need 
to be able to do is open filea, read it, and split each line up into 
columns.  You're going to want something along the lines of:

for line in open("filea"):
   col1, col2, col3 = line.split()

Play with that for a while and make sure you understand what's going on.  
There's the iteration over the lines of a file, the splitting of each 
line into a list of fields, and the unpacking of that list into three 
variables.  Each of those are very common operations that you'll be 
using often.

At some point, you're going to want to say, "I've got a line from fileb 
whose column 2 is 'ab'; what line from filea has 'ab' in column 3?"  
That call for a map.  In Python, it's called a dictionary.  As you read 
fileb, you'll want to build a map, something like:

map = {}
for line in open("filea"):
   col1, col2, col3 = line.split()
   map[col3] = line

Once you've done that, try:

>>> print map

and see what it gives you.  Then, read up on dictionaries

http://docs.python.org/2/tutorial/datastructures.html#dictionaries

and see if the hints I've given you are enough to get the rest of the 
way yourself.  If not, come back and ask more questions.

Oh, also, you didn't say what version of Python you're using.  My 
examples above assumed Python 2.  If you're using Python 3, some minor 
details may change, so let us know which you're using.

[toc] | [prev] | [next] | [standalone]


#57083

FromJim Gibson <jimsgibson@gmail.com>
Date2013-10-18 15:54 -0700
Message-ID<181020131554447773%jimsgibson@gmail.com>
In reply to#57015
In article <mailman.1193.1382062311.18130.python-list@python.org>,
<"torque.india@gmail.com"> wrote:

> Hi all,
> 
> I am new to python, just was looking for logic to understand to write code in
> the below scenario.
> 
> I am having a file (filea) with multiple columns, and another file(fileb)
> with again multiple columns, but say i want to use column2 of fileb as a
> search expression to search for similar value in column3 of filea. and print
> it with value of rows of filea.
> 
> filea:
> a 1 ab
> b 2 bc
> d 3 de
> e 4 ef
> .
> .
> .
> 
> fileb
> z ab 24
> y bc 85
> x ef 123
> w de 33 
> 
> Regards../ omps

Interestingly, somebody named "Om Prakash Singh" asked the identical
question on the perl beginners list, except with the word "perl"
substituted for "python". Is this a homework problem? Are you unsure
about which language to use? Are you comparison shopping?

-- 
Jim Gibson

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web