Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #84579 > unrolled thread

Problem while reading files from hdfs using python

Started byShalini Ravishankar <shalini.ravishankar@gmail.com>
First post2015-01-25 12:23 -0800
Last post2015-02-03 18:02 -0500
Articles 2 — 2 participants

Back to article view | Back to comp.lang.python


Contents

  Problem while reading files from hdfs using python Shalini Ravishankar <shalini.ravishankar@gmail.com> - 2015-01-25 12:23 -0800
    Re: Problem while reading files from hdfs using python Dave Angel <davea@davea.name> - 2015-02-03 18:02 -0500

#84579 — Problem while reading files from hdfs using python

FromShalini Ravishankar <shalini.ravishankar@gmail.com>
Date2015-01-25 12:23 -0800
SubjectProblem while reading files from hdfs using python
Message-ID<19c7923f-2f26-4752-ac2b-1b6dfc093631@googlegroups.com>
Hello Everyone,

I am trying to read(open) and write files in hdfs inside a python script. But having error. Can someone tell me what is wrong here.

Code (full): sample.py
    
    #!/usr/bin/python
    

    from subprocess import Popen, PIPE
    
    print "Before Loop"
    
    cat = Popen(["hadoop", "fs", "-cat", "./sample.txt"],
                stdout=PIPE)
    put = Popen(["hadoop", "fs", "-put", "-", "./modifiedfile.txt"],
                stdin=PIPE)
    for line in cat.stdout:
        line += "Blah"
        print line
        put.stdin.write(line)
    
    cat.stdout.close()
    cat.wait()
    put.stdin.close()
    put.wait()

When I execute : 

    hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.5.1.jar -file ./sample.py -mapper './sample.py' -input sample.txt -output fileRead

It executes properly I couldn't find the file which supposed to create in hdfs modifiedfile

And When I execute :

     hadoop fs -getmerge ./fileRead/ file.txt

Inside the file.txt, I got :

    Before Loop	
    Before Loop

Can someone please tell me what I am doing wrong here ?? I dont think it reads from the sample.txt

I would really appreciate the help.


--
Thanks & Regards,
Shalini Ravishankar.

[toc] | [next] | [standalone]


#85183

FromDave Angel <davea@davea.name>
Date2015-02-03 18:02 -0500
Message-ID<mailman.18448.1423004594.18130.python-list@python.org>
In reply to#84579
On 01/25/2015 03:23 PM, Shalini Ravishankar wrote:
> Hello Everyone,
>
> I am trying to read(open) and write files in hdfs inside a python script. But having error.

Please copy/paste the full error traceback.

> Can someone tell me what is wrong here.
>
> Code (full): sample.py
>
>      #!/usr/bin/python
>
>
>      from subprocess import Popen, PIPE
>
>      print "Before Loop"
>
>      cat = Popen(["hadoop", "fs", "-cat", "./sample.txt"],
>                  stdout=PIPE)

I don't know anything about hadoop, and when you run it separately, you 
used different parameters.  So you can do a lot towards testing it yourself.

Start by running hadoop fs -cat ...  from shell to see whether it 
displays anything.  You should be able to use exactly the same arguments 
as you use in the Popen call.

Then if that seems to work as you expect, comment out your 'put' code 
below, and add some prints to the loop.  Does that look reasonable?

At that point, if both look reasonable, then try the inverse.  Write 
some known data to the 'put' command, and see if it makes it into the 
appropriate file.  Once again, you should  be able to also test the 
program parameters and behavior from the shell, typing manually into stdin.


>      put = Popen(["hadoop", "fs", "-put", "-", "./modifiedfile.txt"],
>                  stdin=PIPE)
>      for line in cat.stdout:
>          line += "Blah"
>          print line
>          put.stdin.write(line)
>
>      cat.stdout.close()
>      cat.wait()
>      put.stdin.close()
>      put.wait()
>
> When I execute :
>
>      hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.5.1.jar -file ./sample.py -mapper './sample.py' -input sample.txt -output fileRead
>
> It executes properly I couldn't find the file which supposed to create in hdfs modifiedfile
>
> And When I execute :
>
>       hadoop fs -getmerge ./fileRead/ file.txt
>
> Inside the file.txt, I got :
>
>      Before Loop	
>      Before Loop
>
> Can someone please tell me what I am doing wrong here ?? I dont think it reads from the sample.txt
>
> I would really appreciate the help.
>
>
> --
> Thanks & Regards,
> Shalini Ravishankar.
>


-- 
DaveA

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web