Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #52939

Re: Help regarding urllib

From Dave Angel <davea@davea.name>
Subject Re: Help regarding urllib
Date 2013-08-24 13:07 +0000
References <31b7c124-d5f3-485d-a838-a083773c6c31@googlegroups.com> <f6cc8018-989f-43b9-8380-4a2aa23b189c@googlegroups.com>
Newsgroups comp.lang.python
Message-ID <mailman.194.1377349701.19984.python-list@python.org> (permalink)

Show all headers | View raw


malhar vora wrote:

> On Saturday, August 24, 2013 4:15:01 PM UTC+5:30, malhar vora wrote:
>> Hello All,
>> 
>> 
>> 
>> 
>> 
>> I am simply fetching data from robots.txt of a url. Below is my code.
>> 
>> 
>> 
>> siteurl = siteurl.rstrip("/")
>
> Sorry for last complete. It was sent by mistake.
>
> Here is my code.
>
> siteurl = siteurl.rstrip("/")
> roboturl = siteurl + r'/robots.txt'
> robotdata = urllib.urlopen(roboturl).read() # Reading robots.txt of given url
> print robotdata
>
> In above code siteurl is fetched simply from local text file.

Why aren't you showing us what is in that local text file?   Or more
specifically what siteurl turns out to be?  I suspect it's missing the
http://  prefix

    <snip>
> IOError: [Errno 2] The system cannot find the path specified: 'www.bestrecipes.c
> om.au\\robots.txt'
>

Looks to me like it decided this url referred to a file.  That's the
default behavior when you don't specify the scheme identifier (eg. 
'http")

Also it might well have been necessary to specify what Python version
and OS you're running this on.  For example, the single backslash
character is specific to Windows.  (The doubling presumably is an
artifact of how the error message is displayed, eg. look at how repr()
displays strings)


-- 
DaveA

Back to comp.lang.python | Previous | NextPrevious in thread | Find similar | Unroll thread


Thread

Help regarding urllib malhar vora <malhar.v.ntech@gmail.com> - 2013-08-24 03:45 -0700
  Re: Help regarding urllib malhar vora <malhar.v.ntech@gmail.com> - 2013-08-24 03:49 -0700
    Re: Help regarding urllib Dave Angel <davea@davea.name> - 2013-08-24 13:07 +0000

csiph-web