Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #52934 > unrolled thread
| Started by | malhar vora <malhar.v.ntech@gmail.com> |
|---|---|
| First post | 2013-08-24 03:45 -0700 |
| Last post | 2013-08-24 13:07 +0000 |
| Articles | 3 — 2 participants |
Back to article view | Back to comp.lang.python
Help regarding urllib malhar vora <malhar.v.ntech@gmail.com> - 2013-08-24 03:45 -0700
Re: Help regarding urllib malhar vora <malhar.v.ntech@gmail.com> - 2013-08-24 03:49 -0700
Re: Help regarding urllib Dave Angel <davea@davea.name> - 2013-08-24 13:07 +0000
| From | malhar vora <malhar.v.ntech@gmail.com> |
|---|---|
| Date | 2013-08-24 03:45 -0700 |
| Subject | Help regarding urllib |
| Message-ID | <31b7c124-d5f3-485d-a838-a083773c6c31@googlegroups.com> |
Hello All,
I am simply fetching data from robots.txt of a url. Below is my code.
siteurl = siteurl.rstrip("/")
[toc] | [next] | [standalone]
| From | malhar vora <malhar.v.ntech@gmail.com> |
|---|---|
| Date | 2013-08-24 03:49 -0700 |
| Message-ID | <f6cc8018-989f-43b9-8380-4a2aa23b189c@googlegroups.com> |
| In reply to | #52934 |
On Saturday, August 24, 2013 4:15:01 PM UTC+5:30, malhar vora wrote:
> Hello All,
>
>
>
>
>
> I am simply fetching data from robots.txt of a url. Below is my code.
>
>
>
> siteurl = siteurl.rstrip("/")
Sorry for last complete. It was sent by mistake.
Here is my code.
siteurl = siteurl.rstrip("/")
roboturl = siteurl + r'/robots.txt'
robotdata = urllib.urlopen(roboturl).read() # Reading robots.txt of given url
print robotdata
In above code siteurl is fetched simply from local text file.
Whenever I run above code. In place of "/" before robots.txt, it writes "\\" in url as I found in error. The error is given below.
This is main function
Main URL : www.bestrecipes.com.au
$$$$$$$$$$:www.bestrecipes.com.au
###########-->www.bestrecipes.com.au/robots.txt
Traceback (most recent call last):
File "dataintegrator.py", line 104, in <module>
main()
File "dataintegrator.py", line 81, in main
print "Sitemap Url : " + getSiteMapUrl(i)
File "D:\Malhar Data\Projects\Data Parsing\My Code\Final Part\libs\datareader.
py", line 50, in getSiteMapUrl
robotdata = urllib.urlopen(roboturl).read() # Reading robots.txt of given ur
l
File "C:\Python26\lib\urllib.py", line 87, in urlopen
return opener.open(url)
File "C:\Python26\lib\urllib.py", line 203, in open
return getattr(self, name)(url)
File "C:\Python26\lib\urllib.py", line 461, in open_file
return self.open_local_file(url)
File "C:\Python26\lib\urllib.py", line 475, in open_local_file
raise IOError(e.errno, e.strerror, e.filename)
IOError: [Errno 2] The system cannot find the path specified: 'www.bestrecipes.c
om.au\\robots.txt'
I am new to Python and not able to figure out this problem. Please help me.
Thank you,
Malhar Vora
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <davea@davea.name> |
|---|---|
| Date | 2013-08-24 13:07 +0000 |
| Message-ID | <mailman.194.1377349701.19984.python-list@python.org> |
| In reply to | #52935 |
malhar vora wrote:
> On Saturday, August 24, 2013 4:15:01 PM UTC+5:30, malhar vora wrote:
>> Hello All,
>>
>>
>>
>>
>>
>> I am simply fetching data from robots.txt of a url. Below is my code.
>>
>>
>>
>> siteurl = siteurl.rstrip("/")
>
> Sorry for last complete. It was sent by mistake.
>
> Here is my code.
>
> siteurl = siteurl.rstrip("/")
> roboturl = siteurl + r'/robots.txt'
> robotdata = urllib.urlopen(roboturl).read() # Reading robots.txt of given url
> print robotdata
>
> In above code siteurl is fetched simply from local text file.
Why aren't you showing us what is in that local text file? Or more
specifically what siteurl turns out to be? I suspect it's missing the
http:// prefix
<snip>
> IOError: [Errno 2] The system cannot find the path specified: 'www.bestrecipes.c
> om.au\\robots.txt'
>
Looks to me like it decided this url referred to a file. That's the
default behavior when you don't specify the scheme identifier (eg.
'http")
Also it might well have been necessary to specify what Python version
and OS you're running this on. For example, the single backslash
character is specific to Windows. (The doubling presumably is an
artifact of how the error message is displayed, eg. look at how repr()
displays strings)
--
DaveA
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web