Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #7813

Re: HTTPConncetion - HEAD request

Path csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!feeder.news-service.com!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Return-Path <rosuav@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.013
X-Spam-Evidence '*H*': 0.97; '*S*': 0.00; 'sorts': 0.04; 'instance,': 0.05; '301': 0.09; 'header,': 0.09; 'pm,': 0.10; '>>>': 0.12; 'received:209.85.214.174': 0.14; 'received:mail- iw0-f174.google.com': 0.14; 'wrote:': 0.14; "''),": 0.16; '(note': 0.16; '404': 0.16; 'angelico': 0.16; "b''": 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'validity,': 0.16; 'command': 0.19; 'header:In-Reply-To:1': 0.21; 'seems': 0.21; 'ignore': 0.21; 'request,': 0.22; 'tells': 0.22; 'so.': 0.22; 'issues.': 0.23; 'replacing': 0.23; 'fri,': 0.23; 'code': 0.24; "doesn't": 0.25; 'example': 0.27; 'work.': 0.28; 'message-id:@mail.gmail.com': 0.28; 'received:209.85.214': 0.28; 'urls': 0.28; 'e.g.': 0.29; 'server': 0.29; 'all,': 0.30; "won't": 0.30; 'fact': 0.30; 'exist.': 0.30; 'get.': 0.30; 'changes': 0.30; 'this.': 0.31; "skip:' 10": 0.32; 'headers': 0.32; 'to:addr :python-list': 0.33; 'list': 0.33; 'actually': 0.33; '(for': 0.33; 'asking': 0.33; "isn't": 0.33; 'chris': 0.34; 'there': 0.35; 'skip:" 10': 0.35; '17,': 0.35; 'skip:h 40': 0.35; 'here,': 0.35; 'quite': 0.36; 'probably': 0.36; 'received:google.com': 0.37; 'something': 0.37; 'received:209.85': 0.37; 'case': 0.37; 'response': 0.37; 'page': 0.37; 'url:org': 0.38; 'but': 0.38; 'subject:: ': 0.38; 'some': 0.38; 'user': 0.39; 'received:209': 0.39; 'requests': 0.39; 'to:addr:python.org': 0.39; 'getting': 0.40; 'header': 0.40; 'possibility': 0.40; 'hope': 0.60; 'your': 0.60; 'body': 0.61; 'order': 0.62; 'back': 0.63; 'link': 0.64; 'fall': 0.65; 'plus': 0.65; 'here': 0.66; 'confirm': 0.72; 'succeed': 0.73; 'certain,': 0.84; 'request:': 0.91; 'skip:h 50': 0.91; 'picture': 0.97
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=LgmcwN17DomhplR8Uvac4uFi2DeeDcb8WSxDeQUITN4=; b=iVFeCRnNfBTPOs74o1S3/6uf8A8tn3aqV3SjRzj8p0nyC05cDF657usF7fAS9bkOTn k0VGm+o4fflShpQUoCIqgUGObaeGuAnTwgyehu7JjevwRcntgszqMup2lRsWvWOfrMo1 k4DjkNCw6B+mWfpHGzrPOTZngsaoTxrEQqODA=
DomainKey-Signature a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=pMwJYycwcThOpxd/MmIAyBn59XNVf0XV0Mbhg85zRJ4tPzGThEz7yk6vm433C3gHZX y9bxMYPHTex9rJp6LdgARmAWXYspreo0RJfbv+zFu1DcQzI/licADgp4+OUVgHAQpky8 ilu0ixfDH5tGLtuL1YfI1cWZgIHq5AsvPu5Wk=
MIME-Version 1.0
In-Reply-To <173e15bf-5fbd-484c-8be1-d4f0a6d155fc@u26g2000vby.googlegroups.com>
References <b06bf827-671c-4555-b3ec-108bc5c3a0b8@m10g2000yqd.googlegroups.com> <mailman.44.1308265289.1164.python-list@python.org> <173e15bf-5fbd-484c-8be1-d4f0a6d155fc@u26g2000vby.googlegroups.com>
Date Fri, 17 Jun 2011 18:44:37 +1000
Subject Re: HTTPConncetion - HEAD request
From Chris Angelico <rosuav@gmail.com>
To python-list@python.org
Content-Type text/plain; charset=ISO-8859-1
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.12
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.68.1308300280.1164.python-list@python.org> (permalink)
Lines 63
NNTP-Posting-Host 82.94.164.166
X-Trace 1308300280 news.xs4all.nl 49179 [::ffff:82.94.164.166]:51112
X-Complaints-To abuse@xs4all.nl
Xref x330-a1.tempe.blueboxinc.net comp.lang.python:7813

Show key headers only | View raw


On Fri, Jun 17, 2011 at 6:19 PM, gervaz <gervaz@gmail.com> wrote:
> The fact is that I have a list of urls and I wanted to retrieve the
> minimum necessary information in order to understand if the link is a
> valid html page or e.g. a picture or something else. As far as I
> understood here http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html
> the HEAD command is the one that let you do this. But it seems it
> doesn't work.

It's not working because of a few issues.

Twitter doesn't accept requests that come without a Host: header, so
you'll need to provide that. Also, your "HTTP 1.0" is going as the
body of the request, which is quite unnecessary. What you were getting
was a 301 redirect, as you can confirm thus:

>>> r.getcode()
301
>>> r.getheaders()
[('Date', 'Fri, 17 Jun 2011 08:31:31 GMT'), ('Server', 'Apache'),
('Location', 'http://twitter.com/'), ('Cache-Control', 'max-age=300'),
('Expires', 'Fri, 17 Jun 2011 08:36:31 GMT'), ('Vary',
'Accept-Encoding'), ('Connection', 'close'), ('Content-Type',
'text/html; charset=iso-8859-1')]

(Note the Location header - the server's asking you to go to
twitter.com by name.)

h.request("HEAD","/",None,{"Host":"twitter.com"})

Now we have a request that the server's prepared to answer:

>>> r.getcode()
200

The headers are numerous, so I won't quote them here, but you get a
Content-Length which tells you the size of the page that you would
get, plus a few others that may be of interest. But note that there's
still no body on a HEAD request:

>>> r.read()
b''

If you want to check validity, the most important part is the code:

>>> h.request("HEAD","/aasdfadefa",None,{"Host":"twitter.com"})
>>> r=h.getresponse()
>>> r.getcode()
404

Twitter might be a bad example for this, though, as the above call
will succeed if there is a user of that name (for instance, replacing
"/aasdfadefa" with "/rosuav" changes the response to a 200). You also
have to contend with the possibility that the server won't allow HEAD
requests at all, in which case just fall back on GET.

But all this isn't certain, even so. There are some misconfigured
servers that actually send a 200 response when a page doesn't exist.
But you can probably ignore those sorts of hassles, and just code to
the standard.

Hope that helps!

Chris Angelico

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

HTTPConncetion - HEAD request gervaz <gervaz@gmail.com> - 2011-06-16 15:43 -0700
  Re: HTTPConncetion - HEAD request Ian Kelly <ian.g.kelly@gmail.com> - 2011-06-16 17:00 -0600
    Re: HTTPConncetion - HEAD request gervaz <gervaz@gmail.com> - 2011-06-17 01:19 -0700
      Re: HTTPConncetion - HEAD request Chris Angelico <rosuav@gmail.com> - 2011-06-17 18:44 +1000
  Re: HTTPConncetion - HEAD request Adam Tauno Williams <awilliam@whitemice.org> - 2011-06-17 06:14 -0400
    Re: HTTPConncetion - HEAD request gervaz <gervaz@gmail.com> - 2011-06-17 10:53 -0700
      Re: HTTPConncetion - HEAD request "Elias Fotinis" <efotinis@yahoo.com> - 2011-06-19 16:21 +0300

csiph-web