Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!goblin1!goblin2!goblin.stu.neva.ru!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.007 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'syntax': 0.04; '<>,': 0.07; 'binary': 0.07; 'encoded': 0.07; 'utf-8': 0.07; 'clean.': 0.09; 'http': 0.09; 'must,': 0.09; 'rfc': 0.09; 'subject:Why': 0.09; 'suggest': 0.14; '<>.': 0.14; 'effect,': 0.16; 'https': 0.16; 'mailscanner,': 0.16; 'header:User-Agent:1': 0.23; 'bytes': 0.24; '2005': 0.26; '[1]': 0.29; 'character': 0.29; 'scanned': 0.29; 'characters': 0.30; 'url:wiki': 0.31; 'schemes': 0.31; 'url:wikipedia': 0.31; 'values.': 0.31; 'convert': 0.35; 'requirement': 0.35; 'believed': 0.36; 'returning': 0.36; 'subject:?': 0.36; 'url:org': 0.36; 'should': 0.36; 'january': 0.37; 'represent': 0.38; 'generic': 0.38; 'to:addr:python-list': 0.38; 'to:addr:python.org': 0.39; 'according': 0.40; 'dangerous': 0.60; 'most': 0.60; 'introduced': 0.61; 'new': 0.61; 'viruses': 0.61; 'provide': 0.64; 'more': 0.64; 'believe': 0.68; 'percent': 0.68; 'affected.': 0.84; 'confusing': 0.84 Date: Sat, 16 Nov 2013 20:38:43 +0100 From: Laszlo Nagy User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.1.0 MIME-Version: 1.0 To: python-list@python.org Subject: Why tornado.web.RequestHandler.arguments.get is binary? Content-Type: multipart/alternative; boundary="------------010301040909060809050603" X-shopzeus-MailScanner-Information: Please contact the ISP for more information X-shopzeus-MailScanner-ID: D0ACF8895C52.A03CF X-shopzeus-MailScanner: Found to be clean X-shopzeus-MailScanner-From: gandalf@shopzeus.com X-Spam-Status: No X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 76 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1384630731 news.xs4all.nl 15910 [2001:888:2000:d::a6]:41408 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:59648 This is a multi-part message in MIME format. --------------010301040909060809050603 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit I believe most data passed in URLs are character data. RFC 2986 also suggest that the standard should be percent encoded UTF-8: > The generic URI syntax mandates that new URI schemes that provide for > the representation of character data in a URI must, in effect, > represent characters from the unreserved set without translation, and > should convert all other characters to bytes according to UTF-8 > , and then percent-encode those > values. This requirement was introduced in January 2005 with the > publication of RFC 3986 . URI > schemes introduced before this date are not affected. [1] It is somewhat confusing that URI may be used to represent binary data. More specifically, http and https URLs contain textual data in almost all cases. When it is textual, it must be in UTF-8 (as dictated by the RFC). So what is the reason in arguments.get returning binary data? [1] http://en.wikipedia.org/wiki/Percent-encoding#Percent-encoding_in_a_URI -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. --------------010301040909060809050603 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit I believe most data passed in URLs are character data. RFC 2986 also suggest that the standard should be percent encoded UTF-8:

The generic URI syntax mandates that new URI schemes that provide for the representation of character data in a URI must, in effect, represent characters from the unreserved set without translation, and should convert all other characters to bytes according to UTF-8, and then percent-encode those values. This requirement was introduced in January 2005 with the publication of RFC 3986. URI schemes introduced before this date are not affected. [1]

It is somewhat confusing that URI may be used to represent binary data. More specifically, http and https URLs contain textual data in almost all cases. When it is textual, it must be in UTF-8 (as dictated by the RFC). So what is the reason in arguments.get returning binary data?


[1] http://en.wikipedia.org/wiki/Percent-encoding#Percent-encoding_in_a_URI


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean. --------------010301040909060809050603--