Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!1.eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed7.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.014 X-Spam-Evidence: '*H*': 0.97; '*S*': 0.00; 'bug.': 0.07; 'though:': 0.07; 'type,': 0.07; 'valueerror:': 0.07; 'cc:addr:python-list': 0.09; 'literal': 0.09; 'subject:2.7': 0.09; 'supported.': 0.09; 'python': 0.10; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'numeral': 0.16; 'subject:non': 0.16; 'wrote:': 0.16; 'string': 0.17; '2015': 0.20; 'cc:2**0': 0.20; 'cc:addr:python.org': 0.20; '"",': 0.22; 'fraction': 0.22; 'seems': 0.23; '(most': 0.24; 'tim': 0.24; 'header:In-Reply-To:1': 0.24; 'feature': 0.24; 'linux': 0.26; 'message- id:@mail.gmail.com': 0.27; 'character.': 0.29; 'chase': 0.29; 'decimal': 0.29; 'request,': 0.29; 'subject: [': 0.29; 'convert': 0.29; "i'm": 0.30; 'agreed': 0.31; 'traceback': 0.33; 'definition': 0.34; 'file': 0.34; 'received:google.com': 0.35; 'could': 0.35; 'unicode': 0.35; 'but': 0.36; 'pm,': 0.36; 'subject:: ': 0.37; 'being': 0.37; 'no,': 0.38; 'wrong': 0.38; 'anything': 0.38; 'skip:p 20': 0.38; 'subject:-': 0.39; 'some': 0.40; 'ten': 0.60; 'your': 0.60; 'more': 0.63; "they're": 0.66; 'float:': 0.72; 'jul': 0.72; 'category.': 0.84; 'chrisa': 0.84; 'holes': 0.84; "it'd": 0.84; 'numerals': 0.84; 'roman': 0.84; 'to:none': 0.91; '2014,': 0.91 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type:content-transfer-encoding; bh=C2E23DU3ilmpgVwulOAGwwW0D9lsQgo+doP9DYOr/YM=; b=UEP2FIg2cVPPKe7Q2PyH3WTIQdC9uqcfeac6O41Ppn0egvM1cohBF3d1bbA7+T6Nvu RPW0cwQgTTclqqXxwj4BWa/TxJ7M+GffV3m5+5yyXN5k+Cc4O1qRPpYk7TUkzyrM10h8 bdxd92QQytMbqbbX6r181c7MG1I53GwJom5m8gnvFtm/ce4ka+O8ZSofroXgTwym77hn Rlwct71gNOfkZW2nBEVb/Jwx4Wp5oMDaXA1CK8PpHimN8EzeSB0844DvTHda4kvbRpr+ gu6VjZXirDQMP2pLx5b+XsovpbEgnPtpWd+iYDVwEZ6CDJJTJfWz/6m+I/ZGgNV2TCdm YLDw== MIME-Version: 1.0 X-Received: by 10.107.4.1 with SMTP id 1mr28377188ioe.10.1437329278510; Sun, 19 Jul 2015 11:07:58 -0700 (PDT) In-Reply-To: <20150719075601.779a4edb@bigbox.christie.dr> References: <7083e494-6192-4acb-aea9-216d858171bc@googlegroups.com> <55ab2b57$0$1664$c3e8da3$5496439d@news.astraweb.com> <20150719075601.779a4edb@bigbox.christie.dr> Date: Mon, 20 Jul 2015 04:07:58 +1000 Subject: Re: Devanagari int literals [was Re: Should non-security 2.7 bugs be fixed?] From: Chris Angelico Cc: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 39 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1437329281 news.xs4all.nl 2869 [2001:888:2000:d::a6]:50298 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:94143 On Sun, Jul 19, 2015 at 10:56 PM, Tim Chase wrote: > Agreed that it's pretty awesome. It seems to have some holes though: > > Python 3.4.2 (default, Oct 8 2014, 10:45:20) > [GCC 4.9.1] on linux > Type "help", "copyright", "credits" or "license" for more information. >>>> print('\N{VULGAR FRACTION ONE EIGHTH}') > =E2=85=9B >>>> print(float('\N{VULGAR FRACTION ONE EIGHTH}')) > Traceback (most recent call last): > File "", line 1, in > ValueError: could not convert string to float: '=E2=85=9B' >>>> print('\N{ROMAN NUMERAL NINE}') > =E2=85=A8 >>>> int('\N{ROMAN NUMERAL NINE}') > Traceback (most recent call last): > File "", line 1, in > ValueError: invalid literal for int() with base 10: '=E2=85=A8' >>>> print('\N{ROMAN NUMERAL TEN THOUSAND}') > =E2=86=82 >>>> int('\N{ROMAN NUMERAL TEN THOUSAND}') > Traceback (most recent call last): > File "", line 1, in > ValueError: invalid literal for int() with base 10: '=E2=86=82' The int() and float() functions accept, if I'm not mistaken, anything with Unicode category "Nd" (Number, decimal digit). In your examples, the fraction (U+215B) is No, and the Roman numerals (U+2168, U+2182) are Nl, so they're not supported. Adding support for these forms might be accepted as a feature request, but it's not a bug. (I may be wrong about the definition being based on category. It may be based on the "Numeric type" of each character. But again, the characters that are accepted would be those which have a Digit type, not merely Numeric, and again, it'd be a feature request to expand that.) ChrisA