Path: csiph.com!usenet.pasdenom.info!news.redatomik.org!newsfeed.xs4all.nl!newsfeed8.news.xs4all.nl!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Subject: Re: Devanagari int literals [was Re: Should non-security 2.7 bugs be fixed?]
To: python-list@python.org
References: <moenuj$f11$1@ger.gmane.org> <moeqtn$mh0$1@ger.gmane.org> <moeurf$a6f$1@ger.gmane.org> <mailman.695.1437273248.3674.python-list@python.org> <7083e494-6192-4acb-aea9-216d858171bc@googlegroups.com> <55ab2b57$0$1664$c3e8da3$5496439d@news.astraweb.com> <20150719075601.779a4edb@bigbox.christie.dr> <CAPTjJmoV1MV4Fyr39iYL0s0o_sXYQvZmkgrLMcDA1UfsvyFf8w@mail.gmail.com> <20150719145520.5888c9e1@bigbox.christie.dr> <CAPTjJmqN+mxwU6eSnbZZXYxSaxNZQWuj7KiZ4J4wrOaZpMa6hQ@mail.gmail.com>
From: MRAB <python@mrabarnett.plus.com>
Date: Sun, 19 Jul 2015 23:13:48 +0100
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0
MIME-Version: 1.0
In-Reply-To: <CAPTjJmqN+mxwU6eSnbZZXYxSaxNZQWuj7KiZ4J4wrOaZpMa6hQ@mail.gmail.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.746.1437344032.3674.python-list@python.org>
Lines: 47
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:94167

On 2015-07-19 22:16, Chris Angelico wrote:
> On Mon, Jul 20, 2015 at 5:55 AM, Tim Chase
> <python.list@tim.thechases.com> wrote:
>> On 2015-07-20 04:07, Chris Angelico wrote:
>>> The int() and float() functions accept, if I'm not mistaken,
>>> anything with Unicode category "Nd" (Number, decimal digit). In
>>> your examples, the fraction (U+215B) is No, and the Roman numerals
>>> (U+2168, U+2182) are Nl, so they're not supported. Adding support
>>> for these forms might be accepted as a feature request, but it's
>>> not a bug.
>>
>> Ah, that makes sense.  Some simple testing (thanks, unicodedata
>> module) supports your conjecture.
>>
>> It's not a particularly big deal so not really worth the brain-cycles
>> to add support for them.  Just upon hearing "Python's int() does
>> smart things with Unicode characters", those were some of my first
>> characters to try.  The failure struck me as odd until you explained
>> the simple difference.
>
> The other part of the problem is: What should float("2⅛3") be? Should
> it be equal to 21.0/83.0? Should the first part be parsed as a classic
> mixed number (2 + 1/8), and then what should the 3 mean? While it's
> easy to see what an individual character should represent (just check
> unicodedata.numeric(ch) - for ⅛ it's 0.125), the true meaning of a
> string of such characters is less than clear. Similarly, Roman
> numerals aren't meant to be used after the decimal point, so "Ⅸ.Ⅴ"
> does not normally mean nine and a half... not to mention the confusing
> situation that "ⅠⅤ" would naively parse as 15 but "Ⅳ" is definitely 4.
> Since these kinds of complexities exist, it's safest to reserve this
> level of parsing for a special-purpose function. If someone can come
> up with a really strong argument for the float() and int()
> constructors interpreting these, I'd expect to see it deployed as a
> third-party module first, before being pointed out as "see, you can
> use float() for all these, but if you want to use those, you should
> use Float() instead". (Incidentally, I fully expect to see, some day,
> pytz.localize() semantics brought into the standard library
> datetime.datetime class, for precisely this reason.)
>
> Unicode is awesome, but it's not a panacea :)
>
What's the result of, say, float('1e.3')?

It raises an exception.

So float("2⅛3") should also raise an exception.