Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!news-out.readnews.com!transit4.readnews.com!panix!roy From: Roy Smith Newsgroups: comp.lang.python Subject: Re: Performance of int/long in Python 3 Date: Wed, 03 Apr 2013 20:46:18 -0400 Organization: PANIX Public Access Internet and UNIX, NYC Lines: 31 Message-ID: References: <87dff083-14d8-4163-89f3-d78a9be6c802@c15g2000vbl.googlegroups.com> <3qadncD4-6fcPsbMnZ2dnUVZ_rqdnZ2d@westnet.com.au> <515bbedb$0$29891$c3e8da3$5496439d@news.astraweb.com> <515be00e$0$29891$c3e8da3$5496439d@news.astraweb.com> NNTP-Posting-Host: localhost X-Trace: reader1.panix.com 1365036380 20011 127.0.0.1 (4 Apr 2013 00:46:20 GMT) X-Complaints-To: abuse@panix.com NNTP-Posting-Date: Thu, 4 Apr 2013 00:46:20 +0000 (UTC) User-Agent: MT-NewsWatcher/3.5.3b3 (Intel Mac OS X) Xref: csiph.com comp.lang.python:42718 In article , rusi wrote: > On Apr 3, 6:43 pm, Roy Smith wrote: > > This has to inspect the entire string, no?  I posted (essentially) this > > a few days ago: > > > >        if all(ord(c) <= 0xffff for c in s): > >             return "it's all bmp" > >         else: > >             return "it's got astral crap in it" > > Astral crap? CRAP? > Verily sir I am offended! > [...] > You are American! This is true. But, to be fair, in the (I don't have the exact number here) roughly 200 million records in our recent big data import job, I found exactly FOUR strings with astral characters. Which boiled down to two versions of each of two different song titles. One had a Unicode Character 'BALLOON' (U+1F388). The other had some heart symbol (sorry, I don't remember the exact code point). These hardly seem a matter of national pride. And, if you don't believe there is astral crap, how do you explain U+1F4A9?