Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.024 X-Spam-Evidence: '*H*': 0.95; '*S*': 0.00; '(b)': 0.07; 'granted,': 0.07; 'windows,': 0.09; 'cc:addr:python-list': 0.11; '(actually': 0.16; '*with': 0.16; '8-bit': 0.16; 'blank,': 0.16; 'encodings': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'hex': 0.16; 'most)': 0.16; 'readable': 0.16; 'subject:Unicode': 0.16; 'surprising': 0.16; 'unicode,': 0.16; 'unicode.': 0.16; 'uppercase': 0.16; 'applies': 0.16; 'all.': 0.16; 'wrote:': 0.18; 'cc:addr:python.org': 0.22; '(a)': 0.24; 'copied': 0.24; 'unicode': 0.24; '(or': 0.24; 'cc:2**0': 0.24; 'least': 0.26; 'header:In-Reply-To:1': 0.27; 'point': 0.28; '(c)': 0.29; 'character': 0.29; 'generally': 0.29; 'characters': 0.30; 'message-id:@mail.gmail.com': 0.30; 'code': 0.31; "skip:' 10": 0.31; '(unless': 0.31; "d'aprano": 0.31; 'font': 0.31; 'steven': 0.31; 'another': 0.32; 'text': 0.33; '(e.g.': 0.33; 'fri,': 0.33; 'more,': 0.35; 'point.': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'method': 0.36; 'possible': 0.36; 'area': 0.37; 'sometimes': 0.38; 'pm,': 0.38; 'little': 0.38; 'space': 0.40; 'even': 0.60; 'most': 0.60; 'providing': 0.61; 'information,': 0.61; 'simple': 0.61; 'back': 0.62; 'show': 0.63; 'information': 0.63; 'provide': 0.64; 'more': 0.64; 'great': 0.65; 'due': 0.66; 'six': 0.68; 'skill': 0.68; 'designers': 0.74; 'square': 0.74; 'bmp,': 0.84; 'improvement': 0.84; 'boxes': 0.91; 'to:none': 0.92 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=4/32cWyjvc0S9nr/2Fe28WfYDW0C78DsameS27UcCCs=; b=sliOIXmRWgh8OvWodfp6gGo89+BlMnIj0vwFSz9ROnhCMAIAzIG4LePATfc49wRtwn 45XDjamCzHN6MMVEOgwc7QwVmZ77qgkXpJwrhEqFkm4ikS+cgHX3U4N6YZECGgiPlLBm IClugeC2xW3qcJAsfJZIHQkjx4pqJruR72oPv+pLoPk/ponNK+ioN/givBIUlGGdJTpU A3ZFwoSLwWPxIt4NpGtDHn1TX9ZLpArjcLqXVoQVCZpEsUtIWzRnauX4Z9MUuBrqHjVo LhrTyTXAGa9R89Tu4pl0sSYAmQ93WhsjTnsusxletJoYjWR7u0xLUwZQvKGAWy0n7rTg qnmg== MIME-Version: 1.0 X-Received: by 10.220.191.134 with SMTP id dm6mr12836304vcb.16.1399021704462; Fri, 02 May 2014 02:08:24 -0700 (PDT) In-Reply-To: <53635b34$0$29965$c3e8da3$5496439d@news.astraweb.com> References: <5361d4f9$0$11109$c3e8da3@news.astraweb.com> <82067b83-a6f5-4b16-b012-385535ea5607@googlegroups.com> <53635b34$0$29965$c3e8da3$5496439d@news.astraweb.com> Date: Fri, 2 May 2014 19:08:24 +1000 Subject: Re: Unicode 7 From: Chris Angelico Cc: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 28 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1399021712 news.xs4all.nl 2975 [2001:888:2000:d::a6]:49987 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:70856 On Fri, May 2, 2014 at 6:45 PM, Steven D'Aprano wrote: >> - unicode 'number-boxes' (what are these called?) > > They are missing character glyphs, and they have nothing to do with > Unicode. They are due to deficiencies in the text font you are using. > > Admittedly with Unicode's 0x10FFFF possible characters (actually more, > since a single code point can have multiple glyphs) it isn't surprising > that most font designers have neither the time, skill or desire to create > a glyph for every single code point. But then the same applies even for > more restrictive 8-bit encodings -- sometimes font designers don't even > bother providing glyphs for *ASCII* characters. > > (E.g. they may only provide glyphs for uppercase A...Z, not lowercase.) This is another area where Unicode has given us "a great improvement over the old method of giving satisfaction". Back in the 1990s on OS/2, DOS, and Windows, a missing glyph might be (a) blank, (b) a simple square with no information, or (c) copied from some other font (common with dingbats fonts). With Unicode, the standard is to show a little box *with the hex digits in it*. Granted, those boxes are a LOT more readable for BMP characters than SMP (unless your text is huge, six digits in the space of one character will make them pretty tiny), and a "Unicode" font will generally include all (or at least most) of the BMP, but it's still better than having no information at all. ChrisA