Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #107465

Re: Different names for Unicode codepoint

Path csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail
From Eli Zaretskii <eliz@gnu.org>
Newsgroups comp.lang.python
Subject Re: Different names for Unicode codepoint
Date Thu, 21 Apr 2016 22:40:17 +0300
Lines 26
Message-ID <mailman.27.1461269623.23626.python-list@python.org> (permalink)
References <87wpnqsrzz.fsf@metapensiero.it> <83h9eu699a.fsf@gnu.org>
Mime-Version 1.0
Content-Type text/plain; charset=utf-8
Content-Transfer-Encoding 8bit
X-Trace news.uni-berlin.de RAswyHzCVuvPS7UWZVW7YwSxEbSPpa3WB/5b5DwOdPfQ==
Return-Path <eliz@gnu.org>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.001
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; '(unicode': 0.07; 'character,': 0.07; 'cc:addr:python-list': 0.09; 'python:': 0.09; 'received:gnu.org': 0.09; 'tab': 0.09; 'python': 0.10; 'thu,': 0.15; '2016': 0.16; 'accordingly': 0.16; 'from:addr:gnu.org': 0.16; 'message-id:@gnu.org': 0.16; 'name"': 0.16; 'received:fencepost.gnu.org': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'sfxlen:0': 0.16; 'subject:Different': 0.16; 'subject:Unicode': 0.16; 'syntaxerror:': 0.16; '>>>': 0.20; 'cc:2**0': 0.20; 'cc:addr:python.org': 0.20; '+0200': 0.20; 'candidates': 0.21; '"",': 0.22; 'cc:no real name:2**0': 0.22; 'slightly': 0.23; 'import': 0.24; 'properties': 0.24; '(which': 0.26; 'skip:u 20': 0.28; 'ret': 0.29; 'character': 0.29; 'mention': 0.30; 'date:': 0.31; 'received:84': 0.32; 'file': 0.34; 'cc:': 0.35; 'unicode': 0.35; 'unknown': 0.35; 'supports': 0.35; 'according': 0.36; 'there': 0.36; 'email addr:python.org': 0.36; 'subject:: ': 0.37; 'received:org': 0.37; 'names': 0.38; 'from:': 0.39; 'where': 0.40; 'called': 0.40; 'header:MIME-version:1': 0.60; 'header:Message-Id:1': 0.61; 'different': 0.63; 'email name :python-list': 0.67; 'completion': 0.79; 'header:In-reply-to:1': 0.84; 'received:il': 0.91
X-Spam-Checker-Version SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level
X-Spam-Status No, score=-0.2 required=5.0 tests=BAYES_50,RP_MATCHES_RCVD autolearn=disabled version=3.3.2
In-reply-to <87wpnqsrzz.fsf@metapensiero.it> (message from Lele Gaifax on Thu, 21 Apr 2016 21:04:32 +0200)
X-detected-operating-system by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Received-From 2001:4830:134:3::e
X-Mailman-Approved-At Thu, 21 Apr 2016 16:13:42 -0400
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.22
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID <83h9eu699a.fsf@gnu.org>
X-Mailman-Original-References <87wpnqsrzz.fsf@metapensiero.it>
Xref csiph.com comp.lang.python:107465

Show key headers only | View raw


> From: Lele Gaifax <lele@metapensiero.it>
> Date: Thu, 21 Apr 2016 21:04:32 +0200
> Cc: python-list@python.org
> 
> is there a particular reason for the slightly different names that Emacs
> (version 25.0.92) and Python (version 3.6.0a0) give to a single Unicode entity?

They don't.

> Just to mention one codepoint, ⋖ is called "LESS THAN WITH DOT" accordingly to
> Emacs' C-x 8 RET TAB menu, while in Python:
> 
>     >>> import unicodedata
>     >>> unicodedata.name('⋖')
>     'LESS-THAN WITH DOT'
>     >>> print("\N{LESS THAN WITH DOT}")
>       File "<stdin>", line 1
>     SyntaxError: (unicode error) ...: unknown Unicode character name

Emacs shows both the "Name" and the "Old Name" properties of
characters as completion candidates, while Python evidently supports
only "Name".  If you type "C-x 8 RET LESS TAB", then you will see
among the completion candidates both "LESS THAN WITH DOT" and
"LESS-THAN WITH DOT".  The former is the "old name" of this character,
according to the Unicode Character Database (which is where Emacs
obtains the names and other properties of characters).

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: Different names for Unicode codepoint Eli Zaretskii <eliz@gnu.org> - 2016-04-21 22:40 +0300

csiph-web