Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Eli Zaretskii Newsgroups: comp.lang.python Subject: Re: Different names for Unicode codepoint Date: Thu, 21 Apr 2016 22:40:17 +0300 Lines: 26 Message-ID: References: <87wpnqsrzz.fsf@metapensiero.it> <83h9eu699a.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: news.uni-berlin.de RAswyHzCVuvPS7UWZVW7YwSxEbSPpa3WB/5b5DwOdPfQ== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; '(unicode': 0.07; 'character,': 0.07; 'cc:addr:python-list': 0.09; 'python:': 0.09; 'received:gnu.org': 0.09; 'tab': 0.09; 'python': 0.10; 'thu,': 0.15; '2016': 0.16; 'accordingly': 0.16; 'from:addr:gnu.org': 0.16; 'message-id:@gnu.org': 0.16; 'name"': 0.16; 'received:fencepost.gnu.org': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'sfxlen:0': 0.16; 'subject:Different': 0.16; 'subject:Unicode': 0.16; 'syntaxerror:': 0.16; '>>>': 0.20; 'cc:2**0': 0.20; 'cc:addr:python.org': 0.20; '+0200': 0.20; 'candidates': 0.21; '"",': 0.22; 'cc:no real name:2**0': 0.22; 'slightly': 0.23; 'import': 0.24; 'properties': 0.24; '(which': 0.26; 'skip:u 20': 0.28; 'ret': 0.29; 'character': 0.29; 'mention': 0.30; 'date:': 0.31; 'received:84': 0.32; 'file': 0.34; 'cc:': 0.35; 'unicode': 0.35; 'unknown': 0.35; 'supports': 0.35; 'according': 0.36; 'there': 0.36; 'email addr:python.org': 0.36; 'subject:: ': 0.37; 'received:org': 0.37; 'names': 0.38; 'from:': 0.39; 'where': 0.40; 'called': 0.40; 'header:MIME-version:1': 0.60; 'header:Message-Id:1': 0.61; 'different': 0.63; 'email name :python-list': 0.67; 'completion': 0.79; 'header:In-reply-to:1': 0.84; 'received:il': 0.91 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-0.2 required=5.0 tests=BAYES_50,RP_MATCHES_RCVD autolearn=disabled version=3.3.2 In-reply-to: <87wpnqsrzz.fsf@metapensiero.it> (message from Lele Gaifax on Thu, 21 Apr 2016 21:04:32 +0200) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Mailman-Approved-At: Thu, 21 Apr 2016 16:13:42 -0400 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Mailman-Original-Message-ID: <83h9eu699a.fsf@gnu.org> X-Mailman-Original-References: <87wpnqsrzz.fsf@metapensiero.it> Xref: csiph.com comp.lang.python:107465 > From: Lele Gaifax > Date: Thu, 21 Apr 2016 21:04:32 +0200 > Cc: python-list@python.org > > is there a particular reason for the slightly different names that Emacs > (version 25.0.92) and Python (version 3.6.0a0) give to a single Unicode entity? They don't. > Just to mention one codepoint, ⋖ is called "LESS THAN WITH DOT" accordingly to > Emacs' C-x 8 RET TAB menu, while in Python: > > >>> import unicodedata > >>> unicodedata.name('⋖') > 'LESS-THAN WITH DOT' > >>> print("\N{LESS THAN WITH DOT}") > File "", line 1 > SyntaxError: (unicode error) ...: unknown Unicode character name Emacs shows both the "Name" and the "Old Name" properties of characters as completion candidates, while Python evidently supports only "Name". If you type "C-x 8 RET LESS TAB", then you will see among the completion candidates both "LESS THAN WITH DOT" and "LESS-THAN WITH DOT". The former is the "old name" of this character, according to the Unicode Character Database (which is where Emacs obtains the names and other properties of characters).