Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Chris Angelico Newsgroups: comp.lang.python Subject: Re: unicodedata with chr() not the same between python 3.4 and 3.5 Date: Wed, 23 Dec 2015 02:42:11 +1100 Lines: 51 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Trace: news.uni-berlin.de F3OSVtTqDK+fL5JTZuIPIQtnpU2SmVB+r+Wv7eEaZFFg== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.019 X-Spam-Evidence: '*H*': 0.96; '*S*': 0.00; 'cc:addr:python-list': 0.09; 'integers': 0.09; 'subject:same': 0.09; 'url:unicode': 0.09; 'python': 0.10; 'subject:not': 0.11; 'subject:python': 0.14; 'wed,': 0.15; '23,': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'subject:between': 0.16; 'wrote:': 0.16; '>>>': 0.20; '2015': 0.20; 'cc:2**0': 0.20; 'cc:addr:python.org': 0.20; 'am,': 0.23; 'code,': 0.23; 'dec': 0.23; 'import': 0.24; 'header:In-Reply-To:1': 0.24; 'linux': 0.26; 'points': 0.27; 'message-id:@mail.gmail.com': 0.27; "skip:' 10": 0.28; 'skip:u 20': 0.28; 'skip:( 20': 0.28; "i'm": 0.30; 'code': 0.30; 'changed': 0.33; 'source': 0.33; 'received:google.com': 0.35; 'files,': 0.35; 'mapping': 0.35; 'newer': 0.35; 'unicode': 0.35; 'something': 0.35; 'but': 0.36; 'url:org': 0.36; 'received:209.85': 0.36; 'subject:: ': 0.37; 'two': 0.37; 'received:209.85.213': 0.37; 'received:209': 0.38; 'why': 0.39; 'data': 0.39; 'sure': 0.39; 'subject:the': 0.39; 'rather': 0.39; 'subject:with': 0.40; 'some': 0.40; 'your': 0.60; 'more': 0.63; 'url:0': 0.63; 'between': 0.65; 'here': 0.66; '3.4': 0.84; 'category.': 0.84; 'chrisa': 0.84; 'difference.': 0.84; 'to:none': 0.91; '2014,': 0.91 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=utNLVQ6LIbjueZtoigM2ZuM6+QDwkyUmQJaZhs0kQFw=; b=rKKIMGr6Brgrx7YZSvx6UPLbPXlRUcJm8OHR8y4ZvOquSmkNdqLwyNUpwHSODdSc3H /ckS/+4fW+hj2raEQo/vEm1UFqHc06dgxuKk1fbidA7jMncR4x1UDkYzfHwUBUAa3rXb 7goiJh68UqV3Q77t3YyfvPbILnBQm9uL1GVCVAmBCviUmGbpMjrawY2SOGxrwZsSFrpr U+7w1RE9JEVLdxU3hhkOlZlORB84nA8od8f+3oHGGVLN4zgrv6ehl+GYDRaTYgEKKyxg S7MMUqoKZBUnd/bGmt5cndBymGTposm0faXjy+6kA1+mOtk5cbb5Nti4T06ULCZl82Hq b0gA== X-Received: by 10.50.66.179 with SMTP id g19mr7522828igt.94.1450798931063; Tue, 22 Dec 2015 07:42:11 -0800 (PST) In-Reply-To: X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:100733 On Wed, Dec 23, 2015 at 2:27 AM, Vincent Davis wrote: > I was expecting the code below to be the same between python3.4 and 3.5. I > need a mapping between the integers and unicode that is consistant between > 3.4 and 3.5 > >>>> > import unicodedata >>>> > u = ''.join(chr(i) for i in range(65536) if (unicodedata.category(chr(i)) > in ('Lu', 'Ll')))[945:965] Not sure why you're slicing it like this, but it makes little difference. The significant thing here is that the newer Pythons are shipping newer Unicode data files, and some code points have changed category. rosuav@sikorsky:~$ python3.4 Python 3.4.2 (default, Oct 8 2014, 10:45:20) [GCC 4.9.1] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import unicodedata >>> unicodedata.unidata_version '6.3.0' >>> rosuav@sikorsky:~$ python3.5 Python 3.5.0b1+ (default:7255af1a1c50+, May 26 2015, 00:39:06) [GCC 4.9.2] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import unicodedata >>> unicodedata.unidata_version '7.0.0' >>> rosuav@sikorsky:~$ python3.6 Python 3.6.0a0 (default:6e114c4023f5, Dec 20 2015, 19:15:28) [GCC 4.9.2] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import unicodedata >>> unicodedata.unidata_version '8.0.0' >>> Have a read here of what changed in those two major versions: http://unicode.org/versions/Unicode7.0.0/ http://unicode.org/versions/Unicode8.0.0/ I'm not sure what the best way is to create the mapping you want, but I would advise freezing it to a specific set of codepoints in your source code, rather than depending on something external. ChrisA