Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Chris Angelico Newsgroups: comp.lang.python Subject: Re: What's the best/neatest way to get Unicode data from a database into a grid cell? Date: Sun, 7 Feb 2016 22:56:07 +1100 Lines: 33 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Trace: news.uni-berlin.de 6b4edW9CvoOs6qyKKxdsawMQWV2ElW+zgzMaRI6ZF5ig== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'encoded': 0.05; 'string;': 0.07; 'cc:addr:python-list': 0.09; 'behave': 0.09; 'part,': 0.09; 'subject:into': 0.09; 'utf8': 0.09; 'python': 0.10; 'python.': 0.11; '2.7': 0.13; 'ignore': 0.14; 'subject: \n ': 0.15; '2016': 0.16; 'display,': 0.16; 'encodings': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'py3': 0.16; 'qt,': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'sqlite3': 0.16; 'subject:Unicode': 0.16; 'such,': 0.16; 'unicode.': 0.16; 'with?': 0.16; 'works"': 0.16; 'wrote:': 0.16; 'string': 0.17; 'byte': 0.18; 'bytes': 0.18; 'library,': 0.18; 'string,': 0.18; 'gui': 0.18; 'cc:2**0': 0.20; 'cc:addr:python.org': 0.20; 'handles': 0.20; 'correctly.': 0.22; 'text,': 0.22; 'programming': 0.22; 'feb': 0.23; "haven't": 0.24; 'header:In-Reply-To:1': 0.24; 'module': 0.25; "i've": 0.25; 'switch': 0.27; 'question': 0.27; 'message-id:@mail.gmail.com': 0.27; 'idea': 0.28; 'fine': 0.28; 'regular': 0.29; 'character': 0.29; 'handled': 0.29; 'objects': 0.29; "i'm": 0.30; 'subject:/': 0.30; 'code': 0.30; 'correctly': 0.34; 'received:google.com': 0.35; 'done': 0.35; 'text.': 0.35; 'unicode': 0.35; 'quite': 0.35; 'something': 0.35; 'but': 0.36; 'should': 0.36; 'instead': 0.36; 'received:209.85': 0.36; 'subject:?': 0.36; 'pm,': 0.36; 'subject:: ': 0.37; 'being': 0.37; 'received:209.85.213': 0.37; 'seem': 0.37; 'things': 0.38; 'presence': 0.38; 'received:209': 0.38; 'does': 0.39; 'subject:from': 0.39; 'subject:the': 0.39; 'some': 0.40; 'your': 0.60; "you'll": 0.61; 'confirm': 0.62; 'programs': 0.62; 'combining': 0.66; 'subject:get': 0.81; '"just': 0.84; '3.4': 0.84; 'chrisa': 0.84; 'gtk,': 0.84; 'non-bmp': 0.84; 'to:none': 0.91; 'hassle': 0.91; 'serious': 0.97 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=94bSvQ8Z71uOv/BZGti81m0IXp0Rz8JxKdgrwiLgMCc=; b=eeXT4wlu0MbLTb4PebbnkvBF53BNdNnf2bJos5HUB3I7pvbMwnEDb1X9v0l6m4LohX WQNDUjjRZyFXiHdBZXqiAF1dUHO6xM4b12nLACmAkLsXwlUq9dM9envHeSG7lYE+Mlt6 ObzB06T83jBE7WD/cQaSxfCQL3wxL0mkL5wlakEteI/NUxt/+eer1fYYh6hn+aS1bnSo o1JlHL+RMqJgJSAGjm3ejWGu/fbnEf2YT9COg8ZDQ/tUPEnzlb1hq8YS6ZfD5bKtKbnp LJm4QDpURe88gX3IAgazNHFoqC32Kc+5zpClgq9VNhgLmnOGA1Wf2L2GGpqPlLTaqDIG X7uw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:cc:content-type; bh=94bSvQ8Z71uOv/BZGti81m0IXp0Rz8JxKdgrwiLgMCc=; b=lCwI0L/1InjfXi2giGd52hIiRJjWLGp8VRI9gj+NfvPUZ7ji9GzBdOehTu+ux61JdY HWWBWtDFlDzAO58bsVKRROOf0arIgNUL1VtvtstdbeRZ0ywr4JpmI4zx3OZjElaNaCHv 6l93H6ZatJ3/CMNrekmMZj8JdaHaQ+4cdyI1Op+TFhJi8gTbgysDPtlU2rD9xlpSYwvH xB/NS0Rcve4KaDjamG+bK3PYetwBiBZXkO8Mmkq+SIngNa3DNYZRK2YvSF/g2wwWfaYD 27jMrjOfE11NuOBAMHA5uQpueDe4iwbSgXBdA8HdzUOw+Yv+41bitmdubG80uwzgJNxj CsIA== X-Gm-Message-State: AG10YOSxietYDUWYLKeZdyREESOWMdKuaW9OjRMMkJyrWpNzgbFnamJpSOzVdEcIF61glTHqv9ve0U/+3DwUaQ== X-Received: by 10.50.147.38 with SMTP id th6mr13025955igb.13.1454846167449; Sun, 07 Feb 2016 03:56:07 -0800 (PST) In-Reply-To: X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.21rc2 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:102626 On Sun, Feb 7, 2016 at 10:19 PM, wrote: > My system (xubuntu 15.10) is all UTF8 so accented characters are > handled by the display, in terminals, etc. correctly. I'm currently > using python 2.7 for this but would be quite happy to move to 3.4 if > this handles UTF8 better (I seem to remember it does maybe). As a general rule, yes, Python 3 handles Unicode somewhat better than Python 2 does. The main reason for this is that the native string type is now a Unicode string, instead of the native string type being a byte string; so "abcd" is not a string of bytes that happen to be encoded as UTF-8, it is actually a string of Unicode characters. For the most part, you'll be able to ignore character encodings and such, and just work with text. So the question then becomes: Under Python 3, can you use regular string objects for both the things you're working with? I can confirm that the inbuilt sqlite3 module works just fine with Unicode text; so all you need to do is try out your GUI code under Python 3. I've no idea how good wx support in Py3 is, so you might find you need to switch GUI toolkits to get everything working; but whether it's with wxWidgets, GTK, QT, or some other library, you should be able to put something together under Python 3 that "just works" as regards Unicode. Caveat: I haven't done any serious GUI programming using Python. (Note that "just works" can actually be a lot of hassle if you truly want to support *all* of Unicode. I've seen programs that don't behave correctly in the presence of combining characters, or right-to-left text, or zero-width characters, or non-BMP characters; but you can get far better support for the same amount of effort if you use Py3 and Unicode than if you try to do things manually under Py2.) ChrisA