Path: csiph.com!usenet.pasdenom.info!news.redatomik.org!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.004 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'configure': 0.05; 'encoding': 0.05; 'subject:text': 0.05; 'mysql,': 0.07; 'alias': 0.09; 'postgres': 0.09; 'subject:question': 0.10; 'cc:addr:python- list': 0.11; '"we': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'personally,': 0.16; 'unicode.': 0.16; 'utf8': 0.16; 'sat,': 0.16; 'wrote:': 0.18; 'split': 0.19; 'cc:addr:python.org': 0.22; 'unicode': 0.24; 'cc:2**0': 0.24; 'least': 0.26; 'header:In-Reply-To:1': 0.27; 'character': 0.29; 'message-id:@mail.gmail.com': 0.30; "d'aprano": 0.31; 'steven': 0.31; 'text': 0.33; 'agree': 0.35; 'problem.': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'possible': 0.36; 'should': 0.36; 'pm,': 0.38; 'does': 0.39; 'field.': 0.61; 'full': 0.61; 'skip:* 10': 0.61; 'kind': 0.63; 'mar': 0.68; 'anything.': 0.68; '2015': 0.84; 'ever.': 0.84; 'to:none': 0.92 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=uKKKS98xEeeCHM/f2ZjVmsXvuAuIbWbhm33CCsyo6JU=; b=XHcUhSdNAT9OyYz+awBP3lp7NYJpl1Ppm9Viok9vNUepJFEHSKQk5vzF0fp4uT+7WG 41x+ZMu4FPGLMToTGllwSsDSQad/CFOQp8pmCQRn9l7DSEFnxpu6fYotl6nJwx67uJBK n50uG+3gvwU1f93UsVAEUbw+2Tts0CJYsCZAOD3CWOXQg5kemV25nZzBWKYKpMSeURuf NAEUMqFKq2vCiaw2Ti03PmHE7wDX9AQfg5eiStJPxkVCYo2Hg1QKWmI/hrixauILlNvI z4lqTI2FCGOCPEQQLJaDh1i/AHrOK+bQ1L/HtbXGqZDbwu0WsvlCg9UgSzYLTi4P8nVw 8s0w== MIME-Version: 1.0 X-Received: by 10.42.159.132 with SMTP id l4mr15172172icx.59.1425728039270; Sat, 07 Mar 2015 03:33:59 -0800 (PST) In-Reply-To: <54fadc70$0$13004$c3e8da3$5496439d@news.astraweb.com> References: <201502241524.t1OFO09k022270@fido.openend.se> <201502241620.t1OGKf4n002146@fido.openend.se> <54ECB134.5090304@davea.name> <201502241945.t1OJjshO013092@fido.openend.se> <201502241957.t1OJvrJS015604@fido.openend.se> <9169f3b1-2ac7-42a3-8033-584f84b88a1f@googlegroups.com> <7a75a23c-4678-4d7a-a2ec-9e8fff4c07f8@googlegroups.com> <132d5ce6-f672-4eec-99f9-1cc9e88b94f3@googlegroups.com> <619e4cb5-1c4c-449b-a5d7-951101b32b45@googlegroups.com> <54f862ca$0$13014$c3e8da3$5496439d@news.astraweb.com> <54fadc70$0$13004$c3e8da3$5496439d@news.astraweb.com> Date: Sat, 7 Mar 2015 22:33:58 +1100 Subject: Re: Newbie question about text encoding From: Chris Angelico Cc: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.19 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 18 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1425728048 news.xs4all.nl 2889 [2001:888:2000:d::a6]:33901 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:87084 On Sat, Mar 7, 2015 at 10:09 PM, Steven D'Aprano wrote: > Stop using MySQL, which is a joke of a database[1], and use Postgres which > does not have this problem. I agree with the recommendation, though to be fair to MySQL, it is now possible to store full Unicode. Though personally, I think the whole "UTF8MB3 vs UTF8MB4" split is an embarrassment and should be abolished *immediately* - not "we may change the meaning of UTF8 to be an alias for UTF8MB4 in the future", just completely abolish the distinction right now. (And deprecate the longer words.) There should be no reason to build any kind of "UTF-8 but limited to three bytes" encoding for anything. Ever. But at least you can, if you configure things correctly, store any Unicode character in your TEXT field. ChrisA