Path: csiph.com!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed4a.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.007 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'subject:Python': 0.06; '"as': 0.07; 'mysql,': 0.07; 'subject: -- ': 0.07; 'upgraded': 0.07; 'utf-8': 0.07; 'alias': 0.09; 'postgresql.': 0.09; 'cc:addr :python-list': 0.11; 'changes': 0.15; '*and': 0.16; '11:59': 0.16; 'buggy': 0.16; 'discarded': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'roy': 0.16; 'utf8': 0.16; 'wrote:': 0.18; '(not': 0.18; 'obviously': 0.18; 'thu,': 0.19; 'cc:addr:python.org': 0.22; 'bytes': 0.24; 'instance,': 0.24; 'switched': 0.24; 'people,': 0.24; 'cc:2**0': 0.24; 'header:In- Reply-To:1': 0.27; 'point': 0.28; 'characters': 0.30; 'newer': 0.30; 'message-id:@mail.gmail.com': 0.30; 'correctly.': 0.31; 'fighting': 0.31; 'long.': 0.31; 'subject:skip:i 10': 0.31; 'could': 0.34; 'problem.': 0.35; 'case,': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'version': 0.36; 'system,': 0.38; 'problems': 0.38; 'version,': 0.38; 'handle': 0.38; 'pm,': 0.38; 'anything': 0.39; 'sure': 0.39; 'future': 0.60; 'impact': 0.61; 'full': 0.61; 'new': 0.61; "you're": 0.61; 'decided': 0.64; 'our': 0.64; 'become': 0.64; 'worth': 0.66; 'note:': 0.66; 'smith': 0.68; 'containing': 0.69; 'business': 0.70; 'records': 0.73; 'business,': 0.83; 'to:none': 0.92; 'acknowledge': 0.93 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=RYCQfiDKWz7Nolw9QEA2iNb81R61R6/g8oID0e6+7II=; b=hOIhVOm1piEl4E833cl+cYwzjUGmZtOyRdBqpnuL7W91Lkzhk0v1WzGGvqVZsMxlUH EEWcRn0yjg2Ggz3XA1qVsOFBX8dnOMC4/4quOJzW1QABzVLS1mpuQ8SYqiPSAkwXhK7i mLGsFwykFMbSbZYzTm3EUnuLuva+8D4YVBKPMpmZ2Q2cvmpKsCrtbZhYvmZulwIv4qiu SIQy7najuFoDGoNn1YVo14YEGcJkFDa9pjQOayBP2A7pbtVn+iybbk1EgFgVs98PQ5Rr axRbNpqxSJvpSyhpjlwGuYuPBZ9thHWUgXsdfuyludtP4y8AnQVIovmk/5/LAk4SsKHR 6o8Q== MIME-Version: 1.0 X-Received: by 10.52.136.98 with SMTP id pz2mr8337834vdb.70.1401982384838; Thu, 05 Jun 2014 08:33:04 -0700 (PDT) In-Reply-To: References: <20140603194949.3147497d@x34f> <44acd692-5dcd-4e5f-8238-7fbe0de4db2a@googlegroups.com> Date: Fri, 6 Jun 2014 01:33:04 +1000 Subject: Re: Micro Python -- a lean and efficient implementation of Python 3 From: Chris Angelico Cc: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 38 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1401982394 news.xs4all.nl 2969 [2001:888:2000:d::a6]:33900 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:72711 On Thu, Jun 5, 2014 at 11:59 PM, Roy Smith wrote: > It turns out, we could have upgraded to a newer version of MySQL, which > did handle astral characters correctly. But, what we did was discarded > the records containing non-BMP data. Of course, that's a decision that > can only be made when you understand the business requirements. In our > case, discarding those four records had no impact on our business, so it > made sense. For other people, not having the full dataset might have > been a fatal problem. > > This was just one of many MySQL problems we ran into. Eventually, we > decided it wasn't worth fighting with what was obviously a brain-dead > system, and switched databases. Point to note: It's not just "Avoid MySQL version x.y.z, it's buggy", but "Make sure you're on a sufficiently new version of MySQL *and then use these settings*". For instance, the MySQL "utf8" locale/collation/charset (not sure what it calls it) supports only the BMP; you have to use "utf8mb4", which is UTF-8 that's allowed to go as far as four bytes long. What were they thinking? What, were they thinking? I understand there's now an alias "utf8mb3" for the buggy utf8, with some theory that some future version of MySQL might make utf8 become an alias for utf8mb4. But when would you ever actually *demand* this buggy behaviour? Why not just say "as of this version, utf8 is identical to utf8mb4, which was a superset thereof", and if anything changes or breaks, just acknowledge that it used to be buggy? Use PostgreSQL. ChrisA