Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed4a.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'warnings': 0.04; 'anyway.': 0.05; 'explicitly': 0.05; 'output': 0.05; '-*-': 0.07; '21,': 0.07; 'nasty': 0.07; 'problem:': 0.07; "subject:' ": 0.07; 'utf-8': 0.07; '[0]': 0.09; 'encoding:': 0.09; 'exception.': 0.09; 'imply': 0.09; 'input,': 0.09; 'omit': 0.09; 'spaces': 0.09; 'trailing': 0.09; 'worse': 0.09; 'cc:addr:python-list': 0.11; 'changes': 0.15; '(1,': 0.16; '(2,': 0.16; '(3,': 0.16; '__future__': 0.16; 'charset': 0.16; 'columns': 0.16; 'created.': 0.16; 'declared': 0.16; 'definitions,': 0.16; 'displaying': 0.16; 'enum': 0.16; 'exist.': 0.16; 'finney': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'imo,': 0.16; 'imo.': 0.16; 'truncation': 0.16; 'worse.': 0.16; 'sat,': 0.16; 'wrote:': 0.18; 'addition,': 0.20; 'feb': 0.22; 'import': 0.22; 'cc:addr:python.org': 0.22; "aren't": 0.24; 'char': 0.24; 'exists': 0.24; 'of.': 0.24; 'refers': 0.24; 'url:02': 0.24; 'url:dev': 0.24; 'fine': 0.24; 'cc:2**0': 0.24; 'handling': 0.26; 'primary': 0.26; 'values': 0.27; 'gets': 0.27; 'header:In-Reply- To:1': 0.27; 'tried': 0.27; 'skip:p 30': 0.29; 'character': 0.29; 'words': 0.29; "doesn't": 0.30; 'characters': 0.30; 'message- id:@mail.gmail.com': 0.30; 'that.': 0.31; 'minor': 0.31; 'produces': 0.31; 'skip:m 60': 0.31; 'beginning': 0.33; 'table': 0.34; "i'd": 0.34; 'could': 0.34; "can't": 0.35; 'display': 0.35; 'common': 0.35; 'something': 0.35; 'no,': 0.35; 'requirement': 0.35; 'test': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'version': 0.36; 'consist': 0.36; 'url:downloads': 0.36; 'two': 0.37; 'ben': 0.38; 'pm,': 0.38; 'does': 0.39; 'skip:p 20': 0.39; 'even': 0.60; 'skip:u 10': 0.60; 'problems.': 0.60; 'most': 0.60; 'url:5': 0.61; 'new': 0.61; 'first': 0.61; 'back': 0.62; 'box,': 0.64; 'more': 0.64; 'url:pdf': 0.68; '8bit%:84': 0.69; 'default': 0.69; 'brand': 0.72; 'links,': 0.74; '"not': 0.84; '(id': 0.84; '2015': 0.84; 'change"': 0.84; 'loses': 0.84; 'silently': 0.84; 'strings)': 0.84; 'trick,': 0.84; 'forgotten': 0.91; 'good,': 0.91; 'to:none': 0.92; 'silent': 0.95 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type:content-transfer-encoding; bh=43AXydbgNf44KQoy/srDSL2ZrfhqXA7Sd6DIMI/W4Yw=; b=04BO7yjaPL0A3cYdyYG39fNUBcj9i+UMa4Nto8+hIa6AHEDq/m4y/LKDVepxSsHEr0 hgqow5ip4iJcwvh/Ccx3jfvI8xa+PaOCG6jNDTBvgmDTj6Hhcadlgw96QDsdWWWm0GLz o+sUyA4rY8qm+bNade4ovIa+Ee7FDLGj+MAZV/kR9c/pUGKxdGHGK+KansydRdb4gjs5 dPpt+64HzLxiXexPSTJ4HRj9J5NbhZjbYhFpiCPW2lPjnzUiZcyPkeadODpDhIyBT1DA V3ofu4zLXIP73KNXRmiam2byFHvKNB9vOMTkdUyad+xEl5G7BSzGR9fiAhnMJNk5NJkG 8mtw== MIME-Version: 1.0 X-Received: by 10.107.33.11 with SMTP id h11mr1731280ioh.53.1424497144026; Fri, 20 Feb 2015 21:39:04 -0800 (PST) In-Reply-To: <85a9087y9c.fsf_-_@benfinney.id.au> References: <4154cc37-0bb0-4bf2-a52c-b728c737357c@googlegroups.com> <54E517B4.4000409@stoneleaf.us> <87a908e0zh.fsf@jester.gateway.pace.com> <85a9087y9c.fsf_-_@benfinney.id.au> Date: Sat, 21 Feb 2015 16:39:03 +1100 Subject: Re: 'Lite' Databases From: Chris Angelico Cc: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 72 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1424497146 news.xs4all.nl 2867 [2001:888:2000:d::a6]:40052 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:86014 On Sat, Feb 21, 2015 at 2:13 PM, Ben Finney wr= ote: > In addition, MySQL silently [0] loses data in many common situations. > > > > > Right, I'd forgotten about silent truncations. Though to be fair, the first link refers to some changes which aren't a problem: > Columns that are part of a PRIMARY KEY are made NOT NULL even if not decl= ared that way. The requirement that a PK consist of non-nullable columns is part of other databases, too, and having the words "PRIMARY KEY" imply "NOT NULL PRIMARY KEY" is fine IMO. Back when I used DB2 version 5, I used to use "ID SMALLINT NOT NULL PRIMARY KEY" at the beginning of most of my table definitions, and all this "silent change" does is allow you to abbreviate that. > Trailing spaces are automatically deleted from ENUM and SET member values= when the table is created. Which means they're built on top of CHAR columns, not VARCHAR. Not a huge deal IMO, though definitely something to be aware of. Some of the other issues are more concerning, but these two don't bother me, anyway. The other two links, though? Definitely problems. Silent truncation of data is a pest, and it gets even worse than that. I tried to put together a test-case to see if I could do the classic "break two UTF-16 strings and construct a brand new character out of them" trick, but found something even worse. # -*- encoding: UTF-8 -*- from __future__ import print_function import MySQLdb # If you don't explicitly say charset=3D"UTF8", you get some default that y= ou may # not be able to trust. On my test box, it gave me Latin-1. Plus, you can't= say # charset=3D"UTF-8" - you have to omit the hyphen. Not good, but not a gotc= ha as # you get an immediate exception. con =3D MySQLdb.connect("localhost","demo","demo","demodb",charset=3D"UTF8"= ) cur =3D con.cursor() # Minor nastiness: This produces a warning if the table doesn't exist. # So automatically displaying warnings will produce annoying noise. cur.execute("drop table if exists nasty") # Don't forget, "utf8" doesn't mean UTF-8... no, you have to say "utf8mb4"! cur.execute("create table nasty (id smallint primary key, payload char(8) charset utf8mb4 not null)") strings =3D (u"English: safe", u"=CE=B5=CE=BB=CE=BB=CE=B7=CE=BD=CE=B9=CE=BA= =CE=AC: safe", u"\U0001F4A9: accurate") cur.execute("insert into nasty values (1, %s), (2, %s), (3, %s)", strings) cur.execute("select payload from nasty order by id") print("Input\t\tOutput") for input, output in zip(strings, cur): print(input,output[0],sep=3D"\t") Your terminal may or may not be able to display U+1F4A9, but it's an accurate description of MySQL's handling of astral characters in this demo. I don't even know what's going on here. Tested on MySQL 5.5 on Debian Wheezy. ChrisA