Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.032 X-Spam-Evidence: '*H*': 0.94; '*S*': 0.00; 'essentially': 0.04; 'argument': 0.05; 'purpose.': 0.07; 'delimited': 0.09; 'mysql.': 0.09; 'newline': 0.09; 'statements': 0.09; 'subject:How': 0.10; 'creates': 0.14; '(everything': 0.16; '(postgresql': 0.16; 'comparisons,': 0.16; 'dump': 0.16; 'dump,': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'roy': 0.16; 'worst': 0.16; 'sat,': 0.16; 'wrote:': 0.18; '(but': 0.19; 'memory': 0.22; 'case.': 0.24; 'initial': 0.24; 'versions': 0.24; 'nearly': 0.26; 'header:In-Reply-To:1': 0.27; 'appear': 0.29; 'testing': 0.29; 'chris': 0.29; 'am,': 0.29; 'strongly': 0.30; 'message-id:@mail.gmail.com': 0.30; 'went': 0.31; 'code': 0.31; '(maybe': 0.31; 'away.': 0.31; 'probably': 0.32; 'compatible': 0.32; "we're": 0.32; 'worked': 0.33; 'maybe': 0.34; "i'd": 0.34; 'could': 0.34; 'subject:with': 0.35; 'received:209.85': 0.35; 'received:209.85.220': 0.35; 'transaction': 0.35; 'something': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'doing': 0.36; 'two': 0.37; 'project': 0.37; 'received:209': 0.37; 'performance': 0.37; 'server': 0.38; 'to:addr:python-list': 0.38; 'pm,': 0.38; 'does': 0.39; 'bad': 0.39; 'help,': 0.39; 'skip:. 10': 0.39; 'delete': 0.39; 'sure': 0.39; 'to:addr:python.org': 0.39; 'either': 0.39; 'how': 0.40; 'even': 0.60; 'back': 0.62; "you'll": 0.62; 'complete': 0.62; 'day.': 0.63; 'more': 0.64; 'chance': 0.65; '30,': 0.65; 'mar': 0.68; 'smith': 0.68; 'guaranteed': 0.75; 'article': 0.77; 'gain': 0.79; 'fails,': 0.84; 'partial': 0.84; 'subject:find': 0.84; 'try,': 0.84; 'differences': 0.93; 'hand,': 0.93; '2013': 0.98 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=Wd4s1pLUROqV56fha6S/ZqYyYIcbdhzvZfswlfOaAYo=; b=SRm2flJddhAabv2jgKCe+jU0IsrQctQGG+8XI5TXtmlylLwcMwD976ox+eUuEmO2I7 +S9MeRwo7SHt/yauHsIyhixyjoWa/KzkAZ3arK5RgTbmEmTGiEsSugRgSBY1lHa96xcd GYi3P+nmJ9GCZgUUIUAL2UXCZd7/oOmeBWWvZbqobKE16YbGAaKzo6nJaPOTLD3chP48 NlRXjgFlS98TqIx84YdhiMABW3a6lpdHiPdqVyg/TGb4Uu2df9UIxHWazFn1jsw6deGx kknMURhV1Uof5R4AkeRgK+TIX2DjSUBrNwRURdRtUoRvwheppc09oKcMXDZl7mFWIDIV W6fQ== MIME-Version: 1.0 X-Received: by 10.220.103.7 with SMTP id i7mr3469697vco.7.1364609108957; Fri, 29 Mar 2013 19:05:08 -0700 (PDT) In-Reply-To: References: <5155E32A.1000403@davea.name> Date: Sat, 30 Mar 2013 13:05:08 +1100 Subject: Re: How to find bad row with db api executemany()? From: Chris Angelico To: python-list@python.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 45 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1364609118 news.xs4all.nl 6884 [2001:888:2000:d::a6]:59517 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:42300 On Sat, Mar 30, 2013 at 12:19 PM, Roy Smith wrote: > In article , > Chris Angelico wrote: > >> On Sat, Mar 30, 2013 at 11:41 AM, Roy Smith wrote: >> > In article , >> > Dennis Lee Bieber wrote: >> > >> >> If using MySQLdb, there isn't all that much difference... MySQLdb is >> >> still compatible with MySQL v4 (and maybe even v3), and since those >> >> versions don't have "prepared statements", .executemany() essentially >> >> turns into something that creates a newline delimited "list" of >> >> "identical" (but for argument substitution) statements and submits that >> >> to MySQL. >> > >> > Shockingly, that does appear to be the case. I had thought during my >> > initial testing that I was seeing far greater throughput, but as I got >> > more into the project and started doing some side-by-side comparisons, >> > it the differences went away. >> >> How much are you doing per transaction? The two extremes (everything >> in one transaction, or each line in its own transaction) are probably >> the worst for performance. See what happens if you pepper the code >> with 'begin' and 'commit' statements (maybe every thousand or ten >> thousand rows) to see if performance improves. >> >> ChrisA > > We're doing it all in one transaction, on purpose. We start with an > initial dump, then get updates about once a day. We want to make sure > that the updates either complete without errors, or back out cleanly. > If we ever had a partial daily update, the result would be a mess. > > Hmmm, on the other hand, I could probably try doing the initial dump the > way you describe. If it fails, we can just delete the whole thing and > start again. One transaction for the lot isn't nearly as bad as one transaction per row, but it can consume a lot of memory on the server - or at least, that's what I found last time I worked with MySQL. (PostgreSQL works completely differently, and I'd strongly recommend doing it all as one transaction if you switch.) It's not guaranteed to help, but if it won't hurt to try, there's a chance you'll gain some performance. ChrisA