Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #42267

Re: How to find bad row with db api executemany()?

Path csiph.com!usenet.pasdenom.info!gegeweb.org!usenet-fr.net!nerim.net!novso.com!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <davea@davea.name>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.095
X-Spam-Evidence '*H*': 0.81; '*S*': 0.00; 'one?': 0.05; 'assuming': 0.09; 'exception,': 0.09; 'raises': 0.09; 'rows': 0.09; 'subset': 0.09; 'violates': 0.09; 'subject:How': 0.10; 'algorithm.': 0.16; 'different,': 0.16; 'efficiency.': 0.16; 'half.': 0.16; 'inserting': 0.16; 'roy': 0.16; 'sequencing': 0.16; 'wrote:': 0.18; 'header:User-Agent:1': 0.23; '(or': 0.24; 'least': 0.26; 'header:In-Reply-To:1': 0.27; 'am,': 0.29; 'quickly': 0.29; "i'm": 0.30; "skip:' 10": 0.31; 'subject:with': 0.35; 'problem.': 0.35; 'one,': 0.35; 'but': 0.35; 'there': 0.35; 'choosing': 0.36; "i'll": 0.36; 'should': 0.36; 'error.': 0.37; 'half': 0.37; 'so,': 0.37; 'two': 0.37; 'list': 0.37; 'to:addr:python-list': 0.38; 'little': 0.38; 'short': 0.38; 'bad': 0.39; 'to:addr:python.org': 0.39; 'either': 0.39; 'even': 0.60; 'catch': 0.60; 'tell': 0.60; 'information': 0.63; 'kind': 0.63; 'such': 0.63; 'more': 0.64; 'total': 0.65; 'talking': 0.65; 'direct': 0.67; 'received:74.208': 0.68; 'smith': 0.68; 'caused': 0.69; 'carefully': 0.74; 'million': 0.74; 'calls,': 0.84; 'different.': 0.84; 'received:74.208.4.194': 0.84; 'subject:find': 0.84; 'dozen': 0.91; 'items,': 0.91
Date Fri, 29 Mar 2013 14:53:30 -0400
From Dave Angel <davea@davea.name>
User-Agent Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130308 Thunderbird/17.0.4
MIME-Version 1.0
To python-list@python.org
Subject Re: How to find bad row with db api executemany()?
References <F40EFF8D-9F4F-4ACB-8671-450F3CD761CA@panix.com>
In-Reply-To <F40EFF8D-9F4F-4ACB-8671-450F3CD761CA@panix.com>
Content-Type text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding 7bit
X-Provags-ID V02:K0:fXS5rayZuBvmg0+ZLFUoIUxe0T7Y0LsYvWy16kppA7X Su3ajHsITlr1Zvp1C4ktC0d9RQAl1z0FaQfDUxXxegFkJ+j4vp y0TyZGglOnK/N3/rORN9QO1MStgWc9V30yGXbdFgfmPWX50E7n XzaFKZ9CGzGHNaKeu9cuDg/pV+Shc/sLz+m1ysPyW3/irofXPT rfzpNv99jW8ECd9lNJWNLZ1hN6zLaRGc3AXlHXkDFoHoHxNIgL TX0891/ReJUu6kt9QesVI1R5t1Ak+e2gtd3DEIf+a0+k184uhV XHkp2ydh4ThkJeDYXuCq2ioRUI5u9dkpTNxQtL+KWLAHVmW9A= =
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.3961.1364583228.2939.python-list@python.org> (permalink)
Lines 32
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1364583228 news.xs4all.nl 6958 [2001:888:2000:d::a6]:39035
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:42267

Show key headers only | View raw


On 03/29/2013 10:48 AM, Roy Smith wrote:
> I'm inserting a gazillion rows into a MySQL database using MySQLdb and cursor.executemany() for efficiency.  Every once in a while, I get a row which violates some kind of database constraint and raises Error.
>
> I can catch the exception, but don't see any way to tell which row caused the problem.  Is this information obtainable, short of retrying each row one by one?
>

I don't know the direct answer, or even if there is one (way to get 
MySQL to tell you which one failed), but ...

Assuming that executeMany is much cheaper than a million calls to 
executeOne (or whatever).

  -- single bad rows --
If you have a million items, and you know exactly one is different, you 
can narrow it down more quickly than just sequencing through them.  You 
can do half of them at a time, carefully choosing which subset of the 
total you use each time.  After 20 such calls, you can then calculate 
exactly which one is different.  Standard CS algorithm.

  -- sparse set of rows --
If you know that it's at least one, but still less than a dozen or so, 
it's a little trickier, but you should still converge on a final list 
pretty quickly.  Each time you do half, you also do the complementary 
half.  If either of them has no 'differences" you can then eliminate 
half the cases.

If you don't get a specific answer where MySQL can tell you the bad row, 
and if you don't know what I'm talking about, ask and I'll try to 
elaborate on one of the two above cases.

-- 
DaveA

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: How to find bad row with db api executemany()? Dave Angel <davea@davea.name> - 2013-03-29 14:53 -0400

csiph-web