Path: csiph.com!usenet.pasdenom.info!gegeweb.org!usenet-fr.net!nerim.net!novso.com!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.095 X-Spam-Evidence: '*H*': 0.81; '*S*': 0.00; 'one?': 0.05; 'assuming': 0.09; 'exception,': 0.09; 'raises': 0.09; 'rows': 0.09; 'subset': 0.09; 'violates': 0.09; 'subject:How': 0.10; 'algorithm.': 0.16; 'different,': 0.16; 'efficiency.': 0.16; 'half.': 0.16; 'inserting': 0.16; 'roy': 0.16; 'sequencing': 0.16; 'wrote:': 0.18; 'header:User-Agent:1': 0.23; '(or': 0.24; 'least': 0.26; 'header:In-Reply-To:1': 0.27; 'am,': 0.29; 'quickly': 0.29; "i'm": 0.30; "skip:' 10": 0.31; 'subject:with': 0.35; 'problem.': 0.35; 'one,': 0.35; 'but': 0.35; 'there': 0.35; 'choosing': 0.36; "i'll": 0.36; 'should': 0.36; 'error.': 0.37; 'half': 0.37; 'so,': 0.37; 'two': 0.37; 'list': 0.37; 'to:addr:python-list': 0.38; 'little': 0.38; 'short': 0.38; 'bad': 0.39; 'to:addr:python.org': 0.39; 'either': 0.39; 'even': 0.60; 'catch': 0.60; 'tell': 0.60; 'information': 0.63; 'kind': 0.63; 'such': 0.63; 'more': 0.64; 'total': 0.65; 'talking': 0.65; 'direct': 0.67; 'received:74.208': 0.68; 'smith': 0.68; 'caused': 0.69; 'carefully': 0.74; 'million': 0.74; 'calls,': 0.84; 'different.': 0.84; 'received:74.208.4.194': 0.84; 'subject:find': 0.84; 'dozen': 0.91; 'items,': 0.91 Date: Fri, 29 Mar 2013 14:53:30 -0400 From: Dave Angel User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130308 Thunderbird/17.0.4 MIME-Version: 1.0 To: python-list@python.org Subject: Re: How to find bad row with db api executemany()? References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:fXS5rayZuBvmg0+ZLFUoIUxe0T7Y0LsYvWy16kppA7X Su3ajHsITlr1Zvp1C4ktC0d9RQAl1z0FaQfDUxXxegFkJ+j4vp y0TyZGglOnK/N3/rORN9QO1MStgWc9V30yGXbdFgfmPWX50E7n XzaFKZ9CGzGHNaKeu9cuDg/pV+Shc/sLz+m1ysPyW3/irofXPT rfzpNv99jW8ECd9lNJWNLZ1hN6zLaRGc3AXlHXkDFoHoHxNIgL TX0891/ReJUu6kt9QesVI1R5t1Ak+e2gtd3DEIf+a0+k184uhV XHkp2ydh4ThkJeDYXuCq2ioRUI5u9dkpTNxQtL+KWLAHVmW9A= = X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 32 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1364583228 news.xs4all.nl 6958 [2001:888:2000:d::a6]:39035 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:42267 On 03/29/2013 10:48 AM, Roy Smith wrote: > I'm inserting a gazillion rows into a MySQL database using MySQLdb and cursor.executemany() for efficiency. Every once in a while, I get a row which violates some kind of database constraint and raises Error. > > I can catch the exception, but don't see any way to tell which row caused the problem. Is this information obtainable, short of retrying each row one by one? > I don't know the direct answer, or even if there is one (way to get MySQL to tell you which one failed), but ... Assuming that executeMany is much cheaper than a million calls to executeOne (or whatever). -- single bad rows -- If you have a million items, and you know exactly one is different, you can narrow it down more quickly than just sequencing through them. You can do half of them at a time, carefully choosing which subset of the total you use each time. After 20 such calls, you can then calculate exactly which one is different. Standard CS algorithm. -- sparse set of rows -- If you know that it's at least one, but still less than a dozen or so, it's a little trickier, but you should still converge on a final list pretty quickly. Each time you do half, you also do the complementary half. If either of them has no 'differences" you can then eliminate half the cases. If you don't get a specific answer where MySQL can tell you the bad row, and if you don't know what I'm talking about, ask and I'll try to elaborate on one of the two above cases. -- DaveA