Groups > comp.lang.python > #35199 > unrolled thread

Python3 + sqlite3: Where's the bug?

Started by	Johannes Bauer <dfnsonfsduifb@gmx.de>
First post	2012-12-20 15:52 +0100
Last post	2012-12-20 10:57 -0500
Articles	6 — 4 participants

Back to article view | Back to comp.lang.python

  Python3 + sqlite3: Where's the bug? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2012-12-20 15:52 +0100
    Re: Python3 + sqlite3: Where's the bug? Chris Angelico <rosuav@gmail.com> - 2012-12-21 02:05 +1100
      Re: Python3 + sqlite3: Where's the bug? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2012-12-20 16:20 +0100
        Re: Python3 + sqlite3: Where's the bug? Chris Angelico <rosuav@gmail.com> - 2012-12-21 02:55 +1100
        Re: Python3 + sqlite3: Where's the bug? Hans Mulder <hansmu@xs4all.nl> - 2012-12-20 17:35 +0100
    Re: Python3 + sqlite3: Where's the bug? inq1ltd <inq1ltd@inqvista.com> - 2012-12-20 10:57 -0500

#35199 — Python3 + sqlite3: Where's the bug?

From	Johannes Bauer <dfnsonfsduifb@gmx.de>
Date	2012-12-20 15:52 +0100
Subject	Python3 + sqlite3: Where's the bug?
Message-ID	<kav8ni$f38$1@news.albasani.net>

Hi group,

I've run into a problem using Python3.2 and sqlite3 db access that I
can't quite wrap my head around. I'm pretty sure there's a bug in my
program, but I can't see where. Help is greatly appreciated. I've
created a minimal example to demonstrate the phaenomenon (attached at
bottom).

First, the program creates a db and inits two tables "foo" and "bar",
which both only have a "int" value. Then "foo" is populated with unique
ints.

A fetchmanychks function is supposed to have the same behavior as
fetchall(), but instead perform the operation in many subsequent
fetchmany() chunks.

When I traverse the foo table using cursor cur1 and insert into the bar
table using cursor cur2, I receive at some point:

Traceback (most recent call last):
  File "y.py", line 25, in <module>
    cur2.execute("INSERT INTO bar (id) VALUES (?);", (v,))
sqlite3.IntegrityError: PRIMARY KEY must be unique

Which means that the fetchmany() read returns the *same* value again!
How is this possible? If I either

- Remove the "db.commit()"
- Replace fetchmanychks(cur1) by cur1.fetchall()

it works without error -- but I want neither (I want regular commits
because sqlite3 becomes horribly slow when the journal becomes large and
the tables nothing to do with each other anyways and atomicity is not
needed in my case).

Do I grossly misunderstand fetchmany() or where's my bug here?

Thanks in advance,
Joe




#!/usr/bin/python3.2
import sqlite3

db = sqlite3.connect("foobar.sqlite")
cur1 = db.cursor()
cur2 = db.cursor()

def fetchmanychks(cursor):
	cursor.execute("SELECT id FROM foo;")
	while True:
		result = cursor.fetchmany()
		if len(result) == 0:
			break
		for x in result:
			yield x

cur1.execute("CREATE TABLE foo (id integer PRIMARY KEY);")
cur1.execute("CREATE TABLE bar (id integer PRIMARY KEY);")
for i in range(0, 200000, 5):
	cur1.execute("INSERT INTO foo VALUES (?);", (i,))
db.commit()

ctr = 0
for (v, ) in fetchmanychks(cur1):
	cur2.execute("INSERT INTO bar (id) VALUES (?);", (v,))
	ctr += 1
	if ctr == 100:
		db.commit()
		ctr = 0



-- 
>> Wo hattest Du das Beben nochmal GENAU vorhergesagt?
> Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
 - Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$1@speranza.aioe.org>

[toc] | [next] | [standalone]

#35201

From	Chris Angelico <rosuav@gmail.com>
Date	2012-12-21 02:05 +1100
Message-ID	<mailman.1100.1356015940.29569.python-list@python.org>
In reply to	#35199

On Fri, Dec 21, 2012 at 1:52 AM, Johannes Bauer <dfnsonfsduifb@gmx.de> wrote:
> def fetchmanychks(cursor):
>         cursor.execute("SELECT id FROM foo;")
>         while True:
>                 result = cursor.fetchmany()
>                 if len(result) == 0:
>                         break
>                 for x in result:
>                         yield x

I'm not familiar with sqlite, but from working with other databases,
I'm wondering if possibly your commits are breaking the fetchmany.

Would it spoil your performance improvements to do all the fetchmany
calls before yielding anything? Alternatively, can you separate the
two by opening a separate database connection for the foo-reading (so
it isn't affected by the commit)?

ChrisA

[toc] | [prev] | [next] | [standalone]

#35202

From	Johannes Bauer <dfnsonfsduifb@gmx.de>
Date	2012-12-20 16:20 +0100
Message-ID	<kavab7$ihd$1@news.albasani.net>
In reply to	#35201

On 20.12.2012 16:05, Chris Angelico wrote:
> On Fri, Dec 21, 2012 at 1:52 AM, Johannes Bauer <dfnsonfsduifb@gmx.de> wrote:
>> def fetchmanychks(cursor):
>>         cursor.execute("SELECT id FROM foo;")
>>         while True:
>>                 result = cursor.fetchmany()
>>                 if len(result) == 0:
>>                         break
>>                 for x in result:
>>                         yield x
> 
> I'm not familiar with sqlite, but from working with other databases,
> I'm wondering if possibly your commits are breaking the fetchmany.

Hmm, but this:

def fetchmanychks(cursor):
	cursor.execute("SELECT id FROM foo;")
	while True:
		result = cursor.fetchone()
		if result is not None:
			yield result
		else:
			break

Works nicely -- only the fetchmany() makes the example break.

> Would it spoil your performance improvements to do all the fetchmany
> calls before yielding anything?

Well this would effectively then be a fetchall() call -- this is
problematic since the source data is LARGE (spekaing of gigabytes of
data here).

> Alternatively, can you separate the
> two by opening a separate database connection for the foo-reading (so
> it isn't affected by the commit)?

At that point in the code I don't actually have a filename anymore,
merely the connection. But shouldn't the cursor actually be the
"correct" solution? I.e. in theory, should the example work at all or am
I thinking wrong?

Because if I'm approaching this from the wrong angle, I'll have no
choice but to change all that code to open separate connections to the
same file (something that currently are no provisions for).

Best regards,
Johannes

-- 
>> Wo hattest Du das Beben nochmal GENAU vorhergesagt?
> Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
 - Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$1@speranza.aioe.org>

[toc] | [prev] | [next] | [standalone]

#35203

From	Chris Angelico <rosuav@gmail.com>
Date	2012-12-21 02:55 +1100
Message-ID	<mailman.1101.1356018959.29569.python-list@python.org>
In reply to	#35202

On Fri, Dec 21, 2012 at 2:20 AM, Johannes Bauer <dfnsonfsduifb@gmx.de> wrote:
> Hmm, but this:
>
>                 result = cursor.fetchone()
>                         yield result
>
> Works nicely -- only the fetchmany() makes the example break.

Okay, now it's sounding specific to sqlite. I'll bow out. :)

>
>> Would it spoil your performance improvements to do all the fetchmany
>> calls before yielding anything?
>
> Well this would effectively then be a fetchall() call -- this is
> problematic since the source data is LARGE (spekaing of gigabytes of
> data here).

That would be a "yes", then. Scratch that!

>> Alternatively, can you separate the
>> two by opening a separate database connection for the foo-reading (so
>> it isn't affected by the commit)?
>
> At that point in the code I don't actually have a filename anymore,
> merely the connection. But shouldn't the cursor actually be the
> "correct" solution? I.e. in theory, should the example work at all or am
> I thinking wrong?

You say "db.commit()", not "cur2.commit()", so I don't see that a
cursor would un-break what part-way commits is breaking.

> Because if I'm approaching this from the wrong angle, I'll have no
> choice but to change all that code to open separate connections to the
> same file (something that currently are no provisions for).

Is that an sqlite limitation, or just one of your code?

I poked around at the sqlite3 docs, but didn't find any obvious
"clone" option on the connection, nor a way to retrieve the file name.
That would have been fairly convenient. Oh well.

ChrisA

[toc] | [prev] | [next] | [standalone]

#35204

From	Hans Mulder <hansmu@xs4all.nl>
Date	2012-12-20 17:35 +0100
Message-ID	<50d33e3a$0$6889$e4fe514c@news2.news.xs4all.nl>
In reply to	#35202

On 20/12/12 16:20:13, Johannes Bauer wrote:
> On 20.12.2012 16:05, Chris Angelico wrote:
>> On Fri, Dec 21, 2012 at 1:52 AM, Johannes Bauer <dfnsonfsduifb@gmx.de> wrote:
>>> def fetchmanychks(cursor):
>>>         cursor.execute("SELECT id FROM foo;")
>>>         while True:
>>>                 result = cursor.fetchmany()
>>>                 if len(result) == 0:
>>>                         break
>>>                 for x in result:
>>>                         yield x
>>
>> I'm not familiar with sqlite, but from working with other databases,
>> I'm wondering if possibly your commits are breaking the fetchmany.

Yes, that's what it looks like.

I think that should be considered a bug in fetchmany.

> Hmm, but this:
> 
> def fetchmanychks(cursor):
> 	cursor.execute("SELECT id FROM foo;")
> 	while True:
> 		result = cursor.fetchone()
> 		if result is not None:
> 			yield result
> 		else:
> 			break
> 
> Works nicely -- only the fetchmany() makes the example break.

On my system, fetchmany() defaults to returning only one row.

The documentation says that the default should be the optimal
number of rows per chunk for the underlying database engine.
If the optimum is indeed fetchone one row at a time, then
maybe you could consider using fetchone() as a work-around.

>> Would it spoil your performance improvements to do all the
>> fetchmany calls before yielding anything?
> 
> Well this would effectively then be a fetchall() call -- this is
> problematic since the source data is LARGE (speaking of gigabytes
> of data here).
> 
>> Alternatively, can you separate the
>> two by opening a separate database connection for the foo-reading
>> (so it isn't affected by the commit)?
> 
> At that point in the code I don't actually have a filename anymore,
> merely the connection. But shouldn't the cursor actually be the
> "correct" solution? I.e. in theory, should the example work at all
> or am I thinking wrong?

I think you're right and that fetchmany is broken.

> Because if I'm approaching this from the wrong angle, I'll have no
> choice but to change all that code to open separate connections to
> the same file (something that currently are no provisions for).


Hope this helps,

-- HansM

[toc] | [prev] | [next] | [standalone]

#35206

From	inq1ltd <inq1ltd@inqvista.com>
Date	2012-12-20 10:57 -0500
Message-ID	<mailman.1102.1356024160.29569.python-list@python.org>
In reply to	#35199

[Multipart message — attachments visible in raw view] — view raw

On Thursday, December 20, 2012 03:52:39 PM Johannes Bauer wrote:
> Hi group,
> 
> I've run into a problem using Python3.2 and sqlite3 db access that I
> can't quite wrap my head around. I'm pretty sure there's a bug in my
> program, but I can't see where. Help is greatly appreciated. I've
> created a minimal example to demonstrate the phaenomenon (attached at
> bottom).
> 
> First, the program creates a db and inits two tables "foo" and "bar",
> which both only have a "int" value. Then "foo" is populated with unique
> ints.
> 
> A fetchmanychks function is supposed to have the same behavior as
> fetchall(), but instead perform the operation in many subsequent
> fetchmany() chunks.
> 
> When I traverse the foo table using cursor cur1 and insert into the bar
> table using cursor cur2, I receive at some point:
> 
> Traceback (most recent call last):
>   File "y.py", line 25, in <module>
>     cur2.execute("INSERT INTO bar (id) VALUES (?);", (v,))
> sqlite3.IntegrityError: PRIMARY KEY must be unique
> 
> Which means that the fetchmany() read returns the *same* value again!
> How is this possible? If I either
> 
> - Remove the "db.commit()"
> - Replace fetchmanychks(cur1) by cur1.fetchall()
> 
> it works without error -- but I want neither (I want regular commits
> because sqlite3 becomes horribly slow when the journal becomes large and
> the tables nothing to do with each other anyways and atomicity is not
> needed in my case).
> 
> Do I grossly misunderstand fetchmany() or where's my bug here?
> 
> Thanks in advance,
> Joe
> 
> 
Joe,

Both of the following addresses will get you to the same place.

You will get an answer from the sqlite help site.

sqlite-users@sqlite.org

General Discussion of SQLite Database <sqlite-users@sqlite.org>


jd
inqvista.com


> 
> 
> #!/usr/bin/python3.2
> import sqlite3
> 
> db = sqlite3.connect("foobar.sqlite")
> cur1 = db.cursor()
> cur2 = db.cursor()
> 
> def fetchmanychks(cursor):
> 	cursor.execute("SELECT id FROM foo;")
> 	while True:
> 		result = cursor.fetchmany()
> 		if len(result) == 0:
> 			break
> 		for x in result:
> 			yield x
> 
> cur1.execute("CREATE TABLE foo (id integer PRIMARY KEY);")
> cur1.execute("CREATE TABLE bar (id integer PRIMARY KEY);")
> for i in range(0, 200000, 5):
> 	cur1.execute("INSERT INTO foo VALUES (?);", (i,))
> db.commit()
> 
> ctr = 0
> for (v, ) in fetchmanychks(cur1):
> 	cur2.execute("INSERT INTO bar (id) VALUES (?);", (v,))
> 	ctr += 1
> 	if ctr == 100:
> 		db.commit()
> 		ctr = 0
> 
> >> Wo hattest Du das Beben nochmal GENAU vorhergesagt?
> > 
> > Zumindest nicht öffentlich!
> 
> Ah, der neueste und bis heute genialste Streich unsere großen
> Kosmologen: Die Geheim-Vorhersage.
>  - Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$1@speranza.aioe.org>

[toc] | [prev] | [standalone]

csiph-web

Python3 + sqlite3: Where's the bug?

Contents

#35199 — Python3 + sqlite3: Where's the bug?

#35201

#35202

#35203

#35204

#35206