Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #35199 > unrolled thread
| Started by | Johannes Bauer <dfnsonfsduifb@gmx.de> |
|---|---|
| First post | 2012-12-20 15:52 +0100 |
| Last post | 2012-12-20 10:57 -0500 |
| Articles | 6 — 4 participants |
Back to article view | Back to comp.lang.python
Python3 + sqlite3: Where's the bug? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2012-12-20 15:52 +0100
Re: Python3 + sqlite3: Where's the bug? Chris Angelico <rosuav@gmail.com> - 2012-12-21 02:05 +1100
Re: Python3 + sqlite3: Where's the bug? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2012-12-20 16:20 +0100
Re: Python3 + sqlite3: Where's the bug? Chris Angelico <rosuav@gmail.com> - 2012-12-21 02:55 +1100
Re: Python3 + sqlite3: Where's the bug? Hans Mulder <hansmu@xs4all.nl> - 2012-12-20 17:35 +0100
Re: Python3 + sqlite3: Where's the bug? inq1ltd <inq1ltd@inqvista.com> - 2012-12-20 10:57 -0500
| From | Johannes Bauer <dfnsonfsduifb@gmx.de> |
|---|---|
| Date | 2012-12-20 15:52 +0100 |
| Subject | Python3 + sqlite3: Where's the bug? |
| Message-ID | <kav8ni$f38$1@news.albasani.net> |
Hi group,
I've run into a problem using Python3.2 and sqlite3 db access that I
can't quite wrap my head around. I'm pretty sure there's a bug in my
program, but I can't see where. Help is greatly appreciated. I've
created a minimal example to demonstrate the phaenomenon (attached at
bottom).
First, the program creates a db and inits two tables "foo" and "bar",
which both only have a "int" value. Then "foo" is populated with unique
ints.
A fetchmanychks function is supposed to have the same behavior as
fetchall(), but instead perform the operation in many subsequent
fetchmany() chunks.
When I traverse the foo table using cursor cur1 and insert into the bar
table using cursor cur2, I receive at some point:
Traceback (most recent call last):
File "y.py", line 25, in <module>
cur2.execute("INSERT INTO bar (id) VALUES (?);", (v,))
sqlite3.IntegrityError: PRIMARY KEY must be unique
Which means that the fetchmany() read returns the *same* value again!
How is this possible? If I either
- Remove the "db.commit()"
- Replace fetchmanychks(cur1) by cur1.fetchall()
it works without error -- but I want neither (I want regular commits
because sqlite3 becomes horribly slow when the journal becomes large and
the tables nothing to do with each other anyways and atomicity is not
needed in my case).
Do I grossly misunderstand fetchmany() or where's my bug here?
Thanks in advance,
Joe
#!/usr/bin/python3.2
import sqlite3
db = sqlite3.connect("foobar.sqlite")
cur1 = db.cursor()
cur2 = db.cursor()
def fetchmanychks(cursor):
cursor.execute("SELECT id FROM foo;")
while True:
result = cursor.fetchmany()
if len(result) == 0:
break
for x in result:
yield x
cur1.execute("CREATE TABLE foo (id integer PRIMARY KEY);")
cur1.execute("CREATE TABLE bar (id integer PRIMARY KEY);")
for i in range(0, 200000, 5):
cur1.execute("INSERT INTO foo VALUES (?);", (i,))
db.commit()
ctr = 0
for (v, ) in fetchmanychks(cur1):
cur2.execute("INSERT INTO bar (id) VALUES (?);", (v,))
ctr += 1
if ctr == 100:
db.commit()
ctr = 0
--
>> Wo hattest Du das Beben nochmal GENAU vorhergesagt?
> Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
- Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$1@speranza.aioe.org>
[toc] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2012-12-21 02:05 +1100 |
| Message-ID | <mailman.1100.1356015940.29569.python-list@python.org> |
| In reply to | #35199 |
On Fri, Dec 21, 2012 at 1:52 AM, Johannes Bauer <dfnsonfsduifb@gmx.de> wrote:
> def fetchmanychks(cursor):
> cursor.execute("SELECT id FROM foo;")
> while True:
> result = cursor.fetchmany()
> if len(result) == 0:
> break
> for x in result:
> yield x
I'm not familiar with sqlite, but from working with other databases,
I'm wondering if possibly your commits are breaking the fetchmany.
Would it spoil your performance improvements to do all the fetchmany
calls before yielding anything? Alternatively, can you separate the
two by opening a separate database connection for the foo-reading (so
it isn't affected by the commit)?
ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Johannes Bauer <dfnsonfsduifb@gmx.de> |
|---|---|
| Date | 2012-12-20 16:20 +0100 |
| Message-ID | <kavab7$ihd$1@news.albasani.net> |
| In reply to | #35201 |
On 20.12.2012 16:05, Chris Angelico wrote:
> On Fri, Dec 21, 2012 at 1:52 AM, Johannes Bauer <dfnsonfsduifb@gmx.de> wrote:
>> def fetchmanychks(cursor):
>> cursor.execute("SELECT id FROM foo;")
>> while True:
>> result = cursor.fetchmany()
>> if len(result) == 0:
>> break
>> for x in result:
>> yield x
>
> I'm not familiar with sqlite, but from working with other databases,
> I'm wondering if possibly your commits are breaking the fetchmany.
Hmm, but this:
def fetchmanychks(cursor):
cursor.execute("SELECT id FROM foo;")
while True:
result = cursor.fetchone()
if result is not None:
yield result
else:
break
Works nicely -- only the fetchmany() makes the example break.
> Would it spoil your performance improvements to do all the fetchmany
> calls before yielding anything?
Well this would effectively then be a fetchall() call -- this is
problematic since the source data is LARGE (spekaing of gigabytes of
data here).
> Alternatively, can you separate the
> two by opening a separate database connection for the foo-reading (so
> it isn't affected by the commit)?
At that point in the code I don't actually have a filename anymore,
merely the connection. But shouldn't the cursor actually be the
"correct" solution? I.e. in theory, should the example work at all or am
I thinking wrong?
Because if I'm approaching this from the wrong angle, I'll have no
choice but to change all that code to open separate connections to the
same file (something that currently are no provisions for).
Best regards,
Johannes
--
>> Wo hattest Du das Beben nochmal GENAU vorhergesagt?
> Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
- Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$1@speranza.aioe.org>
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2012-12-21 02:55 +1100 |
| Message-ID | <mailman.1101.1356018959.29569.python-list@python.org> |
| In reply to | #35202 |
On Fri, Dec 21, 2012 at 2:20 AM, Johannes Bauer <dfnsonfsduifb@gmx.de> wrote: > Hmm, but this: > > result = cursor.fetchone() > yield result > > Works nicely -- only the fetchmany() makes the example break. Okay, now it's sounding specific to sqlite. I'll bow out. :) > >> Would it spoil your performance improvements to do all the fetchmany >> calls before yielding anything? > > Well this would effectively then be a fetchall() call -- this is > problematic since the source data is LARGE (spekaing of gigabytes of > data here). That would be a "yes", then. Scratch that! >> Alternatively, can you separate the >> two by opening a separate database connection for the foo-reading (so >> it isn't affected by the commit)? > > At that point in the code I don't actually have a filename anymore, > merely the connection. But shouldn't the cursor actually be the > "correct" solution? I.e. in theory, should the example work at all or am > I thinking wrong? You say "db.commit()", not "cur2.commit()", so I don't see that a cursor would un-break what part-way commits is breaking. > Because if I'm approaching this from the wrong angle, I'll have no > choice but to change all that code to open separate connections to the > same file (something that currently are no provisions for). Is that an sqlite limitation, or just one of your code? I poked around at the sqlite3 docs, but didn't find any obvious "clone" option on the connection, nor a way to retrieve the file name. That would have been fairly convenient. Oh well. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Hans Mulder <hansmu@xs4all.nl> |
|---|---|
| Date | 2012-12-20 17:35 +0100 |
| Message-ID | <50d33e3a$0$6889$e4fe514c@news2.news.xs4all.nl> |
| In reply to | #35202 |
On 20/12/12 16:20:13, Johannes Bauer wrote:
> On 20.12.2012 16:05, Chris Angelico wrote:
>> On Fri, Dec 21, 2012 at 1:52 AM, Johannes Bauer <dfnsonfsduifb@gmx.de> wrote:
>>> def fetchmanychks(cursor):
>>> cursor.execute("SELECT id FROM foo;")
>>> while True:
>>> result = cursor.fetchmany()
>>> if len(result) == 0:
>>> break
>>> for x in result:
>>> yield x
>>
>> I'm not familiar with sqlite, but from working with other databases,
>> I'm wondering if possibly your commits are breaking the fetchmany.
Yes, that's what it looks like.
I think that should be considered a bug in fetchmany.
> Hmm, but this:
>
> def fetchmanychks(cursor):
> cursor.execute("SELECT id FROM foo;")
> while True:
> result = cursor.fetchone()
> if result is not None:
> yield result
> else:
> break
>
> Works nicely -- only the fetchmany() makes the example break.
On my system, fetchmany() defaults to returning only one row.
The documentation says that the default should be the optimal
number of rows per chunk for the underlying database engine.
If the optimum is indeed fetchone one row at a time, then
maybe you could consider using fetchone() as a work-around.
>> Would it spoil your performance improvements to do all the
>> fetchmany calls before yielding anything?
>
> Well this would effectively then be a fetchall() call -- this is
> problematic since the source data is LARGE (speaking of gigabytes
> of data here).
>
>> Alternatively, can you separate the
>> two by opening a separate database connection for the foo-reading
>> (so it isn't affected by the commit)?
>
> At that point in the code I don't actually have a filename anymore,
> merely the connection. But shouldn't the cursor actually be the
> "correct" solution? I.e. in theory, should the example work at all
> or am I thinking wrong?
I think you're right and that fetchmany is broken.
> Because if I'm approaching this from the wrong angle, I'll have no
> choice but to change all that code to open separate connections to
> the same file (something that currently are no provisions for).
Hope this helps,
-- HansM
[toc] | [prev] | [next] | [standalone]
| From | inq1ltd <inq1ltd@inqvista.com> |
|---|---|
| Date | 2012-12-20 10:57 -0500 |
| Message-ID | <mailman.1102.1356024160.29569.python-list@python.org> |
| In reply to | #35199 |
[Multipart message — attachments visible in raw view] — view raw
On Thursday, December 20, 2012 03:52:39 PM Johannes Bauer wrote:
> Hi group,
>
> I've run into a problem using Python3.2 and sqlite3 db access that I
> can't quite wrap my head around. I'm pretty sure there's a bug in my
> program, but I can't see where. Help is greatly appreciated. I've
> created a minimal example to demonstrate the phaenomenon (attached at
> bottom).
>
> First, the program creates a db and inits two tables "foo" and "bar",
> which both only have a "int" value. Then "foo" is populated with unique
> ints.
>
> A fetchmanychks function is supposed to have the same behavior as
> fetchall(), but instead perform the operation in many subsequent
> fetchmany() chunks.
>
> When I traverse the foo table using cursor cur1 and insert into the bar
> table using cursor cur2, I receive at some point:
>
> Traceback (most recent call last):
> File "y.py", line 25, in <module>
> cur2.execute("INSERT INTO bar (id) VALUES (?);", (v,))
> sqlite3.IntegrityError: PRIMARY KEY must be unique
>
> Which means that the fetchmany() read returns the *same* value again!
> How is this possible? If I either
>
> - Remove the "db.commit()"
> - Replace fetchmanychks(cur1) by cur1.fetchall()
>
> it works without error -- but I want neither (I want regular commits
> because sqlite3 becomes horribly slow when the journal becomes large and
> the tables nothing to do with each other anyways and atomicity is not
> needed in my case).
>
> Do I grossly misunderstand fetchmany() or where's my bug here?
>
> Thanks in advance,
> Joe
>
>
Joe,
Both of the following addresses will get you to the same place.
You will get an answer from the sqlite help site.
sqlite-users@sqlite.org
General Discussion of SQLite Database <sqlite-users@sqlite.org>
jd
inqvista.com
>
>
> #!/usr/bin/python3.2
> import sqlite3
>
> db = sqlite3.connect("foobar.sqlite")
> cur1 = db.cursor()
> cur2 = db.cursor()
>
> def fetchmanychks(cursor):
> cursor.execute("SELECT id FROM foo;")
> while True:
> result = cursor.fetchmany()
> if len(result) == 0:
> break
> for x in result:
> yield x
>
> cur1.execute("CREATE TABLE foo (id integer PRIMARY KEY);")
> cur1.execute("CREATE TABLE bar (id integer PRIMARY KEY);")
> for i in range(0, 200000, 5):
> cur1.execute("INSERT INTO foo VALUES (?);", (i,))
> db.commit()
>
> ctr = 0
> for (v, ) in fetchmanychks(cur1):
> cur2.execute("INSERT INTO bar (id) VALUES (?);", (v,))
> ctr += 1
> if ctr == 100:
> db.commit()
> ctr = 0
>
> >> Wo hattest Du das Beben nochmal GENAU vorhergesagt?
> >
> > Zumindest nicht öffentlich!
>
> Ah, der neueste und bis heute genialste Streich unsere großen
> Kosmologen: Die Geheim-Vorhersage.
> - Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$1@speranza.aioe.org>
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web