Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #73781

Searching for lots of similar strings (filenames) in sqlite3 database

From Adam Funk <a24061@ducksburg.com>
Newsgroups comp.lang.python
Subject Searching for lots of similar strings (filenames) in sqlite3 database
Date 2014-07-01 12:26 +0100
Organization $CABAL
Message-ID <tgia8bx2sf.ln2@news.ducksburg.com> (permalink)

Show all headers | View raw


I have some code that reads files in a leafnode2 news spool & needs to
check for new files periodically.  The full paths are all like
'/var/spool/news/message.id/345/<123456@example.com>' with a 3-digit
subdirectory & a Message-ID for the filename itself.  I'm using Python
3 & sqlite3 in the standard library.

I have a table of filenames created with the following command:

   cursor.execute('CREATE TABLE files (filename TEXT PRIMARY KEY, used INTEGER)')

To check for new files in one of the subdirectories, I run A then
either B or C below (I've tried both).

A.
    listing1 = os.listdir(directory)
    listing [os.path.join(directory, x) for x in listing1]

B.
    cursor = db_conn.cursor()
    for filename in listing:
        cursor.execute('SELECT filename FROM files WHERE filename IS ?', (filename,))
        row = cursor.fetchone()
        if not row:
            cursor.execute('INSERT INTO files VALUES (?, ?)', (filename, 0) )
            files_new += 1
    db_conn.commit()

C.
    cursor = db_conn.cursor()
    subdir_like = directory + '/%'
    cursor.execute('SELECT filename FROM files WHERE filename LIKE ?', (subdir_like,))
    rows = cursor.fetchall()
    known_files =  [row[0] for row in rows]
    for filename in listing:
        if filename not in known_files:
            cursor.execute('INSERT INTO files VALUES (?, ?)', (filename, 0) )
            files_new += 1
    db_conn.commit()

A+B was the first method I came up with, because it looks like the
"keep it simple & let the database do its job" approach, but it was
very time-consuming, so I tested A+C out.  A is quick (a second); B
can drag on for over an hour to check 2000 filenames (for example) in
a subdirectory; C always takes less than a minute.  So C is much
better than B, but it looks (to me) like one of those attempts to
bypass & ignore the database's built-in optimizations.

Comments?


-- 
No sport is less organized than Calvinball!

Back to comp.lang.python | Previous | NextNext in thread | Find similar | Unroll thread


Thread

Searching for lots of similar strings (filenames) in sqlite3 database Adam Funk <a24061@ducksburg.com> - 2014-07-01 12:26 +0100
  Re: Searching for lots of similar strings (filenames) in sqlite3 database Chris Angelico <rosuav@gmail.com> - 2014-07-01 22:06 +1000
    Re: Searching for lots of similar strings (filenames) in sqlite3 database Adam Funk <a24061@ducksburg.com> - 2014-07-01 16:15 +0100
      Re: Searching for lots of similar strings (filenames) in sqlite3 database Chris Angelico <rosuav@gmail.com> - 2014-07-02 01:57 +1000
        Re: Searching for lots of similar strings (filenames) in sqlite3 database Adam Funk <a24061@ducksburg.com> - 2014-07-02 10:30 +0100
        Re: Searching for lots of similar strings (filenames) in sqlite3 database Adam Funk <a24061@ducksburg.com> - 2014-07-02 10:32 +0100
          Re: Searching for lots of similar strings (filenames) in sqlite3 database Chris Angelico <rosuav@gmail.com> - 2014-07-02 20:03 +1000
            Re: Searching for lots of similar strings (filenames) in sqlite3 database Adam Funk <a24061@ducksburg.com> - 2014-07-02 13:30 +0100
  Re: Searching for lots of similar strings (filenames) in sqlite3 database MRAB <python@mrabarnett.plus.com> - 2014-07-01 13:13 +0100
  Re: Searching for lots of similar strings (filenames) in sqlite3 database Chris Angelico <rosuav@gmail.com> - 2014-07-02 00:02 +1000

csiph-web