Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #73795

Re: Searching for lots of similar strings (filenames) in sqlite3 database

References <tgia8bx2sf.ln2@news.ducksburg.com> <mailman.11362.1404216365.18130.python-list@python.org> <3uva8bx3dn.ln2@news.ducksburg.com>
Date 2014-07-02 01:57 +1000
Subject Re: Searching for lots of similar strings (filenames) in sqlite3 database
From Chris Angelico <rosuav@gmail.com>
Newsgroups comp.lang.python
Message-ID <mailman.11375.1404230249.18130.python-list@python.org> (permalink)

Show all headers | View raw


On Wed, Jul 2, 2014 at 1:15 AM, Adam Funk <a24061@ducksburg.com> wrote:
> On 2014-07-01, Chris Angelico wrote:
>
>> On Tue, Jul 1, 2014 at 9:26 PM, Adam Funk <a24061@ducksburg.com> wrote:
>>>         cursor.execute('SELECT filename FROM files WHERE filename IS ?', (filename,))
>>
>> Shouldn't this be an equality check rather than IS, which normally I'd
>> expect to be "IS NULL" or "IS NOT NULL"?
>
> Oh, it probably should be in "heavy" SQL.  In SQLite, '==', '=', &
> 'IS' are interchangeable.
>
> http://www.tutorialspoint.com/sqlite/sqlite_operators.htm

Ah, okay. In that case, I'd advise going with either == for
consistency with the rest of Python, or (preferably) = for consistency
with other SQL engines. You wouldn't use "is" to test if two Python
strings are equal, so there's no particular reason to use it here :)

> Oh, even better:
>
>     add_files = listing - known_files
>     delete_files = known_files - listing
>
> and then I can remove files that have disappeared off the spool from
> the table.  Thanks very much!

Ah! Didn't know that was a valuable feature for you, but getting that
"for free" is an extra little bonus, so that's awesome!

>> There is one critical consideration, though. What happens if the
>> directory name contains an underscore or percent sign? Or can you
>> absolutely guarantee that they won't? You may need to escape them, and
>> I'm not sure how SQLite handles that. (Possibly \_ will match literal
>> _, and \\ will match literal \, or something like that.)
>
> I can guarantee that the directory names are all
> '/var/spool/news/message.id/' then 3 digits.  (The filenames are
> pretty wild, since they are MIDs.)  AIUI, using the '?' substitution
> in the sqlite3 library is supposed to be safe.

This is nothing to do with question-mark substitution. There are two
separate levels of character significance here - it's like a quoted
string with a regex. Suppose you want to make a regex that searches
for an apostrophe. If you try to define that in a single-quoted
string, you need to escape it:

regex = '^\'$'

However, if you ask the user to enter a regex, that wouldn't be necessary:

regex = input("Enter a pattern: ") # raw_input in Python 2
Enter a pattern: ^'$

This is what the question mark substitution is like - it avoids the
need to carefully manage string delimiters and so on. However, if you
want to make a regex that searches for a backslash, then you need to
escape it, because the backslash is important to the regex itself. In
the same way, the underscore and percent sign are significant to the
LIKE operator. If it were possible to have a directory name with a
percent sign in it, it would match far too much - because you'd
construct a LIKE pattern something like (ahem)
"/var/spool/news/message%20id/142/%" - and as you can see, the percent
sign at the end is no different from the percent sign in the middle.

But you're safe because you know your data, unrelated to your
substitution method. Possibly merits a comment... but possibly not
worth it.

ChrisA

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Searching for lots of similar strings (filenames) in sqlite3 database Adam Funk <a24061@ducksburg.com> - 2014-07-01 12:26 +0100
  Re: Searching for lots of similar strings (filenames) in sqlite3 database Chris Angelico <rosuav@gmail.com> - 2014-07-01 22:06 +1000
    Re: Searching for lots of similar strings (filenames) in sqlite3 database Adam Funk <a24061@ducksburg.com> - 2014-07-01 16:15 +0100
      Re: Searching for lots of similar strings (filenames) in sqlite3 database Chris Angelico <rosuav@gmail.com> - 2014-07-02 01:57 +1000
        Re: Searching for lots of similar strings (filenames) in sqlite3 database Adam Funk <a24061@ducksburg.com> - 2014-07-02 10:30 +0100
        Re: Searching for lots of similar strings (filenames) in sqlite3 database Adam Funk <a24061@ducksburg.com> - 2014-07-02 10:32 +0100
          Re: Searching for lots of similar strings (filenames) in sqlite3 database Chris Angelico <rosuav@gmail.com> - 2014-07-02 20:03 +1000
            Re: Searching for lots of similar strings (filenames) in sqlite3 database Adam Funk <a24061@ducksburg.com> - 2014-07-02 13:30 +0100
  Re: Searching for lots of similar strings (filenames) in sqlite3 database MRAB <python@mrabarnett.plus.com> - 2014-07-01 13:13 +0100
  Re: Searching for lots of similar strings (filenames) in sqlite3 database Chris Angelico <rosuav@gmail.com> - 2014-07-02 00:02 +1000

csiph-web