Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #73833

Re: Searching for lots of similar strings (filenames) in sqlite3 database

From Adam Funk <a24061@ducksburg.com>
Newsgroups comp.lang.python
Subject Re: Searching for lots of similar strings (filenames) in sqlite3 database
Date 2014-07-02 10:32 +0100
Organization $CABAL
Message-ID <770d8bxunj.ln2@news.ducksburg.com> (permalink)
References <tgia8bx2sf.ln2@news.ducksburg.com> <mailman.11362.1404216365.18130.python-list@python.org> <3uva8bx3dn.ln2@news.ducksburg.com> <mailman.11375.1404230249.18130.python-list@python.org>

Show all headers | View raw


On 2014-07-01, Chris Angelico wrote:

> On Wed, Jul 2, 2014 at 1:15 AM, Adam Funk <a24061@ducksburg.com> wrote:
>> On 2014-07-01, Chris Angelico wrote:

>>> There is one critical consideration, though. What happens if the
>>> directory name contains an underscore or percent sign? Or can you
>>> absolutely guarantee that they won't? You may need to escape them, and
>>> I'm not sure how SQLite handles that. (Possibly \_ will match literal
>>> _, and \\ will match literal \, or something like that.)
>>
>> I can guarantee that the directory names are all
>> '/var/spool/news/message.id/' then 3 digits.  (The filenames are
>> pretty wild, since they are MIDs.)  AIUI, using the '?' substitution
>> in the sqlite3 library is supposed to be safe.
>
> This is nothing to do with question-mark substitution. There are two
> separate levels of character significance here - it's like a quoted
> string with a regex. Suppose you want to make a regex that searches
> for an apostrophe. If you try to define that in a single-quoted
> string, you need to escape it:
>
> regex = '^\'$'
>
> However, if you ask the user to enter a regex, that wouldn't be necessary:
>
> regex = input("Enter a pattern: ") # raw_input in Python 2
> Enter a pattern: ^'$
>
> This is what the question mark substitution is like - it avoids the
> need to carefully manage string delimiters and so on. However, if you
> want to make a regex that searches for a backslash, then you need to
> escape it, because the backslash is important to the regex itself. In
> the same way, the underscore and percent sign are significant to the
> LIKE operator. If it were possible to have a directory name with a
> percent sign in it, it would match far too much - because you'd
> construct a LIKE pattern something like (ahem)
> "/var/spool/news/message%20id/142/%" - and as you can see, the percent
> sign at the end is no different from the percent sign in the middle.
>
> But you're safe because you know your data, unrelated to your
> substitution method. Possibly merits a comment... but possibly not
> worth it.

Well, I've changed it to the following anyway.

        subdir_glob = subdir + '/*'
        cursor.execute('SELECT filename FROM files WHERE filename GLOB ?',
                       (subdir_glob,))
        rows = cursor.fetchall()
        known_files = {row[0] for row in rows}

I see what you mean about paths containing '%', but I don't see why
you were concerned about underscores, though.


-- 
You know, there are many people in the country today who, through no
fault of their own, are sane. Some of them were born sane. Some of
them became sane later in their lives.            --― Graham Chapman

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Searching for lots of similar strings (filenames) in sqlite3 database Adam Funk <a24061@ducksburg.com> - 2014-07-01 12:26 +0100
  Re: Searching for lots of similar strings (filenames) in sqlite3 database Chris Angelico <rosuav@gmail.com> - 2014-07-01 22:06 +1000
    Re: Searching for lots of similar strings (filenames) in sqlite3 database Adam Funk <a24061@ducksburg.com> - 2014-07-01 16:15 +0100
      Re: Searching for lots of similar strings (filenames) in sqlite3 database Chris Angelico <rosuav@gmail.com> - 2014-07-02 01:57 +1000
        Re: Searching for lots of similar strings (filenames) in sqlite3 database Adam Funk <a24061@ducksburg.com> - 2014-07-02 10:30 +0100
        Re: Searching for lots of similar strings (filenames) in sqlite3 database Adam Funk <a24061@ducksburg.com> - 2014-07-02 10:32 +0100
          Re: Searching for lots of similar strings (filenames) in sqlite3 database Chris Angelico <rosuav@gmail.com> - 2014-07-02 20:03 +1000
            Re: Searching for lots of similar strings (filenames) in sqlite3 database Adam Funk <a24061@ducksburg.com> - 2014-07-02 13:30 +0100
  Re: Searching for lots of similar strings (filenames) in sqlite3 database MRAB <python@mrabarnett.plus.com> - 2014-07-01 13:13 +0100
  Re: Searching for lots of similar strings (filenames) in sqlite3 database Chris Angelico <rosuav@gmail.com> - 2014-07-02 00:02 +1000

csiph-web