Path: csiph.com!eternal-september.org!feeder.eternal-september.org!nntp.eternal-september.org!.POSTED!not-for-mail
From: Lawrence =?iso-8859-13?q?D=FFOliveiro?= <ldo@nz.invalid>
Newsgroups: comp.os.linux.advocacy,comp.lang.python
Subject: Re: Get to know your files and folders!
Followup-To: comp.lang.python
Date: Tue, 27 Jan 2026 21:02:55 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 64
Message-ID: <10lb95u$3suu7$1@dont-email.me>
References: <10l8bnv$2o524$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 27 Jan 2026 21:02:56 +0000 (UTC)
Injection-Info: dont-email.me; posting-host="8dbaa6513129bee5f250cd60106c8ff2"; logging-data="4094919"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18WWGQUXKJtsFktmKMMsqec"
User-Agent: Pan/0.165 (Kostiantynivka)
Cancel-Lock: sha1:oYyOGVfLq6zvfi46NHL2l/zpGIM=
Xref: csiph.com comp.os.linux.advocacy:706318 comp.lang.python:197660

On Mon, 26 Jan 2026 13:28:24 -0500, DFS wrote:

> Here's some Python code I wrote to capture file metadata (name,
> location, date created, date modified, and size) in a SQLite
> database.

I would consider this a waste of time. There are already standard *nix
commands (e.g. du(1) <https://manpages.debian.org/du(1)>) for
obtaining this information directly from the filesystem, without the
extra steps of collecting the info in a database and having to keep
that up to date.

> Tested on Windows and Linux/WSL.

But not on native Linux? Because WSL forces the Linux kernel to go
through the filesystem-handling bottleneck that is the Windows kernel.

Just some thoughts:

    cSQL =  " CREATE TABLE Files "
    cSQL += " ( "
    cSQL += "   FileID       INTEGER NOT NULL PRIMARY KEY, "
    cSQL += "   FolderID     INTEGER REFERENCES Folders (FolderID), "
    cSQL += "   Folder       TEXT    NOT NULL, "
    cSQL += "   FileName     TEXT    NOT NULL, "
    cSQL += "   FileCreated  NUMBER  NOT NULL, "
    cSQL += "   FileModified NUMBER  NOT NULL, "
    cSQL += "   FileSizeKB   NUMBER  NOT NULL "
    cSQL += " );"

Did you know Python does implicit string concatenation, like C and
C++?

Also, I notice you are assuming each file has only one parent folder.
You do know *nix systems are not restricted like this, right?

    filesize   = round(os.path.getsize(root + '/' + file)/1000,1)
    filecreate = os.path.getctime(root + '/' + file)
    filecreate = str(datetime.datetime.fromtimestamp(filecreate))[0:19]
    filemod    = os.path.getmtime(root + '/' + file)

How many different file-info lookups do you need to do on each file?
How do you handle symlinks? (Yes, even Windows has those now.)

The usual way to get this info is with os.lstat()
<https://docs.python.org/3/library/os.html#os.lstat>, which returns it
all with a single OS call.

> The major slowdown is one cartesian/update query - used to summarize
> data in all subdirectories - for which I haven't been able to figure
> out a decent workaround.

As I said, your problem is using a DBMS in the first place. You are
doing a cross-join of *all* files against *all* folders. But in the
real filesystem, it would be unheard of for *all* files to be present
in *all* folders -- or indeed, for many files to be present in more
than one folder.

Also, I notice your database structure does not reflect the folder
hierarchy -- where do you record parent-child relationships between
folders?

In short, take more account of the actual filesystem hierarchy in your
database structure.