Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #197660
| From | Lawrence D’Oliveiro <ldo@nz.invalid> |
|---|---|
| Newsgroups | comp.os.linux.advocacy, comp.lang.python |
| Subject | Re: Get to know your files and folders! |
| Followup-To | comp.lang.python |
| Date | 2026-01-27 21:02 +0000 |
| Organization | A noiseless patient Spider |
| Message-ID | <10lb95u$3suu7$1@dont-email.me> (permalink) |
| References | <10l8bnv$2o524$1@dont-email.me> |
Cross-posted to 2 groups.
Followups directed to: comp.lang.python
On Mon, 26 Jan 2026 13:28:24 -0500, DFS wrote:
> Here's some Python code I wrote to capture file metadata (name,
> location, date created, date modified, and size) in a SQLite
> database.
I would consider this a waste of time. There are already standard *nix
commands (e.g. du(1) <https://manpages.debian.org/du(1)>) for
obtaining this information directly from the filesystem, without the
extra steps of collecting the info in a database and having to keep
that up to date.
> Tested on Windows and Linux/WSL.
But not on native Linux? Because WSL forces the Linux kernel to go
through the filesystem-handling bottleneck that is the Windows kernel.
Just some thoughts:
cSQL = " CREATE TABLE Files "
cSQL += " ( "
cSQL += " FileID INTEGER NOT NULL PRIMARY KEY, "
cSQL += " FolderID INTEGER REFERENCES Folders (FolderID), "
cSQL += " Folder TEXT NOT NULL, "
cSQL += " FileName TEXT NOT NULL, "
cSQL += " FileCreated NUMBER NOT NULL, "
cSQL += " FileModified NUMBER NOT NULL, "
cSQL += " FileSizeKB NUMBER NOT NULL "
cSQL += " );"
Did you know Python does implicit string concatenation, like C and
C++?
Also, I notice you are assuming each file has only one parent folder.
You do know *nix systems are not restricted like this, right?
filesize = round(os.path.getsize(root + '/' + file)/1000,1)
filecreate = os.path.getctime(root + '/' + file)
filecreate = str(datetime.datetime.fromtimestamp(filecreate))[0:19]
filemod = os.path.getmtime(root + '/' + file)
How many different file-info lookups do you need to do on each file?
How do you handle symlinks? (Yes, even Windows has those now.)
The usual way to get this info is with os.lstat()
<https://docs.python.org/3/library/os.html#os.lstat>, which returns it
all with a single OS call.
> The major slowdown is one cartesian/update query - used to summarize
> data in all subdirectories - for which I haven't been able to figure
> out a decent workaround.
As I said, your problem is using a DBMS in the first place. You are
doing a cross-join of *all* files against *all* folders. But in the
real filesystem, it would be unheard of for *all* files to be present
in *all* folders -- or indeed, for many files to be present in more
than one folder.
Also, I notice your database structure does not reflect the folder
hierarchy -- where do you record parent-child relationships between
folders?
In short, take more account of the actual filesystem hierarchy in your
database structure.
Back to comp.lang.python | Previous | Next — Next in thread | Find similar
Re: Get to know your files and folders! Lawrence D’Oliveiro <ldo@nz.invalid> - 2026-01-27 21:02 +0000 Re: Get to know your files and folders! DFS <nospam@dfs.com> - 2026-01-27 23:22 -0500
csiph-web