Path: csiph.com!eternal-september.org!feeder.eternal-september.org!nntp.eternal-september.org!.POSTED!not-for-mail From: Lawrence =?iso-8859-13?q?D=FFOliveiro?= Newsgroups: comp.os.linux.advocacy,comp.lang.python Subject: Re: Get to know your files and folders! Followup-To: comp.lang.python Date: Tue, 27 Jan 2026 21:02:55 -0000 (UTC) Organization: A noiseless patient Spider Lines: 64 Message-ID: <10lb95u$3suu7$1@dont-email.me> References: <10l8bnv$2o524$1@dont-email.me> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Injection-Date: Tue, 27 Jan 2026 21:02:56 +0000 (UTC) Injection-Info: dont-email.me; posting-host="8dbaa6513129bee5f250cd60106c8ff2"; logging-data="4094919"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18WWGQUXKJtsFktmKMMsqec" User-Agent: Pan/0.165 (Kostiantynivka) Cancel-Lock: sha1:oYyOGVfLq6zvfi46NHL2l/zpGIM= Xref: csiph.com comp.os.linux.advocacy:706318 comp.lang.python:197660 On Mon, 26 Jan 2026 13:28:24 -0500, DFS wrote: > Here's some Python code I wrote to capture file metadata (name, > location, date created, date modified, and size) in a SQLite > database. I would consider this a waste of time. There are already standard *nix commands (e.g. du(1) ) for obtaining this information directly from the filesystem, without the extra steps of collecting the info in a database and having to keep that up to date. > Tested on Windows and Linux/WSL. But not on native Linux? Because WSL forces the Linux kernel to go through the filesystem-handling bottleneck that is the Windows kernel. Just some thoughts: cSQL = " CREATE TABLE Files " cSQL += " ( " cSQL += " FileID INTEGER NOT NULL PRIMARY KEY, " cSQL += " FolderID INTEGER REFERENCES Folders (FolderID), " cSQL += " Folder TEXT NOT NULL, " cSQL += " FileName TEXT NOT NULL, " cSQL += " FileCreated NUMBER NOT NULL, " cSQL += " FileModified NUMBER NOT NULL, " cSQL += " FileSizeKB NUMBER NOT NULL " cSQL += " );" Did you know Python does implicit string concatenation, like C and C++? Also, I notice you are assuming each file has only one parent folder. You do know *nix systems are not restricted like this, right? filesize = round(os.path.getsize(root + '/' + file)/1000,1) filecreate = os.path.getctime(root + '/' + file) filecreate = str(datetime.datetime.fromtimestamp(filecreate))[0:19] filemod = os.path.getmtime(root + '/' + file) How many different file-info lookups do you need to do on each file? How do you handle symlinks? (Yes, even Windows has those now.) The usual way to get this info is with os.lstat() , which returns it all with a single OS call. > The major slowdown is one cartesian/update query - used to summarize > data in all subdirectories - for which I haven't been able to figure > out a decent workaround. As I said, your problem is using a DBMS in the first place. You are doing a cross-join of *all* files against *all* folders. But in the real filesystem, it would be unheard of for *all* files to be present in *all* folders -- or indeed, for many files to be present in more than one folder. Also, I notice your database structure does not reflect the folder hierarchy -- where do you record parent-child relationships between folders? In short, take more account of the actual filesystem hierarchy in your database structure.