Groups > comp.lang.python > #62292 > unrolled thread

Re: Is it more CPU-efficient to read/write config file or read/write sqlite database?

Started by	Chris Angelico <rosuav@gmail.com>
First post	2013-12-18 21:50 +1100
Last post	2013-12-18 20:06 -0600
Articles	7 — 4 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.

  Re: Is it more CPU-efficient to read/write config file or read/write sqlite database? Chris Angelico <rosuav@gmail.com> - 2013-12-18 21:50 +1100
    Re: Is it more CPU-efficient to read/write config file or read/write sqlite database? dick <encore1@cox.net> - 2013-12-18 09:49 -0800
      Re: Is it more CPU-efficient to read/write config file or read/write sqlite database? Tim Chase <python.list@tim.thechases.com> - 2013-12-18 12:00 -0600
        Re: Is it more CPU-efficient to read/write config file or read/write sqlite database? dick <encore1@cox.net> - 2013-12-18 15:14 -0800
          Re: Is it more CPU-efficient to read/write config file or read/write sqlite database? Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-12-18 19:31 -0500
          Re: Is it more CPU-efficient to read/write config file or read/write sqlite database? Chris Angelico <rosuav@gmail.com> - 2013-12-19 11:56 +1100
          Re: Is it more CPU-efficient to read/write config file or read/write sqlite database? Tim Chase <python.list@tim.thechases.com> - 2013-12-18 20:06 -0600

#62292 — Re: Is it more CPU-efficient to read/write config file or read/write sqlite database?

From	Chris Angelico <rosuav@gmail.com>
Date	2013-12-18 21:50 +1100
Subject	Re: Is it more CPU-efficient to read/write config file or read/write sqlite database?
Message-ID	<mailman.4344.1387363803.18130.python-list@python.org>

On Wed, Dec 18, 2013 at 9:31 PM, Cameron Simpson <cs@zip.com.au> wrote:
> On 18Dec2013 14:35, Chris Angelico <rosuav@gmail.com> wrote:
>> An SQL database *is* a different form of storage. It's storing tabular
>> data, not a stream of bytes in a file. You're supposed to be able to
>> treat it as an efficient way to locate a particular tuple based on a
>> set of rules, not a different way to format a file on the disk.
>
> Shrug. It's all just data to me. I don't _care_ about the particular
> internal storage format.

Then use a file, because you want file semantics. That's why you have
both options available.

> Commit() is a logical operation saying this SQL changeset is now
> part of the global state.

The global state is defined by what's on the disk. Specifically, by
what would be read if the power failed right at that moment. In the
case of PostgreSQL, a commit doesn't actually write the table pages -
it just writes the WAL (Write-Ahead Log), which is used to recreate
the transaction. If something fails hard, the WAL replay will apply
the change perfectly. That's the global state. It's not there till the
WAL's been fsync'd.

>> Also: the filesystem layer doesn't guarantee integrity. If you don't
>> fsync() or fdatasync() or some other equivalent [1], it's not on the
>> disk yet, so you can't trust it.
>
> Course I can. There's plenty of scope within the disc physical layer
> (buffering, caching, RAID card buffering) for an fsync() to return
> _before_ the data are written to ferrous oxide (or whatever) because
> the OS DOES NOT KNOW.

The theory of fsync is that it's actually written. If it's been
written to a battery-backed cache that will be flushed to platters
successfully even if the power fails, then it's been fsync'd. That's
not a problem. It *is* a problem if it's been written to a volatile
cache on an SSD and there's more than can be written in the event of a
power failure. That's why there are only two lines of SSD (Intel 320
and 710 series) that are recommended for use with PGSQL.

> All that has happened after an fsync() is that the OS taken your
> SQL changeset that you commited to the OS data abstraction and
> pushed it one layer lower into the "disk" abstraction. There's more
> going on in there.

Not just pushed it one layer lower; the point of fsync is that it's
been pushed all the way down. See its man page [1]:

"""fsync() transfers ("flushes") all modified in-core data ... to the
disk ... so that all changed information can be retrieved even after
the system crashed or was rebooted."""

It's fundamentally about crash recovery, not about "passing it to a
lower abstraction". Of course, the OS isn't always *able* to guarantee
things (NFS shares are notoriously hard to pin down), but the
intention of fsync is that it won't return (and therefore the COMMIT
operation won't finish) until the data can be read back reliably even
in the event of a major failure.

Databases protect against that. If you want that protection, use a
database. If you don't, use a file. There's nothing wrong with either
option.

ChrisA

[1] on the web here, for those who don't have them handy:
http://linux.die.net/man/2/fsync

[toc] | [next] | [standalone]

#62319

From	dick <encore1@cox.net>
Date	2013-12-18 09:49 -0800
Message-ID	<iln3b99rqvioro188l8d7tivtluhse44mt@4ax.com>
In reply to	#62292

On Wed, 18 Dec 2013 21:50:00 +1100, Chris Angelico <rosuav@gmail.com>
wrote:

>On Wed, Dec 18, 2013 at 9:31 PM, Cameron Simpson <cs@zip.com.au> wrote:
>> On 18Dec2013 14:35, Chris Angelico <rosuav@gmail.com> wrote:
>>> An SQL database *is* a different form of storage. It's storing tabular
>>> data, not a stream of bytes in a file. You're supposed to be able to
>>> treat it as an efficient way to locate a particular tuple based on a
>>> set of rules, not a different way to format a file on the disk.
>>
>> Shrug. It's all just data to me. I don't _care_ about the particular
>> internal storage format.
>
>Then use a file, because you want file semantics. That's why you have
>both options available.
>
>> Commit() is a logical operation saying this SQL changeset is now
>> part of the global state.
>
>The global state is defined by what's on the disk. Specifically, by
>what would be read if the power failed right at that moment. In the
>case of PostgreSQL, a commit doesn't actually write the table pages -
>it just writes the WAL (Write-Ahead Log), which is used to recreate
>the transaction. If something fails hard, the WAL replay will apply
>the change perfectly. That's the global state. It's not there till the
>WAL's been fsync'd.
>
  <snip>
>
>> All that has happened after an fsync() is that the OS taken your
>> SQL changeset that you commited to the OS data abstraction and
>> pushed it one layer lower into the "disk" abstraction. There's more
>> going on in there.
>
>Not just pushed it one layer lower; the point of fsync is that it's
>been pushed all the way down. See its man page [1]:
>
>"""fsync() transfers ("flushes") all modified in-core data ... to the
>disk ... so that all changed information can be retrieved even after
>the system crashed or was rebooted."""
>
>It's fundamentally about crash recovery, not about "passing it to a
>lower abstraction". Of course, the OS isn't always *able* to guarantee
>things (NFS shares are notoriously hard to pin down), but the
>intention of fsync is that it won't return (and therefore the COMMIT
>operation won't finish) until the data can be read back reliably even
>in the event of a major failure.
>
>Databases protect against that. If you want that protection, use a
>database. If you don't, use a file. There's nothing wrong with either
>option.
>
>ChrisA
>
>[1] on the web here, for those who don't have them handy:
>http://linux.die.net/man/2/fsync

Don't forget that most hard disks have an option to cache the write
data. This is a 'feature' that allows the manufacturers to claim
better write performance. You can't be sure when the data is written
to the disk if that option is in play.

Dick

[toc] | [prev] | [next] | [standalone]

#62322

From	Tim Chase <python.list@tim.thechases.com>
Date	2013-12-18 12:00 -0600
Message-ID	<mailman.4368.1387389583.18130.python-list@python.org>
In reply to	#62319

On 2013-12-18 09:49, dick wrote:
> Don't forget that most hard disks have an option to cache the write
> data. This is a 'feature' that allows the manufacturers to claim
> better write performance. You can't be sure when the data is written
> to the disk if that option is in play.

However, my understanding is that they have a small on-drive
battery/capacitor that stores sufficient energy for the cached
write(s) to complete in the event the system's power abruptly cuts
off.

Granted, this is purely hearsay, as it's been a long time since I
mucked around with hardware much.

-tkc

[toc] | [prev] | [next] | [standalone]

#62346

From	dick <encore1@cox.net>
Date	2013-12-18 15:14 -0800
Message-ID	<iva4b9d7ip3u3g75kamttpl81c39hvehu0@4ax.com>
In reply to	#62322

On Wed, 18 Dec 2013 12:00:50 -0600, Tim Chase
<python.list@tim.thechases.com> wrote:

>On 2013-12-18 09:49, dick wrote:
>> Don't forget that most hard disks have an option to cache the write
>> data. This is a 'feature' that allows the manufacturers to claim
>> better write performance. You can't be sure when the data is written
>> to the disk if that option is in play.
>
>However, my understanding is that they have a small on-drive
>battery/capacitor that stores sufficient energy for the cached
>write(s) to complete in the event the system's power abruptly cuts
>off.
>
>Granted, this is purely hearsay, as it's been a long time since I
>mucked around with hardware much.
>
>-tkc
>
>
The drives may have something like that now, but they didn't have any
power down flush capability when I was working for WD. Of course, that
was 15 years ago...

Dick

[toc] | [prev] | [next] | [standalone]

#62350

From	Dennis Lee Bieber <wlfraed@ix.netcom.com>
Date	2013-12-18 19:31 -0500
Message-ID	<mailman.4392.1387413308.18130.python-list@python.org>
In reply to	#62346

On Wed, 18 Dec 2013 15:14:55 -0800, dick <encore1@cox.net> declaimed the
following:

>The drives may have something like that now, but they didn't have any
>power down flush capability when I was working for WD. Of course, that
>was 15 years ago...
>
	And I remember days when I used to run a bat file to park the drive
heads before power down.
-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
    wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

[toc] | [prev] | [next] | [standalone]

#62353

From	Chris Angelico <rosuav@gmail.com>
Date	2013-12-19 11:56 +1100
Message-ID	<mailman.4394.1387414611.18130.python-list@python.org>
In reply to	#62346

On Thu, Dec 19, 2013 at 11:31 AM, Dennis Lee Bieber
<wlfraed@ix.netcom.com> wrote:
> On Wed, 18 Dec 2013 15:14:55 -0800, dick <encore1@cox.net> declaimed the
> following:
>
>>The drives may have something like that now, but they didn't have any
>>power down flush capability when I was working for WD. Of course, that
>>was 15 years ago...
>>
>         And I remember days when I used to run a bat file to park the drive
> heads before power down.

I grew up with that!

Heads parked on one drive.

ChrisA

[toc] | [prev] | [next] | [standalone]

#62357

From	Tim Chase <python.list@tim.thechases.com>
Date	2013-12-18 20:06 -0600
Message-ID	<mailman.4395.1387418726.18130.python-list@python.org>
In reply to	#62346

On 2013-12-18 15:14, dick wrote:
>>However, my understanding is that they have a small on-drive
>>battery/capacitor that stores sufficient energy for the cached
>>write(s) to complete in the event the system's power abruptly cuts
>>off.
>>
>>Granted, this is purely hearsay, as it's been a long time since I
>>mucked around with hardware much.
>>  
> The drives may have something like that now, but they didn't have
> any power down flush capability when I was working for WD. Of
> course, that was 15 years ago...

<old_fart>
Indeed, I certainly remember launching park.exe on my DOS & Win95
machines to flush write-caches and park the heads in preparation of
power-down.
</old_fart>
I recall being told by multiple hardware professionals
since 2000 that such wasn't needed any more.  I don't have reason to
doubt them, especially as I no longer see references to parking heads
in any OS or add-on utility.

At least on linux (possibly the BSDs too), one can specify that
particular mounts are done with the "sync" option to force all writes
to make it to the metal/EEPROM before returning, though I don't think
this is the default.

-tkc

[toc] | [prev] | [standalone]

csiph-web

Re: Is it more CPU-efficient to read/write config file or read/write sqlite database?

Contents

#62292 — Re: Is it more CPU-efficient to read/write config file or read/write sqlite database?

#62319

#62322

#62346

#62350

#62353

#62357