Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.sys.mac.system > #107091 > unrolled thread

APFS and software updates

Started byJF Mezei <jfmezei.spamnot@vaxination.ca>
First post2017-05-20 20:07 -0400
Last post2017-05-23 21:18 +0000
Articles 20 on this page of 21 — 4 participants

Back to article view | Back to comp.sys.mac.system


Contents

  APFS and software updates JF Mezei <jfmezei.spamnot@vaxination.ca> - 2017-05-20 20:07 -0400
    Re: APFS and software updates dempson@actrix.gen.nz (David Empson) - 2017-05-21 17:59 +1200
      Re: APFS and software updates Jolly Roger <jollyroger@pobox.com> - 2017-05-21 16:26 +0000
      Re: APFS and software updates JF Mezei <jfmezei.spamnot@vaxination.ca> - 2017-05-21 13:00 -0400
        Re: APFS and software updates dempson@actrix.gen.nz (David Empson) - 2017-05-22 12:02 +1200
          Re: APFS and software updates JF Mezei <jfmezei.spamnot@vaxination.ca> - 2017-05-21 23:36 -0400
            Re: APFS and software updates Jolly Roger <jollyroger@pobox.com> - 2017-05-22 04:35 +0000
              Re: APFS and software updates Jolly Roger <jollyroger@pobox.com> - 2017-05-22 15:18 +0000
            Re: APFS and software updates dempson@actrix.gen.nz (David Empson) - 2017-05-23 14:11 +1200
              Re: APFS and software updates JF Mezei <jfmezei.spamnot@vaxination.ca> - 2017-05-22 23:15 -0400
                Re: APFS and software updates dempson@actrix.gen.nz (David Empson) - 2017-05-24 23:27 +1200
                  Re: APFS and software updates JF Mezei <jfmezei.spamnot@vaxination.ca> - 2017-05-24 15:57 -0400
                    Re: APFS and software updates dempson@actrix.gen.nz (David Empson) - 2017-05-25 12:37 +1200
                      Re: APFS and software updates JF Mezei <jfmezei.spamnot@vaxination.ca> - 2017-05-24 20:51 -0400
                        Re: APFS and software updates dempson@actrix.gen.nz (David Empson) - 2017-05-26 13:44 +1200
                          Re: APFS and software updates Jolly Roger <jollyroger@pobox.com> - 2017-05-26 02:11 +0000
                          Re: APFS and software updates JF Mezei <jfmezei.spamnot@vaxination.ca> - 2017-05-26 00:08 -0400
                    Re: APFS and software updates Lewis <g.kreme@gmail.com.dontsendmecopies> - 2017-05-26 01:27 +0000
              Re: APFS and software updates Lewis <g.kreme@gmail.com.dontsendmecopies> - 2017-05-23 09:36 +0000
                Re: APFS and software updates dempson@actrix.gen.nz (David Empson) - 2017-05-24 01:42 +1200
                  Re: APFS and software updates Lewis <g.kreme@gmail.com.dontsendmecopies> - 2017-05-23 21:18 +0000

Page 1 of 2  [1] 2  Next page →


#107091 — APFS and software updates

FromJF Mezei <jfmezei.spamnot@vaxination.ca>
Date2017-05-20 20:07 -0400
SubjectAPFS and software updates
Message-ID<5920da3b$0$17396$c3e8da3$dd9697d2@news.astraweb.com>
With a totally new way to update files being used for APFS with
snapshots etc, does this mean that a system update/upgrade could be done
on a running system with a new version of files being created while the
running system still has files opened pointing to the old version?

(with old versions then deleted at reboot since the system would then
boot with most recent version of files) ?


In a different vein, if file1 is some indexed database file, and
process1 updates records in it while process2 accesses records,  will
process2 continue to access its "snapshot" of the file at the time it
opened it, or would it get updated records (written in different blocks) ?

Or put it another way, would an application specify, when it opens a
file, whether it wants a static snapshot at time of opening versus
dynamic accxess to the file as it is being modified by others ?

[toc] | [next] | [standalone]


#107098

Fromdempson@actrix.gen.nz (David Empson)
Date2017-05-21 17:59 +1200
Message-ID<1n6dxd5.1eiz2hs1avbau9N%dempson@actrix.gen.nz>
In reply to#107091
JF Mezei <jfmezei.spamnot@vaxination.ca> wrote:

> With a totally new way to update files being used for APFS with
> snapshots etc, does this mean that a system update/upgrade could be done
> on a running system with a new version of files being created while the
> running system still has files opened pointing to the old version?

I think you need to go away and read/view the APFS presentation at last
year's WWDC again. It would also help if you stop obsessing about APFS
being some magic that will completely change the way things work.

New features of APFS have little to do with software updates.

The only way snapshots figure in software updates would be to save the
state of the file system so an update can be rolled back if something
goes wrong while it is being installed, or if the user decides they
don't like the result and want to revert to the prior state, or to allow
mounting the prior state (read only) for comparison.

Clones might help updates by reducing the amount of data needing to be
copied when a file is patched (assuming the patch is replacing a block
of data the same size, which is probably rare).

> (with old versions then deleted at reboot since the system would then
> boot with most recent version of files) ?

Unix file systems (including HFS+) can already do that.

The problem with OS updates is nothing to do with how files are
replaced, but with coordinating all parts of the system to use the same
versions of a related set of files. The easiest way to ensure that is to
restart.

> In a different vein, if file1 is some indexed database file, and
> process1 updates records in it while process2 accesses records,  will
> process2 continue to access its "snapshot" of the file at the time it
> opened it, or would it get updated records (written in different blocks) ?

Existing databases can already do that without file system help, e.g.
SQLite in WAL mode: if reader A starts a transaction, then writer B
starts a transaction, makes changes and commits, reader A continues to
see the old state of the database until its transaction ends. The
changes are visible to reader A when it starts its next transaction.

The clone feature in APFS is somewhat similar but would require the
database software to close and reopen the database to see changes done
in a clone which then replaced the original file.

This would certainly be faster, as it avoids double writing new/updated
database pages (initially to the WAL file, then later to the main
database at a checkpoint).

Supporting this would require a new journalling method and updated
database software, but old journalling methods would still need to be
supported for other file systems, therefore it increases the complexity
of cross platform database engines for a feature only available on a
limited number of systems. That makes it less likely to be supported by
cross-platform database engines like SQLite, unless Apple did a
platform-specific branch.

> Or put it another way, would an application specify, when it opens a
> file, whether it wants a static snapshot at time of opening versus
> dynamic accxess to the file as it is being modified by others ?

Snapshots have nothing to do with opening files. They are for saving the
state of entire volumes.

-- 
David Empson
dempson@actrix.gen.nz

[toc] | [prev] | [next] | [standalone]


#107099

FromJolly Roger <jollyroger@pobox.com>
Date2017-05-21 16:26 +0000
Message-ID<eodtdjFreetU1@mid.individual.net>
In reply to#107098
On 2017-05-21, David Empson <dempson@actrix.gen.nz> wrote:
> JF Mezei <jfmezei.spamnot@vaxination.ca> wrote:
>
>> With a totally new way to update files being used for APFS with
>> snapshots etc, does this mean that a system update/upgrade could be done
>> on a running system with a new version of files being created while the
>> running system still has files opened pointing to the old version?
>
> I think you need to go away and read/view the APFS presentation at last
> year's WWDC again. It would also help if you stop obsessing about APFS
> being some magic that will completely change the way things work.

You have to wonder how many times JF has to be asked nicely to RTFM
before he'll actually do it. Maybe the answer is: never. 

-- 
E-mail sent to this address may be devoured by my ravenous SPAM filter.
I often ignore posts from Google. Use a real news client instead.

JR

[toc] | [prev] | [next] | [standalone]


#107100

FromJF Mezei <jfmezei.spamnot@vaxination.ca>
Date2017-05-21 13:00 -0400
Message-ID<5921c795$0$12341$b1db1813$65575428@news.astraweb.com>
In reply to#107098
On 2017-05-21 01:59, David Empson wrote:



> Snapshots have nothing to do with opening files. They are for saving the
> state of entire volumes.

certain types of links to files act as snapshots. If file1 is made to
point to file2, it initially acts as a hard link. (same blocks).

But as file2 is modified, it contains new blocks, and file1 continues to
point to the old blocks, so file1 is now different content from file2.

I was wondering if a process having a file open would have similar
"snapshot" behaviour or if by default, they always point to the current
"live" file.


Consider an 86 block file, with a continuous extend from blocks 700 to 796.

Process 1 rewrites bytes 1024 to 2047 and those 2 blocks get written to
blocks 3456-3457 (APFS never overwrites blocks).

The file now has 3 extents:
700-701, 3456-3457, 704-786.

If process 2 has the same opened and wants to read 1024 to 2047 at same
time, depending on how the file pointers/caches etc are handled, it
could still see the file as having one extent from 700-786 or the new
fragmented one.

(whether they are  "extents" or just a glorified linked list, it is the
same question: whether different processes retain a coherent view of the
file as byte ranges inside the file get moved to new blocks.

[toc] | [prev] | [next] | [standalone]


#107105

Fromdempson@actrix.gen.nz (David Empson)
Date2017-05-22 12:02 +1200
Message-ID<1n6f8l5.11tc1uh1qikfpyN%dempson@actrix.gen.nz>
In reply to#107100
JF Mezei <jfmezei.spamnot@vaxination.ca> wrote:

> On 2017-05-21 01:59, David Empson wrote:
> 
> 
> 
> > Snapshots have nothing to do with opening files. They are for saving the
> > state of entire volumes.
> 
> certain types of links to files act as snapshots.

No they don't. In APFS, a snapshot is the state of the file system on
the entire volume at a point in time. It is mountable as a read only
copy of the volume as at that point, and can be used to roll back the
entire volume to that point.

As I said but you snipped out:

I think you need to go away and read/view the APFS presentation at last
year's WWDC again.

> If file1 is made to point to file2, it initially acts as a hard link.
> (same blocks).
> 
> But as file2 is modified, it contains new blocks, and file1 continues to
> point to the old blocks, so file1 is now different content from file2.

That is a clone. It is not a hard link, nor is it a snapshot.

> I was wondering if a process having a file open would have similar
> "snapshot" behaviour or if by default, they always point to the current
> "live" file.

Clones are two completely different files as far as applications are
concerned.

The purpose of clones is to allow file duplicate operations to take
almost no time or disk space, because they don't need to copy any data,
just create new entries in the directory structures. A clone behaves
like a copy of the original file.

Subsequent modifications to either the original file or the clone are
not visible in the other file. Under the hood, they share disk storage
for the unmodified portions, but applications have no way to access that
level of detail.

This is not the same as a hard link, because a hard link does see
changes made via other hard links to the same file.

In traditional Unix file system terms:

- Hard links are directory entries (pathnames) pointing to the same
inode, wihch points to the data blocks. Any change to the inode is
visible to all the hard links.

- Clones are directory entries pointing to separate inodes which
initially point to the same data blocks, but changes made via one inode
result in that inode pointing to different data blocks for modified
parts of the file; other inodes in the set of clones still point to the
original data blocks so see no changes.

(Clones can also be used to quickly duplicate an entire folder, not just
a single file, with the same properties that the original and clone
folders subsequently behave like independent copies, not showing any
changes made to the other.)

> Consider an 86 block file, with a continuous extend from blocks 700 to 796.

[snip]

No. You have a completely wrong concept of the purpose of clones.

-- 
David Empson
dempson@actrix.gen.nz

[toc] | [prev] | [next] | [standalone]


#107106

FromJF Mezei <jfmezei.spamnot@vaxination.ca>
Date2017-05-21 23:36 -0400
Message-ID<59225cc4$0$41892$b1db1813$2411a48f@news.astraweb.com>
In reply to#107105
On 2017-05-21 20:02, David Empson wrote:

> That is a clone. It is not a hard link, nor is it a snapshot.

But snapshots use the same underlying technology as a clone. And it is
the term I was looking for.

> Clones are two completely different files as far as applications are
> concerned.

Yes. But what i am asking here: if process1 has file1 opened for read,
and process2 opens file1 for read-write, do both processes access the
same file, or does the system generate a temorary clone so that process1
gets a static snapshot (for lack of better word) of the file structure
at the time it opened it.

During the presentation, Apple mentioned that APFS was designed for
desktops, not large servers, so I wonder about concurrency issues when
an update to a file causes the file structure to change.


There is also a security issue.

Process1 is reading file sequentially and reads a block just as process2
rwrites that block, causing original one to be deallocated from the
file, moved tto the free block list, and a new block is allocated to
containe the updated data.

Process1 could end up reading a block which is no longer part of that
file and has already been allocated to another file that this used does
not have access to.

If a "clone" is made, then the system knows that the blocks deallopcated
by process2 are still in use by process1 and would keep them intact as
as part of the version of the file seen by process1 at time it opened it.

If a clone is not made, then the OS and file system will have work to do
for cache/file system coherence to ensure that 2 processes accessing the
same file never get blocks that no longer belong to the file.

I am sure Apple has thought of this, but I am curious on how they solved
it. That was not mentioned in the presentation.

[toc] | [prev] | [next] | [standalone]


#107107

FromJolly Roger <jollyroger@pobox.com>
Date2017-05-22 04:35 +0000
Message-ID<eof83oF6qbuU1@mid.individual.net>
In reply to#107106
JF Mezei <jfmezei.spamnot@vaxination.ca> wrote:
> On 2017-05-21 20:02, David Empson wrote:
> 
>> That is a clone. It is not a hard link, nor is it a snapshot.
> 
> But snapshots use the same underlying technology as a clone. 

Show the group where Apple states that APFS snapshots use the same
underlying technology as a clone.

[foolish ramblings rightfully ignored]

-- 
E-mail sent to this address may be devoured by my ravenous SPAM filter.
I often ignore posts from Google. Use a real news client instead.

JR

[toc] | [prev] | [next] | [standalone]


#107112

FromJolly Roger <jollyroger@pobox.com>
Date2017-05-22 15:18 +0000
Message-ID<eogdqiFf9fnU1@mid.individual.net>
In reply to#107107
On 2017-05-22, Jolly Roger <jollyroger@pobox.com> wrote:
> JF Mezei <jfmezei.spamnot@vaxination.ca> wrote:
>> On 2017-05-21 20:02, David Empson wrote:
>> 
>>> That is a clone. It is not a hard link, nor is it a snapshot.
>> 
>> But snapshots use the same underlying technology as a clone. 
>
> Show the group where Apple states that APFS snapshots use the same
> underlying technology as a clone.

*crickets chirping*...

-- 
E-mail sent to this address may be devoured by my ravenous SPAM filter.
I often ignore posts from Google. Use a real news client instead.

JR

[toc] | [prev] | [next] | [standalone]


#107138

Fromdempson@actrix.gen.nz (David Empson)
Date2017-05-23 14:11 +1200
Message-ID<1n6hbwa.1dwmt491syhev2N%dempson@actrix.gen.nz>
In reply to#107106
JF Mezei <jfmezei.spamnot@vaxination.ca> wrote:

> On 2017-05-21 20:02, David Empson wrote:
> 
> > That is a clone. It is not a hard link, nor is it a snapshot.
> 
> But snapshots use the same underlying technology as a clone. And it is
> the term I was looking for.

I'm not bothering to answer your text in detail because you have
everything completely wrong.

Again, an APFS "snapshot" has nothing to do with individual files. It is
a read-only reference to the state of an entire volume at a point in
time.

An APFS "clone" is a method of copying a file without using disk space,
resulting in two separate files which happen to share storage until
either one is modified, and that modification is invisible to the other
file.

A hard link is something that looks like a separate file but actually
points to the same file as the original (also a hard link), and changes
in either hard link are seen via the other one. This is the same
behaviour as existing file systems.

If none of these methods are used, and two processes just happen to have
the same file open, then writes by one process will be seen immediately
by the other process when it reads the modified portion of the file.
This is the same behaviour as existing file systems.

You seem to be inventing a "version" mechanism where two open instances
of the same file start to have diverging content just because one
process happens to have the file open while another process is writing
to that file. No such mechanism exists in APFS as described at WWDC
2016.

A "version" mechanism would have to be implemented at a higher level,
e.g. by an application or the OS explictly cloning the file to preserve
a reference to its current state. That results in another file with a
different pathname which can be opened separately. The clone happens to
initially share storage with the original file, but it won't see any
subsequent changes to the original file.

For example, this could be used by the Autosave and Versions mechanism,
or by Time Machine Local Storage backups.

-- 
David Empson
dempson@actrix.gen.nz

[toc] | [prev] | [next] | [standalone]


#107147

FromJF Mezei <jfmezei.spamnot@vaxination.ca>
Date2017-05-22 23:15 -0400
Message-ID<5923a964$0$61860$c3e8da3$e074e489@news.astraweb.com>
In reply to#107138
On 2017-05-22 22:11, David Empson wrote:

> Again, an APFS "snapshot" has nothing to do with individual files. It is
> a read-only reference to the state of an entire volume at a point in
> time.

The way I saw it, cloning used the same underlying mechanism as snapshot
to share unchanged blocks and when one is changed, the changed file
links to the changed blocks, and the unchanged file links to the
unchanged blocks.



> If none of these methods are used, and two processes just happen to have
> the same file open, then writes by one process will be seen immediately
> by the other process when it reads the modified portion of the file.

So you're saying Apple has solved the issue of processes getting file
structure when they open the file, and those on process-memory
structures get dynamically changed whenever another process updates the
file causing the list of blocks containing current data to change.

Note: in many systems, a process opens a file and gets the enf of file
marker, and file size and those remain static even if another process
appends to the file. The "synch() routine in C was developped to force
thsoe structures to be re-read so the process has an updated view of the
file's structure.

In the case of APFS, it isn't just end of file that changes, but also
the list of already allocated blocks because when you rwerite a block,
it gets written elsewhere. If process1 doesn't get updated information,
it will read the block at the old location and get the old data.


> You seem to be inventing a "version" mechanism where two open instances

The underlying structure of APFS is "version" based because an update of
a record never overwites, it is always written elsewhere and the file
allocation list changes to include that new block instead of the old block.

A process whose in-memory structures don't get dynamically changed will
still have the ols "view" of which blocks contain current data and will
end up reading unupdated data.

[toc] | [prev] | [next] | [standalone]


#107187

Fromdempson@actrix.gen.nz (David Empson)
Date2017-05-24 23:27 +1200
Message-ID<1n6hi75.15jb9r61ue6veeN%dempson@actrix.gen.nz>
In reply to#107147
JF Mezei <jfmezei.spamnot@vaxination.ca> wrote:

> On 2017-05-22 22:11, David Empson wrote:
> 
> > If none of these methods are used, and two processes just happen to have
> > the same file open, then writes by one process will be seen immediately
> > by the other process when it reads the modified portion of the file.
> 
> So you're saying Apple has solved the issue of processes getting file
> structure when they open the file,

Processes do not get file structures when then open files. They get a
descriptor to the file, and the internal details of the on-disk
strucures are managed by the kernel and file system driver.

> and those on process-memory structures get dynamically changed whenever
> another process updates the file causing the list of blocks containing
> current data to change.

The kernel data structures get updated, which is not changing - it was
already done that way with HFS+.

You seem to be imagining a nonexistent problem.

> Note: in many systems, a process opens a file and gets the enf of file
> marker, and file size and those remain static even if another process
> appends to the file. The "synch() routine in C was developped to force
> thsoe structures to be re-read so the process has an updated view of the
> file's structure.

I don't care what some obscure or obsolete system you've dealt with
does. There is no such "synch()" function or anything resembling it in
standard C, and that isn't how the BSD file API behaves on any operating
system I've worked on. It doesn't matter which file system you are
using.

If process 1 and process 2 both have the same file open, and process 2
writes to the file changing its length, the next call by process 1 to
the BSD API fstat() or similar will see the updated file length.

If process 1 reads the file length into a variable and then keeps
referring to its own variable, then it won't know about the changed
length, but that isn't an OS or file system issue - it is an application
design issue.

The same principle applies to the content of files, not just the length.
If process 1 is repeatedly reading a particular area in a file, and
process 2 has the same file open and writes to that area of the file,
the next read by process 1 will return the data that was written by
process 2.

> In the case of APFS, it isn't just end of file that changes, but also
> the list of already allocated blocks because when you rwerite a block,
> it gets written elsewhere. If process1 doesn't get updated information,
> it will read the block at the old location and get the old data.

Process 1 has absolutely no clue this has happened. It just sees changed
data in the file the next time it reads the area that has been modified.

It is all handled inside the kernel, which knows there are two open
references to the same file, and updates caches and other in-memory data
structures accordingly.

> > You seem to be inventing a "version" mechanism where two open instances
> 
> The underlying structure of APFS is "version" based because an update of
> a record never overwites, it is always written elsewhere and the file
> allocation list changes to include that new block instead of the old block.
> 
> A process whose in-memory structures don't get dynamically changed will
> still have the ols "view" of which blocks contain current data and will
> end up reading unupdated data.

No it won't, because the "process" doesn't have access to that level of
detail.

-- 
David Empson
dempson@actrix.gen.nz

[toc] | [prev] | [next] | [standalone]


#107194

FromJF Mezei <jfmezei.spamnot@vaxination.ca>
Date2017-05-24 15:57 -0400
Message-ID<5925e5ae$0$38718$b1db1813$19ace300@news.astraweb.com>
In reply to#107187
On 2017-05-24 07:27, David Empson wrote:

> Processes do not get file structures when then open files. They get a
> descriptor to the file, and the internal details of the on-disk
> strucures are managed by the kernel and file system driver.

There is a lot of file context kept in process memory. (for instance
where in the file the current" pointer" is for the next read, and many
file attributes. Not sure whetherread ahead  buffers are in process or
kernel memory. (if process 1 read first 12 bytes of a file, it is likely
the first read read 2048 or more bytes in a read ahead buffer). And that
matters because if process 2 modifies some of those bytes already in a
read ahead buffer, if in process memory, won't get updated, but if in
kernel memory, it might.

> You seem to be imagining a nonexistent problem.

Since it is a new file system with a new way to store data (moving it
around to different blocks with every rewite operation), asking
questions on how it works is normal.

As I said before, the WWDC presentation explicitely skipped over
concurrency issues (hint of unresolved issues).


> I don't care what some obscure or obsolete system you've dealt with
> does. There is no such "synch()" function or anything resembling it in
> standard C,

man fsync

FSYNC(2)                    BSD System Calls Manual
FSYNC(2)

NAME
     fsync -- synchronize a file's in-core state with that on disk


And there is a good reason for that call: if you do  mass writes to
append to the file, an implicit fsync after every write would kill
performance, you are much better off stacking many writes for efficiency
and either having the implicit fsync (or expklicit) are regular
intervals or when you're done.

Even in Finder, you will find cases where the number of bytes used
remains at 0 while another process writes to it, but bytes allocated
increases and only once file closed is the bytes used updated.

Applications that write to log files want an explicit or implicit fsync
to happen after every write since they don't know when the next write
will happen and it isn't a performance hit if the next write won't
happen for another few seconds.


Apple may or may not have implemented live updates to such structures
and this wasn't mentioned in the WWDC presentation which didn't deal
with concurrency. So I don't think my question is unwarranted.


> If process 1 and process 2 both have the same file open, and process 2
> writes to the file changing its length, the next call by process 1 to
> the BSD API fstat() or similar will see the updated file length.

does fstat cause and implicit fsync?

But if you read sequentially, while aother process rewrites random areas
of file, concurrency starts to matter.


> It is all handled inside the kernel, which knows there are two open
> references to the same file, and updates caches and other in-memory data
> structures accordingly.

Wel, the "all handled inside the kernel" is the big question.  If you
make low level calls, sure, but even "low level" calls in C for instance
still have the C-run time in between the programmer and the kernel.

It is all likely that Apple has solved the problem. And you may very
well be right, But just stating that it is solved doesn't give me a
pointer where Apple explains HOW it has handled concurrency for APFS.

[toc] | [prev] | [next] | [standalone]


#107195

Fromdempson@actrix.gen.nz (David Empson)
Date2017-05-25 12:37 +1200
Message-ID<1n6ksac.1u9iucwum32c2N%dempson@actrix.gen.nz>
In reply to#107194
JF Mezei <jfmezei.spamnot@vaxination.ca> wrote:

> On 2017-05-24 07:27, David Empson wrote:
> 
> > Processes do not get file structures when then open files. They get a
> > descriptor to the file, and the internal details of the on-disk
> > strucures are managed by the kernel and file system driver.
> 
> There is a lot of file context kept in process memory. (for instance
> where in the file the current" pointer" is for the next read,

You have this whole concept completely wrong.

A process uses system calls like read() and write() which only specify a
file descriptor previously returned by the kernel via the open() system
call. The file position is managed by the kernel on the other side of
the system call. (Whether the kernel stores the file state in
process-specific or kernel-specific memory is not relevant, because that
data is managed by the kernel and protected so the process has no access
to it.)

The library code running in the process does not keep track of the
current file position, nor does it need to specify the position when
doing read/write calls.

> and many file attributes. Not sure whetherread ahead  buffers are in
> process or kernel memory.

Higher level APIs like the standard C library fopen(), fread() etc. can
do an extra level of buffering, but any buffering done at the block
level is managed by the kernel, which is able to correctly update those
buffers if two processes have the same file open and one of them
modifies it.

> > You seem to be imagining a nonexistent problem.
> 
> Since it is a new file system with a new way to store data (moving it
> around to different blocks with every rewite operation), asking
> questions on how it works is normal.

It is not relevant to how processes see files. It is a kernel
implementation issue.

> As I said before, the WWDC presentation explicitely skipped over
> concurrency issues (hint of unresolved issues).
> 
> > I don't care what some obscure or obsolete system you've dealt with
> > does. There is no such "synch()" function or anything resembling it in
> > standard C,
> 
> man fsync
> 
> FSYNC(2)                    BSD System Calls Manual
> FSYNC(2)
> 
> NAME
>      fsync -- synchronize a file's in-core state with that on disk

If you had bothered to read the first paragraph of DESCRIPTION section
you would see that fsync() is for the purpose of making sure data has
been written to disk (at least as far as the drive's internal cache)
rather than just to in-memory cache managed by the kernel.

fsync() has nothing to do with reading files, and neither process with
the same open file needs to do an fsync() for a write by process 2 to be
readable by process 1.

This is not your imagined synch() call. No such call exists in the BSD
API.

> And there is a good reason for that call: if you do  mass writes to
> append to the file, an implicit fsync after every write would kill
> performance, you are much better off stacking many writes for efficiency
> and either having the implicit fsync (or expklicit) are regular
> intervals or when you're done.
> 
> Even in Finder, you will find cases where the number of bytes used
> remains at 0 while another process writes to it, but bytes allocated
> increases and only once file closed is the bytes used updated.

That has nothing to do with multiple processes having the same file
open, nor does it have anything to do with fsync().

Finder doesn't have the file open.

Finder reads the directory at the point you open the window, reporting
the current size and other metadata for each file in the directory. It
doesn't update its view of the directory until it is told by the system
that the directory has changed. If a file in the directory is being
modified, that does not signal a change to the directory until the
modified file is closed.

This is easily tested by having something periodically write to a file
without closing it, and comparing what you see in Finder vs ls -l in
Terminal. ls looks at the current state. Finder only updates its
reported file size when the file is closed (or if Finder refreshes its
view of the directory for some other reason such as quit and relaunch of
Finder).

I've written some test programs to prove this. ls shows the current
state as an open file is periodically written via write() with no other
system calls, but Finder shows the initial file size unchanged until the
file is closed. Adding an fsync() after the write makes no difference.
To get Finder to update dynamically it is necessary to close() the file
after the write() (then open it again prior to the next write).

A companion test program which opens the file and monitors its current
length via fstat(), printing changes, correctly shows the file size at
the moment it is written by the write() call, indicating that the
in-memory state of the file is being updated without needing to flush
anything to disk, with nothing resembling a sync call in either the
reader or writer.

> Applications that write to log files want an explicit or implicit fsync
> to happen after every write since they don't know when the next write
> will happen and it isn't a performance hit if the next write won't
> happen for another few seconds.

That's because they don't want to lose data that didn't get written to
disk if the system happened to crash or there is a power cut shortly
after the write.

> Apple may or may not have implemented live updates to such structures
> and this wasn't mentioned in the WWDC presentation which didn't deal
> with concurrency. So I don't think my question is unwarranted.

It is unwarranted, because current OS versions and file systems do not
behave the way you describe.

> > If process 1 and process 2 both have the same file open, and process 2
> > writes to the file changing its length, the next call by process 1 to
> > the BSD API fstat() or similar will see the updated file length.
> 
> does fstat cause and implicit fsync?

No. Why would it? fstat() reads information about a file, much of which
will be cached in memory, whereas fsync() flushes data from the kernel
caches to disk.

If fstat() forced a sync operation it would kill performance, because
fstat() is heavily used to poll for changes to an open file.

> But if you read sequentially, while aother process rewrites random areas
> of file, concurrency starts to matter.

Yes, and the kernel takes care of that.

> > It is all handled inside the kernel, which knows there are two open
> > references to the same file, and updates caches and other in-memory data
> > structures accordingly.
> 
> Wel, the "all handled inside the kernel" is the big question.  If you
> make low level calls, sure, but even "low level" calls in C for instance
> still have the C-run time in between the programmer and the kernel.

Calls like open(), read(), write() etc. are calls to the kernel on UNIX
systems. The only support code in the C run-time library is to deal with
things like dynamic loading, parameter passing, etc., and nothing to do
with the state of the file which is being accessed.

I can confirm this for macOS because I did an assembly level single-step
through my test program calling write(). It did a lot of mucking around
with dynamic loading and symbols, then ultimately executed a syscall
instruction.

Other platforms such as Windows need a translation layer in the C
run-time library to convert the BSD file API calls into the native API,
including extra processing for things like files open in text mode with
end of line character translation (which involves extra buffering), and
converting error results to standardised values. Even then, the C
run-time code for these calls doesn't do any file state management - it
is handled in the system call.

For example, on Windows, the read() function in the C library ends up
calling the Windows API ReadFile(). The C library code doesn't remember
the file position. The kernel deals with that.

> It is all likely that Apple has solved the problem. And you may very
> well be right, But just stating that it is solved doesn't give me a
> pointer where Apple explains HOW it has handled concurrency for APFS.

macOS already handles concurrency for "two open instances of the same
file" in existing OS versions on HFS+ and other file systems, so there
will not be any change in this area for APFS.

-- 
David Empson
dempson@actrix.gen.nz

[toc] | [prev] | [next] | [standalone]


#107196

FromJF Mezei <jfmezei.spamnot@vaxination.ca>
Date2017-05-24 20:51 -0400
Message-ID<59262a85$0$22714$c3e8da3$33881b6a@news.astraweb.com>
In reply to#107195
On 2017-05-24 20:37, David Empson wrote:

> Other platforms such as Windows need a translation layer in the C
> run-time library to convert the BSD file API calls into the native API,

In terms of C on OS-X, does the fact that C calls are translated to HFS
calls not mean there is a translation layer ?

> For example, on Windows, the read() function in the C library ends up
> calling the Windows API ReadFile(). The C library code doesn't remember
> the file position. The kernel deals with that.

when you do random read/write, the higher level language deals with file
position. And in VMS for instance, you can specify your own RAB/FAB
*process storage* to hold the open file context.  Since not all OS are
alike it is a fair question to ask.

You seem to have answered it with the info that for OS-X, it is all in
kernel and it manages multiple "opens" to the same file and keep them
coherent with each other.

Thanks. Sorry if it took a while.

[toc] | [prev] | [next] | [standalone]


#107202

Fromdempson@actrix.gen.nz (David Empson)
Date2017-05-26 13:44 +1200
Message-ID<1n6l1cr.tlzxi31j4665pN%dempson@actrix.gen.nz>
In reply to#107196
JF Mezei <jfmezei.spamnot@vaxination.ca> wrote:

> On 2017-05-24 20:37, David Empson wrote:
> 
> > Other platforms such as Windows need a translation layer in the C
> > run-time library to convert the BSD file API calls into the native API,
> 
> In terms of C on OS-X, does the fact that C calls are translated to HFS
> calls not mean there is a translation layer ?

C calls are not "translated to HFS calls".

C calls to the BSD file API are handled by the kernel. The BSD API
defines a standard interface for how to access files, independent of the
underlying file system.

The kernel is responsible for using the appropriate file system driver
to deal with the specifics of the file system on which the file resides,
whether it be UFS, HFS+, FAT, APFS, some other installed file system, or
a request forwarded to server for a networked file system.

The HFS+ file system driver implements the kernel's internal file I/O
operations as appropriate for the HFS+ file system, e.g. dealing with
the catalog tree and other structures, extra layers for Core Storage if
appropriate, etc.

> > For example, on Windows, the read() function in the C library ends up
> > calling the Windows API ReadFile(). The C library code doesn't remember
> > the file position. The kernel deals with that.
> 
> when you do random read/write, the higher level language deals with file
> position.

Yes, by calling the BSD API lseek() which sets the file position (as
managed by the kernel) to whatever the application wants. lseek() also
returns the file position, so the application can find the current
position if it needs to know.

> And in VMS for instance you can specify your own RAB/FAB *process
> storage* to hold the open file context. 

Irrelevant. This thread is about macOS (and by extension, UNIX systems
which support the same file I/O API).

I could comment on how ProDOS on the Apple II implements file buffering
in memory supplied by the application, but it would be equally
irrelevant to how macOS and the BSD API manage file state and buffering,
or how APFS will behave.

> Since not all OS are alike it is a fair question to ask.

You assumed that file buffering and state was managed by processes,
expressed it as a fact, and jumped to multiple wrong conclusions as a
result. I don't call that a "question".

> You seem to have answered it with the info that for OS-X, it is all in
> kernel and it manages multiple "opens" to the same file and keep them
> coherent with each other.
> 
> Thanks. Sorry if it took a while.

-- 
David Empson
dempson@actrix.gen.nz

[toc] | [prev] | [next] | [standalone]


#107203

FromJolly Roger <jollyroger@pobox.com>
Date2017-05-26 02:11 +0000
Message-ID<eoph5vFj238U1@mid.individual.net>
In reply to#107202
On 2017-05-26, David Empson <dempson@actrix.gen.nz> wrote:
> JF Mezei <jfmezei.spamnot@vaxination.ca> wrote:
>
>> Since not all OS are alike it is a fair question to ask.
>
> You assumed that file buffering and state was managed by processes,
> expressed it as a fact, and jumped to multiple wrong conclusions as a
> result. I don't call that a "question".

You have way more patience than most, David!

-- 
E-mail sent to this address may be devoured by my ravenous SPAM filter.
I often ignore posts from Google. Use a real news client instead.

JR

[toc] | [prev] | [next] | [standalone]


#107204

FromJF Mezei <jfmezei.spamnot@vaxination.ca>
Date2017-05-26 00:08 -0400
Message-ID<5927aa47$0$61830$c3e8da3$e074e489@news.astraweb.com>
In reply to#107202
On 2017-05-25 21:44, David Empson wrote:

> Irrelevant. This thread is about macOS (and by extension, UNIX systems
> which support the same file I/O API).


It may be irrelevant to you because you knew the answer, not not
irrelevant to someonone who only knows that OS-X is on a mach kernel and
there are differet Unix kernels out there and different implementations
of file systems that hapen to share the same higher level APIs to make
them "Unix".

BTW, VMS got Unix certification before Solaris. And OS-X also got Unix
certification at some point. Not all Unixes are alike.

With regards to "fsync". When you have multiple hosts accessing a shared
file system in a disk array (not a file server), the flushing to disk is
the only means to signal changes to files so the other compuyters can
see those changes.

This is why the WWDC message about APFS being designed for desktops and
not necessary most optimal for servers raised my questions.

(I am not debating your answer, just explaining why I had the question).

[toc] | [prev] | [next] | [standalone]


#107201

FromLewis <g.kreme@gmail.com.dontsendmecopies>
Date2017-05-26 01:27 +0000
Message-ID<slrnoif1ar.1hf.g.kreme@snow.local>
In reply to#107194
In message <5925e5ae$0$38718$b1db1813$19ace300@news.astraweb.com> JF Mezei <jfmezei.spamnot@vaxination.ca> wrote:
> On 2017-05-24 07:27, David Empson wrote:

>> Processes do not get file structures when then open files. They get a
>> descriptor to the file, and the internal details of the on-disk
>> strucures are managed by the kernel and file system driver.

> There is a lot of file context kept in process memory. (for instance
> where in the file the current" pointer" is for the next read, and many
> file attributes.

No. You are entirely wrong.

> Since it is a new file system with a new way to store data (moving it
> around to different blocks with every rewite operation), asking
> questions on how it works is normal.

That's not waht you are doing, you are making up inane 'problems' that
don't exist and 'asking' how APFS deals with them. These are problems
that don't exist in any modern file system, because you don't understand
how computers, kernels, drivers, and file systems work. At all.

> As I said before, the WWDC presentation explicitely skipped over
> concurrency issues (hint of unresolved issues).

No, you are making shit up.

> man fsync

> FSYNC(2)                    BSD System Calls Manual
> FSYNC(2)

> NAME
>      fsync -- synchronize a file's in-core state with that on disk

Do you know what "in-core" means?

Did you READ the man page?

"Note that while fsync() will flush all data from the host to the drive
(i.e. the "permanent storage device"), the drive itself may not
physically write the data to the platters for quite some time and it may
be written in an out-of-order sequence."

fsync is a flush command, and isn't doing at all what you are implying.

> And there is a good reason for that call: if you do  mass writes to
> append to the file, an implicit fsync after every write would kill
> performance, you are much better off stacking many writes for efficiency
> and either having the implicit fsync (or expklicit) are regular
> intervals or when you're done.

That is not how fsync is used, if it used.

> Applications that write to log files want an explicit or implicit fsync

No.


-- 
And what group was that, Gail?
The Menstrual Cycles.

[toc] | [prev] | [next] | [standalone]


#107154

FromLewis <g.kreme@gmail.com.dontsendmecopies>
Date2017-05-23 09:36 +0000
Message-ID<slrnoi80rg.2umq.g.kreme@snow.local>
In reply to#107138
In message <1n6hbwa.1dwmt491syhev2N%dempson@actrix.gen.nz> David Empson <dempson@actrix.gen.nz> wrote:
> A "version" mechanism would have to be implemented at a higher level,
> e.g. by an application or the OS explictly cloning the file to preserve
> a reference to its current state. That results in another file with a
> different pathname which can be opened separately. The clone happens to
> initially share storage with the original file, but it won't see any
> subsequent changes to the original file.

> For example, this could be used by the Autosave and Versions mechanism,
> or by Time Machine Local Storage backups.

I know at least some people are assuming that the APFS version of Time
Machine will do exactly this. I am not so sure, but I've not really read
up that much on APFS (only enough to know JF was wrong again, but not
enough to point out the exact error in cloned files as you did).

If it does, it means a TM backup disk is going to be able to hold way more
history than it does now.

-- 
You came in that thing? You're braver than I thought!

[toc] | [prev] | [next] | [standalone]


#107155

Fromdempson@actrix.gen.nz (David Empson)
Date2017-05-24 01:42 +1200
Message-ID<1n6i8ei.6zwet78mqwtwN%dempson@actrix.gen.nz>
In reply to#107154
Lewis <g.kreme@gmail.com.dontsendmecopies> wrote:

> In message <1n6hbwa.1dwmt491syhev2N%dempson@actrix.gen.nz> David Empson
> <dempson@actrix.gen.nz> wrote:
> > A "version" mechanism would have to be implemented at a higher level,
> > e.g. by an application or the OS explictly cloning the file to preserve
> > a reference to its current state. That results in another file with a
> > different pathname which can be opened separately. The clone happens to
> > initially share storage with the original file, but it won't see any
> > subsequent changes to the original file.
> 
> > For example, this could be used by the Autosave and Versions mechanism,
> > or by Time Machine Local Storage backups.
> 
> I know at least some people are assuming that the APFS version of Time
> Machine will do exactly this. I am not so sure, but I've not really read
> up that much on APFS (only enough to know JF was wrong again, but not
> enough to point out the exact error in cloned files as you did).
> 
> If it does, it means a TM backup disk is going to be able to hold way more
> history than it does now.

Apple didn't say anything last year about Time Machine and APFS, but it
makes perfect sense to use clones to implement the Local Backup
mechanism. (Keeping snapshots instead would be overkill because they
would require keeping the state of the entire volume, whereas local
backups are only used for selected files.)

The Autosave and Versions mechanism wouldn't work so well with clones,
because it needs to track insertions and deletions, not just identically
sized replacements.

As for Time Machine on a separate disk, with both disks using APFS, that
is crying out for use of snapshots to get a frozen sample of the source
drive. How to do the backup raises questions: there could be new
features not yet described (e.g. backing up snapshots with structure
retention, including deltas between snapshots), or they could just use
clones on the backup drive.

WWDC is going to be interesting.

-- 
David Empson
dempson@actrix.gen.nz

[toc] | [prev] | [next] | [standalone]


Page 1 of 2  [1] 2  Next page →

Back to top | Article view | comp.sys.mac.system


csiph-web