Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!usenet.ukfsn.org!not-for-mail From: Martin Gregorie Newsgroups: comp.lang.java.programmer Subject: Re: A proposal to handle file encodings Date: Sun, 2 Dec 2012 19:36:12 +0000 (UTC) Organization: UK Free Software Network Lines: 92 Message-ID: References: <50aed080$0$292$14726298@news.sunsite.dk> NNTP-Posting-Host: 84.45.235.129 Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Trace: localhost.localdomain 1354476972 29917 84.45.235.129 (2 Dec 2012 19:36:12 GMT) X-Complaints-To: usenet@localhost.localdomain NNTP-Posting-Date: Sun, 2 Dec 2012 19:36:12 +0000 (UTC) User-Agent: Pan/0.139 (Sexual Chocolate; GIT bf56508 git://git.gnome.org/pan2) Xref: csiph.com comp.lang.java.programmer:20044 On Sun, 02 Dec 2012 13:02:27 +0100, Peter J. Holzer wrote: > On 2012-11-29 02:22, Martin Gregorie > wrote: >> (2) alternatively it may be possible to do the job by adding a mode or >> to to the file opening operations. > > You mean an optional 4th parameter to open(2)? > No, what I said - an extra mode or two. If you didn't want the defaults you'd OR them with the other modes. > I still don't see how that could work. That implies that the kernel > somehow guesses that you want to use the metadata from some file you > opened for reading for the file you are just opening for writing. While > that would be the right behaviour for "cp" or similar programs, it doubt > it would be right for the majority of programs. > It obviously wouldn't apply if the other file was stdin/stdout/stderr and, in fact many (most) programs that have a file open for reading and another for writing would probably want to copy the metadata unless it was a compiler or something else that applies major transformations to the data its handling: in these cases you'd expect to specify the metadata explicitly or to use an OS predefined matedata set. > It also raises the question of what the kernel should do if the process > doesn't have the necessary privileges to set some xattrs (or if the file > system doesn't support them). Fail? > Why would that be treated any differently to access privileges? If the requested combination of attributes are nonsensical (e.g. trying the write a binary stream to a file of keyed records, or violate an OS- defined rule, the file simply wouldn't open. > That again makes no sense at the unix system call interface which deals > only with byte streams. > But, by definition, if you were using metadata to control the character encoding (which is where this discussion started) or to define the file as containing keyed, fixed field records, you would not be trying to write a byte stream. If you tried something like that I'd expect that either you'd get a compile time exception or for the file management subsystem to throw an error at runtime. The compile-time error would be preferable and is more or less what Java does. Equally, if you were just diddling with the character encoding, that should just work unless you were attempting to use an unsupported or non- sensible conversion. For instance: - ASCII to one of the Windows code pages would leave 0x00 to 0x7f unchanged (though the high order bits may need to be modified) and simply change the metadata to tell consumers of the file what encoding to use. - ASCII->EBCDIC and EBCDIC->ASCII would have to recode every byte. except that there are some characters ('{' and '}') which, IIRC are not part of the EBCDIC character set in at least some dialects. - some transforms would be one way: ASCII to utf-8 is ok, but IIRC the reverse would fail and ISO 6 bit or Baudot to anything else should work but the reverse is probably not possible. >> Thinking about it a little more, (2) is definitely the best solution >> because it would be rather useful to be able to default the metadata >> applied to a new file with a similar mechanism to that used for the >> permission bits. > > umask(2) is actually pretty broken IMHO. > IME it has few surprises unless you're moving files between users with different umasks. I don't know if you've used OSen that support the sort of extreme metadata I'm talking about. I have and it can be rather convenient. Here's a couple of nice examples: - use the metadata to set the backup frequency for a file, the number of generations of the backup to be kept, and the number of parallel backups to be done. - (for a print file) use metadata to specify the printer capabilities needed to print the file and the type of paper required. This could be used by the program to match its output to the available paper size (think A4 vs US Letter) as well as making sure that the output is sent to a printer with the right paper and capabilities to output it. -- martin@ | Martin Gregorie gregorie. | Essex, UK org |