Path: csiph.com!news.swapon.de!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: news@zzo38computer.org.invalid
Newsgroups: comp.infosystems,comp.protocols.misc
Subject: Re: Request for comments: Scorpion protocol/file-format
Date: Wed, 10 Apr 2024 16:01:25 -0700
Organization: A noiseless patient Spider
Lines: 145
Message-ID: <1712684411.bystand@zzo38computer.org>
References: <1712084972.bystand@zzo38computer.org> <uv03l6$3bfj5$1@dont-email.me> <1712562630.bystand@zzo38computer.org> <uv2es4$os1$1@dont-email.me>
MIME-Version: 1.0
Injection-Date: Thu, 11 Apr 2024 01:00:05 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="7a25a004a7b722a9383462be40ab868a"; logging-data="1322184"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/ksijwYZl9VTfkHj89b+oj"
User-Agent: bystand/1.3.0pre1
Cancel-Lock: sha1:JGk4yZU1Y9MYNVATbr161gtca9A=
Xref: csiph.com comp.infosystems:144 comp.protocols.misc:75

sean@conman.org wrote:
> > Thank you; I will write about it. (In this context, ULFI is short for
> > "Unordered Labels File Identification".)
> 
>   You go into some detail, but not enough to answer "what is this for?"  It
> looks like it's supposed to replace MIME but ... how?

I explained more below, because you had written another comment below.

> > I thought you could use recv with the MSG_PEEK flag. (However, I did not
> > actually try that (yet). If I am wrong, then you can tell me what is wrong
> > with that, please.)
> 
>   You aren't wrong, but uch a method isn't mentioned that much (if at all)
> in most networking tutorials, and if you are going for implementation
> simplicity (which you haven't explicitly stated) then yes, this is "more
> technically difficult than you may think."  I would try an implemention
> before pushing for this myself.  This was never done for HTTP---I wonder
> why?

Implementation simplicity is more important for mandatory parts than for
optional parts. Of course TLS is itself complicated, which is one of the
reasons for being made optional (although there are other reasons too).

I will try the implementation; so far I have not implemented TLS in the
server side at all. However, this will require more changes just to make
it work with TLS at all, so itmight take a while before I will manage to
implement this. (Other people are free to write their own implementations,
and if they want to implement TLS, then they will try this instead.)

(I have once accessed a HTTPS server that does support this actually,
although after sending an unencrypted HTTP request on port 443, I received
a valid HTTP response but it was just an error message that says that
unencrypted requests on port 443 are not allowed.)

> >>   Fourth:  impose a hard limit on clients following redirects ...
> > OK. I added it.
> 
>   Not strong enough.  In RFC-speak, MUST is stronger (mandatory) than
> SHOULD.

It says:
  If the number of consecutive redirects exceed the limit (which MUST be
  not more than five by default, although it may be configurable by the
  user), then the client MUST NOT automatically follow further redirects.

It does say MUST.

> >>   Sixth:  For the sub-protocol I ...
> > OK, I will add that; it is a good idea.
>   You still lack a description of what this is used for.  

The document does explain what it is used for. (If it is unclear, then
hopefully someone who can explain it better, is able to do so.)

> > Its use is that links to files can specify the hash so that you can verify
> > on the client side that the file has not changed (and that spies have not
> > tampered with it, if the source of the hash is trustworthy).
> 
>   "Spies" that tamper with the file on the server can just as easily tamper
> with the hash.  Tampering in transit is protected though.

That depends on where (and when) you got the hash from.

(The hash is useful for other purposes too, such as for caching, for finding
another copy of the file (if someone has it indexed by its hash then you can
verify that it is correct), verifying that if you linked to a file that the
file has not been changed since then, etc.)

TLS does not prevent the server operator from changing the files to
malicious ones, nor does it prevent some other stuff; TLS does not (and
cannot) solve everything.

>   Yes, most networking protocols for the Internet are big-endian, but man,
> do people complain about it now that Intel won.  Besides, there are file
> formats, like ZIP files, that are little-endian in nature.  I'm not arguing
> for little-endian (like I said, I like big-endian myself).  I'm just saying
> be prepared for pushback on this.

I understand, but it isn't really a significant issue.

>   And yet, in ULFI section, you have people parsing
> 
> 	a.b:c	SAME AS	c:a.b
> 	a:b+c:d	SAME AS a:b:b.c:d
> 
> and you say "text-based format would be much more difficult for the client
> to parse" with a straight face?

It is not generally necessary for implementations to compare ULFI for
equality. If you are looking for a piece with a specific name then you
will find it. (This is also true if you are looking for multiple names.)

It is possible that there are multiple names that an implementation will
recognize, with different meanings in each case (it is also possible that
it will treat multiples with the same meanings), and it might define the
priorities to decide which one to use (or use them together if they can).

As one example where it might use multiples at once, a EPUB file is also
a ZIP file, and you can easily specify both, so that an implementation that
can open ZIP archives but not EPUB can still display the ZIP archive (there
might also be some command for the user to select explicitly which one to
use if both are implemented, but usually the implementation would make one
of them to have priority). MIME does have such a mechanism too, but it
seems to be just "added on" and is not a clean way to do it, in my opinion.

Another alternative than MIME is UTI (used by Apple), which can specify
that a type conforms one or more other types. However, this has its own
problems, such as you will need all of the definitions in order to compare
them, and there are no parameters, and it will always be required to be
exactly one that conforms with one or more others (doing it this way is
sometimes wrong; e.g. a PostScript file can be text or binary and can be
considered as a document or as a program).

>   From arguments I've seen about binary-data in otherwise text documents
> [3], if it's can't be done in an existing editor, it's a non-starter.

It is only because the existing editor is not written yet; people can try to
do so if you like to do. A converter program does already exist, so that is
another way to be done.

> >>   Tenth:  What is the purpose of ".special/conversion"?  What file formats
> >> to what file formats?  
> > 
> > Any file formats to any file formats.
> 
>   Why is that a part of a *protocol* specification?

It is both the protocol specification and file format specification.

> [3]	Whenever CSV (Comma separated values) files come up on Hacker News
> 	or Lobste.rs, inevitably, someone will mention that ASCII defines
> 	four explicit separator characters, FS (File Separator), GS (Group
> 	Separator), RS (Record Separator) and US (Unit Separator) and the
> 	use of those will fix most problems with CSV.  The pushback comes
> 	when opponents of ASCII separators claim a file that uses such
> 	characters can't be edited in a normal text editor so STFU!  It's so
> 	bad that people who push for TSV (Tab separated values) will get
> 	pushback for the (ab)use of tabs in a text file.

I am aware of this, and I am one of the people who have suggested the use
of ASCII separated values.

-- 
Don't laugh at the moon when it is day time in France.