Groups > comp.infosystems > #140 > unrolled thread

Request for comments: Scorpion protocol/file-format

Started by	news@zzo38computer.org.invalid
First post	2024-04-07 18:04 -0700
Last post	2024-04-10 16:01 -0700
Articles	5 — 2 participants

Back to article view | Back to comp.infosystems

  Request for comments: Scorpion protocol/file-format news@zzo38computer.org.invalid - 2024-04-07 18:04 -0700
    Re: Request for comments: Scorpion protocol/file-format sean@conman.org - 2024-04-08 06:42 +0000
      Re: Request for comments: Scorpion protocol/file-format news@zzo38computer.org.invalid - 2024-04-08 16:06 -0700
        Re: Request for comments: Scorpion protocol/file-format sean@conman.org - 2024-04-09 04:06 +0000
          Re: Request for comments: Scorpion protocol/file-format news@zzo38computer.org.invalid - 2024-04-10 16:01 -0700

#140 — Request for comments: Scorpion protocol/file-format

From	news@zzo38computer.org.invalid
Date	2024-04-07 18:04 -0700
Subject	Request for comments: Scorpion protocol/file-format
Message-ID	<1712084972.bystand@zzo38computer.org>

I would like to see what other people's criticism of Scorpion protocol
and file format that I had made up. It is alternative than HTTP/HTML,
Gemini, Gopher, Spartan, etc.

Note that it won't (and is not intended to) replace any of those; you can
even link between them easily (which is intentional). (Gopher requires the
use of a hack to handle this properly, but nevertheless it works OK.)

You can access the specification document by:
  echo 'R scorpion://zzo38computer.org/specification.txt' | nc zzo38computer.org 1517 | less

Alternatively, it can be accessed by GitHub:
  https://github.com/zzo38/scorpion/blob/trunk/Specification

The document is possible to be changed in future, in case something is
wrong with it (including if something is missing). (I can also add more
FAQ entries if you have other frequent questions, too.)

If you want to criticize this, then in addition to the above document,
you should also be famililar with section 7 of the Gemini protocol FAQ:
  echo 'gemini://geminiprotocol.net/docs/faq-section-7.gmi' | ncat --ssl geminiprotocol.net 1965 | less

My process is different from that described in the Gemini FAQ in many ways,
although there are some similarities, and much of what is described there
is still relevant to what I am doing.

Scorpion protocol/file-format is not intended to be a strict subset or
strict superset of anything else. However, it is intended to be simpler
and less messy than the alternatives, in many ways.

-- 
Don't laugh at the moon when it is day time in France.

[toc] | [next] | [standalone]

#141

From	sean@conman.org
Date	2024-04-08 06:42 +0000
Message-ID	<uv03l6$3bfj5$1@dont-email.me>
In reply to	#140

In comp.infosystems news@zzo38computer.org.invalid wrote:
> I would like to see what other people's criticism of Scorpion protocol
> and file format that I had made up. It is alternative than HTTP/HTML,
> Gemini, Gopher, Spartan, etc.

  My initial response to the specification:

  First, what is ULFI?  All I bring up when I search on that is "Upper Limb
Functional Index"---I can't seem to locate anything that is close to MIME. 
If you do use TLAs [1] and ETLAs [2], please define it somewhere in the
document for those who are unfamiliar with it.

  Second, URL support ... do you expect people to follow RFC-3986? 
RFC-3987?  Or the WHATWG living specification?

  Third: On TLS, methinks you underestimate how difficult it is to check
the first byte of a request is 0x16 and have an existing TLS library take
over the connection if it is.  I'm not saying it's impossible, just more
technically difficult than you may think.  Have you implemented a server
that supports both TLS and non-TLS support on the same port?

  Third the second:  More TLS---those who like TLS might take offence at
support for non-TLS---an attacker can easily MITM [3] requests to force
non-TLS requests, thus defeating the purpose of TLS in the first place.

  Third the third:  There will be a subset of people who hate TLS, and
demand that you don't use it, but use some other, possibly bespoke,
encryption system instead.  Before taking these people seriously, demand a
proof-of-concept and an analysis by real cryptographers before you engage
with them.  It'll save time.

  Third the fourth:  What's with the weird SNI support?  The client should
use it, but the server should not?  What?

  Third the fifth:  What do you mean by "clients SHOULD allow to use the
system's DNS services to implement encrypted Client Hello"?  And what's with
the following?  "if implemented, there MUST be an option to disable this
feature."

  Fourth:  impose a hard limit on clients following redirects.  I know from
experience that if this isn't mandatory, no one will implement it.  Even if
it is mandatory, some won't implement it, but hopefully it'll be a smaller
subset who ignore this.

  Fifth:  Some server implementor will hard code a 2147483647 on a 4x reply,
which is 69 years.  Clients will obviously ignore such a silly request,
leading to an arms race.  Don't bother with a timeout value.  

  Sixth:  For the sub-protocol I, please use BNF for capability codes.  And
what's with terminal emulators?

  Seventh:  The Hashed URI section---what?  You first said relative URLs
aren't allowed in a request, so is this meant for documents?  What does the
hash buy you here?  And why number the hash algorithms instead of just
listing their names?  This is getting complicated, quickly.

  Eighth:  oh, a new document format.  Nice.  Binary HTML.  Even better. 
Big endian---I don't mind, but it's not fasionable among kids today (because
Intel won; Motorola lost and get over it Boomer!) and will be complained
about.  And by "nice" I mean "oh god!" You'll get people bitching about not
being able to include control data with their favorite editors and besides,
you're redefining well defined control codes.  You are NOT going to get
acceptance of this, or the following database file format.

  Ninth:  ".special/crawl"?  Really?  Not "/robots.txt"?  Or
"/.wellknown/robots.txt"?  Sigh.  Even Gemini repurposed "/robots.txt", a
well known and supported format.  But if you insist on a new format, perhaps
a example (or four) could be included?

  Tenth:  What is the purpose of ".special/conversion"?  What file formats
to what file formats?  

  Thus ends my initial reaction to the specification.

  -spc

[1]	Three Letter Acronym

[2]	Extended Three Letter Acronym

[3]	Man-in-the-Middle

[toc] | [prev] | [next] | [standalone]

#142

From	news@zzo38computer.org.invalid
Date	2024-04-08 16:06 -0700
Message-ID	<1712562630.bystand@zzo38computer.org>
In reply to	#141

Thank you for your comments. I will try to respond to them the best that I
can, and will add whatever is necessary to the FAQ as well as to modify
other parts of the document as appropriate.

Some of the changes mentioned below I have done; others I have partially
done or not added yet. I will continue to work on it later, though.

sean@conman.org wrote:
>   First, what is ULFI?  All I bring up when I search on that is "Upper Limb
> Functional Index"---I can't seem to locate anything that is close to MIME. 
> If you do use TLAs [1] and ETLAs [2], please define it somewhere in the
> document for those who are unfamiliar with it.

Thank you; I will write about it. (In this context, ULFI is short for
"Unordered Labels File Identification".)

>   Second, URL support ... do you expect people to follow RFC-3986? 
> RFC-3987?  Or the WHATWG living specification?

RFC 3986. (However, the "hashed:" scheme has its own rules.)

>   Third: On TLS, methinks you underestimate how difficult it is to check
> the first byte of a request is 0x16 and have an existing TLS library take
> over the connection if it is.  I'm not saying it's impossible, just more
> technically difficult than you may think.  Have you implemented a server
> that supports both TLS and non-TLS support on the same port?

I thought you could use recv with the MSG_PEEK flag. (However, I did not
actually try that (yet). If I am wrong, then you can tell me what is wrong
with that, please.)

>   Third the second:  More TLS---those who like TLS might take offence at
> support for non-TLS---an attacker can easily MITM [3] requests to force
> non-TLS requests, thus defeating the purpose of TLS in the first place.

An implementation may allow the user to configure it to not use non-TLS for
some (or all) servers. (This is similar than "HTTPS-Everywhere", but it is
not specific to HTTP(S).)

Additionally, the client is supposed to display a warning message if a
redirect from TLS to non-TLS (or vice-versa) occurs.

I think non-TLS has benefits such as improved simplicity and improved
energy efficiency. However, sometimes encryption is desirable, so TLS
is permitted, too.

>   Third the third:  There will be a subset of people who hate TLS, and
> demand that you don't use it, but use some other, possibly bespoke,
> encryption system instead.  Before taking these people seriously, demand a
> proof-of-concept and an analysis by real cryptographers before you engage
> with them.  It'll save time.

I have considered that, and have decided against it (at least for now), for
the reasons you specify, and for reasons mentioned in the Gemini FAQ (see
section 4.5.3). So, for now, it uses TLS.

>   Third the fourth:  What's with the weird SNI support?  The client should
> use it, but the server should not?  What?

Maybe it is unclear. What I mean is that the server shouldn't require SNI
since the host name is included in the request anyways.

However, possibly SNI might be needed for the server to present the proper
certificate to the client; if that is the case, then the server may present
an invalid certificate when the wrong (or no) SNI is used.

>   Third the fifth:  What do you mean by "clients SHOULD allow to use the
> system's DNS services to implement encrypted Client Hello"?  And what's with
> the following?  "if implemented, there MUST be an option to disable this
> feature."

Perhaps my specification is unclear. However, I am not sure how to write it
more clearly.

>   Fourth:  impose a hard limit on clients following redirects.  I know from
> experience that if this isn't mandatory, no one will implement it.  Even if
> it is mandatory, some won't implement it, but hopefully it'll be a smaller
> subset who ignore this.

OK. I added it.

>   Fifth:  Some server implementor will hard code a 2147483647 on a 4x reply,
> which is 69 years.  Clients will obviously ignore such a silly request,
> leading to an arms race.  Don't bother with a timeout value.  

OK, it is a good point. Even in Gemini protocol they suggested removing the
time specification in a 4x reply.

>   Sixth:  For the sub-protocol I, please use BNF for capability codes.  And
> what's with terminal emulators?

OK, I will add that; it is a good idea.

>   Seventh:  The Hashed URI section---what?  You first said relative URLs
> aren't allowed in a request, so is this meant for documents?  What does the
> hash buy you here?  And why number the hash algorithms instead of just
> listing their names?  This is getting complicated, quickly.

That is correct that relative URLs aren't allowed in a request, although
hashed: URLs are not necessarily relative (although they can be). Anyways,
it isn't useful to be used in a request (although some servers might allow
them in proxied requests (if the URL after the comma is absolute), but this
is generally discouraged).

Its use is that links to files can specify the hash so that you can verify
on the client side that the file has not changed (and that spies have not
tampered with it, if the source of the hash is trustworthy).

>   Eighth:  oh, a new document format.  Nice.  Binary HTML.  Even better. 
> Big endian---I don't mind, but it's not fasionable among kids today (because
> Intel won; Motorola lost and get over it Boomer!) and will be complained
> about.  And by "nice" I mean "oh god!" You'll get people bitching about not
> being able to include control data with their favorite editors and besides,
> you're redefining well defined control codes.  You are NOT going to get
> acceptance of this, or the following database file format.

The internet is supposed to big-endian, isn't it? Although I think that
small-endian is better (independently of what computers use it), I think
that it isn't that significant that it is worth violating the convention
of internet in this way. (Also, uxn is big-endian.)

A text-based format would be much more difficult for the client to parse, to
have to handle difficult escaping and nesting and other stuff like that. A
binary format will be simpler, especially a "flat" one such as this one,
rather than being nested like HTML and XML.

There are a few possibilities for how to write the document, such as using
a specialized editor, or using a converter or a static site generator.

>   Ninth:  ".special/crawl"?  Really?  Not "/robots.txt"?  Or
> "/.wellknown/robots.txt"?  Sigh.  Even Gemini repurposed "/robots.txt", a
> well known and supported format.  But if you insist on a new format, perhaps
> a example (or four) could be included?

I think that there are problems with the robots.txt format, including a
possible confusion of what is mandatory and optional.

I will add an example because you are correct it is a good idea to do so.
(I did not add it yet; sorry. I will do so later.)

>   Tenth:  What is the purpose of ".special/conversion"?  What file formats
> to what file formats?  

Any file formats to any file formats.

-- 
Don't laugh at the moon when it is day time in France.

[toc] | [prev] | [next] | [standalone]

#143

From	sean@conman.org
Date	2024-04-09 04:06 +0000
Message-ID	<uv2es4$os1$1@dont-email.me>
In reply to	#142

In comp.infosystems news@zzo38computer.org.invalid wrote:
> Thank you for your comments. I will try to respond to them the best that I
> can, and will add whatever is necessary to the FAQ as well as to modify
> other parts of the document as appropriate.
> 
> Some of the changes mentioned below I have done; others I have partially
> done or not added yet. I will continue to work on it later, though.
> 
> sean@conman.org wrote:
>>   First, what is ULFI?  All I bring up when I search on that is "Upper Limb
>> Functional Index"---I can't seem to locate anything that is close to MIME. 
>> If you do use TLAs [1] and ETLAs [2], please define it somewhere in the
>> document for those who are unfamiliar with it.
> 
> Thank you; I will write about it. (In this context, ULFI is short for
> "Unordered Labels File Identification".)

  You go into some detail, but not enough to answer "what is this for?"  It
looks like it's supposed to replace MIME but ... how?  There are no
examples, and a web search only brings up references to unordered lists in
HTML.

>>   Third: On TLS, methinks you underestimate how difficult it is to check
>> the first byte of a request is 0x16 and have an existing TLS library take
>> over the connection if it is.  I'm not saying it's impossible, just more
>> technically difficult than you may think.  Have you implemented a server
>> that supports both TLS and non-TLS support on the same port?
> 
> I thought you could use recv with the MSG_PEEK flag. (However, I did not
> actually try that (yet). If I am wrong, then you can tell me what is wrong
> with that, please.)

  You aren't wrong, but uch a method isn't mentioned that much (if at all)
in most networking tutorials, and if you are going for implementation
simplicity (which you haven't explicitly stated) then yes, this is "more
technically difficult than you may think."  I would try an implemention
before pushing for this myself.  This was never done for HTTP---I wonder
why?

>>   Fourth:  impose a hard limit on clients following redirects.  I know from
>> experience that if this isn't mandatory, no one will implement it.  Even if
>> it is mandatory, some won't implement it, but hopefully it'll be a smaller
>> subset who ignore this.
> 
> OK. I added it.

  Not strong enough.  In RFC-speak, MUST is stronger (mandatory) than
SHOULD.

>>   Sixth:  For the sub-protocol I, please use BNF for capability codes.  And
>> what's with terminal emulators?
> 
> OK, I will add that; it is a good idea.

  You still lack a description of what this is used for.  

>>   Seventh:  The Hashed URI section---what?  You first said relative URLs
>> aren't allowed in a request, so is this meant for documents?  What does the
>> hash buy you here?  And why number the hash algorithms instead of just
>> listing their names?  This is getting complicated, quickly.
> 
> Its use is that links to files can specify the hash so that you can verify
> on the client side that the file has not changed (and that spies have not
> tampered with it, if the source of the hash is trustworthy).

  "Spies" that tamper with the file on the server can just as easily tamper
with the hash.  Tampering in transit is protected though.

>>   Eighth:  oh, a new document format.  Nice.  Binary HTML.  Even better. 
>> Big endian---I don't mind, but it's not fasionable among kids today (because
>> Intel won; Motorola lost and get over it Boomer!) and will be complained
>> about.  And by "nice" I mean "oh god!" You'll get people bitching about not
>> being able to include control data with their favorite editors and besides,
>> you're redefining well defined control codes.  You are NOT going to get
>> acceptance of this, or the following database file format.
> 
> The internet is supposed to big-endian, isn't it? Although I think that
> small-endian is better (independently of what computers use it), I think
> that it isn't that significant that it is worth violating the convention
> of internet in this way. (Also, uxn is big-endian.)

  Yes, most networking protocols for the Internet are big-endian, but man,
do people complain about it now that Intel won.  Besides, there are file
formats, like ZIP files, that are little-endian in nature.  I'm not arguing
for little-endian (like I said, I like big-endian myself).  I'm just saying
be prepared for pushback on this.

> A text-based format would be much more difficult for the client to parse, to

  And yet, in ULFI section, you have people parsing

	a.b:c	SAME AS	c:a.b
	a:b+c:d	SAME AS a:b:b.c:d

and you say "text-based format would be much more difficult for the client
to parse" with a straight face?  

> have to handle difficult escaping and nesting and other stuff like that. A
> binary format will be simpler, especially a "flat" one such as this one,
> rather than being nested like HTML and XML.

  One of the big complaints about text/gemini is the lack of nested lists.

> There are a few possibilities for how to write the document, such as using
> a specialized editor, or using a converter or a static site generator.

  From arguments I've seen about binary-data in otherwise text documents
[3], if it's can't be done in an existing editor, it's a non-starter.

>>   Tenth:  What is the purpose of ".special/conversion"?  What file formats
>> to what file formats?  
> 
> Any file formats to any file formats.

  Why is that a part of a *protocol* specification?

 -spc

[1]	Three Letter Acronym

[2]	Extended Three Letter Acronym

[3]	Whenever CSV (Comma separated values) files come up on Hacker News
	or Lobste.rs, inevitably, someone will mention that ASCII defines
	four explicit separator characters, FS (File Separator), GS (Group
	Separator), RS (Record Separator) and US (Unit Separator) and the
	use of those will fix most problems with CSV.  The pushback comes
	when opponents of ASCII separators claim a file that uses such
	characters can't be edited in a normal text editor so STFU!  It's so
	bad that people who push for TSV (Tab separated values) will get
	pushback for the (ab)use of tabs in a text file.

[toc] | [prev] | [next] | [standalone]

#144

From	news@zzo38computer.org.invalid
Date	2024-04-10 16:01 -0700
Message-ID	<1712684411.bystand@zzo38computer.org>
In reply to	#143

sean@conman.org wrote:
> > Thank you; I will write about it. (In this context, ULFI is short for
> > "Unordered Labels File Identification".)
> 
>   You go into some detail, but not enough to answer "what is this for?"  It
> looks like it's supposed to replace MIME but ... how?

I explained more below, because you had written another comment below.

> > I thought you could use recv with the MSG_PEEK flag. (However, I did not
> > actually try that (yet). If I am wrong, then you can tell me what is wrong
> > with that, please.)
> 
>   You aren't wrong, but uch a method isn't mentioned that much (if at all)
> in most networking tutorials, and if you are going for implementation
> simplicity (which you haven't explicitly stated) then yes, this is "more
> technically difficult than you may think."  I would try an implemention
> before pushing for this myself.  This was never done for HTTP---I wonder
> why?

Implementation simplicity is more important for mandatory parts than for
optional parts. Of course TLS is itself complicated, which is one of the
reasons for being made optional (although there are other reasons too).

I will try the implementation; so far I have not implemented TLS in the
server side at all. However, this will require more changes just to make
it work with TLS at all, so itmight take a while before I will manage to
implement this. (Other people are free to write their own implementations,
and if they want to implement TLS, then they will try this instead.)

(I have once accessed a HTTPS server that does support this actually,
although after sending an unencrypted HTTP request on port 443, I received
a valid HTTP response but it was just an error message that says that
unencrypted requests on port 443 are not allowed.)

> >>   Fourth:  impose a hard limit on clients following redirects ...
> > OK. I added it.
> 
>   Not strong enough.  In RFC-speak, MUST is stronger (mandatory) than
> SHOULD.

It says:
  If the number of consecutive redirects exceed the limit (which MUST be
  not more than five by default, although it may be configurable by the
  user), then the client MUST NOT automatically follow further redirects.

It does say MUST.

> >>   Sixth:  For the sub-protocol I ...
> > OK, I will add that; it is a good idea.
>   You still lack a description of what this is used for.  

The document does explain what it is used for. (If it is unclear, then
hopefully someone who can explain it better, is able to do so.)

> > Its use is that links to files can specify the hash so that you can verify
> > on the client side that the file has not changed (and that spies have not
> > tampered with it, if the source of the hash is trustworthy).
> 
>   "Spies" that tamper with the file on the server can just as easily tamper
> with the hash.  Tampering in transit is protected though.

That depends on where (and when) you got the hash from.

(The hash is useful for other purposes too, such as for caching, for finding
another copy of the file (if someone has it indexed by its hash then you can
verify that it is correct), verifying that if you linked to a file that the
file has not been changed since then, etc.)

TLS does not prevent the server operator from changing the files to
malicious ones, nor does it prevent some other stuff; TLS does not (and
cannot) solve everything.

>   Yes, most networking protocols for the Internet are big-endian, but man,
> do people complain about it now that Intel won.  Besides, there are file
> formats, like ZIP files, that are little-endian in nature.  I'm not arguing
> for little-endian (like I said, I like big-endian myself).  I'm just saying
> be prepared for pushback on this.

I understand, but it isn't really a significant issue.

>   And yet, in ULFI section, you have people parsing
> 
> 	a.b:c	SAME AS	c:a.b
> 	a:b+c:d	SAME AS a:b:b.c:d
> 
> and you say "text-based format would be much more difficult for the client
> to parse" with a straight face?

It is not generally necessary for implementations to compare ULFI for
equality. If you are looking for a piece with a specific name then you
will find it. (This is also true if you are looking for multiple names.)

It is possible that there are multiple names that an implementation will
recognize, with different meanings in each case (it is also possible that
it will treat multiples with the same meanings), and it might define the
priorities to decide which one to use (or use them together if they can).

As one example where it might use multiples at once, a EPUB file is also
a ZIP file, and you can easily specify both, so that an implementation that
can open ZIP archives but not EPUB can still display the ZIP archive (there
might also be some command for the user to select explicitly which one to
use if both are implemented, but usually the implementation would make one
of them to have priority). MIME does have such a mechanism too, but it
seems to be just "added on" and is not a clean way to do it, in my opinion.

Another alternative than MIME is UTI (used by Apple), which can specify
that a type conforms one or more other types. However, this has its own
problems, such as you will need all of the definitions in order to compare
them, and there are no parameters, and it will always be required to be
exactly one that conforms with one or more others (doing it this way is
sometimes wrong; e.g. a PostScript file can be text or binary and can be
considered as a document or as a program).

>   From arguments I've seen about binary-data in otherwise text documents
> [3], if it's can't be done in an existing editor, it's a non-starter.

It is only because the existing editor is not written yet; people can try to
do so if you like to do. A converter program does already exist, so that is
another way to be done.

> >>   Tenth:  What is the purpose of ".special/conversion"?  What file formats
> >> to what file formats?  
> > 
> > Any file formats to any file formats.
> 
>   Why is that a part of a *protocol* specification?

It is both the protocol specification and file format specification.

> [3]	Whenever CSV (Comma separated values) files come up on Hacker News
> 	or Lobste.rs, inevitably, someone will mention that ASCII defines
> 	four explicit separator characters, FS (File Separator), GS (Group
> 	Separator), RS (Record Separator) and US (Unit Separator) and the
> 	use of those will fix most problems with CSV.  The pushback comes
> 	when opponents of ASCII separators claim a file that uses such
> 	characters can't be edited in a normal text editor so STFU!  It's so
> 	bad that people who push for TSV (Tab separated values) will get
> 	pushback for the (ab)use of tabs in a text file.

I am aware of this, and I am one of the people who have suggested the use
of ASCII separated values.

-- 
Don't laugh at the moon when it is day time in France.

[toc] | [prev] | [standalone]

csiph-web

Request for comments: Scorpion protocol/file-format

Contents

#140 — Request for comments: Scorpion protocol/file-format

#141

#142

#143

#144