Re: no-https: a plain-HTTP to HTTPS proxy

From	Ivan Shmakov <ivan@siamics.net>
Newsgroups	comp.infosystems.www.misc, comp.misc
Subject	Re: no-https: a plain-HTTP to HTTPS proxy
Date	2018-09-18 13:10 +0000
Organization	Aioe.org NNTP Server
Message-ID	<87r2hrdk8b.fsf@violet.siamics.net> (permalink)
References	<87d0tdgbt4.fsf@violet.siamics.net> <eli$1809161636@qaz.wtf>

Cross-posted to 2 groups.

Show all headers | View raw

>>>>> Eli the Bearded <*@eli.users.panix.com> writes:
>>>>> In comp.infosystems.www.misc, Ivan Shmakov <ivan@siamics.net> wrote:

 >> It took me about a day to write a crude but apparently (more or
 >> less) working HTTP to HTTPS proxy.  (That I hope to beat into shape
 >> and release via news:alt.sources around next Wednesday or so.
 >> FTR, the code is currently under 600 LoC long, or 431 LoC excluding
 >> comments and empty lines.)  Some design notes are below.

 > What language?

	Perl 5.  It appears the most apt for the task of the five general
	purpose languages I'm using regularly these days.  (The others
	being Emacs Lisp, Shell, Awk; and C, though that's mostly limited
	to occasional embedded programming.)

 >> The basic algorithm is as follows:

 >> 1. receive a request header from the client; we only allow GET and
 >> HEAD requests for now, as we do not support request /bodies/ as of yet;

 > No POST requests will stop a lot of forms.

	My intent was to support Web /reading/ over plain HTTP specifically
	-- which is something that shouldn't involve forms IMO.  That said,
	I suppose there can be any number of resources that use POST for
	/search/ forms, which is something that may be worth supporting.

 >  HEAD is an easy case, but largely unused.

	Easy, indeed, and I do use it myself, so the question of whether
	to implement its handling or not wasn't really considered.

[...]

 >> 6. strip certain headers (such as Strict-Transport-Security: and
 >> Upgrade:, but also Set-Cookie:) off the response and send the result
 >> to the client;

 > That probably covers it.  If you change HTTP/1.1 to HTTP/1.0 on the
 > requests, then 1% of servers will have issues and 50% fewer servers
 > will send chunked requests.  (Numbers made up, based on my experiences.)

	The idea was to require the barest minimum of mangling in the
	code, so to leave up the most choices to the user.  As such,
	HTTP/1.1 and chunked encoding appears worth enough supporting.

 > You can also drop Accept-Encoding: if you want to avoid dealing with
 > compressed responses.

	Per RFC 7231, Accept-Encoding: identity communicates the client's
	preference for "no encoding."  Omitting the header, OTOH, means
	"no preference":

    5.3.4.  Accept-Encoding

    [...]

    A request without an Accept-Encoding header field implies that the
    user agent has no preferences regarding content-codings.  Although
    this allows the server to use any content-coding in a response, it
    does not imply that the user agent will be able to correctly process
    all encodings.

	That said, I do wish for the user to have the choice of having
	/both/ compression and transformations available.  And while I'm
	not constrained much by bandwidth, some of the future users of
	this code may be.

[...]

 >> There was an idea of transparently replacing https: references in
 >> HTML and XML attributes with scheme-relative ones (like, e. g.,
 >> https://example.com/ to //example.com/.)  So far, that fails more
 >> often than it works, for two primary reasons: compression (although
 >> that can be solved by forcing Accept-Encoding: identity in requests)
 >> -- and the fact that by the time such filtering can take place,
 >> we've already sent the Content-Length: (if any) for the original
 >> (unaltered) body to the client!

 > You can fix that with whitespace padding.

 >    <img src="https://qaz.wtf/tmp/chree.png" ...>
 >    <img src="//qaz.wtf/tmp/chree.png" ...>

	Yes, I've tried it (alongside Accept-Encoding: identity), it
	worked, but I don't like it for the lack of generality.

 > Beware of parsing issues.

	Other than those shown in the examples below?

 > Real world HTML usually looks like one of the first two but may
 > sometimes look like one of second two of these:

 > <img src="https://qaz.wtf/tmp/chree.png" ...>
 > <img src='https://qaz.wtf/tmp/chree.png' ...>
 > <img src=https://qaz.wtf/tmp/chree.png ...>
 > <img src = "https://qaz.wtf/tmp/chree.png" ...>

 > (And that's ignoring case.)

	Indeed; and case and lack of quotes will require specialcasing
	for HTML (I aim to support XML applications as well, which
	fortunately are somewhat simpler in this respect.)

	OTOH, I don't think I've ever seen the " = " form; do the blanks
	around the equals sign even conform to any HTML version?

[...]

 >> Thoughts?

 > Are you going to fix Referer: headers to use the https: version when
 > communicating with an https site?  I think you probably should.

	I guess I'll leave it up to the user.  Per my experience (with
	copying Web pages using Wget), resources requiring Referer: are
	more an exception rather than the rule, but still.

 > Elijah ------ only forces https on his site for the areas that
 > require login

	And that's a sensible approach.

-- 
FSF associate member #7257  http://am-1.org/~ivan/

Back to comp.infosystems.www.misc | Previous | Next — Previous in thread | Next in thread | Find similar

Thread

no-https: a plain-HTTP to HTTPS proxy Ivan Shmakov <ivan@siamics.net> - 2018-09-16 07:07 +0000
  Re: no-https: a plain-HTTP to HTTPS proxy not@telling.you.invalid (Computer Nerd Kev) - 2018-09-16 22:52 +0000
    Re: no-https: a plain-HTTP to HTTPS proxy Mike Spencer <mds@bogus.nodomain.nowhere> - 2018-09-19 17:27 -0300
  Re: no-https: a plain-HTTP to HTTPS proxy Ivan Shmakov <ivan@siamics.net> - 2018-09-18 13:10 +0000
    Re: no-https: a plain-HTTP to HTTPS proxy Ivan Shmakov <ivan@siamics.net> - 2018-09-18 17:05 +0000
      Re: no-https: a plain-HTTP to HTTPS proxy Andy Burns <usenet@andyburns.uk> - 2018-09-18 18:32 +0100
      Re: no-https: a plain-HTTP to HTTPS proxy Rich <rich@example.invalid> - 2018-09-18 18:56 +0000
        Re: no-https: a plain-HTTP to HTTPS proxy Ivan Shmakov <ivan@siamics.net> - 2018-09-19 05:15 +0000
    Re: no-https: a plain-HTTP to HTTPS proxy Marko Rauhamaa <marko@pacujo.net> - 2018-09-18 22:02 +0300
      Re: no-https: a plain-HTTP to HTTPS proxy Rich <rich@example.invalid> - 2018-09-18 19:08 +0000
    Re: no-https: a plain-HTTP to HTTPS proxy Andy Burns <usenet@andyburns.uk> - 2018-09-18 20:16 +0100
  Re: no-https: a plain-HTTP to HTTPS proxy Ivan Shmakov <ivan@siamics.net> - 2018-09-25 18:39 +0000
    Re: no-https: a plain-HTTP to HTTPS proxy Eli the Bearded <*@eli.users.panix.com> - 2018-09-25 22:29 +0000
      Re: no-https: a plain-HTTP to HTTPS proxy Ivan Shmakov <ivan@siamics.net> - 2018-09-26 01:05 +0000
        Re: no-https: a plain-HTTP to HTTPS proxy Ivan Shmakov <ivan@siamics.net> - 2018-10-04 20:07 +0000
          Re: no-https: a plain-HTTP to HTTPS proxy not@telling.you.invalid (Computer Nerd Kev) - 2018-10-05 00:11 +0000

csiph-web