Groups | Search | Server Info | Login | Register
Groups > comp.infosystems.www.misc > #152
| From | Ivan Shmakov <ivan@siamics.net> |
|---|---|
| Newsgroups | comp.infosystems.www.misc, comp.misc |
| Subject | Re: no-https: a plain-HTTP to HTTPS proxy |
| Date | 2018-09-18 13:10 +0000 |
| Organization | Aioe.org NNTP Server |
| Message-ID | <87r2hrdk8b.fsf@violet.siamics.net> (permalink) |
| References | <87d0tdgbt4.fsf@violet.siamics.net> <eli$1809161636@qaz.wtf> |
Cross-posted to 2 groups.
>>>>> Eli the Bearded <*@eli.users.panix.com> writes:
>>>>> In comp.infosystems.www.misc, Ivan Shmakov <ivan@siamics.net> wrote:
>> It took me about a day to write a crude but apparently (more or
>> less) working HTTP to HTTPS proxy. (That I hope to beat into shape
>> and release via news:alt.sources around next Wednesday or so.
>> FTR, the code is currently under 600 LoC long, or 431 LoC excluding
>> comments and empty lines.) Some design notes are below.
> What language?
Perl 5. It appears the most apt for the task of the five general
purpose languages I'm using regularly these days. (The others
being Emacs Lisp, Shell, Awk; and C, though that's mostly limited
to occasional embedded programming.)
>> The basic algorithm is as follows:
>> 1. receive a request header from the client; we only allow GET and
>> HEAD requests for now, as we do not support request /bodies/ as of yet;
> No POST requests will stop a lot of forms.
My intent was to support Web /reading/ over plain HTTP specifically
-- which is something that shouldn't involve forms IMO. That said,
I suppose there can be any number of resources that use POST for
/search/ forms, which is something that may be worth supporting.
> HEAD is an easy case, but largely unused.
Easy, indeed, and I do use it myself, so the question of whether
to implement its handling or not wasn't really considered.
[...]
>> 6. strip certain headers (such as Strict-Transport-Security: and
>> Upgrade:, but also Set-Cookie:) off the response and send the result
>> to the client;
> That probably covers it. If you change HTTP/1.1 to HTTP/1.0 on the
> requests, then 1% of servers will have issues and 50% fewer servers
> will send chunked requests. (Numbers made up, based on my experiences.)
The idea was to require the barest minimum of mangling in the
code, so to leave up the most choices to the user. As such,
HTTP/1.1 and chunked encoding appears worth enough supporting.
> You can also drop Accept-Encoding: if you want to avoid dealing with
> compressed responses.
Per RFC 7231, Accept-Encoding: identity communicates the client's
preference for "no encoding." Omitting the header, OTOH, means
"no preference":
5.3.4. Accept-Encoding
[...]
A request without an Accept-Encoding header field implies that the
user agent has no preferences regarding content-codings. Although
this allows the server to use any content-coding in a response, it
does not imply that the user agent will be able to correctly process
all encodings.
That said, I do wish for the user to have the choice of having
/both/ compression and transformations available. And while I'm
not constrained much by bandwidth, some of the future users of
this code may be.
[...]
>> There was an idea of transparently replacing https: references in
>> HTML and XML attributes with scheme-relative ones (like, e. g.,
>> https://example.com/ to //example.com/.) So far, that fails more
>> often than it works, for two primary reasons: compression (although
>> that can be solved by forcing Accept-Encoding: identity in requests)
>> -- and the fact that by the time such filtering can take place,
>> we've already sent the Content-Length: (if any) for the original
>> (unaltered) body to the client!
> You can fix that with whitespace padding.
> <img src="https://qaz.wtf/tmp/chree.png" ...>
> <img src="//qaz.wtf/tmp/chree.png" ...>
Yes, I've tried it (alongside Accept-Encoding: identity), it
worked, but I don't like it for the lack of generality.
> Beware of parsing issues.
Other than those shown in the examples below?
> Real world HTML usually looks like one of the first two but may
> sometimes look like one of second two of these:
> <img src="https://qaz.wtf/tmp/chree.png" ...>
> <img src='https://qaz.wtf/tmp/chree.png' ...>
> <img src=https://qaz.wtf/tmp/chree.png ...>
> <img src = "https://qaz.wtf/tmp/chree.png" ...>
> (And that's ignoring case.)
Indeed; and case and lack of quotes will require specialcasing
for HTML (I aim to support XML applications as well, which
fortunately are somewhat simpler in this respect.)
OTOH, I don't think I've ever seen the " = " form; do the blanks
around the equals sign even conform to any HTML version?
[...]
>> Thoughts?
> Are you going to fix Referer: headers to use the https: version when
> communicating with an https site? I think you probably should.
I guess I'll leave it up to the user. Per my experience (with
copying Web pages using Wget), resources requiring Referer: are
more an exception rather than the rule, but still.
> Elijah ------ only forces https on his site for the areas that
> require login
And that's a sensible approach.
--
FSF associate member #7257 http://am-1.org/~ivan/
Back to comp.infosystems.www.misc | Previous | Next — Previous in thread | Next in thread | Find similar
no-https: a plain-HTTP to HTTPS proxy Ivan Shmakov <ivan@siamics.net> - 2018-09-16 07:07 +0000
Re: no-https: a plain-HTTP to HTTPS proxy not@telling.you.invalid (Computer Nerd Kev) - 2018-09-16 22:52 +0000
Re: no-https: a plain-HTTP to HTTPS proxy Mike Spencer <mds@bogus.nodomain.nowhere> - 2018-09-19 17:27 -0300
Re: no-https: a plain-HTTP to HTTPS proxy Ivan Shmakov <ivan@siamics.net> - 2018-09-18 13:10 +0000
Re: no-https: a plain-HTTP to HTTPS proxy Ivan Shmakov <ivan@siamics.net> - 2018-09-18 17:05 +0000
Re: no-https: a plain-HTTP to HTTPS proxy Andy Burns <usenet@andyburns.uk> - 2018-09-18 18:32 +0100
Re: no-https: a plain-HTTP to HTTPS proxy Rich <rich@example.invalid> - 2018-09-18 18:56 +0000
Re: no-https: a plain-HTTP to HTTPS proxy Ivan Shmakov <ivan@siamics.net> - 2018-09-19 05:15 +0000
Re: no-https: a plain-HTTP to HTTPS proxy Marko Rauhamaa <marko@pacujo.net> - 2018-09-18 22:02 +0300
Re: no-https: a plain-HTTP to HTTPS proxy Rich <rich@example.invalid> - 2018-09-18 19:08 +0000
Re: no-https: a plain-HTTP to HTTPS proxy Andy Burns <usenet@andyburns.uk> - 2018-09-18 20:16 +0100
Re: no-https: a plain-HTTP to HTTPS proxy Ivan Shmakov <ivan@siamics.net> - 2018-09-25 18:39 +0000
Re: no-https: a plain-HTTP to HTTPS proxy Eli the Bearded <*@eli.users.panix.com> - 2018-09-25 22:29 +0000
Re: no-https: a plain-HTTP to HTTPS proxy Ivan Shmakov <ivan@siamics.net> - 2018-09-26 01:05 +0000
Re: no-https: a plain-HTTP to HTTPS proxy Ivan Shmakov <ivan@siamics.net> - 2018-10-04 20:07 +0000
Re: no-https: a plain-HTTP to HTTPS proxy not@telling.you.invalid (Computer Nerd Kev) - 2018-10-05 00:11 +0000
csiph-web