Groups | Search | Server Info | Login | Register


Groups > comp.infosystems.www.misc > #161

Re: no-https: a plain-HTTP to HTTPS proxy

From Ivan Shmakov <ivan@siamics.net>
Newsgroups comp.infosystems.www.misc, comp.misc
Subject Re: no-https: a plain-HTTP to HTTPS proxy
Date 2018-09-25 18:39 +0000
Organization Aioe.org NNTP Server
Message-ID <87tvmdctgb.fsf@violet.siamics.net> (permalink)
References <87d0tdgbt4.fsf@violet.siamics.net>

Cross-posted to 2 groups.

Show all headers | View raw


>>>>> Ivan Shmakov <ivan@siamics.net> writes:

 > It took me about a day to write a crude but apparently (more or less)
 > working HTTP to HTTPS proxy.  (That I hope to beat into shape and
 > release via news:alt.sources around next Wednesday or so.  FTR, the
 > code is currently under 600 LoC long, or 431 LoC excluding comments
 > and empty lines.)  Some design notes are below.

	It took much longer (of course), and the code has by now expanded
	about threefold.  The HTTP/1 support is much improved, however;
	for instance, request bodies and chunked coding should now be
	fully supported.  Moreover, the relevant code was split off into
	a separate HTTP1::MessageStream push-mode parser module (or about
	a third of the overall code currently), allowing it to be used
	in other applications.

	The no-https.perl code proper still needs some clean-up after
	all the modifications it got.

	The command-line interface is about as follows.  (Not all the
	options are as of yet thoroughly tested, though.)

Usage:
  $ no-https 
        [-d|--[no-]debug] [--listen=BIND|-l BIND] [--mangle=MANGLE] 
        [--connect=COMMAND] [--ssl-connect=COMMAND] 
  $ no-https {-h|--help} 

BIND is either [HOST:]PORT or, if includes a /, a file name for a
Unix socket to create and listen on.  The default is 8080.

COMMAND will have %%, %h, %p replaced with a literal %, target host
and TCP port, respectively.  Also, %s and %t are replaced respectively
with a space and a TAB.

MANGLE can be minimal, header, or a name of an App::NoHTTPS::Mangle::
package to require and use.  If not specified, default is tried
first, falling back to (internally-implemented) header.

	The --connect= and --ssl-connect= should make it possible to
	utilize a parent proxy, including a SOCKS one, such as that
	provided by Tor, like: --connect="socat  STDIO
	SOCKS4:localhost:%h:%p,socksport=9050".  For --ssl-connect=,
	a tsocks(1)-wrapped gnutls-cli(1) may be an option.

 > Basics

 > The basic algorithm is as follows:

 > 1. receive a request header from the client; we only allow GET and
 > HEAD requests for now, as we do not support request /bodies/ as of yet;

	RFC 7230 section 3.3 actually provides simple criteria for
	determining whether the request has a body:

    The presence of a message body in a request is signaled by a
    Content-Length or Transfer-Encoding header field.  Request message
    framing is independent of method semantics, even if the method does
    not define any use for a message body.

	As such, and given that message passing was "symmetrized," any
	request method except CONNECT is now allowed by the code.

 > 2. decide the server and connect there;

 > 3. send the header to the server;

	Preceded by the request line, obviously.  (It was considered
	a part of the header in the original version of the code.)

 > 4. receive the response header;

	(Same here, for the status line.)

	We also pass any number of "100 Continue" messages here from
	server to client before the "payload" response.

 > 5. if that's an https: redirect:

 > 5.1. connect over TLS, alter the request (Host:, "request target")
 > accordingly, go to step 3;

	A Host: header is prepended to the request header if the
	original has none.

 > 6. strip certain headers (such as Strict-Transport-Security: and
 > Upgrade:, but also Set-Cookie:) off the response and send the result
 > to the client;

	Both the decision whether to "eat up" the redirect and how to
	alter the header and body of the messages (requests and responses
	alike) are left to the "mangler" object.  The object is ought to
	implement the following methods.

	$ma->message_mangler (PARSER, URI)
	    Return a new mangler object for the given HTTP1::MessageStream
	    parser state (either request or response) and request URI.

	    Alternatively, return an URI of the resource to transparently
	    request instead of the given one.

	    Return undef if this mangler has nothing to do with the
	    given parser state and URI.

	$ma->parser ([PARSER]), $ma->uri ([URI]),
	$ma->start_line ([START-LINE]), $ma->header ([HEADER])
	    Get or set the HTTP1::MessageStream object, URI, HTTP/1
	    start line and HTTP/1 header, respectively, associated with
	    the particular request.

	$ma->chunked_p ()
	    Return a true value if the body is ought to be transmitted
	    to the remote using chunked coding.  (The associated header
	    is set up accordingly.)

	$ma->get_mangled_body_part ()
	    Return the next part of the (possibly modified) HTTP/1
	    message body.  This will typically involve a call to the
	    parser object to interpret the portion of the message
	    currently in its own buffer.

	There're currently two such classes implemented: "minimal" and
	"header," and I believe that the above interface can be used to
	implement rather arbitrary HTTP message filters.

	The "minimal" class removes Upgrade and Proxy-Connection headers
	from the messages (requests and responses alike) and causes the
	calling code to transparently replace all the https: redirects
	with requested resources.

	The "header" class also filters Strict-Transport-Security and
	Set-Cookie off the responses.  (Although the former should have
	no effect anyway.)

	There's a minor issue with the handling of https: redirects.
	When http://example.com/ redirects to https://example.com/foo/bar,
	for instance, the links in the latter document will become
	relative to the former URI (unless the 'base' URI is explicitly
	given in the document); thus <a href="baz" /> will point to
	/baz -- instead of the intended /foo/baz.  A likely solution
	is to only eat up http:SAME to https:SAME redirects, rewriting
	http:SOME to https:OTHER instead to point to http:OTHER (which
	will then likely result in a redirect to https:OTHER, in turn
	eaten up by the mangler.)

 > 7. copy up to Content-Length: octets from the server to the client --
 > or all the remaining data if no Content-Length: is given; (somewhat
 > surprisingly, this seems to also work with the "chunked" coding not
 > otherwise considered in the code);

	Both the chunked coding and client-to-server body passing are
	now ought to be supported (although POST requests remain untested.)

 > 8. close the connection to the server and repeat from step 1 so long
 > as the client connection remains active.

[...]

-- 
FSF associate member #7257  http://am-1.org/~ivan/

Back to comp.infosystems.www.misc | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

no-https: a plain-HTTP to HTTPS proxy Ivan Shmakov <ivan@siamics.net> - 2018-09-16 07:07 +0000
  Re: no-https: a plain-HTTP to HTTPS proxy not@telling.you.invalid (Computer Nerd Kev) - 2018-09-16 22:52 +0000
    Re: no-https: a plain-HTTP to HTTPS proxy Mike Spencer <mds@bogus.nodomain.nowhere> - 2018-09-19 17:27 -0300
  Re: no-https: a plain-HTTP to HTTPS proxy Ivan Shmakov <ivan@siamics.net> - 2018-09-18 13:10 +0000
    Re: no-https: a plain-HTTP to HTTPS proxy Ivan Shmakov <ivan@siamics.net> - 2018-09-18 17:05 +0000
      Re: no-https: a plain-HTTP to HTTPS proxy Andy Burns <usenet@andyburns.uk> - 2018-09-18 18:32 +0100
      Re: no-https: a plain-HTTP to HTTPS proxy Rich <rich@example.invalid> - 2018-09-18 18:56 +0000
        Re: no-https: a plain-HTTP to HTTPS proxy Ivan Shmakov <ivan@siamics.net> - 2018-09-19 05:15 +0000
    Re: no-https: a plain-HTTP to HTTPS proxy Marko Rauhamaa <marko@pacujo.net> - 2018-09-18 22:02 +0300
      Re: no-https: a plain-HTTP to HTTPS proxy Rich <rich@example.invalid> - 2018-09-18 19:08 +0000
    Re: no-https: a plain-HTTP to HTTPS proxy Andy Burns <usenet@andyburns.uk> - 2018-09-18 20:16 +0100
  Re: no-https: a plain-HTTP to HTTPS proxy Ivan Shmakov <ivan@siamics.net> - 2018-09-25 18:39 +0000
    Re: no-https: a plain-HTTP to HTTPS proxy Eli the Bearded <*@eli.users.panix.com> - 2018-09-25 22:29 +0000
      Re: no-https: a plain-HTTP to HTTPS proxy Ivan Shmakov <ivan@siamics.net> - 2018-09-26 01:05 +0000
        Re: no-https: a plain-HTTP to HTTPS proxy Ivan Shmakov <ivan@siamics.net> - 2018-10-04 20:07 +0000
          Re: no-https: a plain-HTTP to HTTPS proxy not@telling.you.invalid (Computer Nerd Kev) - 2018-10-05 00:11 +0000

csiph-web