Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #197429

RE: Module urljoin does not appear to work with scheme Gemini

From Anders Munch <ajm@flonidan.dk>
Newsgroups comp.lang.python
Subject RE: Module urljoin does not appear to work with scheme Gemini
Date 2025-04-24 08:36 +0000
Message-ID <mailman.30.1745483776.3008.python-list@python.org> (permalink)
References <20250421083845.5e5b6d69@workstation.localdomain> <f5bikmwp8of.fsf@lochinver.inf.ed.ac.uk> <20250422182253.00b9327d@workstation.localdomain> <f5bwmbbnexs.fsf@lochinver.inf.ed.ac.uk> <VI1PR05MB106805790212AE1658621959AB4852@VI1PR05MB10680.eurprd05.prod.outlook.com>

Show all headers | View raw


Henry S. Thompson wrote:
> Some approach to support future-proofing in general would seem to be in order. 
> Given some other precedents, adding a boolean argument called either 'strict' or 'lax' would be my preference.

An alternative would be to refactor urllib.parse to use strategy objects
for schemes.

parse.py contains a number of lists of scheme names, that act as flags to
control parsing behaviour:
        uses_relative, uses_netloc, uses_params, non_hierarchical, uses_query and uses_fragment.
(If written today they would be sets, but this is very old code that predates sets!)
Group that information by scheme instead of by flag name, in e.g. a dataclass, and
you have made yourself a strategy object lookup table:

scheme_options = {
   'https': SchemeOptions(uses_relative=True,  uses_netloc=True, uses_params=True),
   'git': SchemeOptions(uses_relative=False,  uses_netloc=True, uses_params=False),
   ...
}

Once you have that, you can add the strategy object as an optional argument to
functions.  If the argument is not given, you find a strategy object from
scheme_options to use. If the argument is given, you use that.

The best part of this approach is that you now have a way of saying "treat this
scheme exactly like https":

   from urllib import parse
   parse.urljoin('sptth://...', '../one-level-up', options=parse.scheme_options['https'])

Note: I wrote this before I realised that the lists non_hierarchical, uses_query
and uses_fragment are not used.  With only three options instead of six, making
a strategy object is not quite as attractive.  But still worth considering.

regards, Anders 

Back to comp.lang.python | Previous | Next | Find similar


Thread

RE: Module urljoin does not appear to work with scheme Gemini Anders Munch <ajm@flonidan.dk> - 2025-04-24 08:36 +0000

csiph-web