Groups > comp.lang.python > #84344 > unrolled thread

[OT] absolute vs. relative URI

Started by	Grant Edwards <invalid@invalid.invalid>
First post	2015-01-23 14:47 +0000
Last post	2015-01-23 18:21 +0000
Articles	11 — 7 participants

Back to article view | Back to comp.lang.python

  [OT] absolute vs. relative URI Grant Edwards <invalid@invalid.invalid> - 2015-01-23 14:47 +0000
    Re: [OT] absolute vs. relative URI Marko Rauhamaa <marko@pacujo.net> - 2015-01-23 17:18 +0200
      Re: [OT] absolute vs. relative URI Grant Edwards <invalid@invalid.invalid> - 2015-01-23 15:40 +0000
        Re: [OT] absolute vs. relative URI Chris Warrick <kwpolska@gmail.com> - 2015-01-23 17:00 +0100
          Re: [OT] absolute vs. relative URI Grant Edwards <invalid@invalid.invalid> - 2015-01-23 17:10 +0000
        Re: [OT] absolute vs. relative URI Chris Angelico <rosuav@gmail.com> - 2015-01-24 04:48 +1100
          Re: [OT] absolute vs. relative URI Rick Johnson <rantingrickjohnson@gmail.com> - 2015-01-23 11:02 -0800
            Re: [OT] absolute vs. relative URI Chris Angelico <rosuav@gmail.com> - 2015-01-24 06:46 +1100
            Re: [OT] absolute vs. relative URI Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-01-23 20:22 +0000
    Re: [OT] absolute vs. relative URI Chris Angelico <rosuav@gmail.com> - 2015-01-24 04:55 +1100
    Re: [OT] absolute vs. relative URI Tony the Tiger <tony@tiger.invalid> - 2015-01-23 18:21 +0000

#84344 — [OT] absolute vs. relative URI

From	Grant Edwards <invalid@invalid.invalid>
Date	2015-01-23 14:47 +0000
Subject	[OT] absolute vs. relative URI
Message-ID	<m9tmt8$hbn$1@reader1.panix.com>

I'm maintaining a web app were the original author(s) went to a little
bit of trouble to always use absolute URIs in links in the pages.

First, the code checks the port number the server is listening on and
extrapolates the protocol being used (http or https).  Then it grabs
the server value from the request and sets a more-or-less global
variable 'wwwroot' something like this:

   if port in (443,8433)
     proto = 'https'
   else
     proto = 'http'
     
   wwwroot = "%s://%s/"  % (proto,server)

If the user orignally entered a URL with the literal IP address
10.0.0.99, then wwwroot ends up with a value of "http://10.0.0.99/"

Then, throughout the rest of the code that variable is used so that
all links are absolute (including protocol, host, and absolute path):

  "<a src='%sWhatever>Whatever</a>" % wwwroot

Why do they go to the extra work of constructing the value for wwwroot
and then inserting it later?
  
I'm not an HTLM/HTTP guru, but I've tinkered with web pages for 20+
years, and for links within sites, I've always used links either
relative to the current location or an absolute _path_ relative to the
current server:

  <a src='/Whatever'>Whatever</a>

I've never had any problems with links like that.  Is there some case
where that doesn't work right and I've just been stupidly lucky?
  
-- 
Grant Edwards               grant.b.edwards        Yow! It don't mean a
                                  at               THING if you ain't got
                              gmail.com            that SWING!!

[toc] | [next] | [standalone]

#84345

From	Marko Rauhamaa <marko@pacujo.net>
Date	2015-01-23 17:18 +0200
Message-ID	<87twzh34r6.fsf@elektro.pacujo.net>
In reply to	#84344

Grant Edwards <invalid@invalid.invalid>:

> I'm not an HTLM/HTTP guru, but I've tinkered with web pages for 20+
> years, and for links within sites, I've always used links either
> relative to the current location or an absolute _path_ relative to the
> current server:
>
>   <a src='/Whatever'>Whatever</a>
>
> I've never had any problems with links like that.  Is there some case
> where that doesn't work right and I've just been stupidly lucky?

An ancient HTML spec (<URL: https://tools.ietf.org/html/rfc1866>)
specifies:

    HREF
        gives the URI of the head anchor of a hyperlink.

It refers to the URI spec (<URL: https://tools.ietf.org/html/rfc1630>):

    Partial (relative) form
       Within a object whose URI is well defined, the URI of another
       object may be given in abbreviated form, where parts of the two
       URIs are the same. This allows objects within a group to refer to
       each other without requiring the space for a complete reference,
       and it incidentally allows the group of objects to be moved
       without changing any references.

       [...]

       The rules for the use of a partial name relative to the URI of
       the context are: [...]

Bottom line: you are safe.


Marko

[toc] | [prev] | [next] | [standalone]

#84347

From	Grant Edwards <invalid@invalid.invalid>
Date	2015-01-23 15:40 +0000
Message-ID	<m9tq24$1pu$1@reader1.panix.com>
In reply to	#84345

On 2015-01-23, Marko Rauhamaa <marko@pacujo.net> wrote:
> Grant Edwards <invalid@invalid.invalid>:
>
>> I'm not an HTLM/HTTP guru, but I've tinkered with web pages for 20+
>> years, and for links within sites, I've always used links either
>> relative to the current location or an absolute _path_ relative to the
>> current server:
>>
>>   <a src='/Whatever'>Whatever</a>
>>
>> I've never had any problems with links like that.  Is there some case
>> where that doesn't work right and I've just been stupidly lucky?
>
> An ancient HTML spec (<URL: https://tools.ietf.org/html/rfc1866>)
> specifies:
[...]
> It refers to the URI spec (<URL: https://tools.ietf.org/html/rfc1630>):
[...]
>
> Bottom line: you are safe.

Thanks, I was pretty sure that was the case. But, I'm still baffled
why the original author(s) went to the extra work to always generate
absolute URIs.  The pages were originally developed by a web
development company we contracted to do the initial design for us. We
were _assuming_ they knew more about that sort of thing than we
old-school EE types.

[I must admit that I have learned a lot from their code about how to
use CSS to avoid putting layout/presentation info directly in the HTML
tags the way we did in days of yore.]

-- 
Grant Edwards               grant.b.edwards        Yow! I have a TINY BOWL in
                                  at               my HEAD
                              gmail.com

[toc] | [prev] | [next] | [standalone]

#84349

From	Chris Warrick <kwpolska@gmail.com>
Date	2015-01-23 17:00 +0100
Message-ID	<mailman.18035.1422028823.18130.python-list@python.org>
In reply to	#84347

On Fri, Jan 23, 2015 at 4:40 PM, Grant Edwards <invalid@invalid.invalid> wrote:
> On 2015-01-23, Marko Rauhamaa <marko@pacujo.net> wrote:
>> Grant Edwards <invalid@invalid.invalid>:
>>
>>> I'm not an HTLM/HTTP guru, but I've tinkered with web pages for 20+
>>> years, and for links within sites, I've always used links either
>>> relative to the current location or an absolute _path_ relative to the
>>> current server:
>>>
>>>   <a src='/Whatever'>Whatever</a>
>>>
>>> I've never had any problems with links like that.  Is there some case
>>> where that doesn't work right and I've just been stupidly lucky?
>>
>> An ancient HTML spec (<URL: https://tools.ietf.org/html/rfc1866>)
>> specifies:
> [...]
>> It refers to the URI spec (<URL: https://tools.ietf.org/html/rfc1630>):
> [...]
>>
>> Bottom line: you are safe.

Technically, there is one way to break things:
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/base

However, nobody really uses that.  Determining the protocol and server
URL is a lot of effort, but it does not give any advantages over plain
"/Whatever".  Moreover, it would be safer and more future-proof to use
a protocol-relative "//example.com/Whatever" URL instead of
determining the protocol by ports (why 8433?  I can serve (insecure)
HTTP there; hell: I can be a complete jerk and swap ports 80 and
443!).

But, this webapp completely ignores a pitfall in the process: it
assumes the app lives in the web server root.  You can easily change
this via your favorite HTTP daemon.

> Thanks, I was pretty sure that was the case. But, I'm still baffled
> why the original author(s) went to the extra work to always generate
> absolute URIs.  The pages were originally developed by a web
> development company we contracted to do the initial design for us. We
> were _assuming_ they knew more about that sort of thing than we
> old-school EE types.

Hah!  Those people certainly don’t look “experienced”.

   "<a src='%sWhatever>Whatever</a>" % wwwroot

0. This should be href=, but this is probably an error with retyping.
(use copy-paste next time.)
1. "double" quotes should be used,
2. and on both sides of the URL.
3. This should be handled in (Jinja2) templates,
4. which should not involve string formatting;
5. especially old-style %-based string formatting!

-- 
Chris Warrick <https://chriswarrick.com/>
PGP: 5EAAEA16

[toc] | [prev] | [next] | [standalone]

#84358

From	Grant Edwards <invalid@invalid.invalid>
Date	2015-01-23 17:10 +0000
Message-ID	<m9tvao$gn9$1@reader1.panix.com>
In reply to	#84349

On 2015-01-23, Chris Warrick <kwpolska@gmail.com> wrote:
>
> Hah!  Those people certainly don’t look “experienced”.
>
>    "<a src='%sWhatever>Whatever</a>" % wwwroot
>
> 0. This should be href=, but this is probably an error with retyping.
> (use copy-paste next time.)
> 1. "double" quotes should be used,
> 2. and on both sides of the URL.

My bad. Those three were typos in my example.  Real examples that I
could cut/paste all had way to much other cruft involved to serve as
useful illustrations of my quesiton.

> 3. This should be handled in (Jinja2) templates,
> 4. which should not involve string formatting;
> 5. especially old-style %-based string formatting!

The code I posted wasn't from the actual project -- it was just an
illustration to show what they were doing rather than exactly how they
were doing it.

-- 
Grant Edwards               grant.b.edwards        Yow! All this time I've
                                  at               been VIEWING a RUSSIAN
                              gmail.com            MIDGET SODOMIZE a HOUSECAT!

[toc] | [prev] | [next] | [standalone]

#84367

From	Chris Angelico <rosuav@gmail.com>
Date	2015-01-24 04:48 +1100
Message-ID	<mailman.18045.1422035333.18130.python-list@python.org>
In reply to	#84347

On Sat, Jan 24, 2015 at 3:00 AM, Chris Warrick <kwpolska@gmail.com> wrote:
> 5. especially old-style %-based string formatting!

Please. There's nothing wrong with %-style formatting. It's not
deprecated, and never will be; and it has the advantage of being
cross-language compatible. I was speaking with a Python student
yesterday who didn't understand the {} notation, but grokked "%d + %d
= %d" % (x, y, x+y) instantly, thanks to experience with other
languages.

Use of % formatting is not a bug.

ChrisA

[toc] | [prev] | [next] | [standalone]

#84386

From	Rick Johnson <rantingrickjohnson@gmail.com>
Date	2015-01-23 11:02 -0800
Message-ID	<53547444-2dd1-4b50-a5ad-0fd162a51ab1@googlegroups.com>
In reply to	#84367

On Friday, January 23, 2015 at 11:49:05 AM UTC-6, Chris Angelico wrote:
> On Sat, Jan 24, 2015 at 3:00 AM, Chris Warrick  wrote:
> > 5. especially old-style %-based string formatting!
> 
> Please. There's nothing wrong with %-style formatting. 

*BALD-FACED-PARTISAN-LIE*! 

If there is *NOTHING* wrong with %formatting then why did we
violate our philosophy of "there should be one way to do
it..." and introduce the string.format() method?

The new string.format() method is not merely *ANOTHER* way
to do the same thing Chris, it's first and foremost a bug-
fix for the limited capabilities of the legacy %formatting.

But string.format() is so much more than a mere bug-fix,
Chris, since it not only offers a richer set of tools, you
can even create your own custom extension of the formatter:

    https://docs.python.org/2/library/string.html#string-formatting

...try to do all that with %formatting! And lets not forget
that a *FORMAT-METHOD* aligns itself nicely with the wise
philosophy of OOP! (oh boy, i'm going flac for that one from
the OOP hating religious nutters!)

> It's not deprecated, and never will be;

Chris, what do you call a statement that is based on an
un-provable premise? Oh and, GvR told me to tell you that he
wants his time machine back, and if it has even one dent
you're going to be in some serious trouble!

> and it has the advantage of being cross-language
> compatible. I was speaking with a Python student yesterday
> who didn't understand the {} notation, but grokked "%d +
> %d = %d" % (x, y, x+y) instantly, thanks to experience
> with other languages.

So now you finally admit to us that you base your decisions
on emotion and ignore facts. This was something that i had
suspected for some time and i'm glad that you can finally
admit the truth.

So even though string.format() is highly superior to the
legacy %format crap, you will happily ignore the
advancements and cling to your instinctual emotions like the
religious nutters clinging to a bible.

> Use of % formatting is not a bug.

"not a bug"? Another ridiculous judgment! I believe the
description you're looking for is: "is a foolish
consistency".

[toc] | [prev] | [next] | [standalone]

#84390

From	Chris Angelico <rosuav@gmail.com>
Date	2015-01-24 06:46 +1100
Message-ID	<mailman.18056.1422042405.18130.python-list@python.org>
In reply to	#84386

On Sat, Jan 24, 2015 at 6:02 AM, Rick Johnson
<rantingrickjohnson@gmail.com> wrote:
>> It's not deprecated, and never will be;
>
> Chris, what do you call a statement that is based on an
> un-provable premise? Oh and, GvR told me to tell you that he
> wants his time machine back, and if it has even one dent
> you're going to be in some serious trouble!

Statements from the BDFL committing to maintaining it *at least* until
Python 4000, with clear indication that it is most definitely not
deprecated. Or, looking at it another way: If you want to disagree
with me, find a statement in current, official documentation that says
that it's deprecated.

ChrisA

[toc] | [prev] | [next] | [standalone]

#84397

From	Mark Lawrence <breamoreboy@yahoo.co.uk>
Date	2015-01-23 20:22 +0000
Message-ID	<mailman.18060.1422044574.18130.python-list@python.org>
In reply to	#84386

On 23/01/2015 19:46, Chris Angelico wrote:
> On Sat, Jan 24, 2015 at 6:02 AM, Rick Johnson
> <rantingrickjohnson@gmail.com> wrote:
>>> It's not deprecated, and never will be;
>>
>> Chris, what do you call a statement that is based on an
>> un-provable premise? Oh and, GvR told me to tell you that he
>> wants his time machine back, and if it has even one dent
>> you're going to be in some serious trouble!
>
> Statements from the BDFL committing to maintaining it *at least* until
> Python 4000, with clear indication that it is most definitely not
> deprecated. Or, looking at it another way: If you want to disagree
> with me, find a statement in current, official documentation that says
> that it's deprecated.
>
> ChrisA
>

This http://bugs.python.org/issue14123 seems relevant.

-- 
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

[toc] | [prev] | [next] | [standalone]

#84369

From	Chris Angelico <rosuav@gmail.com>
Date	2015-01-24 04:55 +1100
Message-ID	<mailman.18047.1422035741.18130.python-list@python.org>
In reply to	#84344

On Sat, Jan 24, 2015 at 1:47 AM, Grant Edwards <invalid@invalid.invalid> wrote:
> I'm maintaining a web app were the original author(s) went to a little
> bit of trouble to always use absolute URIs in links in the pages.

The advantage is that someone who downloads the bare page will still
be referencing images, CSS, other links, etc, from the original
server. The disadvantage is... exactly the same. These days, it's MUCH
better to use relative links, and then let something like wget rewrite
them as necessary. In fact, if all your links are relative (not even
the leading slash - use ../../../ as many times as is necessary), they
don't even need rewriting, and you can test your web site by just
pointing your browser at the file system.

The original code seems rather fragile. The port number isn't
incorporated, so incoming requests on 8443 will end up going through
to https:// with the default port of 443. Any other port will be sent
back through 80. Strongly recommend not doing this.

ChrisA

[toc] | [prev] | [next] | [standalone]

#84381

From	Tony the Tiger <tony@tiger.invalid>
Date	2015-01-23 18:21 +0000
Message-ID	<piwww.319381$Um.5741@fx22.am4>
In reply to	#84344

On Fri, 23 Jan 2015 14:47:04 +0000, Grant Edwards wrote:

> Why do they go to the extra work of constructing the value for wwwroot
> and then inserting it later?

1) Maybe their tools didn't allow them to use the ./ ../ syntax and to be 
able to debug the site, so they chose the next best thing?

2) The coder didn't know any better?

3) Bad code day?

4) Murphy's law?

5) The code was written using recycled electrons?

(Prrrrrrrrrrrreeeeeeze, don't tell my you're a golfer, and I loath jazz.)

 /Grrr
-- 
          ___                  ___
 (\_--_/)  | _ ._    _|_|_  _   |o _  _ ._
 ( 9  9 )  |(_)| |\/  |_| |(/_  ||(_|(/_|
 stripes are forever - as overripe ferrets

[toc] | [prev] | [standalone]

csiph-web

[OT] absolute vs. relative URI

Contents

#84344 — [OT] absolute vs. relative URI

#84345

#84347

#84349

#84358

#84367

#84386

#84390

#84397

#84369

#84381