Groups | Search | Server Info | Keyboard shortcuts | Login | Register


Groups > comp.misc > #26317

URIs within URIs: google.com/url?q= et al.

From Ivan Shmakov <ivan@siamics.netREMOVE.invalid>
Newsgroups comp.misc, comp.infosystems.www.misc
Subject URIs within URIs: google.com/url?q= et al.
Followup-To comp.misc
Date 2024-12-20 18:42 +0000
Organization Dbus-free station.
Message-ID <7CehOkmKaRKK7ejb@violet.siamics.net> (permalink)
References (3 earlier) <87ed2yjkl8.fsf@tilde.institute> <55db8483-58f0-c3dc-de0b-7f44881fa180@example.net> <87jzcp4pzy.fsf@enoch.nodomain.nowhere> <4875e490-ad30-d644-345f-4a09c1935c6b@example.net> <87frnb52zf.fsf@enoch.nodomain.nowhere>

Cross-posted to 2 groups.

Followups directed to: comp.misc

Show all headers | View raw


>>>>> On 2024-11-28, Mike Spencer wrote:

	[Cross-posting to news:comp.infosystems.www.misc just in case,
	but setting Followup-To: comp.misc still.  Feel free to disregard,
	though; if anything, I'll be monitoring both groups for some
	time for responses.]

 > Here's a curiosity:

 > Google also sends all of your clicks on search results back through
 > Google.  I assume y'all knew that.

 > If you search for (say):

 >   leon "the professional"

 > you get:

 > https://www.google.com/url
 > ?q=https://en.wikipedia.org/wiki/L%25C3%25A9on:_The_Professional
 > &sa=U&ved=2ahUKEwi [snip tracking hentracks/data]

 > Note that the "real" URL which Google proposes to proxy for you
 > contains non-ASCII characters:

 >   en.wikipedia.org/wiki/L%25C3%25A9on:_The_Professional

 > Wikipedia does *not* *have* a page connected to that URL!  But if you
 > click the link and send it back through Google, you reach the right
 > Wikipedia page that *does* exist:

 >   en.wikipedia.org/wiki/Leon:_The_Professional

	And this page clearly states (search for "Redirected from" there)
	that it was reached via an alias.  If you follow the "Article"
	link from there, it'll lead you to .../L%C3%A9on:_The_Professional
	instead, which is the proper URI for that Wikipedia article.

	Think of it.  Suppose that Google has to return something like
	http://example.com/?o=p&q=http://example.net/ as one of the
	results.  Can you just put it after google.com/url?q= directly
	without ambiguity?  You'd get:

http://google.com/url?q=http://example.com/?o=p&q=http://example.net/&...
                                               ^1                    ^2

	Normally, the URI would start after ?q= and go until the first ^1
	occurence of &, but in this case, it'd be actually the second ^2
	that terminates the intended URI.  Naturally, Google avoids it
	by %-encoding the ?s and &s, like:

http://google.com/url?q=http://example.com/%3fo=p%26q=http://example.net/&...

	By the same merit, they need to escape %s themselves, should
	the original URI contain any, so e. g. http://example.com/%d1%8a
	becomes .../url?q=http://example.com/%25d1%258a&... .

	Of course, Google didn't invent any of this: unless I be mistaken,
	that's how HTML <form method="get" />s have worked from the get-go.
	And you /do/ need something like Hello%3f%20%20Anybody%20home%3f
	to put it after /guestbook?comment=.

	FWIW, I tend to use the following Perl bits for %-encoding and
	decoding, respectively:

s {[^0-9A-Za-z/_.-]}{${ \sprintf ("%%%02x", ord ($&)); }}g;
s {%([0-9a-fA-F]{2})}{${ \chr (hex ($1)); }}g;

 > AFAICT, when spidering the net, Google finds the page that *does*
 > exist, modifies it according to (opaque, unknown) rules of orthography
 > and delivers that to you.  When you send that link back through
 > Google, Google silently reverts the imposed orthographic "correction"
 > so that the link goes to an existing page.

 > Isn't the weird?

	There's this bit near the end of the .../Leon:_The_Professional
	(line split for readability):

<script type="application/ld+json">{
"@context":"https:\/\/schema.org",
"@type":"Article",
"name":"L\u00e9on: The Professional",
"url":"https:\/\/en.wikipedia.org\/wiki\/L%C3%A9on:_The_Professional",
[...]

	I'm pretty certain that Google /does/ parse JSON-LD like in the
	above, so I can only presume that when it finds a Web document
	that points to a different "url": in this way, it (sometimes?)
	uses the latter in preference to the original URI.

	I've been thinking of adopting JSON-LD for my own Web pages
	(http://am-1.org/~ivan/ , http://users.am-1.org/~ivan/ , etc.),
	but so far have only used (arguably better readable)
	http://microformats.org/wiki/microformats2 (that I hope search
	engines will at some point add support for.)  Consider, e. g.:

http://pin13.net/mf2/?url=http://am-1.org/~ivan/qinp-2024/112.l-system.en.xhtml

	Note that ?url= above needs the exact same %-treatment as does
	Google's /url?q=.  Naturally, the HTML form at http://pin13.net/mf2/
	will do it for you.  (Or, rather: instruct your Web user agent
	to do so.)

Back to comp.misc | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

terminal only for two weeks Retrograde <fungus@amongus.com.invalid> - 2024-11-25 13:34 +0000
  Re: terminal only for two weeks D <nospam@example.net> - 2024-11-25 22:18 +0100
  Re: terminal only for two weeks Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-11-25 21:52 +0000
    Re: terminal only for two weeks Mike Spencer <mds@bogus.nodomain.nowhere> - 2024-11-26 03:18 -0400
      Re: terminal only for two weeks Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-11-26 21:28 +0000
    Re: terminal only for two weeks yeti <yeti@tilde.institute> - 2024-11-26 09:22 +0042
      Re: terminal only for two weeks Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-11-26 21:24 +0000
    Re: terminal only for two weeks candycanearter07 <candycanearter07@candycanearter07.nomail.afraid> - 2024-11-30 01:20 +0000
      Re: terminal only for two weeks yeti <yeti@tilde.institute> - 2024-11-30 04:22 +0042
        Re: terminal only for two weeks candycanearter07 <candycanearter07@candycanearter07.nomail.afraid> - 2024-12-01 20:40 +0000
      Re: terminal only for two weeks Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-11-30 03:52 +0000
        Re: terminal only for two weeks candycanearter07 <candycanearter07@candycanearter07.nomail.afraid> - 2024-12-01 20:40 +0000
          Re: terminal only for two weeks Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-12-01 23:24 +0000
            Re: terminal only for two weeks candycanearter07 <candycanearter07@candycanearter07.nomail.afraid> - 2024-12-02 02:00 +0000
              Re: terminal only for two weeks Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-12-02 05:41 +0000
  Re: terminal only for two weeks John McCue <jmccue@qball.jmcunx.com> - 2024-11-26 03:13 +0000
    Re: terminal only for two weeks D <nospam@example.net> - 2024-11-26 10:22 +0100
      Re: terminal only for two weeks yeti <yeti@tilde.institute> - 2024-11-26 12:15 +0042
        Re: terminal only for two weeks D <nospam@example.net> - 2024-11-26 16:36 +0100
          Re: terminal only for two weeks not@telling.you.invalid (Computer Nerd Kev) - 2024-11-27 07:52 +1000
            Re: terminal only for two weeks D <nospam@example.net> - 2024-11-27 10:51 +0100
              Re: terminal only for two weeks not@telling.you.invalid (Computer Nerd Kev) - 2024-11-28 06:44 +1000
                Re: terminal only for two weeks yeti <yeti@tilde.institute> - 2024-11-28 05:54 +0042
                Re: terminal only for two weeks D <nospam@example.net> - 2024-11-28 10:52 +0100
                Re: terminal only for two weeks not@telling.you.invalid (Computer Nerd Kev) - 2024-11-29 06:17 +1000
                Re: terminal only for two weeks D <nospam@example.net> - 2024-11-28 22:05 +0100
                Re: terminal only for two weeks yeti <yeti@tilde.institute> - 2024-11-29 02:19 +0042
                Re: terminal only for two weeks D <nospam@example.net> - 2024-11-29 10:38 +0100
                Re: terminal only for two weeks D <nospam@example.net> - 2024-11-29 22:39 +0100
          Re: terminal only for two weeks Mike Spencer <mds@bogus.nodomain.nowhere> - 2024-11-26 17:57 -0400
            Re: terminal only for two weeks D <nospam@example.net> - 2024-11-27 10:54 +0100
              Re: terminal only for two weeks Mike Spencer <mds@bogus.nodomain.nowhere> - 2024-11-28 01:41 -0400
                Re: terminal only for two weeks Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-11-28 06:42 +0000
                Re: terminal only for two weeks D <nospam@example.net> - 2024-11-28 10:56 +0100
                URIs within URIs: google.com/url?q= et al. Ivan Shmakov <ivan@siamics.netREMOVE.invalid> - 2024-12-20 18:42 +0000
                Re: URIs within URIs: google.com/url?q= et al. Andy Burns <usenet@andyburns.uk> - 2024-12-20 19:03 +0000
                Re: URIs within URIs: google.com/url?q= et al. Mike Spencer <mds@bogus.nodomain.nowhere> - 2024-12-22 01:39 -0400
            Re: terminal only for two weeks Oregonian Haruspex <no_email@invalid.invalid> - 2024-12-04 06:11 +0000
              Re: terminal only for two weeks Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-12-04 06:42 +0000
                Re: terminal only for two weeks candycanearter07 <candycanearter07@candycanearter07.nomail.afraid> - 2024-12-04 14:30 +0000
                Re: terminal only for two weeks Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-12-05 01:46 +0000
                Re: terminal only for two weeks not@telling.you.invalid (Computer Nerd Kev) - 2024-12-08 07:52 +1000
                Re: terminal only for two weeks root <NoEMail@home.org> - 2024-12-08 14:11 +0000
                Re: terminal only for two weeks Bozo User <anthk@disroot.org> - 2025-01-12 23:01 +0000
                Re: terminal only for two weeks D <nospam@example.net> - 2025-01-13 10:46 +0100
                Re: terminal only for two weeks not@telling.you.invalid (Computer Nerd Kev) - 2025-01-14 06:52 +1000
                Re: terminal only for two weeks D <nospam@example.net> - 2025-01-14 18:54 +0100
                web Ivan Shmakov <ivan@siamics.netREMOVE.invalid> - 2025-01-16 07:55 +0000
                Re: web not@telling.you.invalid (Computer Nerd Kev) - 2025-01-17 07:10 +1000
                Re: web yeti <yeti@tilde.institute> - 2025-01-17 04:58 +0042
                Re: web Ivan Shmakov <ivan@siamics.netREMOVE.invalid> - 2025-01-18 14:05 +0000
                Re: web not@telling.you.invalid (Computer Nerd Kev) - 2025-01-19 09:09 +1000
                Re: web candycanearter07 <candycanearter07@candycanearter07.nomail.afraid> - 2025-01-29 20:10 +0000
                Re: web Lawrence D'Oliveiro <ldo@nz.invalid> - 2025-02-04 21:42 +0000
                Re: web Ben Collver <bencollver@tilde.pink> - 2025-01-19 14:47 +0000
                Re: web yeti <yeti@tilde.institute> - 2025-01-19 16:14 +0042
                Re: web snipeco.2@gmail.com (Sn!pe) - 2025-01-19 16:05 +0000
                Re: web Ivan Shmakov <ivan@siamics.netREMOVE.invalid> - 2025-01-19 19:15 +0000
                Re: web Ben Collver <bencollver@tilde.pink> - 2025-01-20 15:37 +0000
                Re: web Ivan Shmakov <ivan@siamics.netREMOVE.invalid> - 2025-01-24 18:45 +0000
                Re: web news@zzo38computer.org.invalid - 2025-01-20 11:23 -0800
                Re: web Dave Yeo <dave.r.yeo@gmail.com> - 2025-01-16 18:04 -0800
                Re: terminal only for two weeks Bozo User <anthk@disroot.org> - 2025-01-12 23:01 +0000
                Re: terminal only for two weeks yeti <yeti@tilde.institute> - 2024-12-05 06:34 +0042
                Re: terminal only for two weeks yeti <yeti@tilde.institute> - 2025-01-16 11:42 +0042
  Re: terminal only for two weeks Anssi Saari <anssi.saari@usenet.mail.kapsi.fi> - 2024-11-28 12:45 +0200
  Re: terminal only for two weeks Bozo User <anthk@disroot.org> - 2025-01-12 23:01 +0000
    Re: terminal only for two weeks Salvador Mirzo <smirzo@example.com> - 2025-01-12 22:03 -0300
      Re: terminal only for two weeks D <nospam@example.net> - 2025-01-13 10:48 +0100
        Re: terminal only for two weeks Salvador Mirzo <smirzo@example.com> - 2025-01-13 16:24 -0300
          Re: terminal only for two weeks D <nospam@example.net> - 2025-01-14 18:50 +0100
          Re: terminal only for two weeks Salvador Mirzo <smirzo@example.com> - 2025-01-15 22:10 -0300
            Re: terminal only for two weeks Rich <rich@example.invalid> - 2025-01-16 04:15 +0000
            Re: terminal only for two weeks Computer Nerd Kev <not@telling.you.invalid> - 2025-01-16 15:58 +1000
            Re: terminal only for two weeks Lawrence D'Oliveiro <ldo@nz.invalid> - 2025-01-21 05:31 +0000
            Re: terminal only for two weeks Ivan Shmakov <ivan@siamics.netREMOVE.invalid> - 2025-01-23 19:33 +0000
              Re: terminal only for two weeks Salvador Mirzo <smirzo@example.com> - 2025-02-12 13:12 -0300
                Re: terminal only for two weeks Jerry Peters <jerry@example.invalid> - 2025-02-16 20:55 +0000
                Re: terminal only for two weeks Salvador Mirzo <smirzo@example.com> - 2025-02-16 22:54 -0300
                Re: terminal only for two weeks Salvador Mirzo <smirzo@example.com> - 2025-02-16 22:56 -0300
                Re: terminal only for two weeks Lawrence D'Oliveiro <ldo@nz.invalid> - 2025-02-17 03:41 +0000
                Re: terminal only for two weeks Salvador Mirzo <smirzo@example.com> - 2025-02-19 13:02 -0300
                Re: terminal only for two weeks kludge@panix.com (Scott Dorsey) - 2025-02-17 22:18 +0000
                Re: terminal only for two weeks Salvador Mirzo <smirzo@example.com> - 2025-02-19 13:03 -0300

csiph-web