Groups | Search | Server Info | Login | Register
| From | Mike Spencer <mds@bogus.nodomain.nowhere> |
|---|---|
| Newsgroups | comp.misc |
| Subject | Re: URIs within URIs: google.com/url?q= et al. |
| Date | 2024-12-22 01:39 -0400 |
| Organization | Bridgewater Institute for Advanced Study - Blacksmith Shop |
| Message-ID | <87wmfsxoms.fsf@enoch.nodomain.nowhere> (permalink) |
| References | (4 earlier) <55db8483-58f0-c3dc-de0b-7f44881fa180@example.net> <87jzcp4pzy.fsf@enoch.nodomain.nowhere> <4875e490-ad30-d644-345f-4a09c1935c6b@example.net> <87frnb52zf.fsf@enoch.nodomain.nowhere> <7CehOkmKaRKK7ejb@violet.siamics.net> |
[ Top-posting because this is brief and adds no new interspersed text...]
Thank you very much, Ivan, for redirecting ;-) my lagging attention to
the %25 hex encoded chars prefixed to the already hex encoded chars
and Google's pages/methods for dealing with them. Your detailed reply
is much appreciated.
I'll read your comments more carefully and see if I can't tweak my
Perl script, the behavior of which led to my original comments on
this, to Do The Right Thing.
[ Previous exchange left unaltered for the record.]
Ivan Shmakov <ivan@siamics.netREMOVE.invalid> writes:
> >>>>> On 2024-11-28, Mike Spencer wrote:
>
> [Cross-posting to news:comp.infosystems.www.misc just in case,
> but setting Followup-To: comp.misc still. Feel free to disregard,
> though; if anything, I'll be monitoring both groups for some
> time for responses.]
>
> > Here's a curiosity:
>
> > Google also sends all of your clicks on search results back through
> > Google. I assume y'all knew that.
>
> > If you search for (say):
>
> > leon "the professional"
>
> > you get:
>
> > https://www.google.com/url
> > ?q=https://en.wikipedia.org/wiki/L%25C3%25A9on:_The_Professional
> > &sa=U&ved=2ahUKEwi [snip tracking hentracks/data]
>
> > Note that the "real" URL which Google proposes to proxy for you
> > contains non-ASCII characters:
>
> > en.wikipedia.org/wiki/L%25C3%25A9on:_The_Professional
>
> > Wikipedia does *not* *have* a page connected to that URL! But if you
> > click the link and send it back through Google, you reach the right
> > Wikipedia page that *does* exist:
>
> > en.wikipedia.org/wiki/Leon:_The_Professional
>
> And this page clearly states (search for "Redirected from" there)
> that it was reached via an alias. If you follow the "Article"
> link from there, it'll lead you to .../L%C3%A9on:_The_Professional
> instead, which is the proper URI for that Wikipedia article.
>
> Think of it. Suppose that Google has to return something like
> http://example.com/?o=p&q=http://example.net/ as one of the
> results. Can you just put it after google.com/url?q= directly
> without ambiguity? You'd get:
>
> http://google.com/url?q=http://example.com/?o=p&q=http://example.net/&...
> ^1 ^2
>
> Normally, the URI would start after ?q= and go until the first ^1
> occurence of &, but in this case, it'd be actually the second ^2
> that terminates the intended URI. Naturally, Google avoids it
> by %-encoding the ?s and &s, like:
>
> http://google.com/url?q=http://example.com/%3fo=p%26q=http://example.net/&...
>
> By the same merit, they need to escape %s themselves, should
> the original URI contain any, so e. g. http://example.com/%d1%8a
> becomes .../url?q=http://example.com/%25d1%258a&... .
>
> Of course, Google didn't invent any of this: unless I be mistaken,
> that's how HTML <form method="get" />s have worked from the get-go.
> And you /do/ need something like Hello%3f%20%20Anybody%20home%3f
> to put it after /guestbook?comment=.
>
> FWIW, I tend to use the following Perl bits for %-encoding and
> decoding, respectively:
>
> s {[^0-9A-Za-z/_.-]}{${ \sprintf ("%%%02x", ord ($&)); }}g;
> s {%([0-9a-fA-F]{2})}{${ \chr (hex ($1)); }}g;
>
> > AFAICT, when spidering the net, Google finds the page that *does*
> > exist, modifies it according to (opaque, unknown) rules of orthography
> > and delivers that to you. When you send that link back through
> > Google, Google silently reverts the imposed orthographic "correction"
> > so that the link goes to an existing page.
>
> > Isn't the weird?
>
> There's this bit near the end of the .../Leon:_The_Professional
> (line split for readability):
>
> <script type="application/ld+json">{
> "@context":"https:\/\/schema.org",
> "@type":"Article",
> "name":"L\u00e9on: The Professional",
> "url":"https:\/\/en.wikipedia.org\/wiki\/L%C3%A9on:_The_Professional",
> [...]
>
> I'm pretty certain that Google /does/ parse JSON-LD like in the
> above, so I can only presume that when it finds a Web document
> that points to a different "url": in this way, it (sometimes?)
> uses the latter in preference to the original URI.
>
> I've been thinking of adopting JSON-LD for my own Web pages
> (http://am-1.org/~ivan/ , http://users.am-1.org/~ivan/ , etc.),
> but so far have only used (arguably better readable)
> http://microformats.org/wiki/microformats2 (that I hope search
> engines will at some point add support for.) Consider, e. g.:
>
> http://pin13.net/mf2/?url=http://am-1.org/~ivan/qinp-2024/112.l-system.en.xhtml
>
> Note that ?url= above needs the exact same %-treatment as does
> Google's /url?q=. Naturally, the HTML form at http://pin13.net/mf2/
> will do it for you. (Or, rather: instruct your Web user agent
> to do so.)
--
Mike Spencer Nova Scotia, Canada
Back to comp.misc | Previous | Next — Previous in thread | Next in thread | Find similar
terminal only for two weeks Retrograde <fungus@amongus.com.invalid> - 2024-11-25 13:34 +0000
Re: terminal only for two weeks D <nospam@example.net> - 2024-11-25 22:18 +0100
Re: terminal only for two weeks Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-11-25 21:52 +0000
Re: terminal only for two weeks Mike Spencer <mds@bogus.nodomain.nowhere> - 2024-11-26 03:18 -0400
Re: terminal only for two weeks Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-11-26 21:28 +0000
Re: terminal only for two weeks yeti <yeti@tilde.institute> - 2024-11-26 09:22 +0042
Re: terminal only for two weeks Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-11-26 21:24 +0000
Re: terminal only for two weeks candycanearter07 <candycanearter07@candycanearter07.nomail.afraid> - 2024-11-30 01:20 +0000
Re: terminal only for two weeks yeti <yeti@tilde.institute> - 2024-11-30 04:22 +0042
Re: terminal only for two weeks candycanearter07 <candycanearter07@candycanearter07.nomail.afraid> - 2024-12-01 20:40 +0000
Re: terminal only for two weeks Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-11-30 03:52 +0000
Re: terminal only for two weeks candycanearter07 <candycanearter07@candycanearter07.nomail.afraid> - 2024-12-01 20:40 +0000
Re: terminal only for two weeks Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-12-01 23:24 +0000
Re: terminal only for two weeks candycanearter07 <candycanearter07@candycanearter07.nomail.afraid> - 2024-12-02 02:00 +0000
Re: terminal only for two weeks Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-12-02 05:41 +0000
Re: terminal only for two weeks John McCue <jmccue@qball.jmcunx.com> - 2024-11-26 03:13 +0000
Re: terminal only for two weeks D <nospam@example.net> - 2024-11-26 10:22 +0100
Re: terminal only for two weeks yeti <yeti@tilde.institute> - 2024-11-26 12:15 +0042
Re: terminal only for two weeks D <nospam@example.net> - 2024-11-26 16:36 +0100
Re: terminal only for two weeks not@telling.you.invalid (Computer Nerd Kev) - 2024-11-27 07:52 +1000
Re: terminal only for two weeks D <nospam@example.net> - 2024-11-27 10:51 +0100
Re: terminal only for two weeks not@telling.you.invalid (Computer Nerd Kev) - 2024-11-28 06:44 +1000
Re: terminal only for two weeks yeti <yeti@tilde.institute> - 2024-11-28 05:54 +0042
Re: terminal only for two weeks D <nospam@example.net> - 2024-11-28 10:52 +0100
Re: terminal only for two weeks not@telling.you.invalid (Computer Nerd Kev) - 2024-11-29 06:17 +1000
Re: terminal only for two weeks D <nospam@example.net> - 2024-11-28 22:05 +0100
Re: terminal only for two weeks yeti <yeti@tilde.institute> - 2024-11-29 02:19 +0042
Re: terminal only for two weeks D <nospam@example.net> - 2024-11-29 10:38 +0100
Re: terminal only for two weeks D <nospam@example.net> - 2024-11-29 22:39 +0100
Re: terminal only for two weeks Mike Spencer <mds@bogus.nodomain.nowhere> - 2024-11-26 17:57 -0400
Re: terminal only for two weeks D <nospam@example.net> - 2024-11-27 10:54 +0100
Re: terminal only for two weeks Mike Spencer <mds@bogus.nodomain.nowhere> - 2024-11-28 01:41 -0400
Re: terminal only for two weeks Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-11-28 06:42 +0000
Re: terminal only for two weeks D <nospam@example.net> - 2024-11-28 10:56 +0100
URIs within URIs: google.com/url?q= et al. Ivan Shmakov <ivan@siamics.netREMOVE.invalid> - 2024-12-20 18:42 +0000
Re: URIs within URIs: google.com/url?q= et al. Andy Burns <usenet@andyburns.uk> - 2024-12-20 19:03 +0000
Re: URIs within URIs: google.com/url?q= et al. Mike Spencer <mds@bogus.nodomain.nowhere> - 2024-12-22 01:39 -0400
Re: terminal only for two weeks Oregonian Haruspex <no_email@invalid.invalid> - 2024-12-04 06:11 +0000
Re: terminal only for two weeks Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-12-04 06:42 +0000
Re: terminal only for two weeks candycanearter07 <candycanearter07@candycanearter07.nomail.afraid> - 2024-12-04 14:30 +0000
Re: terminal only for two weeks Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-12-05 01:46 +0000
Re: terminal only for two weeks not@telling.you.invalid (Computer Nerd Kev) - 2024-12-08 07:52 +1000
Re: terminal only for two weeks root <NoEMail@home.org> - 2024-12-08 14:11 +0000
Re: terminal only for two weeks Bozo User <anthk@disroot.org> - 2025-01-12 23:01 +0000
Re: terminal only for two weeks D <nospam@example.net> - 2025-01-13 10:46 +0100
Re: terminal only for two weeks not@telling.you.invalid (Computer Nerd Kev) - 2025-01-14 06:52 +1000
Re: terminal only for two weeks D <nospam@example.net> - 2025-01-14 18:54 +0100
web Ivan Shmakov <ivan@siamics.netREMOVE.invalid> - 2025-01-16 07:55 +0000
Re: web not@telling.you.invalid (Computer Nerd Kev) - 2025-01-17 07:10 +1000
Re: web yeti <yeti@tilde.institute> - 2025-01-17 04:58 +0042
Re: web Ivan Shmakov <ivan@siamics.netREMOVE.invalid> - 2025-01-18 14:05 +0000
Re: web not@telling.you.invalid (Computer Nerd Kev) - 2025-01-19 09:09 +1000
Re: web candycanearter07 <candycanearter07@candycanearter07.nomail.afraid> - 2025-01-29 20:10 +0000
Re: web Lawrence D'Oliveiro <ldo@nz.invalid> - 2025-02-04 21:42 +0000
Re: web Ben Collver <bencollver@tilde.pink> - 2025-01-19 14:47 +0000
Re: web yeti <yeti@tilde.institute> - 2025-01-19 16:14 +0042
Re: web snipeco.2@gmail.com (Sn!pe) - 2025-01-19 16:05 +0000
Re: web Ivan Shmakov <ivan@siamics.netREMOVE.invalid> - 2025-01-19 19:15 +0000
Re: web Ben Collver <bencollver@tilde.pink> - 2025-01-20 15:37 +0000
Re: web Ivan Shmakov <ivan@siamics.netREMOVE.invalid> - 2025-01-24 18:45 +0000
Re: web news@zzo38computer.org.invalid - 2025-01-20 11:23 -0800
Re: web Dave Yeo <dave.r.yeo@gmail.com> - 2025-01-16 18:04 -0800
Re: terminal only for two weeks Bozo User <anthk@disroot.org> - 2025-01-12 23:01 +0000
Re: terminal only for two weeks yeti <yeti@tilde.institute> - 2024-12-05 06:34 +0042
Re: terminal only for two weeks yeti <yeti@tilde.institute> - 2025-01-16 11:42 +0042
Re: terminal only for two weeks Anssi Saari <anssi.saari@usenet.mail.kapsi.fi> - 2024-11-28 12:45 +0200
Re: terminal only for two weeks Bozo User <anthk@disroot.org> - 2025-01-12 23:01 +0000
Re: terminal only for two weeks Salvador Mirzo <smirzo@example.com> - 2025-01-12 22:03 -0300
Re: terminal only for two weeks D <nospam@example.net> - 2025-01-13 10:48 +0100
Re: terminal only for two weeks Salvador Mirzo <smirzo@example.com> - 2025-01-13 16:24 -0300
Re: terminal only for two weeks D <nospam@example.net> - 2025-01-14 18:50 +0100
Re: terminal only for two weeks Salvador Mirzo <smirzo@example.com> - 2025-01-15 22:10 -0300
Re: terminal only for two weeks Rich <rich@example.invalid> - 2025-01-16 04:15 +0000
Re: terminal only for two weeks Computer Nerd Kev <not@telling.you.invalid> - 2025-01-16 15:58 +1000
Re: terminal only for two weeks Lawrence D'Oliveiro <ldo@nz.invalid> - 2025-01-21 05:31 +0000
Re: terminal only for two weeks Ivan Shmakov <ivan@siamics.netREMOVE.invalid> - 2025-01-23 19:33 +0000
Re: terminal only for two weeks Salvador Mirzo <smirzo@example.com> - 2025-02-12 13:12 -0300
Re: terminal only for two weeks Jerry Peters <jerry@example.invalid> - 2025-02-16 20:55 +0000
Re: terminal only for two weeks Salvador Mirzo <smirzo@example.com> - 2025-02-16 22:54 -0300
Re: terminal only for two weeks Salvador Mirzo <smirzo@example.com> - 2025-02-16 22:56 -0300
Re: terminal only for two weeks Lawrence D'Oliveiro <ldo@nz.invalid> - 2025-02-17 03:41 +0000
Re: terminal only for two weeks Salvador Mirzo <smirzo@example.com> - 2025-02-19 13:02 -0300
Re: terminal only for two weeks kludge@panix.com (Scott Dorsey) - 2025-02-17 22:18 +0000
Re: terminal only for two weeks Salvador Mirzo <smirzo@example.com> - 2025-02-19 13:03 -0300
csiph-web