Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Chris Angelico Newsgroups: comp.lang.python Subject: Re: WP-A: A New URL Shortener Date: Wed, 16 Mar 2016 10:55:52 +1100 Lines: 91 Message-ID: References: <17785955.P1rOlOtRcj@PointedEars.de> <1637296.ljfaO6m7tu@PointedEars.de> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: news.uni-berlin.de 4JbAdyh8XThFDTjszk+00QQnRpDNBrbt+OTCSxhWqzKQ== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.005 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; '16,': 0.03; 'that?': 0.05; 'convention.': 0.07; 'cc:addr:python-list': 0.09; '*is*': 0.09; 'aliases': 0.09; 'compression': 0.09; 'creator': 0.09; 'domains,': 0.09; 'length.': 0.09; 'oh,': 0.09; 'through,': 0.09; 'exception': 0.13; 'wed,': 0.15; '(but': 0.15; "hasn't": 0.15; '(either': 0.16; '(http)': 0.16; '2016': 0.16; '[2].': 0.16; 'arbitrarily': 0.16; 'arbitrarily.': 0.16; 'count.': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'length,': 0.16; 'manageable': 0.16; 'meanwhile,': 0.16; 'overnight.': 0.16; 'quoted': 0.16; 'readable': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'reversed': 0.16; 'shorten': 0.16; 'subject:URL': 0.16; 'usenet': 0.16; 'wrote:': 0.16; 'bytes': 0.18; 'typical': 0.18; '>>>': 0.20; 'all,': 0.20; 'cc:2**0': 0.20; 'cc:addr:python.org': 0.20; 'context.': 0.22; 'either.': 0.22; 'ones.': 0.22; 'am,': 0.23; '(you': 0.23; 'ease': 0.23; 'long,': 0.24; 'somewhere': 0.24; 'header:In-Reply-To:1': 0.24; 'discussion': 0.24; 'example': 0.26; 'rest': 0.26; 'chris': 0.26; 'room': 0.27; 'message- id:@mail.gmail.com': 0.27; 'to?': 0.27; 'correct': 0.28; "skip:' 10": 0.28; 'long.': 0.29; 'second,': 0.29; 'server.': 0.30; 'code': 0.30; 'becomes': 0.30; "can't": 0.32; 'software,': 0.32; 'related': 0.32; 'generally': 0.32; 'maybe': 0.33; 'problem': 0.33; 'common': 0.33; 'instead,': 0.33; 'shorter': 0.33; "i'll": 0.33; 'covered': 0.34; 'server': 0.34; 'received:google.com': 0.35; 'so,': 0.35; 'ones': 0.35; 'text': 0.35; 'path': 0.35; 'replace': 0.35; "isn't": 0.35; 'according': 0.36; 'community': 0.36; 'but': 0.36; 'too': 0.36; 'should': 0.36; 'instead': 0.36; 'there': 0.36; 'lines': 0.36; 'received:209.85': 0.36; 'possible': 0.36; 'urls': 0.36; 'subject:: ': 0.37; 'really': 0.37; 'link.': 0.37; 'thought': 0.37; 'version': 0.38; 'received:209': 0.38; 'several': 0.38; 'someone': 0.38; 'why': 0.39; 'data': 0.39; 'sure': 0.39; 'enough': 0.39; 'subject:-': 0.39; 'rather': 0.39; 'where': 0.40; 'still': 0.40; 'space': 0.40; 'your': 0.60; 'clients': 0.61; 'needing': 0.63; 'please,': 0.63; 'thomas': 0.63; 'information': 0.63; 'places': 0.64; 'capable': 0.65; 'limit': 0.65; 'mar': 0.65; 'soon': 0.65; "they're": 0.66; 'virtually': 0.66; 'cards': 0.67; 'cut': 0.67; 'media': 0.67; 'reply': 0.68; 'business': 0.69; 'skip:\xe2 10': 0.70; 'duty': 0.72; '*no': 0.84; 'battle': 0.84; 'chrisa': 0.84; 'compression,': 0.84; "else's": 0.84; 'hand.': 0.84; 'inexpensive.': 0.84; 'limits,': 0.84; 'quotations': 0.84; 'remained': 0.84; 'replacements': 0.84; "that'll": 0.84; 'thoughtful': 0.84; 'canonical': 0.91; 'urls,': 0.91; 'fight': 0.97 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-transfer-encoding; bh=ivmeT66Loo3aPariQ+qRBJ8g3tiVZhlnyStOFIpBL0Y=; b=Bs68tkspkTg65jsmU7hlXsrf92+N2N+/Wm1dmurnWq9pNqgneR0gJ54JXwPDDhVjYg JJSHhlTAEXLe+xlr0EZpcw8hIfiJixbPo8d9K4tBub9ranhZQg1Oo8EKNbNiEikLaZ0A zYU6ZS4h0mehpdl4cYVidLOPWIeXcunzJGZbKsMi+X3thK/1xDj2fNjvjqdEuQcrF8lB pHBojFnADkuGRa/XtPMeyMopZACSqjHBg786oNDbCy+KtZ+rShzfLYnVRWAWcb08oGMV Ar7OvGbipypJT9TssRyNLuqdBsgnHyQiKxo0s9f5LZHD/CQs7vbyLMVwyyVbru+S9IDN BUSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-transfer-encoding; bh=ivmeT66Loo3aPariQ+qRBJ8g3tiVZhlnyStOFIpBL0Y=; b=OZhQHwa3KTrbVBeyzXT1wahekuG6c5uNzEyJBecpYP/UmZtS1XH60PZ78uGYl0hw0w YgKAmS7QaF0LcbAMtayikvsFFhBWOPfXyQ2/BC4IUTqlDDrrUapYOQA0X3ZED7avsm/c lAUUbqBs3kfW3OB7WKPNxw4JuDzS96GveODAVpJApR3kDb9cRcRa9bZ7tEchojFWP9CM uP9FijY0/C1UlIuqyd2zFCY5QNF9UV9pT1fQOu3JPa2sY6kmhsDCIhdIaFS7TOOyDHHK De4xHg7PmBIy5oEf7Y5lHJxY8dyZM9JPAdZE7aeNpZaYz+kkpv0J4engnzneHK1DRrdi QnfQ== X-Gm-Message-State: AD7BkJIBT+XHUoFo2TzBLCgLULo09s0yYKoCW5Ql+MqeU4MNI5WtWZXcc20RzO6MeXtcjoXMzm9XbB/8ihmg+Q== X-Received: by 10.112.119.144 with SMTP id ku16mr267422lbb.6.1458086152939; Tue, 15 Mar 2016 16:55:52 -0700 (PDT) In-Reply-To: <1637296.ljfaO6m7tu@PointedEars.de> X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:104969 On Wed, Mar 16, 2016 at 10:38 AM, Thomas 'PointedEars' Lahn wrote: > Chris Angelico wrote: > >> On Wed, Mar 16, 2016 at 9:53 AM, Thomas 'PointedEars' Lahn >> wrote: > >>> [=E2=80=A6] I cannot be sure because I have not thought this through, b= ut with > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >>> aliases for common second-level domains, and with text compression, it >>> should be possible to do this without a database. >> >> How? If you shorten URLs, you have to be able to reconstruct the long >> ones. Compression can't do that to arbitrary lengths. Somewhere there >> needs to be the rest of the information. > > First of all, you quoted me out of context. I trimmed the context. You got a problem with that? > Second, do you even read what you reply to? See the markings above. Instead of thinking about URL shorteners specifically, think generally about information theory. You cannot, fundamentally, shorten all URLs arbitrarily. There just isn't enough room to store the information. > And as for second-level domains, consider for example =E2=80=9Ct.c=E2=80= =9D instead of > =E2=80=9Ctwitter.com=E2=80=9D as part of the short URI. That'll work only for the ones that you code in specifically, and that's only shortening your URL by 8 characters. A typical URL needing shortening is over 80 characters - maybe several hundred. You need to cut that down to a manageable length. That fundamentally cannot be reversed without readding information. >>> And with the exception of Twitter-ish sites that place a limit on messa= ge >>> length, there really is *no need* for shorter URIs nowadays. (HTTP) >>> clients and servers are capable of processing really long ones [1]; >>> electronic communications media and related software, too [2]. And dat= a >>> storage space as well as data transmission has become exceptionally >>> inexpensive. A few less bytes there do not count. >> >> There are many places where there are limits (hard or soft) on message >> lengths. Some of us still use MUDs and 80-character line limits. > > See above. Covered by [2]. Unrelated. Not covered by that link. Go use a MUD some time. > But speaking of length limits, the lines in your postings are too long, > according to Usenet convention. I had to correct the quotations so that > they remained readable when word-wrapped. Oh, so you'd rather the lines be cut to... I dunno, 80 characters? Might be a good reason to use a URL shortener. >> Business cards or other printed media need to be transcribed by hand. >> Dictation of URLs becomes virtually impossible when they're >> arbitrarily long. > > (You are not reading at all, are you?) This is covered by that: > >>> Instead, there *is* a need for *concise*, *semantic* URIs that Web >>> (service) users can *easily* *remember*. It is the duty of the origina= l >>> Web authors=E2=88=95developers to make sure that there are, and I think= that no >>> kind of automation is going to ease or replace thoughtful path design >>> anytime soon (but please, prove me wrong): >> >> Sure...... if you control the destination server. What if you're >> engaging in scholarly discussion about someone else's content? You >> can't change the canonical URLs, and you can't simply copy their >> content to your own server (either for licensing reasons or to >> guarantee that the official version hasn't been tampered with). > > That is why I said it is the duty of the original authors/developers. It= is > a community effort, and it is not going to happen overnight. But evading > the problem with unreliable replacements such as =E2=80=9Cshort URLs=E2= =80=9D is not going > to solve it either. So, you can go fight an unwinnable battle against literally every web creator in the world. Meanwhile, I'll keep on using URL shorteners. ChrisA