Path: csiph.com!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail From: Gene Heskett Newsgroups: comp.lang.python Subject: Re: WP-A: A New URL Shortener Date: Tue, 15 Mar 2016 22:34:18 -0400 Lines: 129 Message-ID: References: <1637296.ljfaO6m7tu@PointedEars.de> Mime-Version: 1.0 Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Trace: news.uni-berlin.de 9DU1Ar+Y0FUntOdblxCzHQsiGSW9HFWP3MJp5BPavAcQ== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; '16,': 0.03; 'true,': 0.04; 'that?': 0.05; 'convention.': 0.07; '*is*': 0.09; 'aliases': 0.09; 'compression': 0.09; 'creator': 0.09; 'domains,': 0.09; 'length.': 0.09; 'oh,': 0.09; 'polish': 0.09; 'posting.': 0.09; 'restored': 0.09; 'through,': 0.09; 'exception': 0.13; 'wed,': 0.15; '(but': 0.15; "hasn't": 0.15; '(either': 0.16; '(http)': 0.16; '2016': 0.16; '[2].': 0.16; 'arbitrarily': 0.16; 'arbitrarily.': 0.16; 'carriage': 0.16; 'count.': 0.16; 'length,': 0.16; 'manageable': 0.16; 'meanwhile,': 0.16; 'otoh,': 0.16; 'overnight.': 0.16; 'picnic': 0.16; 'quoted': 0.16; 'readable': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'reversed': 0.16; 'shorten': 0.16; 'soap,': 0.16; 'subject:URL': 0.16; 'url.': 0.16; 'usenet': 0.16; 'wrote:': 0.16; 'bytes': 0.18; 'typical': 0.18; 'working.': 0.18; '>>>': 0.20; 'all,': 0.20; 'context.': 0.22; 'either.': 0.22; 'fixing': 0.22; 'ones.': 0.22; 'recognize': 0.22; 'cheers,': 0.22; 'am,': 0.23; '(you': 0.23; 'demonstrate': 0.23; 'ease': 0.23; 'long,': 0.24; 'somewhere': 0.24; 'header:In-Reply-To:1': 0.24; 'discussion': 0.24; 'header:User-Agent:1': 0.26; 'example': 0.26; 'rest': 0.26; 'chris': 0.26; 'room': 0.27; 'least': 0.27; 'to?': 0.27; 'correct': 0.28; "skip:' 10": 0.28; 'feeds': 0.29; 'long.': 0.29; 'second,': 0.29; 'character': 0.29; 'server.': 0.30; 'code': 0.30; 'becomes': 0.30; 'putting': 0.30; 'probably': 0.31; 'post': 0.31; "can't": 0.32; 'software,': 0.32; 'related': 0.32; 'generally': 0.32; 'maybe': 0.33; 'problem': 0.33; 'common': 0.33; 'hell': 0.33; 'instead,': 0.33; 'shorter': 0.33; 'wrap': 0.33; "i'll": 0.33; 'charge': 0.33; 'covered': 0.34; 'server': 0.34; 'so,': 0.35; 'ones': 0.35; 'could': 0.35; 'text': 0.35; 'exist': 0.35; 'path': 0.35; 'replace': 0.35; "isn't": 0.35; 'problem.': 0.35; 'according': 0.36; 'community': 0.36; 'but': 0.36; 'too': 0.36; 'should': 0.36; 'instead': 0.36; 'places': 0.64; 'capable': 0.65; 'limit': 0.65; 'mar': 0.65; 'soon': 0.65; 'practice.': 0.66; "they're": 0.66; 'virtually': 0.66; 'below.': 0.66; 'cards': 0.67; 'cut': 0.67; 'media': 0.67; 'reply': 0.68; 'business': 0.69; 'internet': 0.70; 'skip:\xe2 10': 0.70; 'teach': 0.70; 'duty': 0.72; 'received:204': 0.75; '250': 0.76; 'click': 0.76; '*no': 0.84; 'bandwidth,': 0.84; 'battle': 0.84; 'chrisa': 0.84; 'compression,': 0.84; "else's": 0.84; 'hand.': 0.84; 'inexpensive.': 0.84; 'limits,': 0.84; 'quotations': 0.84; 'remained': 0.84; 'replacements': 0.84; "that'll": 0.84; 'thoughtful': 0.84; 'canonical': 0.91; 'urls,': 0.91; 'fight': 0.97 X-Spam-Status: No, score=1.1 required=5.0 X-Spam-Level: + User-Agent: KMail/1.9.10 (enterprise35 0.20100827.1168748) In-Reply-To: X-KMail-QuotePrefix: > Content-Disposition: inline X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:104981 On Tuesday 15 March 2016 19:55:52 Chris Angelico wrote: > On Wed, Mar 16, 2016 at 10:38 AM, Thomas 'PointedEars' Lahn > > wrote: > > Chris Angelico wrote: > >> On Wed, Mar 16, 2016 at 9:53 AM, Thomas 'PointedEars' Lahn > >> > >> wrote: > >>> [=E2=80=A6] I cannot be sure because I have not thought this through,= but > >>> with > > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > >>> aliases for common second-level domains, and with text > >>> compression, it should be possible to do this without a database. > >> > >> How? If you shorten URLs, you have to be able to reconstruct the > >> long ones. Compression can't do that to arbitrary lengths. > >> Somewhere there needs to be the rest of the information. > > > > First of all, you quoted me out of context. > > I trimmed the context. You got a problem with that? > > > Second, do you even read what you reply to? See the markings above. > > Instead of thinking about URL shorteners specifically, think generally > about information theory. You cannot, fundamentally, shorten all URLs > arbitrarily. There just isn't enough room to store the information. > > > And as for second-level domains, consider for example =E2=80=9Ct.c=E2= =80=9D instead > > of =E2=80=9Ctwitter.com=E2=80=9D as part of the short URI. > > That'll work only for the ones that you code in specifically, and > that's only shortening your URL by 8 characters. A typical URL needing > shortening is over 80 characters - maybe several hundred. You need to > cut that down to a manageable length. That fundamentally cannot be > reversed without readding information. And I submit that putting someone in charge of the drives organization,=20 and the database on that drive that the url has to dig thru, can make a=20 huge difference in the length of the resultant url. > >>> And with the exception of Twitter-ish sites that place a limit on > >>> message length, there really is *no need* for shorter URIs > >>> nowadays. (HTTP) clients and servers are capable of processing > >>> really long ones [1]; electronic communications media and related > >>> software, too [2]. And data storage space as well as data > >>> transmission has become exceptionally inexpensive. A few less > >>> bytes there do not count. They may not count for that much in terms of what the user pays for=20 bandwidth, but see below. And some users are probably still paying for=20 their internet access by the minute in some locales. > >> There are many places where there are limits (hard or soft) on > >> message lengths. Some of us still use MUDs and 80-character line > >> limits. > > > > See above. Covered by [2]. > > Unrelated. Not covered by that link. Go use a MUD some time. > > > But speaking of length limits, the lines in your postings are too > > long, according to Usenet convention. I had to correct the > > quotations so that they remained readable when word-wrapped. > > Oh, so you'd rather the lines be cut to... I dunno, 80 characters? > Might be a good reason to use a URL shortener. > usenet generally encourages us to set our word wrap at 72 to 73=20 characters so there is room for the invitable additions of the quote >=20 character so we can track who said what. That is just common good=20 practice. > >> Business cards or other printed media need to be transcribed by > >> hand. Dictation of URLs becomes virtually impossible when they're > >> arbitrarily long. OTOH, url's in excess of 250 characters long exist only to polish ego's=20 of the people involved or demonstrate that they could not organize a=20 company picnic in a 4 person company. =46ew enough recognize that problem and post their urls on the form of=20 which most email agents recognize as a url, that before=20 presentation to a browser when you click on it, will then go thru it,=20 stripping out the line feeds and carriage returns so that the original=20 as pasted and wrecked by the emailers word wrapping, is restored and it=20 has at least a snowballs chance in hell of working. But you can't teach a winderz user to do that any better than you can=20 break them from top posting. > > (You are not reading at all, are you?) This is covered by that: > >>> Instead, there *is* a need for *concise*, *semantic* URIs that Web > >>> (service) users can *easily* *remember*. It is the duty of the > >>> original Web authors=E2=88=95developers to make sure that there are, = and I > >>> think that no kind of automation is going to ease or replace > >>> thoughtful path design anytime soon (but please, prove me wrong): > >> > >> Sure...... if you control the destination server. What if you're > >> engaging in scholarly discussion about someone else's content? You > >> can't change the canonical URLs, and you can't simply copy their > >> content to your own server (either for licensing reasons or to > >> guarantee that the official version hasn't been tampered with). > > > > That is why I said it is the duty of the original > > authors/developers. It is a community effort, and it is not going > > to happen overnight. But evading the problem with unreliable > > replacements such as =E2=80=9Cshort URLs=E2=80=9D is not going to solve= it either. True, its fixing the wrong end of the problem. > So, you can go fight an unwinnable battle against literally every web > creator in the world. Meanwhile, I'll keep on using URL shorteners. > > ChrisA Cheers, Gene Heskett =2D-=20 "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." =2DEd Howdershelt (Author) Genes Web page