Path: csiph.com!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder1.enfer-du-nord.net!newsfeed.eweka.nl!eweka.nl!feeder3.eweka.nl!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.009 X-Spam-Evidence: '*H*': 0.98; '*S*': 0.00; 'ideally': 0.04; 'exist,': 0.07; '32-bit': 0.09; 'base64': 0.09; 'edward': 0.09; 'cc:addr :python-list': 0.10; 'cases': 0.15; 'encoding': 0.15; '128-bit': 0.16; 'accidental': 0.16; 'debugging,': 0.16; 'fine.': 0.16; 'subject:Generate': 0.16; 'subject:URL': 0.16; 'string': 0.17; 'wrote:': 0.17; '>>>': 0.18; 'bit': 0.21; "skip:' 40": 0.22; 'wednesday,': 0.22; 'cc:2**0': 0.23; "python's": 0.23; 'cc:addr:python.org': 0.25; 'header:In-Reply-To:1': 0.25; 'possibility': 0.27; 'converting': 0.27; 'message- id:@mail.gmail.com': 0.27; 'chris': 0.28; 'concise': 0.29; 'hash': 0.29; 'writes:': 0.29; 'url:mailman': 0.29; 'case,': 0.29; '"the': 0.29; 'probably': 0.29; 'point': 0.31; '(and': 0.32; 'url:python': 0.32; 'could': 0.32; 'url:listinfo': 0.32; 'skip:b 20': 0.34; 'received:google.com': 0.34; 'built-in': 0.35; 'nov': 0.35; 'richard': 0.35; 'pm,': 0.35; 'similar': 0.35; 'received:209.85': 0.35; 'there': 0.35; 'url:org': 0.36; 'does': 0.37; 'option': 0.37; 'ones': 0.37; 'received:209': 0.37; 'subject:: ': 0.38; 'some': 0.38; 'header:Received:5': 0.40; 'url:mail': 0.40; 'your': 0.60; 'john': 0.60; 'back': 0.62; 'ever': 0.63; 'times': 0.63; 'information': 0.63; 'more': 0.63; 'matter.': 0.65; 'secure': 0.67; 'counts': 0.81; 'email addr:panix.com': 0.84; 'bears': 0.91 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:x-gm-message-state; bh=2elBfE92Vl8U/JVB6cR5ssO46r8zHr5mEhtoAzOdjs0=; b=V+kz/SQ3S7Z09ibFAUsrcwlAg0OtsnFIR1m0yaU+B4wZLENd4YzoV+0fFCTGsZUi2v DJS6nYCIF+l0jk0rryE7308NJ77anQMqJ44AXova5GwNQ53FxXoMvrxEGG3REEVPDJW9 RQxslGdexT/cXnohJnsVX5y7RNaQhR7Hzqs65KefToa8Y2DpbLkdfQcpWLtf4LPsesSA Ztb7IisCksPuVO4mWDk8y+g0gDPkhBB8O2aDDL+B9iMzxRoDn8Qo3jnIV2sImoA6dj7C Ge9N2Ph4ripGoErPizx1cCVYg+l5cXzZMEroPiOfSS743By63I4J1nLOnQAJGq29B7y7 FBHw== MIME-Version: 1.0 In-Reply-To: <133e0be5-63af-4f72-9d0a-c59b04aa4ce4@googlegroups.com> References: <0692e6a2-343c-4eb0-be57-fe5c815efb99@googlegroups.com> <133e0be5-63af-4f72-9d0a-c59b04aa4ce4@googlegroups.com> From: Chris Kaynor Date: Tue, 13 Nov 2012 16:26:19 -0800 Subject: Re: Generate unique ID for URL To: Richard Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQnojPWkQJ8BkHEBEmtUbD/b66ugWGuPzLCnn0+2UsS0IJCgjMQYowEAZJv/UFMM4817PcPw Cc: "python-list@python.org" X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 66 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1352852802 news.xs4all.nl 6983 [2001:888:2000:d::a6]:50405 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:33275 One option would be using a hash. Python's built-in hash, a 32-bit CRC, 128-bit MD5, 256-bit SHA or one of the many others that exist, depending on the needs. Higher bit counts will reduce the odds of accidental collisions; cryptographically secure ones if outside attacks matter. In such a case, you'd have to roll your own means of converting the hash back into the string if you ever need it for debugging, and there is always the possibility of collisions. A similar solution would be using a pseudo-random GUID using the url as the seed. You could use a counter if all IDs are generated by a single process (and even in other cases with some work). If you want to be able to go both ways, using base64 encoding is probably your best bet, though you might get benefits by using compression. Chris On Tue, Nov 13, 2012 at 3:56 PM, Richard wrote: > Good point - one way encoding would be fine. > > Also this is performed millions of times so ideally efficient. > > > On Wednesday, November 14, 2012 10:34:03 AM UTC+11, John Gordon wrote: >> In <0692e6a2-343c-4eb0-be57-fe5c815efb99@googlegroups.com> Richard writes: >> >> >> >> > I want to create a URL-safe unique ID for URL's. >> >> > Currently I use: >> >> > url_id = base64.urlsafe_b64encode(url) >> >> >> >> > >>> base64.urlsafe_b64encode('docs.python.org/library/uuid.html') >> >> > 'ZG9jcy5weXRob24ub3JnL2xpYnJhcnkvdXVpZC5odG1s' >> >> >> >> > I would prefer more concise ID's. >> >> > What do you recommend? - Compression? >> >> >> >> Does the ID need to contain all the information necessary to recreate the >> >> original URL? >> >> >> >> -- >> >> John Gordon A is for Amy, who fell down the stairs >> >> gordon@panix.com B is for Basil, assaulted by bears >> >> -- Edward Gorey, "The Gashlycrumb Tinies" > > -- > http://mail.python.org/mailman/listinfo/python-list