Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'argument': 0.05; 'encoding': 0.05; 'encoded': 0.07; 'padding': 0.07; 'mixed': 0.09; 'subject:string': 0.09; 'willmer': 0.09; 'cc:addr:python-list': 0.11; 'python': 0.11; 'ascii,': 0.16; 'base64': 0.16; 'character.': 0.16; 'encoding.': 0.16; 'fits': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'obviously,': 0.16; 'rounding': 0.16; 'sign.': 0.16; 'two,': 0.16; 'usernames': 0.16; 'wrote:': 0.18; 'alex': 0.19; "python's": 0.19; '(the': 0.22; '8bit%:5': 0.22; '>>>': 0.22; 'input': 0.22; 'import': 0.22; 'aug': 0.22; 'cc:addr:python.org': 0.22; 'config': 0.24; 'instead.': 0.24; 'specifies': 0.24; 'unicode': 0.24; 'cc:2**0': 0.24; 'this:': 0.26; 'second': 0.26; 'header:In-Reply- To:1': 0.27; 'am,': 0.29; "doesn't": 0.30; 'message- id:@mail.gmail.com': 0.30; 'file': 0.32; 'could': 0.34; 'usual': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'done': 0.36; 'step': 0.37; 'needed': 0.38; 'that,': 0.38; 'expect': 0.39; 'simply': 0.61; 'back': 0.62; '2.7.': 0.84; 'story:': 0.84; 'to:none': 0.92 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type:content-transfer-encoding; bh=EXOErG+wDL3icY7dxNHYJufNNE/LFZhsQtpBw18JKig=; b=bExh/1x6YSn1kJH5cTtVWb2H51MKz2SKxQE60RSBAwmfZcfPvuHjiDhL/zOnfXOHOa Xvw9swjzjXRpK3QRhm/L2ADPrUMRKlq3I5d7229Ot5e7Ms9t2ReNQ063mLWvclbSK8rC 5pm9EBixI5E8uJ966IB8eXk/LpdwYzsknMDZ8upqGjmVUw2tCByJzoZaL8qddgFtFa58 WxoGcVHu8kRT0RdED8GrH39Jtq7Eb7sYgymmW2oRzUbR+lIg0EDK8RlD7/WrJbrjf+35 5UJeYILiveGpxG1UYdZL0xYQj5v90x3PH50cXrCgfx289HHJPByyKFhaOeTAI6BflmUk OiEg== MIME-Version: 1.0 X-Received: by 10.180.94.234 with SMTP id df10mr2323132wib.76.1408404488621; Mon, 18 Aug 2014 16:28:08 -0700 (PDT) In-Reply-To: <6e869040-98e9-437b-b024-4ffe7abc3054@googlegroups.com> References: <6e869040-98e9-437b-b024-4ffe7abc3054@googlegroups.com> Date: Tue, 19 Aug 2014 09:28:08 +1000 Subject: Re: Coding challenge: Optimise a custom string encoding From: Chris Angelico Cc: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 32 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1408404495 news.xs4all.nl 2875 [2001:888:2000:d::a6]:56932 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:76523 On Tue, Aug 19, 2014 at 5:16 AM, Alex Willmer wrote: > Back story: > Last week we needed a custom encoding to store unicode usernames in a con= fig file that only allowed mixed case ascii, digits, underscore, dash, at-s= ign and plus sign. We also wanted to keeping the encoded usernames somewhat= human readable. > If you can drop the "somewhat human readable" requirement, this fits perfectly into a Base 64 encoding. All you need to do is this: >>> import base64 >>> base64.b64encode("alic=E2=82=AC123".encode(),b"+@").replace(b'=3D',b'-'= ) b'YWxpY+KCrDEyMw--' The second argument specifies that, instead of the usual + and / for the last two, + and @ are used instead. (The last step is because Python's b64encode doesn't allow customization of the padding character. Alternatively, you could simply rstrip() them, and reinstate them by rounding up to four input bytes.) Decoding is, obviously, the reverse: >>> base64.b64decode(_.replace(b'-',b'=3D'),b"+@").decode() 'alic=E2=82=AC123' This is done in Python 3, not Python 2. But I expect it'll work the same way in 2.7. ChrisA