Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #76508
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Subject | Re: Coding challenge: Optimise a custom string encoding |
| Date | 2014-08-18 16:16 -0400 |
| References | <6e869040-98e9-437b-b024-4ffe7abc3054@googlegroups.com> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.13113.1408393206.18130.python-list@python.org> (permalink) |
On 8/18/2014 3:16 PM, Alex Willmer wrote:
> A challenge, just for fun. Can you speed up this function?
You should give a specification here, with examples. You should perhaps
be using .maketrans and .translate.
> import string
>
> charset = set(string.ascii_letters + string.digits + '@_-')
> byteseq = [chr(i) for i in xrange(256)]
> bytemap = {byte: byte if byte in charset else '+' + byte.encode('hex')
> for byte in byteseq}
>
> def plus_encode(s):
> """Encode a unicode string with only ascii letters, digits, _, -, @, +
> """
> bytemap_ = bytemap
> s_utf8 = s.encode('utf-8')
> return ''.join([bytemap[byte] for byte in s_utf8])
>
> On my machine (Ubuntu 14.04, CPython 2.7.6, PyPy 2.2.1) this gets
>
> alex@martha:~$ python -m timeit -s 'import plus_encode' 'plus_encode.plus_encode(u"""qwertyuiop1234567890!"£$%^&*()EURO""")'
> 100000 loops, best of 3: 2.96 usec per loop
>
> alex@martha:~$ pypy -m timeit -s 'import plus_encode' 'plus_encode.plus_encode(u"""qwertyuiop1234567890!"£$%^&*()EURO""")'
> 1000000 loops, best of 3: 1.24 usec per loop
>
> Back story:
> Last week we needed a custom encoding to store unicode usernames in a config file that only allowed mixed case ascii, digits, underscore, dash, at-sign and plus sign. We also wanted to keeping the encoded usernames somewhat human readable.
>
> My design was utf-8 and a variant of %-escaping, using the plus symbol. So u'alic EURO 123' would be encoded as b'alic+e2+82+ac123'. This evening as a learning exercise I've tried to make it fast. This is the result.
>
> This challenge is just for fun. The chosen solution ended up being
>
> def name_encode(s):
> return %s_%s' % (s.encode('utf-8').encode('hex'),
> re.replace('[A-Za-z0-9]', '', s))
>
> Regards, Alex
>
--
Terry Jan Reedy
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Coding challenge: Optimise a custom string encoding Alex Willmer <alex@moreati.org.uk> - 2014-08-18 12:16 -0700
Re: Coding challenge: Optimise a custom string encoding Terry Reedy <tjreedy@udel.edu> - 2014-08-18 16:16 -0400
Re: Coding challenge: Optimise a custom string encoding Alex Willmer <alex@moreati.org.uk> - 2014-08-18 14:27 -0700
Re: Coding challenge: Optimise a custom string encoding Peter Otten <__peter__@web.de> - 2014-08-19 01:35 +0200
Re: Coding challenge: Optimise a custom string encoding Chris Angelico <rosuav@gmail.com> - 2014-08-19 09:28 +1000
Re: Coding challenge: Optimise a custom string encoding Lele Gaifax <lele@metapensiero.it> - 2014-08-19 12:00 +0200
csiph-web