Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #26087
| Path | csiph.com!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder1.enfer-du-nord.net!newsfeed.eweka.nl!eweka.nl!feeder3.eweka.nl!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <gandalf@shopzeus.com> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.000 |
| X-Spam-Evidence | '*H*': 1.00; '*S*': 0.00; 'else:': 0.03; 'example:': 0.03; 'modified': 0.05; 'one?': 0.05; 'parsing': 0.07; 'purpose.': 0.07; 'suppose': 0.07; 'basename': 0.09; 'identifier': 0.09; 'postgresql.': 0.09; 'skip:r 60': 0.09; 'def': 0.10; "wouldn't": 0.11; 'index': 0.13; 'digest': 0.15; '@classmethod': 0.16; 'dom,': 0.16; 'enough?': 0.16; 'identifiers': 0.16; 'identifiers.': 0.16; 'recognizing': 0.16; 'result:': 0.16; 'sign.': 0.16; 'truncating': 0.16; 'underscore.': 0.16; 'element': 0.17; 'thanks,': 0.18; 'creates': 0.18; 'skip:" 30': 0.20; 'trying': 0.21; 'names.': 0.22; 'questions:': 0.22; 'recognize': 0.22; "skip:' 40": 0.22; 'somebody': 0.23; 'long,': 0.24; 'header:User-Agent:1': 0.26; 'skip:b 30': 0.27; '(since': 0.29; 'hash': 0.29; 'restricted': 0.29; 'table,': 0.29; 'tail': 0.29; 'unique,': 0.29; 'objects': 0.29; 'class': 0.29; "i'm": 0.29; "skip:' 10": 0.30; 'field,': 0.30; 'version,': 0.30; 'function': 0.30; 'code': 0.31; 'good.': 0.32; 'print': 0.32; 'problem': 0.33; 'to:addr:python-list': 0.33; 'skip:b 20': 0.34; 'bigger': 0.35; 'problem,': 0.35; 'so,': 0.35; 'similar': 0.35; 'there': 0.35; 'created': 0.36; 'but': 0.36; 'should': 0.36; 'too': 0.36; 'possible': 0.37; 'object': 0.38; 'instead': 0.39; 'to:addr:python.org': 0.39; 'received:192': 0.39; 'received:192.168': 0.40; 'think': 0.40; 'chance': 0.61; 'time,': 0.62; 'safe': 0.63; 'more': 0.63; 'here': 0.65; 'limit': 0.65; 'dollar': 0.71; 'increase': 0.72; 'received:204': 0.72; 'algorithm,': 0.84 |
| Date | Thu, 26 Jul 2012 14:26:16 +0200 |
| From | Laszlo Nagy <gandalf@shopzeus.com> |
| User-Agent | Mozilla/5.0 (X11; Linux x86_64; rv:14.0) Gecko/20120714 Thunderbird/14.0 |
| MIME-Version | 1.0 |
| To | python-list@python.org |
| Subject | Generating valid identifiers |
| Content-Type | text/plain; charset=ISO-8859-1; format=flowed |
| Content-Transfer-Encoding | 7bit |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.12 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.2604.1343305588.4697.python-list@python.org> (permalink) |
| Lines | 57 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1343305588 news.xs4all.nl 6988 [2001:888:2000:d::a6]:34357 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | csiph.com comp.lang.python:26087 |
Show key headers only | View raw
I have a program that creates various database objects in PostgreSQL.
There is a DOM, and for each element in the DOM, a database object is
created (schema, table, field, index and tablespace).
I do not want this program to generate very long identifiers. It would
increase SQL parsing time, and don't look good. Let's just say that the
limit should be 32 characters. But I also want to recognize the
identifiers when I look at their modified/truncated names.
So I have come up with this solution:
- I have restricted original identifiers not to contain the dollar sign.
They can only contain [A-Z] or [a-z] or [0-9] and the underscore. Here
is a valid example:
"group1_group2_group3_some_field_name"
- I'm trying to use a hash function to reduce the length of the
identifier when it is too long:
class Connection(object):
# ... more code here
@classmethod
def makename(cls, basename):
if len(basename)>32:
h = hashlib.sha256()
h.update(basename)
tail = base64.b64encode(h.digest(),"_$")[:10]
return basename[:30]+"$"+tail
else:
return basename
Here is the result:
print repr(Connection.makename("some_field_name"))
'some_field_name'
print repr(Connection.makename("group1_group2_group3_some_field_name"))
'group1_group2_group3_some_fiel$AyQVQUXoyf'
So, if the identifier is too long, then I use a modified version, that
should be unique, and similar to the original name. Let's suppose that
nobody wants to crack this modified hash on purpose.
And now, the questions:
* Would it be a problem to use CRC32 instead of SHA? (Since security is
not a problem, and CRC32 is faster.)
* I'm truncating the digest value to 10 characters. Is it safe enough?
I don't want to use more than 10 characters, because then it wouldn't be
possible to recognize the original name.
* Can somebody think of a better algorithm, that would give a bigger
chance of recognizing the original identifier from the modified one?
Thanks,
Laszlo
Back to comp.lang.python | Previous | Next — Next in thread | Find similar | Unroll thread
Generating valid identifiers Laszlo Nagy <gandalf@shopzeus.com> - 2012-07-26 14:26 +0200
Re: Generating valid identifiers Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-07-26 15:30 +0000
Re: Generating valid identifiers Laszlo Nagy <gandalf@shopzeus.com> - 2012-07-26 20:08 +0200
Re: Generating valid identifiers Ian Kelly <ian.g.kelly@gmail.com> - 2012-07-26 13:28 -0600
Re: Generating valid identifiers Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-07-27 01:54 +0000
Re: Generating valid identifiers Laszlo Nagy <gandalf@shopzeus.com> - 2012-07-27 09:34 +0200
Re: Generating valid identifiers Ian Kelly <ian.g.kelly@gmail.com> - 2012-07-26 14:00 -0600
Re: Generating valid identifiers Laszlo Nagy <gandalf@shopzeus.com> - 2012-07-27 09:28 +0200
Re: Generating valid identifiers Laszlo Nagy <gandalf@shopzeus.com> - 2012-07-27 11:59 +0200
csiph-web