Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #26087
| Date | 2012-07-26 14:26 +0200 |
|---|---|
| From | Laszlo Nagy <gandalf@shopzeus.com> |
| Subject | Generating valid identifiers |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.2604.1343305588.4697.python-list@python.org> (permalink) |
I have a program that creates various database objects in PostgreSQL.
There is a DOM, and for each element in the DOM, a database object is
created (schema, table, field, index and tablespace).
I do not want this program to generate very long identifiers. It would
increase SQL parsing time, and don't look good. Let's just say that the
limit should be 32 characters. But I also want to recognize the
identifiers when I look at their modified/truncated names.
So I have come up with this solution:
- I have restricted original identifiers not to contain the dollar sign.
They can only contain [A-Z] or [a-z] or [0-9] and the underscore. Here
is a valid example:
"group1_group2_group3_some_field_name"
- I'm trying to use a hash function to reduce the length of the
identifier when it is too long:
class Connection(object):
# ... more code here
@classmethod
def makename(cls, basename):
if len(basename)>32:
h = hashlib.sha256()
h.update(basename)
tail = base64.b64encode(h.digest(),"_$")[:10]
return basename[:30]+"$"+tail
else:
return basename
Here is the result:
print repr(Connection.makename("some_field_name"))
'some_field_name'
print repr(Connection.makename("group1_group2_group3_some_field_name"))
'group1_group2_group3_some_fiel$AyQVQUXoyf'
So, if the identifier is too long, then I use a modified version, that
should be unique, and similar to the original name. Let's suppose that
nobody wants to crack this modified hash on purpose.
And now, the questions:
* Would it be a problem to use CRC32 instead of SHA? (Since security is
not a problem, and CRC32 is faster.)
* I'm truncating the digest value to 10 characters. Is it safe enough?
I don't want to use more than 10 characters, because then it wouldn't be
possible to recognize the original name.
* Can somebody think of a better algorithm, that would give a bigger
chance of recognizing the original identifier from the modified one?
Thanks,
Laszlo
Back to comp.lang.python | Previous | Next — Next in thread | Find similar | Unroll thread
Generating valid identifiers Laszlo Nagy <gandalf@shopzeus.com> - 2012-07-26 14:26 +0200
Re: Generating valid identifiers Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-07-26 15:30 +0000
Re: Generating valid identifiers Laszlo Nagy <gandalf@shopzeus.com> - 2012-07-26 20:08 +0200
Re: Generating valid identifiers Ian Kelly <ian.g.kelly@gmail.com> - 2012-07-26 13:28 -0600
Re: Generating valid identifiers Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-07-27 01:54 +0000
Re: Generating valid identifiers Laszlo Nagy <gandalf@shopzeus.com> - 2012-07-27 09:34 +0200
Re: Generating valid identifiers Ian Kelly <ian.g.kelly@gmail.com> - 2012-07-26 14:00 -0600
Re: Generating valid identifiers Laszlo Nagy <gandalf@shopzeus.com> - 2012-07-27 09:28 +0200
Re: Generating valid identifiers Laszlo Nagy <gandalf@shopzeus.com> - 2012-07-27 11:59 +0200
csiph-web