Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #73274
| Path | csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!eternal-september.org!feeder.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail |
|---|---|
| From | Dan Sommers <dan@tombstonezero.net> |
| Newsgroups | comp.lang.python |
| Subject | Re: Python's re module and genealogy problem |
| Date | Sat, 14 Jun 2014 05:14:50 +0000 (UTC) |
| Organization | A noiseless patient Spider |
| Lines | 34 |
| Message-ID | <lnglo9$pg4$1@dont-email.me> (permalink) |
| References | <bvr01iFu926U1@mid.individual.net> <c00ivgF5cjpU1@mid.individual.net> |
| Mime-Version | 1.0 |
| Content-Type | text/plain; charset=UTF-8 |
| Content-Transfer-Encoding | 8bit |
| Injection-Date | Sat, 14 Jun 2014 05:14:50 +0000 (UTC) |
| Injection-Info | mx05.eternal-september.org; posting-host="7afa85ad1f051434641aa58aa18198d3"; logging-data="26116"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/0bwI80oskhmD7fiP4g93Fs3heob6dFPk=" |
| User-Agent | Pan/0.139 (Sexual Chocolate; GIT bf56508 git://git.gnome.org/pan2) |
| Cancel-Lock | sha1:fFT5ReZMgyGIsU7wFaV3wiUI5Dw= |
| Xref | csiph.com comp.lang.python:73274 |
Show key headers only | View raw
On Fri, 13 Jun 2014 17:17:06 +0200, BrJohan wrote:
> Or to put the namevariants in some sequence of sets having elements
> like: ("Kristina", "Christina", "Cristine", "Kristine")
> Matching is then just applying the 'in' operator.
That's definitely a better approach, for the reasons you mentioned.
> Comments?
A soundex (or similar) algorithm will be better in the long run for the
less common, but more often misspelled names. It's fairly simple to
guess at a number of common spellings for names that *you* think are
common now, but what about names that run in families that aren't yours,
or aren't that common outside of that family, or were wildly popular a
couple of hundred years ago but have fallen out of favor now?
My wife's ancestors (she's the genealogist, I just get to hear the
horror stories) are notorious for being somewhat illiterate; for
changing their names, on purpose, after a feud, in order to "distance"
themselves from their relatives; and also for using not-common-now (or
even not-so-common-then) names. Add in somewhat illiterate records
keepers and hospital workers (or midwives or neighbors), not to mention
bad copies of bad copies of centuries-old smudged documents, and you
have an instant soup of names that sound alike but are spelled
differently in ways you cannot guess ahead of time.
Your users will appreciate *some* sort of fuzzy matching, or runtime
extensibility, atop the "obvious" spellings you take the time to include
in your software. And that's *not* a comment on your abilities; it's a
comment on the abilities and creativity of their ancestors.
Dan
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Python's re module and genealogy problem BrJohan <brjohan@gmail.com> - 2014-06-11 14:23 +0200
Re: Python's re module and genealogy problem Robert Kern <robert.kern@gmail.com> - 2014-06-11 14:26 +0100
Re: Python's re module and genealogy problem Mark H Harris <harrismh777@gmail.com> - 2014-06-11 09:08 -0500
Re: Python's re module and genealogy problem Thomas Rachel <nutznetz-0c1b6768-bfa9-48d5-a470-7603bd3aa915@spamschutz.glglgl.de> - 2014-06-11 15:55 +0200
Re: Python's re module and genealogy problem Michael Torrie <torriem@gmail.com> - 2014-06-11 09:34 -0600
Re: Python's re module and genealogy problem Nick Cash <nick.cash@npcinternational.com> - 2014-06-11 16:21 +0000
Re: Python's re module and genealogy problem Simon Ward <simon@bleah.co.uk> - 2014-06-11 18:21 +0100
Re: Python's re module and genealogy problem Vlastimil Brom <vlastimil.brom@gmail.com> - 2014-06-11 20:09 +0200
Re: Python's re module and genealogy problem BrJohan <brjohan@gmail.com> - 2014-06-13 17:17 +0200
Re: Python's re module and genealogy problem Peter Otten <__peter__@web.de> - 2014-06-13 18:26 +0200
Re: Python's re module and genealogy problem Dan Sommers <dan@tombstonezero.net> - 2014-06-14 05:14 +0000
Re: Python's re module and genealogy problem Tony the Tiger <tony@tiger.invalid> - 2014-06-14 08:35 +0000
csiph-web