Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #73274
| From | Dan Sommers <dan@tombstonezero.net> |
|---|---|
| Newsgroups | comp.lang.python |
| Subject | Re: Python's re module and genealogy problem |
| Date | 2014-06-14 05:14 +0000 |
| Organization | A noiseless patient Spider |
| Message-ID | <lnglo9$pg4$1@dont-email.me> (permalink) |
| References | <bvr01iFu926U1@mid.individual.net> <c00ivgF5cjpU1@mid.individual.net> |
On Fri, 13 Jun 2014 17:17:06 +0200, BrJohan wrote:
> Or to put the namevariants in some sequence of sets having elements
> like: ("Kristina", "Christina", "Cristine", "Kristine")
> Matching is then just applying the 'in' operator.
That's definitely a better approach, for the reasons you mentioned.
> Comments?
A soundex (or similar) algorithm will be better in the long run for the
less common, but more often misspelled names. It's fairly simple to
guess at a number of common spellings for names that *you* think are
common now, but what about names that run in families that aren't yours,
or aren't that common outside of that family, or were wildly popular a
couple of hundred years ago but have fallen out of favor now?
My wife's ancestors (she's the genealogist, I just get to hear the
horror stories) are notorious for being somewhat illiterate; for
changing their names, on purpose, after a feud, in order to "distance"
themselves from their relatives; and also for using not-common-now (or
even not-so-common-then) names. Add in somewhat illiterate records
keepers and hospital workers (or midwives or neighbors), not to mention
bad copies of bad copies of centuries-old smudged documents, and you
have an instant soup of names that sound alike but are spelled
differently in ways you cannot guess ahead of time.
Your users will appreciate *some* sort of fuzzy matching, or runtime
extensibility, atop the "obvious" spellings you take the time to include
in your software. And that's *not* a comment on your abilities; it's a
comment on the abilities and creativity of their ancestors.
Dan
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Python's re module and genealogy problem BrJohan <brjohan@gmail.com> - 2014-06-11 14:23 +0200
Re: Python's re module and genealogy problem Robert Kern <robert.kern@gmail.com> - 2014-06-11 14:26 +0100
Re: Python's re module and genealogy problem Mark H Harris <harrismh777@gmail.com> - 2014-06-11 09:08 -0500
Re: Python's re module and genealogy problem Thomas Rachel <nutznetz-0c1b6768-bfa9-48d5-a470-7603bd3aa915@spamschutz.glglgl.de> - 2014-06-11 15:55 +0200
Re: Python's re module and genealogy problem Michael Torrie <torriem@gmail.com> - 2014-06-11 09:34 -0600
Re: Python's re module and genealogy problem Nick Cash <nick.cash@npcinternational.com> - 2014-06-11 16:21 +0000
Re: Python's re module and genealogy problem Simon Ward <simon@bleah.co.uk> - 2014-06-11 18:21 +0100
Re: Python's re module and genealogy problem Vlastimil Brom <vlastimil.brom@gmail.com> - 2014-06-11 20:09 +0200
Re: Python's re module and genealogy problem BrJohan <brjohan@gmail.com> - 2014-06-13 17:17 +0200
Re: Python's re module and genealogy problem Peter Otten <__peter__@web.de> - 2014-06-13 18:26 +0200
Re: Python's re module and genealogy problem Dan Sommers <dan@tombstonezero.net> - 2014-06-14 05:14 +0000
Re: Python's re module and genealogy problem Tony the Tiger <tony@tiger.invalid> - 2014-06-14 08:35 +0000
csiph-web