Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #73274

Re: Python's re module and genealogy problem

From Dan Sommers <dan@tombstonezero.net>
Newsgroups comp.lang.python
Subject Re: Python's re module and genealogy problem
Date 2014-06-14 05:14 +0000
Organization A noiseless patient Spider
Message-ID <lnglo9$pg4$1@dont-email.me> (permalink)
References <bvr01iFu926U1@mid.individual.net> <c00ivgF5cjpU1@mid.individual.net>

Show all headers | View raw


On Fri, 13 Jun 2014 17:17:06 +0200, BrJohan wrote:

> Or to put the namevariants in some sequence of sets having elements
> like:  ("Kristina", "Christina", "Cristine", "Kristine")

> Matching is then just applying the 'in' operator.

That's definitely a better approach, for the reasons you mentioned.

> Comments?

A soundex (or similar) algorithm will be better in the long run for the
less common, but more often misspelled names.  It's fairly simple to
guess at a number of common spellings for names that *you* think are
common now, but what about names that run in families that aren't yours,
or aren't that common outside of that family, or were wildly popular a
couple of hundred years ago but have fallen out of favor now?

My wife's ancestors (she's the genealogist, I just get to hear the
horror stories) are notorious for being somewhat illiterate; for
changing their names, on purpose, after a feud, in order to "distance"
themselves from their relatives; and also for using not-common-now (or
even not-so-common-then) names.  Add in somewhat illiterate records
keepers and hospital workers (or midwives or neighbors), not to mention
bad copies of bad copies of centuries-old smudged documents, and you
have an instant soup of names that sound alike but are spelled
differently in ways you cannot guess ahead of time.

Your users will appreciate *some* sort of fuzzy matching, or runtime
extensibility, atop the "obvious" spellings you take the time to include
in your software.  And that's *not* a comment on your abilities; it's a
comment on the abilities and creativity of their ancestors.

Dan

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Python's re module and genealogy problem BrJohan <brjohan@gmail.com> - 2014-06-11 14:23 +0200
  Re: Python's re module and genealogy problem Robert Kern <robert.kern@gmail.com> - 2014-06-11 14:26 +0100
    Re: Python's re module and genealogy problem Mark H Harris <harrismh777@gmail.com> - 2014-06-11 09:08 -0500
  Re: Python's re module and genealogy problem Thomas Rachel <nutznetz-0c1b6768-bfa9-48d5-a470-7603bd3aa915@spamschutz.glglgl.de> - 2014-06-11 15:55 +0200
  Re: Python's re module and genealogy problem Michael Torrie <torriem@gmail.com> - 2014-06-11 09:34 -0600
  Re: Python's re module and genealogy problem Nick Cash <nick.cash@npcinternational.com> - 2014-06-11 16:21 +0000
  Re: Python's re module and genealogy problem Simon Ward <simon@bleah.co.uk> - 2014-06-11 18:21 +0100
  Re: Python's re module and genealogy problem Vlastimil Brom <vlastimil.brom@gmail.com> - 2014-06-11 20:09 +0200
  Re: Python's re module and genealogy problem BrJohan <brjohan@gmail.com> - 2014-06-13 17:17 +0200
    Re: Python's re module and genealogy problem Peter Otten <__peter__@web.de> - 2014-06-13 18:26 +0200
    Re: Python's re module and genealogy problem Dan Sommers <dan@tombstonezero.net> - 2014-06-14 05:14 +0000
  Re: Python's re module and genealogy problem Tony the Tiger <tony@tiger.invalid> - 2014-06-14 08:35 +0000

csiph-web