Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.002 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'url:pypi': 0.03; 'subject:Python': 0.06; 'expressions': 0.07; 'method.': 0.07; 'problem:': 0.07; 'string': 0.09; 'generators': 0.09; 'subject:module': 0.09; 'expression,': 0.16; 'expressions,': 0.16; 'fuzzy': 0.16; 'intersection': 0.16; 'outputs': 0.16; 'unbound': 0.16; 'which,': 0.16; 'library': 0.18; "python's": 0.19; '>>>': 0.22; 'example': 0.22; 'features,': 0.24; 'subject:problem': 0.24; 'decide': 0.24; 'purposes': 0.26; 'least': 0.26; 'header:In-Reply- To:1': 0.27; 'feature': 0.29; 'matching': 0.30; 'message- id:@mail.gmail.com': 0.30; 'url:mailman': 0.30; 'regular': 0.32; 'url:python': 0.33; 'could': 0.34; 'beyond': 0.35; 'test': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'false': 0.36; 'module.': 0.36; 'url:listinfo': 0.36; 'hi,': 0.36; 'url:org': 0.36; 'two': 0.37; 'e.g.': 0.38; 'to:addr:python-list': 0.38; 'little': 0.38; 'to:addr:python.org': 0.39; 'enough': 0.39; 'url:mail': 0.40; 'mentioned': 0.61; 'decided': 0.64; 'more': 0.64; 'skip:r 30': 0.69; 'stated': 0.69; 'differently:': 0.84; 'careful': 0.91 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=6idYuL3EAHEjEQJa4ZssxrMGzYD9mcwzhQBLc+ubzg8=; b=n5JTb6sdaRwHb2hx0N7C77FghJfKkN1WWgotz4SvNPhDWUibK94zdER9IKNllZ7D9W UbEEUuTZS0zBXNyY5H+EOJVbvigDvZ9EAk6bJLsCPrQCwFObdu6uHFU1PwGaDBeFR791 TthM7JWj7Mn7Nqy2ZUidV0W/aK1wscNvq9MFHRrzZzNb99UySgHrKmz50Y1onZXxKKjW qE2JogTJcj8oMKEWgD3L9YCrQ6WfCusAyY80is+RNDeW/jtUYH+E8OiDohydlLSdr+KC AUH1+nj3M0oD+e+etlej7jQ0qIIV805+kR05YyxE+2mcNlJvVwwv5kDnzS540I4gIe3N c3BA== MIME-Version: 1.0 X-Received: by 10.224.104.5 with SMTP id m5mr55381401qao.9.1402510147070; Wed, 11 Jun 2014 11:09:07 -0700 (PDT) In-Reply-To: References: Date: Wed, 11 Jun 2014 20:09:07 +0200 Subject: Re: Python's re module and genealogy problem From: Vlastimil Brom To: python Content-Type: text/plain; charset=UTF-8 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 40 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1402510564 news.xs4all.nl 2970 [2001:888:2000:d::a6]:49501 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:73185 2014-06-11 14:23 GMT+02:00 BrJohan : > For some genealogical purposes I consider using Python's re module. >... > > Now, my problem: Is there a way to decide whether any two - or more - of > those regular expressions will match the same string? > > Or, stated a little differently: > > Can it, for a pair of regular expressions be decided whether at least one > string matching both of those regular expressions, can be constructed? > -- > https://mail.python.org/mailman/listinfo/python-list Hi, i guess, you could reuse some available generators for strings matching a given regular expression, see e.g.: http://stackoverflow.com/questions/492716/reversing-a-regular-expression-in-python/ for example a pyparsing recipe: http://stackoverflow.com/questions/492716/reversing-a-regular-expression-in-python/5006339#5006339 which might be general enough for your needs - of course, you cannot use unbound quantifiers, backreferences, etc. Then you can test for identical strings in the generated outputs - e.g. using the set(...) and its intersection method. You might also check a much more powerful regex library https://pypi.python.org/pypi/regex which, beyond other features, also supports the mentioned fuzzy matches, cf. >>> regex.findall(r"\bSm(?:ith){e<3}\b", "Smith Smithe Smyth Smythe Smijth") ['Smith', 'Smithe', 'Smyth', 'Smythe', 'Smijth'] >>> (but, of course, you will have to be careful with this feature in order to reduce false positives) hth, vbr