Path: csiph.com!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <CALwzidnG63SjvOVEKWN5e7NVHE7v+vJMg-a6Xn9mB3aMAW5wdA@mail.gmail.com>
References: <mailman.1988.1383495590.18130.python-list@python.org> <cebce864-a614-4305-aa14-bb680c0c7e60@googlegroups.com> <mailman.2610.1384449032.18130.python-list@python.org> <A0ahu.49784$ZJ5.37383@fx09.am4> <5285be70$0$29975$c3e8da3$5496439d@news.astraweb.com> <ben1t6Fd19tU1@mid.individual.net> <Levhu.35560$Mn5.29711@fx25.am4> <WJvhu.111788$qC.80212@fx07.am4> <CALwzidnG63SjvOVEKWN5e7NVHE7v+vJMg-a6Xn9mB3aMAW5wdA@mail.gmail.com>
Date: Tue, 19 Nov 2013 20:26:18 +1100
Subject: Re: Automation
From: Chris Angelico <rosuav@gmail.com>
To: python-list@python.org
Content-Type: text/plain; charset=ISO-8859-1
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.2895.1384853188.18130.python-list@python.org>
Lines: 66
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:59967

On Tue, Nov 19, 2013 at 7:53 PM, Ian Kelly <ian.g.kelly@gmail.com> wrote:
> Aoilegpos for aidnoptg a cdocianorttry vwpiienot but, ttoheliacrley
> spkeaing, lgitehnneng the words can mnartafucue an iocnuurgons
> samenttet that is vlrtiauly isbpilechmoenrne.

isbpilechmoenrne. I totally want to find an excuse to use that word
somewhere.. It just looks awesome.

Paradoxically, it's actually more likely that a computer can figure
out what you're saying here. In fact, I could easily write a little
script that reads /usr/share/dict/words (or equivalent) and attempts
to decode your paragraph. Hmm. You know what, I think I will. It's now
0958 UTC, let's see how long this takes me.

Meh. I did something stupid and decided to use a regular expression.
It's not 1020 UTC, so that's 21 minutes of figuring out what I was
doing wrong with the regex and 1 minute solving the original problem.
But here's your translated paragraph:

-- cut --
Interestingly I'm studying this controversial phenomenon at the
Department of Linguistics at Absytrytewh University and my
extraordinary discoveries wholeheartedly contradict the picsbeliud
findings regarding the relative difficulty of instantly translating
sentences. My researchers developed a convenient contraption at
hnasoa/tw.nartswdbvweos/utrtek:p./il that demonstrates that the
hypothesis uniquely warrants credibility if the assumption that the
preponderance of your words is not extended is unquestionable.
Apologies for adopting a contradictory viewpoint but, theoretically
speaking, lengthening the words can manufacture an incongruous
statement that is virtually incomprehensible.
-- cut --

It couldn't figure out "Absytrytewh", "picsbeliud", or
"hnasoa/tw.nartswdbvweos/utrtek:p./il". That's not a bad result. (And
as a human, I'm guessing that the second one isn't an English word -
maybe it's Scots?) Here's the code:

words = {}
for word in open("/usr/share/dict/words"):
    word=word.strip().lower()
    transformed = word if len(word)==1 else
word[0]+''.join(sorted(word[1:-1]))+word[-1]
    words.setdefault(transformed,set()).add(word)
    words.setdefault(transformed.capitalize(),set()).add(word.capitalize())

import re
for line in open("input"):
    line=line.strip()
    for word in re.split("(\W+)",line):
        try:
            transformed = word if len(word)==1 else
word[0]+''.join(sorted(word[1:-1]))+word[-1]
            realword=words[transformed]
            if len(realword)>1: realword=repr(realword)
            else: realword=next(iter(realword))
            line=line.replace(word,realword)
        except LookupError: # catches three errors, all of which mean
we shouldn't translate anything
            pass
    print(line)


Yeah, it's not the greatest code, but it works :)

ChrisA