Path: csiph.com!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.082 X-Spam-Evidence: '*H*': 0.84; '*S*': 0.00; 'else:': 0.03; 'try:': 0.09; 'yeah,': 0.09; 'translate': 0.10; 'awesome.': 0.16; 'contraption': 0.16; 'expression.': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'guessing': 0.16; 'hmm.': 0.16; 'sentences.': 0.16; 'skip:w 70': 0.16; 'uniquely': 0.16; 'utc,': 0.16; 'wrote:': 0.18; 'result.': 0.19; 'translated': 0.19; 'code,': 0.22; 'import': 0.22; 'saying': 0.22; 'convenient': 0.24; "shouldn't": 0.24; 'skip:l 30': 0.24; 'looks': 0.24; '(or': 0.24; 'script': 0.25; 'skip:" 30': 0.26; 'second': 0.26; 'pass': 0.26; 'code:': 0.26; 'header:In-Reply-To:1': 0.27; 'words': 0.29; 'relative': 0.30; 'statement': 0.30; 'message-id:@mail.gmail.com': 0.30; "i'm": 0.30; 'figure': 0.32; 'regular': 0.32; 'totally': 0.33; 'maybe': 0.34; 'could': 0.34; 'except': 0.35; 'problem.': 0.35; 'something': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'doing': 0.36; 'wrong': 0.37; 'easily': 0.37; 'nov': 0.38; 'solving': 0.38; 'to:addr:python-list': 0.38; 'pm,': 0.38; 'little': 0.38; 'anything': 0.39; 'bad': 0.39; "couldn't": 0.39; 'to:addr:python.org': 0.39; 'university': 0.39; 'how': 0.40; 'skip:u 10': 0.60; 'greatest': 0.60; 'ian': 0.60; 'extended': 0.61; 'skip:o 30': 0.61; "you're": 0.61; 'developed': 0.63; 'decided': 0.64; 'more': 0.64; 'minutes': 0.67; 'difficulty': 0.68; 'reads': 0.68; 'skip:w 40': 0.68; 'fact,': 0.69; 'cut': 0.74; 'researchers': 0.78; 'adopting': 0.84; 'demonstrates': 0.84; 'hypothesis': 0.84; 'instantly': 0.84; 'phenomenon': 0.84; 'studying': 0.84; 'findings': 0.91; '2013': 0.98 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=JUqgV4K626bChDWZbcXfJXwSPANPotGTPsNdb94EXW8=; b=bcvZzWjfswUY6lkzKZ/V09mKyx0C/7pwoyq6oT99agE1VWr0cDm6Hh1/A3dNgdLZQm SYlqK3ZhIeFeWQu7AMSnWJKMnN2uISCjx3RD1i3hGSq7+vj4FfArh4/+dBLoIii+GvyR FJkrVLui6kGMBJKk8rxfiv3AoyHaakLP1YieBLYTnwbPJKpRqOqXQO767rRCOLm2E2iS ZnTkKz8sH317HIP6edpGhUuGot+f2qBoEunlNsDbqa74jFdG5TijM2PtcxU7tCzaNB73 nglulXgKkbONJfB5LWuyuTjKyT7MmteQfJmwR5yjx60eT5Lk3VP/P2uPKru7nidLA6Hb Zk/Q== MIME-Version: 1.0 X-Received: by 10.66.218.198 with SMTP id pi6mr25824686pac.107.1384853178873; Tue, 19 Nov 2013 01:26:18 -0800 (PST) In-Reply-To: References: <5285be70$0$29975$c3e8da3$5496439d@news.astraweb.com> Date: Tue, 19 Nov 2013 20:26:18 +1100 Subject: Re: Automation From: Chris Angelico To: python-list@python.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 66 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1384853188 news.xs4all.nl 15877 [2001:888:2000:d::a6]:40384 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:59967 On Tue, Nov 19, 2013 at 7:53 PM, Ian Kelly wrote: > Aoilegpos for aidnoptg a cdocianorttry vwpiienot but, ttoheliacrley > spkeaing, lgitehnneng the words can mnartafucue an iocnuurgons > samenttet that is vlrtiauly isbpilechmoenrne. isbpilechmoenrne. I totally want to find an excuse to use that word somewhere.. It just looks awesome. Paradoxically, it's actually more likely that a computer can figure out what you're saying here. In fact, I could easily write a little script that reads /usr/share/dict/words (or equivalent) and attempts to decode your paragraph. Hmm. You know what, I think I will. It's now 0958 UTC, let's see how long this takes me. Meh. I did something stupid and decided to use a regular expression. It's not 1020 UTC, so that's 21 minutes of figuring out what I was doing wrong with the regex and 1 minute solving the original problem. But here's your translated paragraph: -- cut -- Interestingly I'm studying this controversial phenomenon at the Department of Linguistics at Absytrytewh University and my extraordinary discoveries wholeheartedly contradict the picsbeliud findings regarding the relative difficulty of instantly translating sentences. My researchers developed a convenient contraption at hnasoa/tw.nartswdbvweos/utrtek:p./il that demonstrates that the hypothesis uniquely warrants credibility if the assumption that the preponderance of your words is not extended is unquestionable. Apologies for adopting a contradictory viewpoint but, theoretically speaking, lengthening the words can manufacture an incongruous statement that is virtually incomprehensible. -- cut -- It couldn't figure out "Absytrytewh", "picsbeliud", or "hnasoa/tw.nartswdbvweos/utrtek:p./il". That's not a bad result. (And as a human, I'm guessing that the second one isn't an English word - maybe it's Scots?) Here's the code: words = {} for word in open("/usr/share/dict/words"): word=word.strip().lower() transformed = word if len(word)==1 else word[0]+''.join(sorted(word[1:-1]))+word[-1] words.setdefault(transformed,set()).add(word) words.setdefault(transformed.capitalize(),set()).add(word.capitalize()) import re for line in open("input"): line=line.strip() for word in re.split("(\W+)",line): try: transformed = word if len(word)==1 else word[0]+''.join(sorted(word[1:-1]))+word[-1] realword=words[transformed] if len(realword)>1: realword=repr(realword) else: realword=next(iter(realword)) line=line.replace(word,realword) except LookupError: # catches three errors, all of which mean we shouldn't translate anything pass print(line) Yeah, it's not the greatest code, but it works :) ChrisA