Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!news.dougwise.org!nntpfeed.proxad.net!proxad.net!feeder1-2.proxad.net!news.wiretrip.org!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'subject:Python': 0.04; 'python.': 0.05; 'library,': 0.05; 'string,': 0.05; 'interpreter': 0.07; 'option,': 0.07; 'pep': 0.07; 'terry': 0.07; 'python': 0.07; '16-bit': 0.09; '32-bit': 0.09; 'matt': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:80.91.229.12': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'received:lo.gmane.org': 0.09; 'utf-8': 0.09; 'wrappers': 0.09; 'api': 0.11; 'pm,': 0.11; 'written': 0.12; '>>>': 0.12; 'wrote:': 0.14; 'ctypes.': 0.16; 'fallback': 0.16; 'from:addr:behnel.de': 0.16; 'from:addr:stefan_ml': 0.16; 'from:name:stefan behnel': 0.16; 'ideally,': 0.16; 'pythonic': 0.16; 'range.': 0.16; 'thing?': 0.16; 'intermediate': 0.16; 'compiled': 0.18; 'compile': 0.19; 'processed': 0.19; 'writes:': 0.20; 'code,': 0.20; 'interface': 0.20; 'code': 0.22; 'header:In-Reply-To:1': 0.22; '(but': 0.22; 'builds': 0.23; 'script.': 0.23; 'appears': 0.24; 'version': 0.25; 'received:84': 0.25; "i'm": 0.26; 'looks': 0.28; 'shared': 0.29; 'url:)': 0.29; "python's": 0.29; 'stefan': 0.29; 'unicode': 0.29; 'anyone': 0.31; 'paul': 0.32; 'to:addr:python- list': 0.32; 'option': 0.33; 'using': 0.34; 'header:X-Complaints- To:1': 0.34; 'header:User-Agent:1': 0.35; 'surely': 0.35; 'allow': 0.36; 'some': 0.37; 'should': 0.37; 'strings': 0.38; 'less': 0.38; 'url:org': 0.38; 'received:org': 0.38; 'likely': 0.39; 'to:addr:python.org': 0.39; 'could': 0.39; 'header:Mime- Version:1': 0.39; 'include': 0.40; 'would': 0.40; 'header:Received:5': 0.40; 'lower': 0.63; 'exceed': 0.65; 'interesting,': 0.84; 'look,': 0.84; 'snowball': 0.84; 'step.': 0.91 X-Injected-Via-Gmane: http://gmane.org/ To: python-list@python.org From: Stefan Behnel Subject: Re: Snowball to Python compiler Date: Fri, 22 Apr 2011 09:50:04 +0200 References: <7xei4vqgi5.fsf@ruckus.brouhaha.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Gmane-NNTP-Posting-Host: dslb-084-056-040-021.pools.arcor-ip.net User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.14) Gecko/20110223 Lightning/1.0b2 Thunderbird/3.1.8 In-Reply-To: X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 35 NNTP-Posting-Host: 82.94.164.166 X-Trace: 1303458623 news.xs4all.nl 81476 [::ffff:82.94.164.166]:51551 X-Complaints-To: abuse@xs4all.nl Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:3851 Terry Reedy, 22.04.2011 05:48: > On 4/21/2011 8:25 PM, Paul Rubin wrote: >> Matt Chaput writes: >>> I'm looking for some code that will take a Snowball program and >>> compile it into a Python script. Or, less ideally, a Snowball >>> interpreter written in Python. >>> >>> (http://snowball.tartarus.org/) >>> >>> Anyone heard of such a thing? >> >> I never saw snowball before, it looks kind of interesting, and it >> looks like it already has a way to compile to C. If you're using >> it for IR on any scale, you're surely much better off using the C >> routines with a C API wrapper, > > If the C routines are in a shared library, you should be able to write the > interface in Python with ctypes. Since it appears that the code has to get compiled anyway, Cython is likely a better option, as it makes it easier to write a fast and Pythonic wrapper. From a quick look, Snowball also has a "-widechar" option that could allow interfacing directly with Python's Unicode strings in 16-bit Unicode builds (but not 32-bit builds!). That would provide for really fast wrappers that do not even need an intermediate encoding step. And PEP 393 would eventually allow to include both a UTF-8 and a 16-bit version of the (prefixed) Snowball code, and to use them alternatively, depending on the internal layout of the processed string, with the obvious fallback to UTF-8 encoding only for strings that really exceed the lower 16-bit Unicode range. That sounds like a really nice project. Stefan