Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail
From: Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de>
Newsgroups: comp.lang.python
Subject: Re: non printable (moving away from Perl)
Date: Fri, 11 Mar 2016 17:34:14 +0100
Lines: 27
Message-ID: <mailman.26.1457714063.26429.python-list@python.org>
References: <nbt27u$fe7$1@gioia.aioe.org> <mailman.17.1457698399.26429.python-list@python.org> <nbukcd$gs2$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0
In-Reply-To: <nbukcd$gs2$1@gioia.aioe.org>
Precedence: list
Xref: csiph.com comp.lang.python:104623

On 11.03.2016 15:23, Fillmore wrote:
> On 03/11/2016 07:13 AM, Wolfgang Maier wrote:
>> One lesson for Perl regex users is that in Python many things can be
>> solved without regexes.
>> How about defining:
>>
>> printable = {chr(n) for n in range(32, 127)}
>>
>> then using:
>>
>> if (set(my_string) - set(printable)):
>>      break
>
> seems computationally heavy. I have a file with about 70k lines, of
> which only 20 contain "funny" chars.
>

Not sure what you call computationally heavy. I just test-parsed a 30 MB 
file (28k lines) with:

with open(my_file) as i:
     for line in i:
         if set(line) - printable:
             continue

and it finished in less than a second.