Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #104625

Re: non printable (moving away from Perl)

Path csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail
From Ian Kelly <ian.g.kelly@gmail.com>
Newsgroups comp.lang.python
Subject Re: non printable (moving away from Perl)
Date Fri, 11 Mar 2016 10:08:07 -0700
Lines 35
Message-ID <mailman.28.1457716135.26429.python-list@python.org> (permalink)
References <nbt27u$fe7$1@gioia.aioe.org> <mailman.17.1457698399.26429.python-list@python.org> <nbukcd$gs2$1@gioia.aioe.org> <nbus27$1si$1@ger.gmane.org>
Mime-Version 1.0
Content-Type text/plain; charset=UTF-8
X-Trace news.uni-berlin.de XV0Qt2z5oEXH98a2J9cteQSNPiMVSH9NcuaN9A23zVHQ==
Return-Path <ian.g.kelly@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.039
X-Spam-Evidence '*H*': 0.92; '*S*': 0.00; 'lines,': 0.05; 'second.': 0.09; 'python': 0.10; '2016': 0.16; 'received:209.85.213.176': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'subject:non': 0.16; 'wrote:': 0.16; 'input': 0.18; '>>>': 0.20; 'am,': 0.23; 'seems': 0.23; 'finished': 0.23; 'header:In-Reply-To:1': 0.24; 'fri,': 0.27; 'message-id:@mail.gmail.com': 0.27; 'perl': 0.29; 'file': 0.34; 'received:google.com': 0.35; 'received:209.85': 0.36; 'to:addr:python-list': 0.36; 'subject:: ': 0.37; 'received:209.85.213': 0.37; 'things': 0.38; 'received:209': 0.38; 'test': 0.39; 'data': 0.39; 'sure': 0.39; 'subject:from': 0.39; 'to:addr:python.org': 0.40; 'your': 0.60; 'granted': 0.63; 'mar': 0.65; 'fall': 0.66; '100': 0.79; 'category.': 0.84; 'to:name:python': 0.84
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=IzyIJkMS4ji1wbYoDMIfbRaFO7gOq0Kb2PDnsBMoB6A=; b=ndo/3/1Va5dqjPpT+BQniREz0OWCIlz5yAOmRsDc4fMe1uFVvTQx+BrT96AZrRkwR9 59Dv45witUPTDiH6KWtchxSMRnc90U0ixomqxXc1yCmoyq8vEuuSBv8WoT6PP8j5nqRI JWoy3WmHOQiCTYa0NkIpz8p3J8dONKCvXCAkz6OVdvjPPW5/a6RTE7xF2r766U6hq+/u Wfnxletxx1Z1sAnF3gD/7kZm8hlNNvnokOm+knRk4T/oXvXB3p/zqCqpK0f8+++vESue pi5WO6mopmZPZsC0pMrAeNnp7pMvQFOb/Bol6wb3EeOwVlZ41XarXkm445nH96ZMkcjn 1mTw==
X-Google-DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=IzyIJkMS4ji1wbYoDMIfbRaFO7gOq0Kb2PDnsBMoB6A=; b=c/p3I0aCekq97XGD8QOcWnuYQyXJgjPy7b1ABtg/I2bolcFa72cUQM5rAoCLrvxads DrXwynumjJESyAd5BEQO2PXPH/yBDxrKNtG7jBGki3K/PEAiod7b3omfm3dbpUWzDNWl IGnAWV4A/SQjlxBADR5ezDWtTuwGTlMNTQ3uF5jfdO0X3D5AiZDB5/0pH0jbPLumHP5e ODkFivdrfWQ0uInnkUZ8euK4CUNfjuAQR2vcISjadBTvLWSCKoOvY6CRgs41RoZWP/hY p8DLOTtd1QKmxz7fDD2ZS4uHPawa+REALg+XPVLP5kP7IaeatKQCiGpvQAifCg7sXkUH x4cQ==
X-Gm-Message-State AD7BkJL/acJSx0e0GuNw9FmxfS1Sa+Kn24kPQSTRgzdln4mkcvZinys1QWckMyzd7duqBCGGVOnwjJbdTENuMA==
X-Received by 10.50.61.177 with SMTP id q17mr4892908igr.68.1457716126966; Fri, 11 Mar 2016 09:08:46 -0800 (PST)
In-Reply-To <nbus27$1si$1@ger.gmane.org>
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.21
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Xref csiph.com comp.lang.python:104625

Show key headers only | View raw


On Fri, Mar 11, 2016 at 9:34 AM, Wolfgang Maier
<wolfgang.maier@biologie.uni-freiburg.de> wrote:
> On 11.03.2016 15:23, Fillmore wrote:
>>
>> On 03/11/2016 07:13 AM, Wolfgang Maier wrote:
>>>
>>> One lesson for Perl regex users is that in Python many things can be
>>> solved without regexes.
>>> How about defining:
>>>
>>> printable = {chr(n) for n in range(32, 127)}
>>>
>>> then using:
>>>
>>> if (set(my_string) - set(printable)):
>>>      break
>>
>>
>> seems computationally heavy. I have a file with about 70k lines, of
>> which only 20 contain "funny" chars.
>>
>
> Not sure what you call computationally heavy. I just test-parsed a 30 MB
> file (28k lines) with:
>
> with open(my_file) as i:
>     for line in i:
>         if set(line) - printable:
>             continue
>
> and it finished in less than a second.

Did your test file contain on the order of 100 unique characters, or
on the order of 100,000?  Granted that most input data would likely
fall into the former category.

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

non printable (moving away from Perl) Fillmore <fillmore_remove@hotmail.com> - 2016-03-10 19:07 -0500
  Re: non printable (moving away from Perl) Ian Kelly <ian.g.kelly@gmail.com> - 2016-03-10 17:25 -0700
  Re: non printable (moving away from Perl) Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-03-11 01:30 +0000
  Re: non printable (moving away from Perl) Ian Kelly <ian.g.kelly@gmail.com> - 2016-03-10 20:52 -0700
  Re: non printable (moving away from Perl) Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> - 2016-03-11 13:13 +0100
    Re: non printable (moving away from Perl) Fillmore <fillmore_remove@hotmail.com> - 2016-03-11 09:23 -0500
      Re: non printable (moving away from Perl) Peter Otten <__peter__@web.de> - 2016-03-11 16:22 +0100
      Re: non printable (moving away from Perl) Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> - 2016-03-11 17:34 +0100
      Re: non printable (moving away from Perl) Ian Kelly <ian.g.kelly@gmail.com> - 2016-03-11 10:08 -0700
  Re: non printable (moving away from Perl) Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> - 2016-03-11 13:17 +0100
    Re: non printable (moving away from Perl) Marko Rauhamaa <marko@pacujo.net> - 2016-03-11 14:47 +0200
  Re: non printable (moving away from Perl) MRAB <python@mrabarnett.plus.com> - 2016-03-11 19:23 +0000
    Re: non printable (moving away from Perl) Fillmore <fillmore_remove@hotmail.com> - 2016-03-11 14:36 -0500
      Re: non printable (moving away from Perl) Ben Finney <ben+python@benfinney.id.au> - 2016-03-12 06:52 +1100

csiph-web