Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #98121 > unrolled thread

Regular expressions

Started bySeymore4Head <Seymore4Head@Hotmail.invalid>
First post2015-11-02 20:09 -0500
Last post2015-11-03 22:15 +0000
Articles 6 on this page of 106 — 30 participants

Back to article view | Back to comp.lang.python


Contents

  Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-02 20:09 -0500
    Re: Regular expressions MRAB <python@mrabarnett.plus.com> - 2015-11-03 01:19 +0000
      Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-02 22:17 -0500
    Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-02 20:42 -0600
      Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-02 22:17 -0500
        Re: Regular expressions Joel Goldstick <joel.goldstick@gmail.com> - 2015-11-02 22:58 -0500
          Re: Regular expressions rurpy@yahoo.com - 2015-11-02 20:23 -0800
            Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-02 21:38 -0700
              Re: Regular expressions rurpy@yahoo.com - 2015-11-03 16:33 -0800
                Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-03 19:04 -0700
                  Re: Regular expressions Dan Sommers <dan@tombstonezero.net> - 2015-11-04 02:55 +0000
                    Re: Regular expressions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-11-04 14:23 +1100
                      Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-03 20:47 -0700
                        Re: Regular expressions Grant Edwards <invalid@invalid.invalid> - 2015-11-04 13:27 +0000
                      Re: Regular expressions Nobody <nobody@nowhere.invalid> - 2015-11-04 05:05 +0000
                      Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-04 09:57 +0100
                        Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-05 13:28 +1100
                          Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-04 20:48 -0600
                          Re: Regular expressions Ben Finney <ben+python@benfinney.id.au> - 2015-11-05 14:03 +1100
                          Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-05 09:33 +0100
                            Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-05 23:05 +1100
                              Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-05 08:00 -0600
                          Re: Regular expressions Albert van der Horst <albert@spenarnc.xs4all.nl> - 2015-11-05 13:39 +0000
                      Re: Regular expressions Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2015-11-04 08:00 -0500
                      Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-04 08:13 -0700
                        Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-04 18:00 -0500
                          Re: Regular expressions rurpy@yahoo.com - 2015-11-04 16:24 -0800
                            Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-05 13:24 +1100
                              Re: Regular expressions rurpy@yahoo.com - 2015-11-04 21:59 -0800
                                Re: Regular expressions Christian Gollwitzer <auriocus@gmx.de> - 2015-11-05 09:18 +0100
                                  Re: Regular expressions rurpy@yahoo.com - 2015-11-06 11:52 -0800
                                    Re: Regular expressions Christian Gollwitzer <auriocus@gmx.de> - 2015-11-06 21:36 +0100
                                      Re: Regular expressions Larry Martell <larry.martell@gmail.com> - 2015-11-06 15:42 -0500
                            Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-05 11:34 +1100
                              Re: Regular expressions rurpy@yahoo.com - 2015-11-04 22:27 -0800
                      Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-04 09:42 -0600
                        Re: Regular expressions Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2015-11-05 20:55 +1300
                          Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-05 19:06 +1100
                      What does “grep” stand for? (was: Regular expressions) Ben Finney <ben+python@benfinney.id.au> - 2015-11-05 05:24 +1100
                        Re: What does “grep” stand for? Christian Gollwitzer <auriocus@gmx.de> - 2015-11-04 20:38 +0100
                          Re: What does “grep” stand for? Chris Angelico <rosuav@gmail.com> - 2015-11-05 11:42 +1100
                            Re: What does “grep” stand for? Christian Gollwitzer <auriocus@gmx.de> - 2015-11-05 08:32 +0100
                              Re: What does “grep” stand for? Chris Angelico <rosuav@gmail.com> - 2015-11-05 19:00 +1100
                          Re: What does “grep” stand for? Random832 <random832@fastmail.com> - 2015-11-05 10:19 -0500
                            Re: What does “grep” stand for? Grant Edwards <invalid@invalid.invalid> - 2015-11-05 18:29 +0000
                              Re: What does “grep” stand for? Random832 <random832@fastmail.com> - 2015-11-05 14:56 -0500
                                Re: What does “grep” stand for? Grant Edwards <invalid@invalid.invalid> - 2015-11-05 20:19 +0000
                                  Re: What does “grep” stand for? Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2015-11-05 20:18 -0500
                                    Re: What does “grep” stand for? Larry Hudson <orgnut@yahoo.com> - 2015-11-05 19:36 -0800
                                      Re: What does “grep” stand for? Dan Sommers <dan@tombstonezero.net> - 2015-11-06 05:31 +0000
                                      Re: What does “grep” stand for? William Ray Wing <wrw@mac.com> - 2015-11-06 08:25 -0500
                                        Re: What does “grep” stand for? Larry Hudson <orgnut@yahoo.com> - 2015-11-06 19:21 -0800
                                    Re: What does ???grep??? stand for? Grant Edwards <invalid@invalid.invalid> - 2015-11-06 14:15 +0000
                                      Re: What does ???grep??? stand for? Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2015-11-06 20:03 -0500
                      Re: What does “grep” stand for? (was: Regular expressions) Tim Chase <python.list@tim.thechases.com> - 2015-11-04 13:05 -0600
                      Re: Regular expressions Terry Reedy <tjreedy@udel.edu> - 2015-11-04 18:08 -0500
                        Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-04 18:29 -0500
                Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-03 21:12 -0600
                Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-04 14:26 +1100
                Re: Regular expressions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-11-04 14:48 +1100
                  Re: Regular expressions Christian Gollwitzer <auriocus@gmx.de> - 2015-11-04 08:21 +0100
                    Re: Regular expressions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-11-04 19:47 +1100
                      Re: Regular expressions rurpy@yahoo.com - 2015-11-04 06:43 -0800
                  Re: Regular expressions rurpy@yahoo.com - 2015-11-04 06:38 -0800
                    Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-05 01:52 +1100
                      Re: Regular expressions rurpy@yahoo.com - 2015-11-04 16:13 -0800
                        Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-05 11:33 +1100
                          Re: Regular expressions rurpy@yahoo.com - 2015-11-04 21:42 -0800
                        Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-05 13:26 +1100
                          Re: Regular expressions Ben Finney <ben+python@benfinney.id.au> - 2015-11-05 14:07 +1100
                          Re: Regular expressions rurpy@yahoo.com - 2015-11-04 21:54 -0800
                        Re: Regular expressions Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2015-11-05 10:14 +0100
                  Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-04 18:02 -0500
                    Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-05 11:54 +1100
                      Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-05 10:07 -0500
                        Re: Regular expressions rurpy@yahoo.com - 2015-11-06 12:46 -0800
            Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-03 18:15 +1100
              Re: Regular expressions Nick Sarbicki <nick.a.sarbicki@gmail.com> - 2015-11-03 08:43 +0000
              Re: Regular expressions rurpy@yahoo.com - 2015-11-03 16:22 -0800
        Re: Regular expressions Denis McMahon <denismfmcmahon@gmail.com> - 2015-11-03 12:38 +0000
        Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-03 05:53 -0600
        Re: Regular expressions Joel Goldstick <joel.goldstick@gmail.com> - 2015-11-03 10:34 -0500
          Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-03 11:10 -0500
            Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-04 03:20 +1100
              Re: Regular expressions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-11-04 14:35 +1100
                Re: Regular expressions Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2015-11-04 12:41 +0100
      Re: Regular expressions Grant Edwards <invalid@invalid.invalid> - 2015-11-03 14:56 +0000
    Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-02 20:51 -0700
      Re: Regular expressions rurpy@yahoo.com - 2015-11-02 20:23 -0800
        Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-02 21:33 -0700
        Re: Regular expressions Robin Koch <robin.koch@t-online.de> - 2015-11-03 23:58 +0100
    Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-03 10:25 +0100
    Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-03 05:50 -0600
    Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-03 15:00 +0100
      Re: Regular expressions Jussi Piitulainen <harvesting@makes.email.invalid> - 2015-11-03 17:12 +0200
        Irregular last line in a text file, was Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-03 16:35 +0100
          Re: Irregular last line in a text file, was Re: Regular expressions Jussi Piitulainen <harvesting@makes.email.invalid> - 2015-11-03 18:42 +0200
        Re: Irregular last line in a text file, was Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-03 10:56 -0600
          Re: Irregular last line in a text file, was Re: Regular expressions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-11-04 14:39 +1100
            Re: Irregular last line in a text file, was Re: Regular expressions Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2015-11-04 10:07 +0000
            Re: Irregular last line in a text file, was Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-04 09:33 -0600
        Re: Irregular last line in a text file, was Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-03 18:44 +0100
        Re: Irregular last line in a text file, was Re: Regular expressions Ian Kelly <ian.g.kelly@gmail.com> - 2015-11-03 11:33 -0700
        Re: Irregular last line in a text file, was Re: Regular expressions Ian Kelly <ian.g.kelly@gmail.com> - 2015-11-03 11:39 -0700
        Re: Irregular last line in a text file, was Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-03 13:45 -0600
          Re: Irregular last line in a text file, was Re: Regular expressions Grant Edwards <invalid@invalid.invalid> - 2015-11-03 22:15 +0000

Page 6 of 6 — ← Prev page 1 2 3 4 5 [6]


#98232 — Re: Irregular last line in a text file, was Re: Regular expressions

FromTim Chase <python.list@tim.thechases.com>
Date2015-11-04 09:33 -0600
SubjectRe: Irregular last line in a text file, was Re: Regular expressions
Message-ID<mailman.20.1446651776.16136.python-list@python.org>
In reply to#98205
On 2015-11-04 14:39, Steven D'Aprano wrote:
> On Wednesday 04 November 2015 03:56, Tim Chase wrote:
>> Or even more valuable to me:
>> 
>>   with open(..., newline="strip") as f:
>>     assert all(not line.endswith(("\n", "\r")) for line in f)
> 
> # Works only on Windows text files.
> def chomp(lines):
>     for line in lines:
>         yield line.rstrip('\r\n')

.rstrip() takes a string that is a set of characters, so it will
remove any \r or \n at the end of the string (so it works with
both Windows & *nix line-endings) whereas just using .rstrip()
without a parameter can throw away data you might want:

  >>> "hello \r\n\r\r\n\n\n".rstrip("\r\n")
  'hello '
  >>> "hello \r\n\r\r\n\n\n".rstrip()
  'hello'

-tkc



[toc] | [prev] | [next] | [standalone]


#98179 — Re: Irregular last line in a text file, was Re: Regular expressions

FromPeter Otten <__peter__@web.de>
Date2015-11-03 18:44 +0100
SubjectRe: Irregular last line in a text file, was Re: Regular expressions
Message-ID<mailman.40.1446572684.8789.python-list@python.org>
In reply to#98163
Tim Chase wrote:

> On 2015-11-03 16:35, Peter Otten wrote:
>> I wish there were a way to prohibit such files. Maybe a special
>> value
>> 
>> with open(..., newline="normalize") f:
>>     assert all(line.endswith("\n") for line in f)
>> 
>> to ensure that all lines end with "\n"?
> 
> Or even more valuable to me:
> 
>   with open(..., newline="strip") as f:
>     assert all(not line.endswith(("\n", "\r")) for line in f)
> 
> because I have countless loops that look something like
> 
>   with open(...) as f:
>     for line in f:
>       line = line.rstrip('\r\n')
>       process(line)

Indeed. It's obvious now you're saying it...

[toc] | [prev] | [next] | [standalone]


#98183 — Re: Irregular last line in a text file, was Re: Regular expressions

FromIan Kelly <ian.g.kelly@gmail.com>
Date2015-11-03 11:33 -0700
SubjectRe: Irregular last line in a text file, was Re: Regular expressions
Message-ID<mailman.44.1446575677.8789.python-list@python.org>
In reply to#98163
On Tue, Nov 3, 2015 at 9:56 AM, Tim Chase <python.list@tim.thechases.com> wrote:
> On 2015-11-03 16:35, Peter Otten wrote:
>> I wish there were a way to prohibit such files. Maybe a special
>> value
>>
>> with open(..., newline="normalize") f:
>>     assert all(line.endswith("\n") for line in f)
>>
>> to ensure that all lines end with "\n"?
>
> Or even more valuable to me:
>
>   with open(..., newline="strip") as f:
>     assert all(not line.endswith(("\n", "\r")) for line in f)
>
> because I have countless loops that look something like
>
>   with open(...) as f:
>     for line in f:
>       line = line.rstrip('\r\n')
>       process(line)

What would happen if you read a file opened like this without
iterating over lines?

[toc] | [prev] | [next] | [standalone]


#98184 — Re: Irregular last line in a text file, was Re: Regular expressions

FromIan Kelly <ian.g.kelly@gmail.com>
Date2015-11-03 11:39 -0700
SubjectRe: Irregular last line in a text file, was Re: Regular expressions
Message-ID<mailman.45.1446576019.8789.python-list@python.org>
In reply to#98163
On Tue, Nov 3, 2015 at 11:33 AM, Ian Kelly <ian.g.kelly@gmail.com> wrote:
> On Tue, Nov 3, 2015 at 9:56 AM, Tim Chase <python.list@tim.thechases.com> wrote:
>> Or even more valuable to me:
>>
>>   with open(..., newline="strip") as f:
>>     assert all(not line.endswith(("\n", "\r")) for line in f)
>>
>> because I have countless loops that look something like
>>
>>   with open(...) as f:
>>     for line in f:
>>       line = line.rstrip('\r\n')
>>       process(line)
>
> What would happen if you read a file opened like this without
> iterating over lines?

I think I'd go with this:

>>> def strip_newlines(iterable):
...     for line in iterable:
...         yield line.rstrip('\r\n')
...
>>> list(strip_newlines(['one\n', 'two\r', 'three']))
['one', 'two', 'three']

Or if I care about optimizing the for loop (but we're talking about
file I/O, so probably not), this might be faster:

>>> import operator
>>> def strip_newlines(iterable):
...     return map(operator.methodcaller('rstrip', '\r\n'), iterable)
...
>>> list(strip_newlines(['one\n', 'two\r', 'three']))
['one', 'two', 'three']

Then the iteration is just:
    for line in strip_newlines(f):

[toc] | [prev] | [next] | [standalone]


#98188 — Re: Irregular last line in a text file, was Re: Regular expressions

FromTim Chase <python.list@tim.thechases.com>
Date2015-11-03 13:45 -0600
SubjectRe: Irregular last line in a text file, was Re: Regular expressions
Message-ID<mailman.48.1446581076.8789.python-list@python.org>
In reply to#98163
On 2015-11-03 11:39, Ian Kelly wrote:
> >> because I have countless loops that look something like
> >>
> >>   with open(...) as f:
> >>     for line in f:
> >>       line = line.rstrip('\r\n')
> >>       process(line)  
> >
> > What would happen if you read a file opened like this without
> > iterating over lines?  
> 
> I think I'd go with this:
> 
> >>> def strip_newlines(iterable):  
> ...     for line in iterable:
> ...         yield line.rstrip('\r\n')
> ...

Behind the scenes, this is what I usually end up doing, but the
effective logic is the same.  I just like the notion of being able to
tell open() that I want iteratation to happen over the *content* of
the lines, ignoring the new-line delimiters.

I can't think of more than 1-2 times in my last 10+ years of
Pythoning that I've actually had potential use for the newlines,
usually on account of simply feeding the entire line back into some
filelike.write() method where I wanted the newlines in the resulting
file. But even in those cases, I seem to recall stripping off the
arbitrary newlines (LF vs. CR/LF) and then adding my own known line
delimiter.

-tkc


[toc] | [prev] | [next] | [standalone]


#98190 — Re: Irregular last line in a text file, was Re: Regular expressions

FromGrant Edwards <invalid@invalid.invalid>
Date2015-11-03 22:15 +0000
SubjectRe: Irregular last line in a text file, was Re: Regular expressions
Message-ID<n1bbmu$qse$1@reader1.panix.com>
In reply to#98188
On 2015-11-03, Tim Chase <python.list@tim.thechases.com> wrote:

[re. iterating over lines in a file]

> I can't think of more than 1-2 times in my last 10+ years of
> Pythoning that I've actually had potential use for the newlines,

If you can think of 1-2 times when you've been interating over the
lines in a file and wanted to see the EOL markers, then that's 1-2
times more than I've ever wanted to see them since I started using
Python 16 years ago...

-- 
Grant Edwards               grant.b.edwards        Yow! !  Up ahead!  It's a
                                  at               DONUT HUT!!
                              gmail.com            

[toc] | [prev] | [standalone]


Page 6 of 6 — ← Prev page 1 2 3 4 5 [6]

Back to top | Article view | comp.lang.python


csiph-web