Groups > comp.lang.python > #68370 > unrolled thread

Sharing: File Reader Generator with & w/o Policy

Started by	Mark H Harris <harrismh777@gmail.com>
First post	2014-03-15 16:38 -0500
Last post	2014-03-16 10:36 +0000
Articles	15 — 5 participants

Back to article view | Back to comp.lang.python

  Sharing: File Reader Generator  with & w/o Policy Mark H Harris <harrismh777@gmail.com> - 2014-03-15 16:38 -0500
    Re: Sharing: File Reader Generator  with & w/o Policy MRAB <python@mrabarnett.plus.com> - 2014-03-15 21:56 +0000
      Re: Sharing: File Reader Generator  with & w/o Policy Mark H Harris <harrismh777@gmail.com> - 2014-03-15 19:36 -0500
      Re: Sharing: File Reader Generator  with & w/o Policy Mark H Harris <harrismh777@gmail.com> - 2014-03-15 19:45 -0500
      Re: Sharing: File Reader Generator  with & w/o Policy Mark H Harris <harrismh777@gmail.com> - 2014-03-15 20:06 -0500
        Re: Sharing: File Reader Generator  with & w/o Policy Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-03-16 01:32 +0000
          Re: Sharing: File Reader Generator  with & w/o Policy Mark H Harris <harrismh777@gmail.com> - 2014-03-15 21:52 -0500
    Re: Sharing: File Reader Generator  with & w/o Policy Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-16 02:01 +0000
      Re: Sharing: File Reader Generator  with & w/o Policy Mark H Harris <harrismh777@gmail.com> - 2014-03-15 22:34 -0500
        Re: Sharing: File Reader Generator with & w/o Policy Chris Angelico <rosuav@gmail.com> - 2014-03-16 14:48 +1100
          Re: Sharing: File Reader Generator with & w/o Policy Mark H Harris <harrismh777@gmail.com> - 2014-03-15 23:47 -0500
            Re: Sharing: File Reader Generator with & w/o Policy Chris Angelico <rosuav@gmail.com> - 2014-03-16 16:41 +1100
              Re: Sharing: File Reader Generator with & w/o Policy Mark H Harris <harrismh777@gmail.com> - 2014-03-16 01:19 -0500
                Re: Sharing: File Reader Generator with & w/o Policy Chris Angelico <rosuav@gmail.com> - 2014-03-16 17:37 +1100
            Re: Sharing: File Reader Generator with & w/o Policy Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-16 10:36 +0000

#68370 — Sharing: File Reader Generator with & w/o Policy

From	Mark H Harris <harrismh777@gmail.com>
Date	2014-03-15 16:38 -0500
Subject	Sharing: File Reader Generator with & w/o Policy
Message-ID	<lg2h87$6fj$1@speranza.aioe.org>

hi folks, I am posting to share a File Reader Generator which I have
been playing with, that simplifies reading of text files on-demand:
like log files, config files, small record flat data-bases, &c.

I have two generators to share, one with & one without "policy".
The idea is to have the generator open and close the file (with error
checking:  try-finish block) and then maintain its state for on-demand
reading either into memory (as list or dict) or for in-line processing.

I will demonstrate the generators here, and then post the code 
following. The generator will be reading a path+filename of a local disk 
file and printing it as in this simple case without policy:
 >>> from my_utils import *

 >>> for record in fName(path+"my_fox"):
	      print(record)

The quick brown fox jumped
over the lazy dog's tail.

Now is the time for all
good women to come to the
aid of computer science!
 >>>

The second generator adds "policy" to the generator processing and
yields tuples, rather than strings. Each tuple contains the record 
number (from zero), and record length (minus the line end), and the 
record itself (stripped of the line end):
 >>>
 >>> for record in fnName(path+"my_fox"):
	      print(record)

(0, 26, 'The quick brown fox jumped')
(1, 25, "over the lazy dog's tail.")
(2, 0, '')
(3, 23, 'Now is the time for all')
(4, 25, 'good women to come to the')
(5, 24, 'aid of computer science!')
 >>>
 >>>

I will now share the source by allowing the fName(filename) utility
to expose itself.  Enjoy:
 >>>
 >>> for record in fName(path+"my_utils.py"):
	      print(record)

#---------------------------------------------------------
# fName(filename)   generator: file reader iterable
#---------------------------------------------------------
def fName(filename):
     try:
         fh = open(filename, 'r')
     except FileNotFoundError as err_code:
         print (err_code)
     else:
         while True:
             linein = fh.readline()
             if (linein!=''):
                 yield(linein.strip('\n'))
             else:
                 break
         fh.close()
     finally:
         None

#---------------------------------------------------------
# fnName(filename)   generator: file reader iterable
#---------------------------------------------------------
def fnName(filename):
     try:
         fh = open(filename, 'r')
     except FileNotFoundError as err_code:
         print (err_code)
     else:
         line_count = 0
         while True:
             linein = fh.readline()
             if (linein!=''):
                 lineout = linein.strip('\n')
                 length = len(lineout)
                 yield((line_count, length, lineout))
                 line_count+=1
             else:
                 break
         fh.close()
     finally:
         None

#---------------------------------------------------------
# {next util}
#---------------------------------------------------------
 >>>

mark h harris

[toc] | [next] | [standalone]

#68371

From	MRAB <python@mrabarnett.plus.com>
Date	2014-03-15 21:56 +0000
Message-ID	<mailman.8156.1394920773.18130.python-list@python.org>
In reply to	#68370

On 2014-03-15 21:38, Mark H Harris wrote:
> hi folks, I am posting to share a File Reader Generator which I have
> been playing with, that simplifies reading of text files on-demand:
> like log files, config files, small record flat data-bases, &c.
>
> I have two generators to share, one with & one without "policy".
> The idea is to have the generator open and close the file (with error
> checking:  try-finish block) and then maintain its state for on-demand
> reading either into memory (as list or dict) or for in-line processing.
>
> I will demonstrate the generators here, and then post the code
> following. The generator will be reading a path+filename of a local disk
> file and printing it as in this simple case without policy:
>   >>> from my_utils import *
>
>   >>> for record in fName(path+"my_fox"):
> 	      print(record)
>
> The quick brown fox jumped
> over the lazy dog's tail.
>
> Now is the time for all
> good women to come to the
> aid of computer science!
>   >>>
>
> The second generator adds "policy" to the generator processing and
> yields tuples, rather than strings. Each tuple contains the record
> number (from zero), and record length (minus the line end), and the
> record itself (stripped of the line end):
>   >>>
>   >>> for record in fnName(path+"my_fox"):
> 	      print(record)
>
> (0, 26, 'The quick brown fox jumped')
> (1, 25, "over the lazy dog's tail.")
> (2, 0, '')
> (3, 23, 'Now is the time for all')
> (4, 25, 'good women to come to the')
> (5, 24, 'aid of computer science!')
>   >>>
>   >>>
>
> I will now share the source by allowing the fName(filename) utility
> to expose itself.  Enjoy:
>   >>>
>   >>> for record in fName(path+"my_utils.py"):
> 	      print(record)
>
> #---------------------------------------------------------
> # fName(filename)   generator: file reader iterable
> #---------------------------------------------------------
> def fName(filename):
>       try:
>           fh = open(filename, 'r')
>       except FileNotFoundError as err_code:
>           print (err_code)
>       else:
>           while True:
>               linein = fh.readline()
>               if (linein!=''):
>                   yield(linein.strip('\n'))
>               else:
>                   break
>           fh.close()
>       finally:
>           None
>
I don't like how it always swallows the exception, so you can't tell
whether the file doesn't exist or exists but is empty, and no way to
specify the file's encoding.

Why do you have the 'finally' clause with 'None' in it? Instead of None
you should have 'pass', or, better yet, omit the clause entirely.

You can also shorten it somewhat:

def fName(filename):
     try:
         with open(filename, 'r') as fh:
             for linein in fh:
                 yield linein.strip('\n')
     except FileNotFoundError as err_code:
         print(err_code)

[snip]

[toc] | [prev] | [next] | [standalone]

#68372

From	Mark H Harris <harrismh777@gmail.com>
Date	2014-03-15 19:36 -0500
Message-ID	<lg2rm6$t3p$1@speranza.aioe.org>
In reply to	#68371

On 3/15/14 4:56 PM, MRAB wrote:

> I don't like how it always swallows the exception, so you can't tell
> whether the file doesn't exist or exists but is empty, and no way to
> specify the file's encoding.

Yes, the error handling needs more robustness/ and instead of printing
the errcode, my actual model on system will log it.
>
> Why do you have the 'finally' clause with 'None' in it? Instead of None
> you should have 'pass', or, better yet, omit the clause entirely.

Its a stub-in really, and that's all at this point.  The 'finally' 
happens regardless of whether the exception occurs, and I don't need
anything there yet, just don't want to forget it.

I've been playing around with wrapping generators within generators for 
readability and simplicity. Like this, where I'm going to wrap the 
fnName(filename) generator within a getnumline(filename) wrapper:

 >>> from my_utils import *

 >>> def getnumline(filename):
	      for record in fnName(filename):
		      yield(record)

 >>> line = getnumline("my_foxy")
 >>>
 >>> next(line)
(0, 26, 'The quick brown fox jumped')
 >>>
 >>> next(line)
(1, 25, "over the lazy dog's tail.")
 >>>

Or this, where I put it all in memory as a dict:

 >>> d1={}
 >>> for line in getnumline("my_foxy"):
	d1[line[0]]=(line[1], line[2])

 >>> for key in d1:
	print (d1[key])

(26, 'The quick brown fox jumped')
(25, "over the lazy dog's tail.")
(0, '')
(23, 'Now is the time for all')
(25, 'good women to come to the')
(24, 'aid of computer science!')
 >>>

marcus

[toc] | [prev] | [next] | [standalone]

#68373

From	Mark H Harris <harrismh777@gmail.com>
Date	2014-03-15 19:45 -0500
Message-ID	<lg2s85$u7q$1@speranza.aioe.org>
In reply to	#68371

On 3/15/14 4:56 PM, MRAB wrote:
>
> def fName(filename):
>     try:
>         with open(filename, 'r') as fh:
>             for linein in fh:
>                 yield linein.strip('\n')
>     except FileNotFoundError as err_code:
>         print(err_code)
>
> [snip]
>

The "with" confuses me because I am not sure specifically what happens 
in the context manager. I'm taking it for granted in this case that 
__exit__() closes the file?

I am finding many examples of file handling using the context manager, 
but none so far that wrap into a generator; more often and file object. 
Is there a preference for file object over generator?

marcus

[toc] | [prev] | [next] | [standalone]

#68374

From	Mark H Harris <harrismh777@gmail.com>
Date	2014-03-15 20:06 -0500
Message-ID	<lg2te7$oq$1@speranza.aioe.org>
In reply to	#68371

On 3/15/14 4:56 PM, MRAB wrote:

>
> You can also shorten it somewhat:

Thanks, I like it... I shortened the fnName() also:

#---------------------------------------------------------
# fn2Name(filename)   generator: file reader iterable
#---------------------------------------------------------
def fn2Name(filename):
     try:
         with open(filename, 'r') as fh:    <=========== can you tell me
             line_count = 0
             for linein in fh:
                 lineout = linein.strip('\n')
                 length = len(lineout)
                 yield((line_count, length, lineout))
                 line_count+=1
     except FileNotFoundError as err_code:
         print(err_code)

#---------------------------------------------------------


...  where I can go to find out (for specific contexts) what the 
__init__() and __exit__() are actually doing, like for instance in this 
case does the filename get closed in __exit__(), and also if errors 
occur does the file close automatically?  thanks

marcus

[toc] | [prev] | [next] | [standalone]

#68375

From	Mark Lawrence <breamoreboy@yahoo.co.uk>
Date	2014-03-16 01:32 +0000
Message-ID	<mailman.8157.1394933542.18130.python-list@python.org>
In reply to	#68374

On 16/03/2014 01:06, Mark H Harris wrote:
> On 3/15/14 4:56 PM, MRAB wrote:
>
>>
>> You can also shorten it somewhat:
>
> Thanks, I like it... I shortened the fnName() also:
>
> #---------------------------------------------------------
> # fn2Name(filename)   generator: file reader iterable
> #---------------------------------------------------------
> def fn2Name(filename):
>      try:
>          with open(filename, 'r') as fh:    <=========== can you tell me
>              line_count = 0
>              for linein in fh:
>                  lineout = linein.strip('\n')
>                  length = len(lineout)
>                  yield((line_count, length, lineout))
>                  line_count+=1
>      except FileNotFoundError as err_code:
>          print(err_code)
>
> #---------------------------------------------------------
>
>
> ...  where I can go to find out (for specific contexts) what the
> __init__() and __exit__() are actually doing, like for instance in this
> case does the filename get closed in __exit__(), and also if errors
> occur does the file close automatically?  thanks
>
> marcus

Start here 
http://docs.python.org/3/library/stdtypes.html#context-manager-types

-- 
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.

Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com

[toc] | [prev] | [next] | [standalone]

#68378

From	Mark H Harris <harrismh777@gmail.com>
Date	2014-03-15 21:52 -0500
Message-ID	<lg33lv$cp7$1@speranza.aioe.org>
In reply to	#68375

On 3/15/14 8:32 PM, Mark Lawrence wrote:

> Start here
> http://docs.python.org/3/library/stdtypes.html#context-manager-types
>

Thanks Mark. I have three books open, and that doc, and wading through. 
You might like to know (as an aside) that I'm done with gg. Got back up 
here with a real news reader and server. All is good that way. gg has 
not been stable over the past three weeks, and this weekend it 
completely quit working. It looks like this reader|client handles the 
line wrapping correctly. whoohoo.

marcus

[toc] | [prev] | [next] | [standalone]

#68377

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2014-03-16 02:01 +0000
Message-ID	<532505f0$0$29994$c3e8da3$5496439d@news.astraweb.com>
In reply to	#68370

On Sat, 15 Mar 2014 16:38:18 -0500, Mark H Harris wrote:

> hi folks, I am posting to share a File Reader Generator which I have
> been playing with, that simplifies reading of text files on-demand: like
> log files, config files, small record flat data-bases, &c.

Reading from files is already pretty simple. I would expect that it will 
be harder to learn the specific details of custom, specialised, file 
readers that *almost*, but not quite, do what you want, than to just 
write a couple of lines of code to do what you need when you need it. 
Particularly for interactive use, where robustness is less important than 
ease of use.

> I have two generators to share, one with & one without "policy". 

What's "policy"?

> The idea is to have the generator open and close the file (with error
> checking:  try-finish block) and then maintain its state for on-demand
> reading either into memory (as list or dict) or for in-line processing.
> 
> I will demonstrate the generators here, and then post the code
> following. The generator will be reading a path+filename of a local disk
> file and printing it as in this simple case without policy:
>
> >>> from my_utils import *
> >>> for record in fName(path+"my_fox"):
> 	      print(record)
> 
> The quick brown fox jumped
> over the lazy dog's tail.

What's "fName" mean? "File name"? That's a horribly misleading name, 
since it *takes* a file name as argument, it doesn't return one. That 
would be like renaming the len() function to "list", since it takes a 
list as argument. Function and class names should be descriptive, giving 
at least a hint as to what they do.

It looks to me that this fName just iterates over the lines in a file, 
which makes it pretty close to just:

for line in open(path + "my_fox"):
    print(line)

> The second generator adds "policy" to the generator processing and
> yields tuples, rather than strings. Each tuple contains the record
> number (from zero), and record length (minus the line end), and the
> record itself (stripped of the line end):

I presume that "record" here means "line", rather than an actual record 
from a flat file with fixed-width fields, or some delimiter other than 
newlines.

for i, line in enumerate(open(pathname + "my_fox")):
    print((i, len(line), line))

>  >>> for record in fnName(path+"my_fox"):
> 	      print(record)

What's "fnName" mean? Perhaps "filename name"? "function name"? Again, 
the name gives no hint as to what the function does.

> def fName(filename):
>      try:
>          fh = open(filename, 'r')
>      except FileNotFoundError as err_code:
>          print (err_code)

For interactive use, this is *just barely* acceptable as a (supposedly) 
user-friendly alternative to a stack trace. 

[Aside: I don't believe that insulating programmers from tracebacks does 
them any favours. Like the Dark Side of the Force, hiding errors is 
seductively attractive, but ultimately harmful, since error tracebacks 
are intimidating to beginners but an essential weapon in the battle 
against buggy code. But reading tracebacks is a skill programmers have to 
learn. Hiding tracebacks does them no favours, it just makes it harder 
for them to learn good debugging skills, and encourages them to treat 
errors as *something to hide* rather than *something to fix*.]

But as a reusable tool for use in non-interactive code, this function 
fails badly. By capturing the exception, it makes it painfully difficult 
for the caller to have control over error-handling. You cannot let the 
exception propagate to some other part of the application for handling; 
you cannot log the exception, or ignore it, or silently swallow the 
exception and try another file. The fName function makes the decision for 
you: it will print the error to standard output (not even standard 
error!) no matter what you want. That's the very essence of *user-
hostile* for library code.

Worse, it's inconsistent! Some errors are handled normally, with an 
exception. It's only FileNotFoundError that is captured and printed. So 
if the user wants to re-use this function and do something with any 
exceptions, she has to use *two* forms of error handling:

(1) wrap it in try...except handler to capture any exception other 
    than FileNotFoundError; and

(2) intercept writes to standard out, capture the error message, and 
    reverse-engineer what went wrong.

instead of just one.

>      else:
>          while True:
>              linein = fh.readline()
>              if (linein!=''):
>                  yield(linein.strip('\n'))
>              else:
>                  break
>          fh.close()

Apart from stripping newlines, which is surely better left to the user 
(what if they need to see the newline? by stripping them automatically, 
the user cannot distinguish between a file which ends with a newline 
character and one which does not), this part is just a re-invention of 
the existing wheel. File objects are already iterable, and yield the 
lines of the file.

>      finally:
>          None

The finally clause is pointless, and not even written idiomatically as a 
do-nothing statement ("pass").

> def fnName(filename):
>      try:
>          fh = open(filename, 'r')
>      except FileNotFoundError as err_code:
>          print (err_code)
>      else:
>          line_count = 0
>          while True:
>              linein = fh.readline()
>              if (linein!=''):
>                  lineout = linein.strip('\n')
>                  length = len(lineout)
>                  yield((line_count, length, lineout))
>                  line_count+=1
>              else:
>                  break
>          fh.close()
>      finally:
>          None

This function re-implements the fName function, except for a simple 
addition. It could be written as:

def fnName(filename):
    for count, line in enumerate(fName(filename)):
        yield (count, len(line), line)

-- 
Steven D'Aprano
http://import-that.dreamwidth.org/

[toc] | [prev] | [next] | [standalone]

#68380

From	Mark H Harris <harrismh777@gmail.com>
Date	2014-03-15 22:34 -0500
Message-ID	<lg364r$guq$1@speranza.aioe.org>
In reply to	#68377

On 3/15/14 9:01 PM, Steven D'Aprano wrote:
> Reading from files is already pretty simple. I would expect that it will
> be harder to learn the specific details of custom, specialised, file
> readers that *almost*, but not quite, do what you want, than to just
> write a couple of lines of code to do what you need when you need it.
> Particularly for interactive use, where robustness is less important than
> ease of use.

    Yes. What I'm finding is that I'm coding the same 4-6 lines of code 
with every file open (I do want error handling, at least for 
FileNotFoundError) and I only want it to be two lines, read the file 
into a list with error handling.

> What's "policy"?

    That's part of what I personally struggle with (frequently) is do I 
place the policy in the generator, or do I handle it on the outside. For 
instance, I normally strip the line-end and I want to know the record 
lengths. I also may want to know the record number from arrival 
sequence. This policy can be handled in the generator; although, I could 
have handled it outside too.

> for i, line in enumerate(open(pathname + "my_fox")):
>      print((i, len(line), line))

I like it...  and this is where I've always been, when I finally said to 
myself, yuk.  yes, it technically works very well. But, its ugly. And I 
don't mean its technically ugly, I mean its aesthetically ugly and not 
user-easy-to-read.  (I know that's all subjective)

for line in getnumline(path+"my_foxy")):
       print(line)

In this case getnumline() is a generator wrapper around fName(). It of 
course doesn't do anything different than the two lines you listed, but 
it is immediately easier to tell what is happening; even if you're not 
an experienced python programmer.

> [Aside: I don't believe that insulating programmers from tracebacks does
> them any favours.

Yes. I think you're right about that.  But what if they're not 
programmers; what if they're just application users that don't have a 
clue what a trace-back is, and just want to know that the file does not 
exist?  And right away they realize that, oops, I spelled the filename 
wrong.  Yeaah, I struggle with this as I'm trying to simplify, because 
personally I want to see the trace back info.

> Worse, it's inconsistent! Some errors are handled normally, with an
> exception. It's only FileNotFoundError that is captured and printed. So
> if the user wants to re-use this function and do something with any
> exceptions, she has to use *two* forms of error handling:

Yes. The exception handling needs to handle all normal errors.
>
> (1) wrap it in try...except handler to capture any exception other
>      than FileNotFoundError; and
>
> (2) intercept writes to standard out, capture the error message, and
>      reverse-engineer what went wrong.

Ok.

> Apart from stripping newlines, which is surely better left to the user
> (what if they need to see the newline? by stripping them automatically,
> the user cannot distinguish between a file which ends with a newline
> character and one which does not), this part is just a re-invention of
> the existing wheel. File objects are already iterable, and yield the
> lines of the file.

Yes, this is based on my use case, which never needs the line-ends, in 
fact they are a pain. These files are variable record length and the 
only thing the newline is used for is delimiting the records.

>
> def fnName(filename):
>      for count, line in enumerate(fName(filename)):
>          yield (count, len(line), line)
>
I like this, thanks!   enumerate and I are becoming friends.

I like this case philosophically because it is a both | and.  The policy 
is contained in the wrapper generator using enumerate() and len() 
leaving the fName() generator to produce the line.

And you are right about another thing,  I just want to use this thing 
over and over.

for line in getnumline(filename):
     {whatever}

    There does seem to be just one way of doing this (file reads) but 
there are actually many ways of doing this. Is a file object really 
better than a generator, are there good reasons for using the generator, 
are there absolute cases for using a file object?

marcus

[toc] | [prev] | [next] | [standalone]

#68382 — Re: Sharing: File Reader Generator with & w/o Policy

From	Chris Angelico <rosuav@gmail.com>
Date	2014-03-16 14:48 +1100
Subject	Re: Sharing: File Reader Generator with & w/o Policy
Message-ID	<mailman.8160.1394941726.18130.python-list@python.org>
In reply to	#68380

On Sun, Mar 16, 2014 at 2:34 PM, Mark H Harris <harrismh777@gmail.com> wrote:
> And you are right about another thing,  I just want to use this thing over
> and over.
>
> for line in getnumline(filename):
>     {whatever}
>
>    There does seem to be just one way of doing this (file reads) but there
> are actually many ways of doing this. Is a file object really better than a
> generator, are there good reasons for using the generator, are there
> absolute cases for using a file object?

I recommend you read up on the Rule of Three. Not the comedic
principle - although that's worth knowing about too - but the
refactoring rule. [1]

As a general rule, code should be put into a function when it's been
done three times the same way. It depends a bit on how similar the
versions are, of course; having two places where the exact same thing
is done might well be enough to refactor, and sometimes you need to
see four or five places doing something only broadly similar before
you can figure out what the common part is, but most of the time,
three usages is the point to give it a name.

There's a cost to refactoring. Suddenly there's a new primitive on the
board - a new piece of language. If you can't give it a good name,
that's potentially a high cost. Splitting out all sorts of things into
generators when you could use well-known primitives like enumerate
gets expensive fast - what's the difference between fName and fnName?
I certainly wouldn't be able to call that, without actually looking
them up.

Let your use-cases justify your refactoring.

ChrisA

[1] https://en.wikipedia.org/wiki/Rule_of_three_(computer_programming)

[toc] | [prev] | [next] | [standalone]

#68383 — Re: Sharing: File Reader Generator with & w/o Policy

From	Mark H Harris <harrismh777@gmail.com>
Date	2014-03-15 23:47 -0500
Subject	Re: Sharing: File Reader Generator with & w/o Policy
Message-ID	<lg3ad2$obv$1@speranza.aioe.org>
In reply to	#68382

On 3/15/14 10:48 PM, Chris Angelico wrote:
> There's a cost to refactoring. Suddenly there's a new primitive on the
> board - a new piece of language . . . Splitting out all sorts of things into
> generators when you could use well-known primitives like enumerate
> gets expensive fast {snip}
>
>
> [1] https://en.wikipedia.org/wiki/Rule_of_three_(computer_programming)

Very good to remember. I am finding the temptation to make all kinds of 
generators (as you noted above). Its just that the python generator 
makes it so easy to define a function that maintains state between calls 
(of next() in this case) and so its also so easy to want to use them... 
almost forgetting about primitives!

And the rule of three is one of those things that sneaks up on oneself. 
I have actually coded about seven (7) such cases when I discovered that 
they were all identical. I am noticing that folks code the same file 
reader cases "with open() as fh: yadda yadda"  and I've noticed that 
they are all pretty close to the same. Wouldn't it be nice to have one 
simpler getline() or getnumline() name that does this one simple thing 
once and for all. But as simple as it is, it isn't. Well, as you say, 
use cases need to determine code refactoring.

The other thing I'm tempted to do is to find names (even new names) that 
read like English closely (whatever I mean by that) so that there is no 
question about what is going on to a non expert.

for line in getnumline(file):
       {whatever}

Well, what if there were a project called SimplyPy, or some such, that 
boiled the python language down to a (Rexx like) or (BASIC like) syntax 
and usage so that ordinary folks could code out problems (like they did 
in 1964) and expert users could use it too including everything else 
they know about python? Would it be good?

A SimplyPy coder would use constructs similar to other procedural 
languages (like Rexx, Pascal, even C) and without knowing the plethora 
of Python intrinsics could solve problems, yet not be an "expert".

SimplyPy would be a structured subset of the normal language for 
learning and use (very small book/tutorial/ think the Rexx handbook, or 
the K&R).

Its a long way off, and I'm just now experimenting. I'm trying to get my 
hands around context managers (and other things). This is an idea I got 
from Anthony Briggs' Hello Python! (forward SteveHolden) from Manning 
books.  Its very small, lite weight, handles real work, but--- its still 
too big. I am wanting to condense it even further, providing the minimal 
basic core language as an end application product rather than the 
"expert" computer science language that will run under it.

or, over it, as you like.

(you think this is a nutty idea?)

marcus

[toc] | [prev] | [next] | [standalone]

#68384 — Re: Sharing: File Reader Generator with & w/o Policy

From	Chris Angelico <rosuav@gmail.com>
Date	2014-03-16 16:41 +1100
Subject	Re: Sharing: File Reader Generator with & w/o Policy
Message-ID	<mailman.8161.1394948516.18130.python-list@python.org>
In reply to	#68383

On Sun, Mar 16, 2014 at 3:47 PM, Mark H Harris <harrismh777@gmail.com> wrote:
> On 3/15/14 10:48 PM, Chris Angelico wrote:
>>
>> There's a cost to refactoring. Suddenly there's a new primitive on the
>> board - a new piece of language . . . Splitting out all sorts of things
>> into
>>
>> generators when you could use well-known primitives like enumerate
>> gets expensive fast {snip}
>>
>>
>> [1] https://en.wikipedia.org/wiki/Rule_of_three_(computer_programming)
>
>
> Very good to remember. I am finding the temptation to make all kinds of
> generators (as you noted above). Its just that the python generator makes it
> so easy to define a function that maintains state between calls (of next()
> in this case) and so its also so easy to want to use them... almost
> forgetting about primitives!

General rule of thumb: Every object in the same namespace should be
readily distinguishable by name alone. And if doing that makes your
names so long that the function signature is longer than the function
body, it might be better to not have that as a function :)

Also, I'd consider something code smell if a name is used in only one
place. Maybe not so much with local variables, as there are other
reasons to separate things out, but a function that gets called from
only one place probably doesn't need to exist at top-level. (Of
course, a published API will often seem to have unused or little-used
functions, because they're being provided to the caller. They don't
count.)

> And the rule of three is one of those things that sneaks up on oneself. I
> have actually coded about seven (7) such cases when I discovered that they
> were all identical. I am noticing that folks code the same file reader cases
> "with open() as fh: yadda yadda"  and I've noticed that they are all pretty
> close to the same. Wouldn't it be nice to have one simpler getline() or
> getnumline() name that does this one simple thing once and for all. But as
> simple as it is, it isn't. Well, as you say, use cases need to determine
> code refactoring.

If getline() is doing nothing that the primitive doesn't, and
getnumline is just enumerate, then they're not achieving anything
beyond shielding you from the primitives.

> The other thing I'm tempted to do is to find names (even new names) that
> read like English closely (whatever I mean by that) so that there is no
> question about what is going on to a non expert.
>
> for line in getnumline(file):
>       {whatever}

The trouble is that your idea of getnumline(file) might well differ
from someone else's idea of getnumline(file). Using Python's
primitives removes that confusion - if you see enumerate(file), you
know exactly what it's doing, even in someone else's code.

> Well, what if there were a project called SimplyPy, or some such, that
> boiled the python language down to a (Rexx like) or (BASIC like) syntax and
> usage so that ordinary folks could code out problems (like they did in 1964)
> and expert users could use it too including everything else they know about
> python? Would it be good?
>
> A SimplyPy coder would use constructs similar to other procedural languages
> (like Rexx, Pascal, even C) and without knowing the plethora of Python
> intrinsics could solve problems, yet not be an "expert".
>
> SimplyPy would be a structured subset of the normal language for learning
> and use (very small book/tutorial/ think the Rexx handbook, or the K&R).
>
> Its a long way off, and I'm just now experimenting. I'm trying to get my
> hands around context managers (and other things). This is an idea I got from
> Anthony Briggs' Hello Python! (forward SteveHolden) from Manning books.  Its
> very small, lite weight, handles real work, but--- its still too big. I am
> wanting to condense it even further, providing the minimal basic core
> language as an end application product rather than the "expert" computer
> science language that will run under it.
>
> or, over it, as you like.
>
> (you think this is a nutty idea?)

To be quite frank, yes I do think it's a nutty idea. Like most nutty
things, there's a kernel of something good in it, but that's not
enough to build a system on :)

Python is already pretty simple. The trouble with adding a layer of
indirection is that you'll generally be limiting what the code can do,
which is usually a bad idea for a general purpose programming
language, and also forcing you to predict everything the programmer
might want to do. Or you might have an "escape clause" that lets the
programmer drop to "real Python"... but as soon as you allow that, you
suddenly force the subsequent reader to comprehend all of Python,
defeating the purpose.

We had a discussion along these lines a little while ago, about
designing a DSL [1] for window creation. On one side of the debate was
"hey look how much cleaner the code is if I use this DSL", and on the
other side was "hey look how much work you don't have to do if you
just write code directly". The more cushioning between the programmer
and the language, the more the cushion has to be built to handle
everything the programmer might want to do. Python is a buffer between
me and C. C is itself a buffer between me and assembly language. Each
of them provides something that I want, but each of them has to be so
extensive as to be able to handle _anything_ I might want to write.
(Or, pretty much anything. Sometimes I find that a high level language
lacks some little thing - recently I was yearning for a beep feature -
and find that I can shell out to some external utility to do it for
me.) Creating SimplyPy would put the onus on you to make it possible
to write general code in it, and I think you'll find it's just not
worth trying - more and more you'll want to add features from Python
itself, until you achieve the inner-platform effect. [2]

Note that there are times when this sort of cushioning and limiting
are absolutely appropriate. Sometimes you want to limit end users to
certain pieces of functionality only, and the easiest way to do it is
to create a special-purpose language that is interpreted by a (say)
Python script. Or maybe you want to write a dice roller that takes a
specific notation like "2d8 + d6 (fire) + 2d6 (sneak attack) + 4
(STR)" [3] and interprets that appropriately. But the idea isn't to
simplify general programming, then, and if you're doing that sort of
thing, you still might want to consider a general-purpose language
(Lua's good for that, as is JavaScript/ECMAScript). That's not what
you're suggesting.

ChrisA

[1] Domain-specific language. I'm never sure whether to footnote these
kinds of acronyms, but I want to clarify that I am not talking about a
copper-based internet connection here.
[2] http://thedailywtf.com/Articles/The_Inner-Platform_Effect.aspx and
https://en.wikipedia.org/wiki/Inner-platform_effect
[3] I actually have a dice roller that does exactly that as part of
Minstrel Hall - http://minstrelhall.com/

[toc] | [prev] | [next] | [standalone]

#68385 — Re: Sharing: File Reader Generator with & w/o Policy

From	Mark H Harris <harrismh777@gmail.com>
Date	2014-03-16 01:19 -0500
Subject	Re: Sharing: File Reader Generator with & w/o Policy
Message-ID	<lg3fq7$25s$1@speranza.aioe.org>
In reply to	#68384

On 3/16/14 12:41 AM, Chris Angelico wrote:
>

    Good stuff Chris, and thanks for the footnotes, I appreciate it.

> If getline() is doing nothing that the primitive doesn't, and
> getnumline is just enumerate, then they're not achieving anything
> beyond shielding you from the primitives.
>

    Yes.  getline(fn) is returning the raw line minus the newline \n. 
And getnumline(fn) is 1) creating a name that is easily recognizable, 
and 2) shielding the 'user' from the primitives; yup.

> The trouble is that your idea of getnumline(file) might well differ
> from someone else's idea of getnumline(file). Using Python's
> primitives removes that confusion

    I am seeing that; esp for folks used to seeing the primitives; don't 
want confusion.

> To be quite frank, yes I do think it's a nutty idea. Like most nutty
> things, there's a kernel of something good in it, but that's not
> enough to build a system on :)

    Thanks for your candor. I appreciate that too. Well, like I said, 
I'm just experimenting with the idea right now, just playing around 
really. In the process I'm coming more up-to-speed with python3.3 all 
the time.   :)

>
> Python is already pretty simple.

    statement == True

>
> We had a discussion along these lines a little while ago, about
> designing a DSL [1] for window creation. On one side of the debate was
> "hey look how much cleaner the code is if I use this DSL", and on the
> other side was "hey look how much work you don't have to do if you
> just write code directly".

    Was that on python-dev, or python-ideas, or here?   I'd like to read 
through it sometime.

Well just for grins, here is the updated my_utils.py for compare with 
where I started tonight, ending like before with the code:

 >>>
 >>> for line in getline(path+"my_foxy"):
	print (line)

The quick brown fox jumped
over the lazy dog's tail.

Now is the time for all
good women to come to the
aid of computer science!

 >>> for line in getnumline(path+"my_foxy"):
	print (line)
	
(0, 26, 'The quick brown fox jumped')
(1, 25, "over the lazy dog's tail.")
(2, 0, '')
(3, 23, 'Now is the time for all')
(4, 25, 'good women to come to the')
(5, 24, 'aid of computer science!')

 >>> for line in getline(path+"my_utils.py"):
	print (line)

#---------------------------------------------------------
# __fOpen__(filename)   generator: file open internal
#---------------------------------------------------------
def __fOpen__(filename):
     try:
         with open(filename, 'r') as fh:
             for linein in fh:
                 yield linein.strip('\n')
     except FileNotFoundError as err_code:
         print(err_code)
         # think about error handling, logging
     finally:
         pass

#---------------------------------------------------------
# getnumline(filename)  generator: enumerated file reader
#---------------------------------------------------------
def getnumline(filename):
     for count, line in enumerate(__fOpen__(filename)):
         yield((count, len(line), line))

#---------------------------------------------------------
# getline(filename)   generator: raw file reader iterable
#---------------------------------------------------------
def getline(filename):
     for line in __fOpen__(filename):
         yield(line)

#---------------------------------------------------------
# {next util}
#---------------------------------------------------------
 >>>

[toc] | [prev] | [next] | [standalone]

#68386 — Re: Sharing: File Reader Generator with & w/o Policy

From	Chris Angelico <rosuav@gmail.com>
Date	2014-03-16 17:37 +1100
Subject	Re: Sharing: File Reader Generator with & w/o Policy
Message-ID	<mailman.8162.1394951846.18130.python-list@python.org>
In reply to	#68385

On Sun, Mar 16, 2014 at 5:19 PM, Mark H Harris <harrismh777@gmail.com> wrote:
> On 3/16/14 12:41 AM, Chris Angelico wrote:
>> To be quite frank, yes I do think it's a nutty idea. Like most nutty
>> things, there's a kernel of something good in it, but that's not
>> enough to build a system on :)
>
>
>    Thanks for your candor. I appreciate that too. Well, like I said, I'm
> just experimenting with the idea right now, just playing around really. In
> the process I'm coming more up-to-speed with python3.3 all the time.   :)

Good, glad you can take it the right way :) Learning is not by doing
whatever you like and being told "Oh yes, very good job" like in
kindergarten. Learning is by doing something (or proposing doing
something) and getting solid feedback. Of course, that feedback may be
wrong - your idea might be brilliant even though I believe it's a bad
one - and you need to know when to stick to your guns and drive your
idea forward through the hail of oncoming ... okay, this metaphor's
getting a bit tangled in its own limbs... okay, the meta-metaphor is
getting... alright I'm stopping now.

>> We had a discussion along these lines a little while ago, about
>> designing a DSL [1] for window creation. On one side of the debate was
>> "hey look how much cleaner the code is if I use this DSL", and on the
>> other side was "hey look how much work you don't have to do if you
>> just write code directly".
>
>    Was that on python-dev, or python-ideas, or here?   I'd like to read
> through it sometime.

Was here on python-list:

https://mail.python.org/pipermail/python-list/2014-January/664617.html
https://mail.python.org/pipermail/python-list/2014-January/thread.html#664617

The thread rambled a bit, but if you like reading, there's some good
content in there. You'll see some example code from my Pike MUD
client, Gypsum, and Steven D'Aprano and I discuss it. If you'd rather
skip most of the thread and just go to the bits I'm talking about,
here's my explanation of the Pike code:

https://mail.python.org/pipermail/python-list/2014-January/665286.html

And here's Steven's take on it:

https://mail.python.org/pipermail/python-list/2014-January/665356.html

And keep reading from there. TL;DR: It's not perfect as a DSL, but
it's jolly good as something that is already there and takes no
effort.

ChrisA

[toc] | [prev] | [next] | [standalone]

#68392 — Re: Sharing: File Reader Generator with & w/o Policy

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2014-03-16 10:36 +0000
Subject	Re: Sharing: File Reader Generator with & w/o Policy
Message-ID	<53257eab$0$29994$c3e8da3$5496439d@news.astraweb.com>
In reply to	#68383

On Sat, 15 Mar 2014 23:47:30 -0500, Mark H Harris wrote:

> The other thing I'm tempted to do is to find names (even new names) that
> read like English closely (whatever I mean by that) so that there is no
> question about what is going on to a non expert.
> 
> for line in getnumline(file):
>        {whatever}

I'm not an expert on your code, and I have very little idea what that is 
supposed to do. Judging by the name "getnumline", my first guess is that 
the function takes a line number n, and it will return the nth line of 
some source:

getnumline(source, 5)
=> returns the 5th line from source

But that's not how you use it. You pass it a "file". Is that a file 
object, or a file name? My guess is that it would be a file object, since 
if you wanted a file name you would have written getnumline(filename). Is 
that a file object that is open for reading or writing? I'd have to guess 
that it's open for reading, since you're (probably?) "getting" from the 
file rather than "putting".

So... something like this:

file = open("some thing")
for line in getnumline(file):
    ...

Presumably it iterates over the lines of the file, but what it does with 
the lines is hard to say. If I had to guess, I'd say... maybe it's 
extracting the lines that start with a line number? Something like this 
perhaps?

def getnumline(file_object):
    count = 0  # Or start at 1?
    while True:
        line = file_object.readline()
        if line == '':
            break
        if line.startswith(str(count)):
            yield line
        count += 1

But this is only a guess, based on the assumption that while the function 
name is misleading, it's not *entirely* misleading. I'm puzzled why the 
function claims to do something with "line" singular, when you're 
obviously using it to iterate over lines plural.

Contrast that with an example from the Python built-ins: enumerate. What 
you get is exactly what it says on the tin: the function is called 
enumerate, and enumerate is what it does:

enumerate
    v 1: specify individually; "She enumerated the many obstacles
         she had encountered"; "The doctor recited the list of
         possible side effects of the drug" [syn: enumerate,
         recite, itemize, itemise]
    2: determine the number or amount of; "Can you count the books
       on your shelf?"; "Count your change" [syn: count, number,
       enumerate, numerate]

-- 
Steven D'Aprano
http://import-that.dreamwidth.org/

[toc] | [prev] | [standalone]

csiph-web

Sharing: File Reader Generator with & w/o Policy

Contents

#68370 — Sharing: File Reader Generator with & w/o Policy

#68371

#68372

#68373

#68374

#68375

#68378

#68377

#68380

#68382 — Re: Sharing: File Reader Generator with & w/o Policy

#68383 — Re: Sharing: File Reader Generator with & w/o Policy

#68384 — Re: Sharing: File Reader Generator with & w/o Policy

#68385 — Re: Sharing: File Reader Generator with & w/o Policy

#68386 — Re: Sharing: File Reader Generator with & w/o Policy

#68392 — Re: Sharing: File Reader Generator with & w/o Policy