Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #75939 > unrolled thread

The "right" way to use config files

Started byFabien <fabien.maussion@gmail.com>
First post2014-08-09 13:48 +0200
Last post2014-08-10 10:33 +0200
Articles 10 — 5 participants

Back to article view | Back to comp.lang.python


Contents

  The "right" way to use config files Fabien <fabien.maussion@gmail.com> - 2014-08-09 13:48 +0200
    Re: The "right" way to use config files Ben Finney <ben+python@benfinney.id.au> - 2014-08-09 22:17 +1000
      Re: The "right" way to use config files Fabien <fabien.maussion@gmail.com> - 2014-08-09 14:33 +0200
        Re: The "right" way to use config files Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2014-08-09 12:16 -0400
          Re: The "right" way to use config files Fabien <fabien.maussion@gmail.com> - 2014-08-09 19:17 +0200
    Re: The "right" way to use config files Tim Chase <python.list@tim.thechases.com> - 2014-08-09 12:08 -0500
    Re: The "right" way to use config files Terry Reedy <tjreedy@udel.edu> - 2014-08-09 13:29 -0400
      Re: The "right" way to use config files Fabien <fabien.maussion@gmail.com> - 2014-08-09 20:14 +0200
        Re: The "right" way to use config files Terry Reedy <tjreedy@udel.edu> - 2014-08-09 18:30 -0400
          Re: The "right" way to use config files Fabien <fabien.maussion@gmail.com> - 2014-08-10 10:33 +0200

#75939 — The "right" way to use config files

FromFabien <fabien.maussion@gmail.com>
Date2014-08-09 13:48 +0200
SubjectThe "right" way to use config files
Message-ID<ls51q0$4ea$1@speranza.aioe.org>
Folks,

I am not a computer scientist (just a scientist) and I'd like to ask 
your opinion about a design problem I have. It's not that I can't get my 
program to work, but rather that I have trouble to find an "elegant" 
solution to the problem of passing information to my program's elements. 
I have trouble to be clear in my request so my apologies for the long 
text...

The tool I am developing is a classical data-analysis workflow. Ideally, 
all the program's configurations are located in a single .cfg file which 
I parse with ConfigObg. The file contains I/O informations 
(path_to_input, path_to_output) as well as internal options 
(use_this_function, dont_use_this_one, function1_paramx = y), etc...

Currently, my program is a "super-object" which is initialized once and 
work schematically as follows:

main():
	obj = init_superobj(config file)
	obj.preprocess()
         obj.process()
	obj.write()

the superobj init routine parses the config files and reads the input data.

and a processing step can be, for example:

def process():

	if self.configfile.as_bool('do_funcion1'):
		params = config.parse_function1_params()
		call_function1(self.data, params)
	
	if self.configfile.as_bool('do_funcion2'):
		params = config.parse_function2_params()
		call_function2(self.data2, params)

The functions themselves do not know about the superobject or about the 
configfile. They are "standalone" functions which take data and 
parameters as input and produce output with it. I thought that the 
standalone functions will be clearer and easier to maintain, since they 
do not rely on some external data structure such as the configobj or 
anything.

BUT, my "problem" is that several options really are "universal" options 
to the program, such as the output directory for example. This 
information (where to write their results) is given to most of the 
functions as parameter.

So I had the idea to define a super-object which parses the config file 
and input data and is given as a single parameter to the processing 
functions, and the functions take the information they need from it. 
This is tempting because there is no need for refactoring when I decide 
to change something in the config, but I am afraid that the program may 
become unmaintainable by someone else than myself. Another possibility 
would be at least to give all the functions access to the configfile.

To get to the point: is it good practice to give all elements of a 
program access to the configfile and if yes, how is it done "properly"?

I hope at least someone will understand what I mean ;-)

Cheers and thanks,

Fabien








[toc] | [next] | [standalone]


#75944

FromBen Finney <ben+python@benfinney.id.au>
Date2014-08-09 22:17 +1000
Message-ID<mailman.12792.1407586666.18130.python-list@python.org>
In reply to#75939
Fabien <fabien.maussion@gmail.com> writes:

> So I had the idea to define a super-object which parses the config
> file and input data and is given as a single parameter to the
> processing functions, and the functions take the information they need
> from it.

That's not a bad idea, you could do that without embarrassment.

A better technique, though, is to make use of modules as namespaces.
Have one module of your application be responsible for the configuration
of the application::

    # app/config.py

    import configparser

    parser = configparser.ConfigParser()
    parser.read("app.conf")

and import that module everywhere else that needs it::

    # app/wibble.py

    from . import config

    def frobnicate():
        do_something_with(config.foo)

By using an imported module, the functions don't need to be
parameterised by application-wide configuration; they can simply access
the module from the global scope and thereby get access to that module's
attributes.

> To get to the point: is it good practice to give all elements of a
> program access to the configfile and if yes, how is it done
> "properly"?

There should be an encapsulation of the responsibility for parsing and
organising the configuration options, and the rest of the application
should access it only via that encapsulation.

Putting that encapsuation in a module is an appropriately Pythonic
technique.

-- 
 \          “Now Maggie, I’ll be watching you too, in case God is busy |
  `\       creating tornadoes or not existing.” —Homer, _The Simpsons_ |
_o__)                                                                  |
Ben Finney

[toc] | [prev] | [next] | [standalone]


#75945

FromFabien <fabien.maussion@gmail.com>
Date2014-08-09 14:33 +0200
Message-ID<ls54fh$aoe$1@speranza.aioe.org>
In reply to#75944
Hi Ben,

On 09.08.2014 14:17, Ben Finney wrote:
> Have one module of your application be responsible for the configuration
> of the application::
>
>      # app/config.py
>
>      import configparser
>
>      parser = configparser.ConfigParser()
>      parser.read("app.conf")

Thanks for the suggestion. This way to do is new to me, and I didn't 
come to the idea myself. It seems like a good way to do this. But how to 
give an argument to this config namespace? i.e I want "app.conf" to be 
given as argument.

Currently my program starts like this:

def main():

     # See if the user gave a configfile
     if len(sys.argv) == 2:
         # file was given as argument
         cfg = str(sys.argv[1])
     else:
         # default file taken in the resource directory
         cfg = os.path.abspath(os.path.join(os.path.dirname(__file__),
                               os.pardir,'res','default.cfg'))

     obj = superobj(cfg)
     obj.preprocess()
     obj.process()
     obj.write()

[toc] | [prev] | [next] | [standalone]


#75948

FromDennis Lee Bieber <wlfraed@ix.netcom.com>
Date2014-08-09 12:16 -0400
Message-ID<mailman.12794.1407601013.18130.python-list@python.org>
In reply to#75945
On Sat, 09 Aug 2014 14:33:54 +0200, Fabien <fabien.maussion@gmail.com>
declaimed the following:

>Hi Ben,
>
>On 09.08.2014 14:17, Ben Finney wrote:
>> Have one module of your application be responsible for the configuration
>> of the application::
>>
>>      # app/config.py
>>
>>      import configparser
>>
>>      parser = configparser.ConfigParser()
>>      parser.read("app.conf")
>
>Thanks for the suggestion. This way to do is new to me, and I didn't 
>come to the idea myself. It seems like a good way to do this. But how to 
>give an argument to this config namespace? i.e I want "app.conf" to be 
>given as argument.
>
	Well, you could let the module access the command line arguments
directly (though I'd recommend against that). In effect, the bottom of the
imported module would have all the 

	sys.argv...

stuff followed by parsing the file provided file name (or a default set of
settings).

	Better, in my view, is to have the import module set up default values
for everything, AND have a function at the bottom of the form

def initialize(fid=None):
	if fid:
		# parse file "fid" replacing the module level items
		# this may require making a them all globals since
		# assignments inside this function would be locals

And then your main program

import myconfig
...
myconfig.initialize(sys.argv[1])

-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
    wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

[toc] | [prev] | [next] | [standalone]


#75951

FromFabien <fabien.maussion@gmail.com>
Date2014-08-09 19:17 +0200
Message-ID<ls5l38$ita$1@speranza.aioe.org>
In reply to#75948
Hi,

On 09.08.2014 18:16, Dennis Lee Bieber wrote:
> Better, in my view, is to have the import module set up default values
> for everything, AND have a function at the bottom of the form
>
> def initialize(fid=None):
> 	if fid:
> 		# parse file "fid" replacing the module level items
> 		# this may require making a them all globals since
> 		# assignments inside this function would be locals
>
> And then your main program
>
> import myconfig
> ...
> myconfig.initialize(sys.argv[1])

Yes ok I think got it. Thanks! I like the idea and will implement it, 
this will avoid the useless superobject and allow to have to configfile 
available to anyone.

Fabien

[toc] | [prev] | [next] | [standalone]


#75950

FromTim Chase <python.list@tim.thechases.com>
Date2014-08-09 12:08 -0500
Message-ID<mailman.12796.1407604196.18130.python-list@python.org>
In reply to#75939
On 2014-08-09 13:48, Fabien wrote:
> So I had the idea to define a super-object which parses the config
> file and input data and is given as a single parameter to the
> processing functions, and the functions take the information they
> need from it. This is tempting because there is no need for
> refactoring when I decide to change something in the config, but I
> am afraid that the program may become unmaintainable by someone
> else than myself. Another possibility would be at least to give all
> the functions access to the configfile.
> 
> To get to the point: is it good practice to give all elements of a 
> program access to the configfile and if yes, how is it done
> "properly"?

Though I don't like how it looks/feels to pass around the config in
just about any function-call that needs it, I've found that doing so
allows me to test more readily.  The alternative (putting it in a
global or some module) usually means that it's harder for me to test
in isolation.

-tkc


[toc] | [prev] | [next] | [standalone]


#75952

FromTerry Reedy <tjreedy@udel.edu>
Date2014-08-09 13:29 -0400
Message-ID<mailman.12797.1407605431.18130.python-list@python.org>
In reply to#75939
On 8/9/2014 7:48 AM, Fabien wrote:

> BUT, my "problem" is that several options really are "universal" options
> to the program, such as the output directory for example. This
> information (where to write their results) is given to most of the
> functions as parameter.

If possible, functions should *return* their results, or yield their 
results in chunks (as generators). Let the driver function decide where 
to put results.  Aside from separating concerns, this makes testing much 
easier.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]


#75955

FromFabien <fabien.maussion@gmail.com>
Date2014-08-09 20:14 +0200
Message-ID<ls5odq$qn9$1@speranza.aioe.org>
In reply to#75952
On 09.08.2014 19:29, Terry Reedy wrote:
> If possible, functions should *return* their results, or yield their
> results in chunks (as generators). Let the driver function decide where
> to put results.  Aside from separating concerns, this makes testing much
> easier.

I see. But then this is also true for parameters, right? And yet we 
return to my original question ;-)


Let's say my configfile looks like this:

-----------------
### app/config.cfg
# General params
output_dir = '..'
input_file = '..'

# Func 1 params
[func1]
     enable = True
     threshold = 0.1
     maxite = 1
-----------------

And I have a myconfig module which looks like:

-----------------
### app/myconfig.py

import ConfigObj

parser = obj() # parser will be instanciated by initialize

def initialize(cfgfile=None):
    global parser
    parser = ConfigObj(cfgfile, file_error=True)
-----------------

My main program could look like this:

-----------------
### app/mainprogram_1.py

import myconfig

def func1():
     # the params are in the cfg
     threshold = myconfig.parser['func1'].as_float('threshold')
     maxite = myconfig.parser['func1'].as_long('maxite')

     # dummy operations
     score = 100.
     ite = 1
     while (score > threshold) and (ite < maxite):
         score /= 10
         ite += 1

     # dummy return
     return score

def main():
     myconfig.initialize(sys.argv[1])

     if myconfig.parser['func1'].as_bool('enable'):
         results = func1()

if __name__ == '__main__':
     main()
-----------------

Or like this:

-----------------
### app/mainprogram_2.py

import myconfig

def func1(threshold=None, maxite=None):
     # dummy operations
     score = 100.
     ite = 1
     while (score > threshold) and (ite < maxite):
         score /= 10
         ite += 1

     # dummy return
     return score

def main():
     myconfig.initialize(sys.argv[1])

     if myconfig.parser['func1'].as_bool('enable'):
         # the params are in the cfg
         threshold = myconfig.parser['func1'].as_float('threshold')
         maxite = myconfig.parser['func1'].as_long('maxite')
         results = func1(threshold=threshold, maxite=maxite)

if __name__ == '__main__':
     main()
-----------------

In this case, program2 is easier to test/understand, but if the 
parameters become numerous it could be a pain...

As always, I guess I'l have to decide on a case by case basis what is best.






[toc] | [prev] | [next] | [standalone]


#75958

FromTerry Reedy <tjreedy@udel.edu>
Date2014-08-09 18:30 -0400
Message-ID<mailman.12802.1407623446.18130.python-list@python.org>
In reply to#75955
On 8/9/2014 2:14 PM, Fabien wrote:
> On 09.08.2014 19:29, Terry Reedy wrote:
>> If possible, functions should *return* their results, or yield their
>> results in chunks (as generators). Let the driver function decide where
>> to put results.  Aside from separating concerns, this makes testing much
>> easier.
>
> I see. But then this is also true for parameters, right? And yet we
> return to my original question ;-)
>
>
> Let's say my configfile looks like this:
>
> -----------------
> ### app/config.cfg
> # General params
> output_dir = '..'
> input_file = '..'
>
> # Func 1 params
> [func1]
>      enable = True
>      threshold = 0.1
>      maxite = 1
> -----------------
>
> And I have a myconfig module which looks like:
>
> -----------------
> ### app/myconfig.py
>
> import ConfigObj
>
> parser = obj() # parser will be instanciated by initialize

Try parser = object() to actually run, but the line is not needed. 
Instead put "parser: instantiated by initialize" in the docstring.
>
> def initialize(cfgfile=None):
>     global parser
>     parser = ConfigObj(cfgfile, file_error=True)
> -----------------
>
> My main program could look like this:
>
> -----------------
> ### app/mainprogram_1.py
>
> import myconfig
>
> def func1():
>      # the params are in the cfg
>      threshold = myconfig.parser['func1'].as_float('threshold')
>      maxite = myconfig.parser['func1'].as_long('maxite')
>
>      # dummy operations
>      score = 100.
>      ite = 1
>      while (score > threshold) and (ite < maxite):
>          score /= 10
>          ite += 1
>
>      # dummy return
>      return score
>
> def main():
>      myconfig.initialize(sys.argv[1])
>
>      if myconfig.parser['func1'].as_bool('enable'):
>          results = func1()
>
> if __name__ == '__main__':
>      main()
> -----------------

The advantage of TDD is that it forces one to make code testable as you 
do. Old code may not be designed to be so easily testable, as I have 
learned trying to add tests to idlelib. For the above, I would consider

def func1_algo(threshhold, maxite):  # possible separte file
     score = 100.
     ite = 1
     while (score > threshold) and (ite < maxite):
         score /= 10
         ite += 1
     return score

def func1():  # interface wrapper
     threshold = myconfig.parser['func1'].as_float('threshold')
     maxite = myconfig.parser['func1'].as_long('maxite')
     return func1_algo(threshhold, maxite)

This is a slight bit of extra work, but now you can separately test (and 
modify) the algorithm and the interfacing.  Testing the algorithm is 
easy, which encourages testing multiple i/o pairs.

for in, out in iopairs:
   assert func1_algo(in) == out  # or self.assertEqual, or ...

(or close enough for float outputs)

As for the interfacing: you can write and read multiple versions of 
config.cfg (relatively slow), use something like unittest.mock to mock 
the myconfig module, or write something fairly simple (py3 code).

class Entry(dict):
     def as_bool(self, name):
         s = self[name]
         return True if s == 'True' else False if s == 'False' else None
     def as_int(self, name):
         return int(self[name])
     as_long = as_int
     def as_float(self, name):
         return float(self[name])

class Config(object):
     def initialize(self, argv):
         pass
myconfig = Config()  # a module is like a singleton class
myconfig.initialize('a')  # test that does not raise

# In use for testing, uncomment the following two lines
# import mainprogram_1.py as mp1
# mp1.myconfig = myconfig

f1_cfg = Entry({
     'enable': 'True',
     'threshold': '0.1',
     'maxite': '1',
     })
myconfig.parser = {'func1': f1_cfg}

print(myconfig.parser['func1'].as_float('threshold') == 0.1)
print(myconfig.parser['func1'].as_long('maxite') == 1)
print(myconfig.parser['func1'].as_bool('enable') == True)

f1_cfg['maxite'] = 5
print(myconfig.parser['func1'].as_int('maxite') == 5)
# prints True 4 times

Notice that you inject the mock myconfig into the tested module just 
one. After that, you can change anything within parser or replace parser 
with a new dict.

> Or like this:
>
> -----------------
> ### app/mainprogram_2.py
>
> import myconfig
>
> def func1(threshold=None, maxite=None):

These should not have defaults; avoid extra work!

>      # dummy operations
>      score = 100.
>      ite = 1
>      while (score > threshold) and (ite < maxite):
>          score /= 10
>          ite += 1
>
>      # dummy return
>      return score
>
> def main():
>      myconfig.initialize(sys.argv[1])
>
>      if myconfig.parser['func1'].as_bool('enable'):
>          # the params are in the cfg
>          threshold = myconfig.parser['func1'].as_float('threshold')
>          maxite = myconfig.parser['func1'].as_long('maxite')
>          results = func1(threshold=threshold, maxite=maxite)
>
> if __name__ == '__main__':
>      main()
> -----------------
>
> In this case, program2 is easier to test/understand, but if the
> parameters become numerous it could be a pain...

This is equivalent to what i wrote except for putting the wrapper inline 
in main().  Testing is the same for either.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]


#75970

FromFabien <fabien.maussion@gmail.com>
Date2014-08-10 10:33 +0200
Message-ID<ls7aob$hq$1@speranza.aioe.org>
In reply to#75958
On 10.08.2014 00:30, Terry Reedy wrote:
> The advantage of TDD is that it forces one to make code testable as you do.

Thanks a lot, Terry, for your comprehensive example!

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web