Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #94885 > unrolled thread

__main__ vs official module name: distinct module instances

Started byCameron Simpson <cs@zip.com.au>
First post2015-08-02 13:53 +1000
Last post2015-08-03 10:57 +1000
Articles 5 — 3 participants

Back to article view | Back to comp.lang.python


Contents

  __main__ vs official module name: distinct module instances Cameron Simpson <cs@zip.com.au> - 2015-08-02 13:53 +1000
    Re: __main__ vs official module name: distinct module instances Steven D'Aprano <steve@pearwood.info> - 2015-08-02 17:41 +1000
      Re: __main__ vs official module name: distinct module instances Chris Angelico <rosuav@gmail.com> - 2015-08-02 18:16 +1000
      Re: __main__ vs official module name: distinct module instances Chris Angelico <rosuav@gmail.com> - 2015-08-02 18:18 +1000
      Re: __main__ vs official module name: distinct module instances Cameron Simpson <cs@zip.com.au> - 2015-08-03 10:57 +1000

#94885 — __main__ vs official module name: distinct module instances

FromCameron Simpson <cs@zip.com.au>
Date2015-08-02 13:53 +1000
Subject__main__ vs official module name: distinct module instances
Message-ID<mailman.1151.1438488073.3674.python-list@python.org>
Hi All,

Maybe this should be over in python-ideas, since there is a proposal down the 
bottom of this message. But first the background...

I've just wasted a silly amount of time debugging an issue that really I know 
about, but had forgotten.

I have a number of modules which include a main() function, and down the bottom 
this code:

  if __name__ == '__main__':
    sys.exit(main(sys.argv))

so that I have a convenient command line tool if I invoke the module directly.  
I typically have tiny shell wrappers like this:

  #!/bin/sh
  exec python -m cs.app.maildb -- ${1+"$@"}

In short, invoke this module as a main program, passing in the command line 
arguments. Very useful.

My problem?

When invoked this way, the module cs.app.maildb that is being executed is 
actually the module named "__main__". If some other piece of code imports 
"cs.app.maildb" they get a _different_ instance of the module. In the same 
program! And how did it cause me trouble? I am monkey patching my module for 
debug purposes, and that monkey patcher imports the module by name. So I was 
monkey patching cs.app.maildb, and _not_ patching __main__. And thus not seeing 
any effect from the patch.

I realise that having __name__ == '__main__' at all almost implies this effect.  
I am not sure it needs to.

The Proposal:

What are the implications of modifying the python invocation:

  python -m cs.app.maildb

to effectively do this (Python pseudo code):

  M = importlib.import("cs.app.maildb")
  M.__name__ = '__main__'
  sys.modules['__main__'] = M

i.e. import the module by name, but bind it to _both_ "cs.app.maildb" and 
"__main__" in sys.modules. And of course hack .__name__ to support the standard 
boilerplate.

This would save some confusion when the module is invoked from the python 
command line and also imported by the code; it is not intuitive that those two 
things give you distinct module instances.

Aside from the module's .__name__ being '__main__' even when accessed by the 
code as cs.app.maildb, are there other implications to such a change that would 
break real world code?

Cheers,
Cameron Simpson <cs@zip.com.au>

The reasonable man adapts himself to the world; the unreasonable one persists
in trying to adapt the world to himself.  Therefore all progress depends
on the unreasonable man.        - George Bernard Shaw

[toc] | [next] | [standalone]


#94889

FromSteven D'Aprano <steve@pearwood.info>
Date2015-08-02 17:41 +1000
Message-ID<55bdc996$0$1663$c3e8da3$5496439d@news.astraweb.com>
In reply to#94885
On Sun, 2 Aug 2015 01:53 pm, Cameron Simpson wrote:

> Hi All,
> 
> Maybe this should be over in python-ideas, since there is a proposal down
> the bottom of this message. But first the background...
> 
> I've just wasted a silly amount of time debugging an issue that really I
> know about, but had forgotten.

:-)


> I have a number of modules which include a main() function, and down the
> bottom this code:
> 
>   if __name__ == '__main__':
>     sys.exit(main(sys.argv))
> 
> so that I have a convenient command line tool if I invoke the module
> directly. I typically have tiny shell wrappers like this:
> 
>   #!/bin/sh
>   exec python -m cs.app.maildb -- ${1+"$@"}

I know this isn't really relevant to your problem, but why use "exec python"
instead of just "python"?

And can you explain the -- ${1+"$@"} bit for somebody who knows just enough
sh to know that it looks useful but not enough to know exactly what it
does?



> In short, invoke this module as a main program, passing in the command
> line arguments. Very useful.
> 
> My problem?
> 
> When invoked this way, the module cs.app.maildb that is being executed is
> actually the module named "__main__". 

Yep. Now, what you could do in cs.app.maildb is this:

# untested, but should work
if __name__ = '__main__':
    import sys
    sys.modules['cs.app.maildb'] = sys.modules[__name__]
    sys.exit(main(sys.argv))


*** but that's the wrong solution ***


The problem here is that by the time cs.app.maildb runs, some other part of
cs or cs.app may have already imported it. The trick of setting the module
object under both names can only work if you can guarantee to run this
before importing anything that does a circular import of cs.app.maildb.

The right existing solution is to avoid having the same module do
double-duty as both runnable script and importable module. In a package,
that's easy. Here's your package structure:


cs
+-- __init__.py
+-- app
    +-- __init__.py
    +-- mailbd.py


and possibly others. Every module that you want to be a runnable script
becomes a submodule with a __main__.py file:


cs
+-- __init__.py
+-- __main__.py
+-- app
    +-- __init__.py
    +-- __main__.py
    +-- mailbd
        +-- __init__.py
        +-- __mail__.py


and now you can call:

python -m cs
python -m cs.app
python -m cs.app.mailbd

as needed. The __main__.py files look like this:

if __name__ = '__main__':
    import cs.app.maildb
    sys.exit(cs.app.maildb.main(sys.argv))


or as appropriate.

Yes, it's a bit more work. If your package has 30 modules, and every one is
runnable, that's a lot more work. But if your package is that, um,
intricate, then perhaps it needs a redesign?

The major use-case for this feature is where you have a package, and you
want it to have a single entry point when running it as a script. (That
would be "python -m cs" in the example above.) But it can be used when you
have multiple entry points too.

For a single .py file, you can usually assume that when you are running it
as a stand alone script, there are no circular imports of itself:

# spam.py
import eggs
if __name__ == '__main__':
    main()

# eggs.py
import spam  # circular import


If that expectation is violated, then you can run into the trouble you
already did.

So... 


* you can safely combine importable module and runnable script in 
  the one file, provided the runnable script functionality doesn't 
  depend on importing itself under the original name (either 
  directly or indirectly);

* if you must violate that expectation, the safest solution is to
  make the module a package with a __main__.py file that contains
  the runnable script portion;

* if you don't wish to do that, you're screwed, and I think that the
  best you can do is program defensively by detecting the problem 
  after the event and bailing out:

  # untested
  import __main__
  import myactualfilename
  if os.path.samefile(__main__.__path__, myactualfilename.__path__):
      raise RuntimeError


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#94891

FromChris Angelico <rosuav@gmail.com>
Date2015-08-02 18:16 +1000
Message-ID<mailman.1153.1438503398.3674.python-list@python.org>
In reply to#94889
On Sun, Aug 2, 2015 at 5:41 PM, Steven D'Aprano <steve@pearwood.info> wrote:
> * if you don't wish to do that, you're screwed, and I think that the
>   best you can do is program defensively by detecting the problem
>   after the event and bailing out:
>
>   # untested
>   import __main__
>   import myactualfilename
>   if os.path.samefile(__main__.__path__, myactualfilename.__path__):
>       raise RuntimeError

Not sure what __path__ is here, as most of the things in my
sys.modules don't have it; do you mean __file__? In theory, it should
be possible to skim across sys.modules, looking for a match against
__main__, and raising RuntimeError if any is found.

[toc] | [prev] | [next] | [standalone]


#94893

FromChris Angelico <rosuav@gmail.com>
Date2015-08-02 18:18 +1000
Message-ID<mailman.1154.1438503524.3674.python-list@python.org>
In reply to#94889
On Sun, Aug 2, 2015 at 6:16 PM, Chris Angelico <rosuav@gmail.com> wrote:
> On Sun, Aug 2, 2015 at 5:41 PM, Steven D'Aprano <steve@pearwood.info> wrote:
>> * if you don't wish to do that, you're screwed, and I think that the
>>   best you can do is program defensively by detecting the problem
>>   after the event and bailing out:
>>
>>   # untested
>>   import __main__
>>   import myactualfilename
>>   if os.path.samefile(__main__.__path__, myactualfilename.__path__):
>>       raise RuntimeError
>
> Not sure what __path__ is here, as most of the things in my
> sys.modules don't have it; do you mean __file__? In theory, it should
> be possible to skim across sys.modules, looking for a match against
> __main__, and raising RuntimeError if any is found.

Oops, premature send.

*In theory* it should be possible to do the above, but whichever
attribute you look for, some modules may not have it. How does this
play with, for instance, zipimport, where there's no actual file name
for the module?

ChrisA

[toc] | [prev] | [next] | [standalone]


#94908

FromCameron Simpson <cs@zip.com.au>
Date2015-08-03 10:57 +1000
Message-ID<mailman.1163.1438563479.3674.python-list@python.org>
In reply to#94889
On 02Aug2015 17:41, Steven D'Aprano <steve@pearwood.info> wrote:
>On Sun, 2 Aug 2015 01:53 pm, Cameron Simpson wrote:
>> Maybe this should be over in python-ideas, since there is a proposal down
>> the bottom of this message. But first the background...
>>
>> I've just wasted a silly amount of time debugging an issue that really I
>> know about, but had forgotten.
>
>:-)
>
>
>> I have a number of modules which include a main() function, and down the
>> bottom this code:
>>
>>   if __name__ == '__main__':
>>     sys.exit(main(sys.argv))
>>
>> so that I have a convenient command line tool if I invoke the module
>> directly. I typically have tiny shell wrappers like this:
>>
>>   #!/bin/sh
>>   exec python -m cs.app.maildb -- ${1+"$@"}

TL;DR: pertinent discussion around my proposal is lower down. First I digress 
into Steven's shell query.

>I know this isn't really relevant to your problem, but why use "exec python"
>instead of just "python"?

Saves a process. Who needs a shell process just hanging around waiting? Think 
of this as tail recursion optimisation.

>And can you explain the -- ${1+"$@"} bit for somebody who knows just enough
>sh to know that it looks useful but not enough to know exactly what it
>does?

Ah.

In a modern shell one can just write $@. I prefer portable code.

The more complicated version, which I use everywhere because it is portable, 
has to do with the behaviour of the $@ special variable. As you know, $* is the 
command line arguments as a single string, which is useless if you need to 
preserve them intact. "$@" is the command line arguments correctly quoted.

Unlike every other "$foo" variable, which produces a single string, "$@" 
produces all the command line arguments as separate strings. Critical for 
passing them correctly to other commands. HOWEVER, if there are no arguments 
then "$@" produces a single empty string. Not desired. It is either a very old 
bug or a deliberate decision that no "$foo" shall utterly vanish.

Thus this:

  ${1+"$@"}

Consulting your nearest "man sh" in the PARAMETER SUBSTITUION section you will 
see that this only inserts "$@" if there is at least one argument, avoiding the 
"$@" => "" with no arguments. It does this by only inserting "$@" if $1 is 
defined. Sneaky and reliable.

I believe in a modern shell a _bare_ $@ acts like a correctly behaving "$@" 
should have, but I always use the incantation above for portability.

>> In short, invoke this module as a main program, passing in the command
>> line arguments. Very useful.
>>
>> My problem?
>>
>> When invoked this way, the module cs.app.maildb that is being executed is
>> actually the module named "__main__".
>
>Yep. Now, what you could do in cs.app.maildb is this:
>
># untested, but should work
>if __name__ = '__main__':
>    import sys
>    sys.modules['cs.app.maildb'] = sys.modules[__name__]
>    sys.exit(main(sys.argv))

Yes, but that is ghastly and complicated. And also relies on the boiler plate 
at the bottom knowing the module name.

>*** but that's the wrong solution ***

It is suboptimal. "Wrong" seems a stretch.

>The problem here is that by the time cs.app.maildb runs, some other part of
>cs or cs.app may have already imported it. The trick of setting the module
>object under both names can only work if you can guarantee to run this
>before importing anything that does a circular import of cs.app.maildb.

That can be done if it takes place in the python interpreter. But there are 
side effects which need to be considered.

My initial objective is that:

  python -m cs.app.maildb

should import cs.app.maildb under the supplied name instead of "__main__" so 
that a recursive import did not instantiate a second module instance. That is, 
I think, a natural thing for users to expect from the above command line: 
"import cs.app.maildb, run its main program".

On further thought last night I devised the logic below to implement python's 
"-m" option:

  # pseudocode, with values hardwired for clarity
  import sys
  M = new_empty_module(name='__main__', qualname='cs.app.maildb')
  sys.modules['cs.app.maildb'] = M
  M.execfile('/path/to/cs/app/maildb.py')   # you know what I mean...

The "qualname" above is an idea I thought of last night to allow introspection 
to cope with '__main__' and 'cs.app.maildb' at the same time, somewhat like the 
.__qualname__ attribute of a function as recently added to the language; under 
this scheme a module would get a __name__ and a __qualname__, normally the 
same, but __name__ set to '__main__' for the "main program module situation.

This should sidestep any issues with recursive imports by having the module in 
place in sys.modules ahead of the running of its code.

>The right existing solution is to avoid having the same module do
>double-duty as both runnable script and importable module.

I disagree. Supporting this double duty is, to me, a highly desirable feature.  
This is, in fact, a primary purpose of the present standard boilerplate.

I _like_ that: a single file, short and succinct.

>In a package,
>that's easy. Here's your package structure:
>
>cs
>+-- __init__.py
>+-- app
>    +-- __init__.py
>    +-- mailbd.py
>
>and possibly others. Every module that you want to be a runnable script
>becomes a submodule with a __main__.py file:
>
>cs
>+-- __init__.py
>+-- __main__.py
>+-- app
>    +-- __init__.py
>    +-- __main__.py
[...]

Yes, nicely separated, but massive structural overkill for simple things like 
single file modules.

>and now you can call:
>
>python -m cs
>python -m cs.app
>python -m cs.app.mailbd
>
>as needed. The __main__.py files look like this:
>
>if __name__ = '__main__':
>    import cs.app.maildb
>    sys.exit(cs.app.maildb.main(sys.argv))
>
>or as appropriate.
>
>Yes, it's a bit more work. If your package has 30 modules, and every one is
>runnable, that's a lot more work. But if your package is that, um,
>intricate, then perhaps it needs a redesign?

  [hg/css]fleet*> grep '__name__ == .__main__' cs/**/*.py|wc -l
        96

No, it is simply my personal kit. The design is ok for what it is. Pieces of it 
are slowly being published on PyPI as they become publishable (beta or better 
quality, proper distinfo metadata applied, checked to not import unpublished 
modules, not import gratuitous tissue paper modules, free of most debugging or 
off topic cruft, etc).

To be honest, the majority of those __main__ calls actually run the unit tests 
for that module, not a proper "main program". A better grep:

  [hg/css-nodedb]fleet*> grep 'main(sys.argv)' cs/**/*.py|wc -l
        14

says just 14. Far saner; those are modules/packages for which there really is 
an associated command line tool.

>The major use-case for this feature is where you have a package, and you
>want it to have a single entry point when running it as a script. (That
>would be "python -m cs" in the example above.) But it can be used when you
>have multiple entry points too.
>
>For a single .py file, you can usually assume that when you are running it
>as a stand alone script, there are no circular imports of itself:
>
># spam.py
>import eggs
>if __name__ == '__main__':
>    main()
>
># eggs.py
>import spam  # circular import
>
>If that expectation is violated, then you can run into the trouble you
>already did.

As described, that expectation was violated. In the normal course of affairs 
one rarely trips over it.

>So...
>* you can safely combine importable module and runnable script in
>  the one file, provided the runnable script functionality doesn't
>  depend on importing itself under the original name (either
>  directly or indirectly);
>
>* if you must violate that expectation, the safest solution is to
>  make the module a package with a __main__.py file that contains
>  the runnable script portion;

My proposal above is to solve this issue without requiring the breaking of a 
module into a multifile package just to address a counterintuitive edge case, 
and to avoid cognitive dissonance for Python users when they do traverse that 
edge case.

I want "python -m foo" to accomplish more closely what the naive user expects.

>* if you don't wish to do that, you're screwed, and I think that the
>  best you can do is program defensively by detecting the problem
>  after the event and bailing out:
>
>  # untested
>  import __main__
>  import myactualfilename
>  if os.path.samefile(__main__.__path__, myactualfilename.__path__):
>      raise RuntimeError

Nasty and defeatist! I rail against this mode of thought! :-)

Anyway, I'm about to raise my proposed implementation change higher up over on 
python-ideas with a plan to write a PEP if I don't get fundamental objections 
(i.e. "this breaks everything" versus your "you can work around it in these 
[cumbersome] ways").

Cheers,
Cameron Simpson <cs@zip.com.au>

"My manner of thinking, so you say, cannot be approved. Do you suppose I
care? A poor fool indeed is he who adopts a manner of thinking for others!
My manner of thinking stems straight from my considered  reflections; it
holds with my existence, with the way I am made. It is not in my power to
alter it; and were it, I'd not do so." Donatien Alphonse Francois de Sade

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web