Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #7719 > unrolled thread

os.path and Path

Started byEthan Furman <ethan@stoneleaf.us>
First post2011-06-15 19:00 -0700
Last post2011-06-16 21:24 -0700
Articles 16 — 9 participants

Back to article view | Back to comp.lang.python


Contents

  os.path and Path Ethan Furman <ethan@stoneleaf.us> - 2011-06-15 19:00 -0700
    Re: os.path and Path Laurent Claessens <moky.math@gmail.com> - 2011-06-16 09:03 +0200
      Re: os.path and Path Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-06-16 07:58 +0000
        Re: os.path and Path Ethan Furman <ethan@stoneleaf.us> - 2011-06-16 09:16 -0700
          Re: os.path and Path Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-06-16 16:41 +0000
            Re: os.path and Path Ethan Furman <ethan@stoneleaf.us> - 2011-06-16 10:18 -0700
            Re: os.path and Path Eric Snow <ericsnowcurrently@gmail.com> - 2011-06-16 11:21 -0600
        Re: os.path and Path Christian Heimes <lists@cheimes.de> - 2011-06-16 18:32 +0200
        Re: os.path and Path Ethan Furman <ethan@stoneleaf.us> - 2011-06-16 10:07 -0700
        Re: os.path and Path Chris Angelico <rosuav@gmail.com> - 2011-06-17 11:00 +1000
    Re: os.path and Path Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-06-16 07:14 +0000
      Re: os.path and Path Ethan Furman <ethan@stoneleaf.us> - 2011-06-16 09:05 -0700
        Re: os.path and Path Chris Torek <nospam@torek.net> - 2011-06-17 00:48 +0000
          Re: os.path and Path Ethan Furman <ethan@stoneleaf.us> - 2011-06-16 18:19 -0700
          Re: os.path and Path Ned Deily <nad@acm.org> - 2011-06-16 19:55 -0700
            Re: os.path and Path rusi <rustompmody@gmail.com> - 2011-06-16 21:24 -0700

#7719 — os.path and Path

FromEthan Furman <ethan@stoneleaf.us>
Date2011-06-15 19:00 -0700
Subjectos.path and Path
Message-ID<mailman.8.1308188800.1164.python-list@python.org>
In my continuing quest for Python Mastery (and because I felt like it ;) 
I decided to code a Path object so I could dispense with all the 
os.path.join and os.path.split and os.path.splitext, etc., etc., and so 
forth.

While so endeavoring a couple threads came back and had a friendly 
little chat in my head:

Thread 1: "objects of different types compare unequal"
self:     "nonsense!  we have the power to say what happens in __eq__!"

Thread 2: "objects that __hash__ the same *must* compare __eq__!"
self:     "um, what? ... wait, only immutable objects hash..."

Thread 2: "you're Path object is immutable..."
self:     "argh!"

Here's the rub:  I'm on Windows (yes, pity me...) but I prefer the 
unices, so I'd like to have / seperate my paths.  But I'm on Windows...

So I thought, "Hey!  I'll just do some conversions in __eq__ and life 
will be great!"

--> some_path = Path('/source/python/some_project')
--> some_path == '/source/python/some_project'
True
--> some_path == r'\source\python\some_project'
True
--> # if on a Mac
--> some_path == ':source:python:some_project'
True
--> # oh, and because I'm on Windows with case-insensitive file names...
--> some_path == '/source/Python/some_PROJECT'
True

And then, of course, the ghosts of threads past came and visited.  For 
those that don't know, the __hash__ must be the same if __eq__ is the 
same because __hash__ is primarily a shortcut for __eq__ -- this is 
important when you have containers that are relying on this behavior, 
such as set() and dict().

So, I suppose I shall have to let go of my dreams of

--> Path('/some/path/and/file') == '\\some\\path\\and\\file'

and settle for

--> Path('...') == Path('...')

but I don't have to like it.  :(

</whine>

~Ethan~

What, you didn't see the opening 'whine' tag?  Oh, well, my xml isn't 
very good... ;)

[toc] | [next] | [standalone]


#7725

FromLaurent Claessens <moky.math@gmail.com>
Date2011-06-16 09:03 +0200
Message-ID<4DF9AADE.6090609@gmail.com>
In reply to#7719
> So, I suppose I shall have to let go of my dreams of
>
> -->  Path('/some/path/and/file') == '\\some\\path\\and\\file'
>
> and settle for
>
> -->  Path('...') == Path('...')
>
> but I don't have to like it.  :(


Why not define the hash method to first convert to '/some/path/and/file' 
and then hash ?

By the way it remains some problems with

/some/another/../path/and/file

which should also be the same.

Laurent

[toc] | [prev] | [next] | [standalone]


#7729

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2011-06-16 07:58 +0000
Message-ID<4df9b7be$0$29973$c3e8da3$5496439d@news.astraweb.com>
In reply to#7725
On Thu, 16 Jun 2011 09:03:58 +0200, Laurent Claessens wrote:

>> So, I suppose I shall have to let go of my dreams of
>>
>> -->  Path('/some/path/and/file') == '\\some\\path\\and\\file'
>>
>> and settle for
>>
>> -->  Path('...') == Path('...')
>>
>> but I don't have to like it.  :(
> 
> 
> Why not define the hash method to first convert to '/some/path/and/file'
> and then hash ?

It's not so simple. If Path is intended to be platform independent, then 
these two paths could represent the same location:

'a/b/c:d/e'  # on Linux or OS X
'a:b:c/d:e'  # on classic Mac pre OS X

and be impossible on Windows. So what's the canonical path it should be 
converted to?



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#7749

FromEthan Furman <ethan@stoneleaf.us>
Date2011-06-16 09:16 -0700
Message-ID<mailman.20.1308240105.1164.python-list@python.org>
In reply to#7729
Steven D'Aprano wrote:
> If Path is intended to be platform independent, then 
> these two paths could represent the same location:
> 
> 'a/b/c:d/e'  # on Linux or OS X
> 'a:b:c/d:e'  # on classic Mac pre OS X
> 
> and be impossible on Windows. So what's the canonical path it should be 
> converted to?

Are these actual valid paths?  I thought Linux used '/' and Mac used ':'.

~Ethan~

[toc] | [prev] | [next] | [standalone]


#7751

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2011-06-16 16:41 +0000
Message-ID<4dfa324e$0$30002$c3e8da3$5496439d@news.astraweb.com>
In reply to#7749
On Thu, 16 Jun 2011 09:16:22 -0700, Ethan Furman wrote:

> Steven D'Aprano wrote:
>> If Path is intended to be platform independent, then these two paths
>> could represent the same location:
>> 
>> 'a/b/c:d/e'  # on Linux or OS X
>> 'a:b:c/d:e'  # on classic Mac pre OS X
>> 
>> and be impossible on Windows. So what's the canonical path it should be
>> converted to?
> 
> Are these actual valid paths?  I thought Linux used '/' and Mac used
> ':'.

Er, perhaps I wasn't as clear as I intended... sorry about that.

On a Linux or OS X box, you could have a file e inside a directory c:d 
inside b inside a. It can't be treated as platform independent, because 
c:d is not a legal path component under classic Mac or Windows.

On a classic Mac (does anyone still use them?), you could have a file e 
inside a directory c/d inside b inside a. Likewise c/d isn't legal under 
POSIX or Windows.

So there are paths that are legal under one file system, but not others, 
and hence there is no single normalization that can represent all legal 
paths under arbitrary file systems.


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#7754

FromEthan Furman <ethan@stoneleaf.us>
Date2011-06-16 10:18 -0700
Message-ID<mailman.24.1308243852.1164.python-list@python.org>
In reply to#7751
Steven D'Aprano wrote:
> On Thu, 16 Jun 2011 09:16:22 -0700, Ethan Furman wrote:
> 
>> Steven D'Aprano wrote:
>>> If Path is intended to be platform independent, then these two paths
>>> could represent the same location:
>>>
>>> 'a/b/c:d/e'  # on Linux or OS X
>>> 'a:b:c/d:e'  # on classic Mac pre OS X
>>>
>>> and be impossible on Windows. So what's the canonical path it should be
>>> converted to?
>> Are these actual valid paths?  I thought Linux used '/' and Mac used
>> ':'.
> 
> Er, perhaps I wasn't as clear as I intended... sorry about that.
> 
> On a Linux or OS X box, you could have a file e inside a directory c:d 
> inside b inside a. It can't be treated as platform independent, because 
> c:d is not a legal path component under classic Mac or Windows.
> 
> On a classic Mac (does anyone still use them?), you could have a file e 
> inside a directory c/d inside b inside a. Likewise c/d isn't legal under 
> POSIX or Windows.
> 
> So there are paths that are legal under one file system, but not others, 
> and hence there is no single normalization that can represent all legal 
> paths under arbitrary file systems.

Yeah, I was just realizing that about two minutes before I read this 
reply.  Drat.  This also makes your comment about sensible path objects 
more sensible.  ;)

~Ethan~

[toc] | [prev] | [next] | [standalone]


#7756

FromEric Snow <ericsnowcurrently@gmail.com>
Date2011-06-16 11:21 -0600
Message-ID<mailman.26.1308244867.1164.python-list@python.org>
In reply to#7751
On Thu, Jun 16, 2011 at 10:41 AM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
>
> On a Linux or OS X box, you could have a file e inside a directory c:d
> inside b inside a. It can't be treated as platform independent, because
> c:d is not a legal path component under classic Mac or Windows.
>
> On a classic Mac (does anyone still use them?), you could have a file e
> inside a directory c/d inside b inside a. Likewise c/d isn't legal under
> POSIX or Windows.
>
> So there are paths that are legal under one file system, but not others,
> and hence there is no single normalization that can represent all legal
> paths under arbitrary file systems.
>

Perhaps one solution is to have the Path class accept registrations of
valid path formats:

    class PathFormat:
        @abstractmethod
        def map_path(self, pathstring):
            """Map the pathstring to the canonical path.

            This could take the form of some regex or an even a more
            explicit conversion.

            If there is no match, return None.

            """

        @abstractmethod
        def unmap_path(self, pathstring):
            """Map the pathstring from a canonical path to this format.

            If there is no match, return None.

            """

    class Path:
        ...
        _formats = []
        @classmethod
        def register_format(cls, format):
            cls._formats.append(format)

        def map_path(self, pathstring):
            for format in self._formats:
                result = format.map_path(pathstring)
                if result is None:
                    continue
                # remember which format matched?
                return result
            raise TypeError("No formatters could map the pathstring.")

        def unmap_path(self, pathstring):
            ...

With something like that, you have a PathFormat class for each
platform that matters.  Anyone would be able to add more, as they
like, through register_format.  This module could also include a few
lines to register a particular PathFormat depending on the platform
determined through sys.platform or whatever.

This way your path class doesn't have to try to worry about the
conversion to and from the canonical path format.

-eric

>
> --
> Steven
> --
> http://mail.python.org/mailman/listinfo/python-list
>

[toc] | [prev] | [next] | [standalone]


#7750

FromChristian Heimes <lists@cheimes.de>
Date2011-06-16 18:32 +0200
Message-ID<mailman.21.1308241939.1164.python-list@python.org>
In reply to#7729
Am 16.06.2011 18:16, schrieb Ethan Furman:
> Steven D'Aprano wrote:
>> If Path is intended to be platform independent, then 
>> these two paths could represent the same location:
>>
>> 'a/b/c:d/e'  # on Linux or OS X
>> 'a:b:c/d:e'  # on classic Mac pre OS X
>>
>> and be impossible on Windows. So what's the canonical path it should be 
>> converted to?
> 
> Are these actual valid paths?  I thought Linux used '/' and Mac used ':'.

"c:d" is a valid directory name on Linux. :]

Christian

[toc] | [prev] | [next] | [standalone]


#7753

FromEthan Furman <ethan@stoneleaf.us>
Date2011-06-16 10:07 -0700
Message-ID<mailman.23.1308243778.1164.python-list@python.org>
In reply to#7729
Christian Heimes wrote:
> Am 16.06.2011 18:16, schrieb Ethan Furman:
>> Steven D'Aprano wrote:
>>> If Path is intended to be platform independent, then 
>>> these two paths could represent the same location:
>>>
>>> 'a/b/c:d/e'  # on Linux or OS X
>>> 'a:b:c/d:e'  # on classic Mac pre OS X
>>>
>>> and be impossible on Windows. So what's the canonical path it should be 
>>> converted to?
>> Are these actual valid paths?  I thought Linux used '/' and Mac used ':'.
> 
> "c:d" is a valid directory name on Linux. :]

Right.  I didn't phrase that at all well.  In Steven's examples, which 
are the path pieces?  I'm guessing

'a', 'b', 'c:d', 'e'; and
'a', 'b', 'c/d', 'e'.

~Ethan~

[toc] | [prev] | [next] | [standalone]


#7785

FromChris Angelico <rosuav@gmail.com>
Date2011-06-17 11:00 +1000
Message-ID<mailman.52.1308272461.1164.python-list@python.org>
In reply to#7729
On Fri, Jun 17, 2011 at 2:32 AM, Christian Heimes <lists@cheimes.de> wrote:
> "c:d" is a valid directory name on Linux. :]
>

The different naming rules come in handy now and then. Wine creates
directories (symlinks, I think, but same diff) called "c:" and "d:"
and so on, which then become the drives that Windows programs see.
It's quite a clean solution.

ChrisA

[toc] | [prev] | [next] | [standalone]


#7728

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2011-06-16 07:14 +0000
Message-ID<4df9ad53$0$29973$c3e8da3$5496439d@news.astraweb.com>
In reply to#7719
On Wed, 15 Jun 2011 19:00:07 -0700, Ethan Furman wrote:

> Thread 1: "objects of different types compare unequal" self:    
> "nonsense!  we have the power to say what happens in __eq__!"
> 
> Thread 2: "objects that __hash__ the same *must* compare __eq__!" self: 
>    "um, what? ... wait, only immutable objects hash..."

Incorrect. And impossible. There are only a fixed number of hash values 
(2**31 I believe...) and a potentially infinite number of unique, unequal 
objects that can be hashed. So by the pigeon-hole principle, there must 
be at least one pigeon-hole (the hash value) containing two or more 
pigeons (unequal objects).

For example:

>>> hash(2**0 + 3)
4
>>> hash(2**64 + 3)
4


What you mean to say is that if objects compare equal, they must hash the 
same. Not the other way around.


> Thread 2: "you're Path object is immutable..." self:     "argh!"
> 
> Here's the rub:  I'm on Windows (yes, pity me...) but I prefer the
> unices, so I'd like to have / seperate my paths.  But I'm on Windows...

Any sensible Path object should accept path components in a form 
independent of the path separator, and only care about the separator when 
converting to and from strings.


[...] 
> So, I suppose I shall have to let go of my dreams of
> 
> --> Path('/some/path/and/file') == '\\some\\path\\and\\file'

To say nothing of:

Path('a/b/c/../d') == './a/b/d'


Why do you think there's no Path object in the standard library? *wink*



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#7744

FromEthan Furman <ethan@stoneleaf.us>
Date2011-06-16 09:05 -0700
Message-ID<mailman.16.1308239495.1164.python-list@python.org>
In reply to#7728
Steven D'Aprano wrote:
> On Wed, 15 Jun 2011 19:00:07 -0700, Ethan Furman wrote:
> 
>> Thread 1: "objects of different types compare unequal" self:    
>> "nonsense!  we have the power to say what happens in __eq__!"
>>
>> Thread 2: "objects that __hash__ the same *must* compare __eq__!" self: 
>>    "um, what? ... wait, only immutable objects hash..."
> 
> Incorrect. 
> What you mean to say is that if objects compare equal, they must hash the 
> same. Not the other way around.

Ack.  I keep saying that backwards.  Thanks for the correction.


>> Thread 2: "you're Path object is immutable..." self:     "argh!"
>>
>> Here's the rub:  I'm on Windows (yes, pity me...) but I prefer the
>> unices, so I'd like to have / seperate my paths.  But I'm on Windows...
> 
> Any sensible Path object should accept path components in a form 
> independent of the path separator, and only care about the separator when 
> converting to and from strings.

Our ideas of 'sensible' apparently differ.

One of my goals with my Path objects was to be a drop-in replacement for 
the strings currently used as paths; consequently, they are a sub-class 
of string, and can still be passed to, for example, os.path.splitext(). 
  Another was to be able to use '/' across all platforms, but still have 
the appropriate separator used when the Path object was passed to, for 
example, open().

To me, a path is an ambiguous item:  /temp/here/xyz.abc
where does the directory structure stop and the filename begin?  xyz.abc 
could be either the last subdirectory, or the filename, and the only way 
to know for sure is to look at the disk.  However, the Path may not be 
complete yet, or the final item may not exist yet -- so what then?  I'm 
refusing the temptation to guess. ;)  The programmer can explicity look, 
or create, appropriately.


> [...] 
>> So, I suppose I shall have to let go of my dreams of
>>
>> --> Path('/some/path/and/file') == '\\some\\path\\and\\file'
> 
> To say nothing of:
> 
> Path('a/b/c/../d') == './a/b/d'

I think I'll make my case-insensitive Paths compare, and hash, as 
all-lowercase, so direct string comparison can still work.  I'll add an 
.eq() method to handle the other fun stuff.


> Why do you think there's no Path object in the standard library? *wink*

Because I can't find one in either 2.7 nor 3.2, and every reference I've 
found has indicated that the other Path contenders were too 
all-encompassing.

~Ethan~

[toc] | [prev] | [next] | [standalone]


#7782

FromChris Torek <nospam@torek.net>
Date2011-06-17 00:48 +0000
Message-ID<ite8950jj3@news6.newsguy.com>
In reply to#7744
>Steven D'Aprano wrote:
>> Why do you think there's no Path object in the standard library? *wink*

In article <mailman.16.1308239495.1164.python-list@python.org>
Ethan Furman  <ethan@stoneleaf.us> wrote:
>Because I can't find one in either 2.7 nor 3.2, and every reference I've 
>found has indicated that the other Path contenders were too 
>all-encompassing.

What I think Steven D'Aprano is suggesting here is that the general
problem is too hard, and specific solutions too incomplete, to
bother with.

Your own specific solution might work fine for your case(s), but it
is unlikely to work in general.

I am not aware of any Python implementations for VMS, CMS, VM,
EXEC-8, or other dinosaurs, but it would be ... interesting.
Consider a typical VMS "full pathname":

    DRA0:[SYS0.SYSCOMMON]FILE.TXT;3

The first part is the (literal) disk drive (a la MS-DOS A: or C:
but slightly more general).  The part in [square brackets] is the
directory path.  The extension (.txt) is limited to three characters,
and the part after the semicolon is the file version number, so
you can refer to a backup version.  (Typically one would use a
"logical name" like SYS$SYSROOT in place of the disk and/or
directory-sequence, so as to paper over the overly-rigid syntax.)

Compare with an EXEC-8 (now, apparently, OS 2200 -- I guess it IS
still out there somewhere) "file" name:

    QUAL*FILE(cyclenumber)

where cycle-numbers are relative, i.e., +0 means "use the current
file" while "+1" means "create a new one" and "-1" means "use the
first backup".  (However, one normally tied external file names to
"internal names" before running a program, via the "@USE" statement.)
The vile details are still available here:

   http://www.bitsavers.org/pdf/univac/1100/UE-637_1108execUG_1970.pdf

(Those of you who have never had to deal with these machines, as I
did in the early 1980s, should consider yourselves lucky. :-) )
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)      http://web.torek.net/torek/index.html

[toc] | [prev] | [next] | [standalone]


#7786

FromEthan Furman <ethan@stoneleaf.us>
Date2011-06-16 18:19 -0700
Message-ID<mailman.53.1308272730.1164.python-list@python.org>
In reply to#7782
Chris Torek wrote:
>> Steven D'Aprano wrote:
>>> Why do you think there's no Path object in the standard library? *wink*
> 
> In article <mailman.16.1308239495.1164.python-list@python.org>
> Ethan Furman  <ethan@stoneleaf.us> wrote:
>> Because I can't find one in either 2.7 nor 3.2, and every reference I've 
>> found has indicated that the other Path contenders were too 
>> all-encompassing.
> 
> What I think Steven D'Aprano is suggesting here is that the general
> problem is too hard, and specific solutions too incomplete, to
> bother with.

Ah.  In that case I completely misunderstood.  Thanks for the insight!

~Ethan~

[toc] | [prev] | [next] | [standalone]


#7794

FromNed Deily <nad@acm.org>
Date2011-06-16 19:55 -0700
Message-ID<mailman.58.1308279344.1164.python-list@python.org>
In reply to#7782
In article <ite8950jj3@news6.newsguy.com>,
 Chris Torek <nospam@torek.net> wrote:

> >Steven D'Aprano wrote:
> >> Why do you think there's no Path object in the standard library? *wink*
> 
> In article <mailman.16.1308239495.1164.python-list@python.org>
> Ethan Furman  <ethan@stoneleaf.us> wrote:
> >Because I can't find one in either 2.7 nor 3.2, and every reference I've 
> >found has indicated that the other Path contenders were too 
> >all-encompassing.
> 
> What I think Steven D'Aprano is suggesting here is that the general
> problem is too hard, and specific solutions too incomplete, to
> bother with.
> 
> Your own specific solution might work fine for your case(s), but it
> is unlikely to work in general.

Note there was quite a bit of discussion some years back about adding 
Jason Orendorff's Path module to the standard library, a module which 
had and still has its fans.  Ultimately, though, it was vetoed by Guido.

http://bugs.python.org/issue1226256
http://wiki.python.org/moin/PathModule

-- 
 Ned Deily,
 nad@acm.org

[toc] | [prev] | [next] | [standalone]


#7797

Fromrusi <rustompmody@gmail.com>
Date2011-06-16 21:24 -0700
Message-ID<cec3a248-ed44-453d-8443-0621a4d41439@p9g2000prh.googlegroups.com>
In reply to#7794
On Jun 17, 7:55 am, Ned Deily <n...@acm.org> wrote:
> In article <ite8950...@news6.newsguy.com>,
>  Chris Torek <nos...@torek.net> wrote:
>
> > >Steven D'Aprano wrote:
> > >> Why do you think there's no Path object in the standard library? *wink*
>
> > In article <mailman.16.1308239495.1164.python-l...@python.org>
> > Ethan Furman  <et...@stoneleaf.us> wrote:
> > >Because I can't find one in either 2.7 nor 3.2, and every reference I've
> > >found has indicated that the other Path contenders were too
> > >all-encompassing.
>
> > What I think Steven D'Aprano is suggesting here is that the general
> > problem is too hard, and specific solutions too incomplete, to
> > bother with.
>
> > Your own specific solution might work fine for your case(s), but it
> > is unlikely to work in general.
>
> Note there was quite a bit of discussion some years back about adding
> Jason Orendorff's Path module to the standard library, a module which
> had and still has its fans.  Ultimately, though, it was vetoed by Guido.
>
> http://bugs.python.org/issue1226256
> http://wiki.python.org/moin/PathModule
>
> --
>  Ned Deily,
>  n...@acm.org

A glance at these links (only cursory I admit) suggests that this was
vetoed because of cross OS compatibility issues.

This is unfortunate.

As an analogy I note that emacs tries to run compatibly on all major
OSes and as a result is running increasingly badly on all (A mere
print functionality which runs easily on apps one hundredth the size
of emacs wont run on windows without wrestling).

More OT but... When a question is asked on this list about which
environment/IDE people use many people seem to say "Emacs"

But when a question is raised about python-emacs issues eg
http://groups.google.com/group/comp.lang.python/browse_thread/thread/acb0f2a01fe50151#
there are usually no answers...

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web