Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #10064 > unrolled thread
| Started by | rantingrick <rantingrick@gmail.com> |
|---|---|
| First post | 2011-07-21 20:46 -0700 |
| Last post | 2011-07-22 15:49 -0400 |
| Articles | 20 on this page of 21 — 9 participants |
Back to article view | Back to comp.lang.python
[PyWart 1001] Inconsistencies between zipfile and tarfile APIs rantingrick <rantingrick@gmail.com> - 2011-07-21 20:46 -0700
Re: [PyWart 1001] Inconsistencies between zipfile and tarfile APIs Corey Richardson <kb1pkl@aim.com> - 2011-07-22 00:13 -0400
Re: Inconsistencies between zipfile and tarfile APIs rantingrick <rantingrick@gmail.com> - 2011-07-21 21:48 -0700
Re: Inconsistencies between zipfile and tarfile APIs Corey Richardson <kb1pkl@aim.com> - 2011-07-22 01:05 -0400
Re: Inconsistencies between zipfile and tarfile APIs rantingrick <rantingrick@gmail.com> - 2011-07-21 22:58 -0700
Re: Inconsistencies between zipfile and tarfile APIs Lars Gustäbel <lars@gustaebel.de> - 2011-07-22 10:49 +0200
Re: Inconsistencies between zipfile and tarfile APIs rantingrick <rantingrick@gmail.com> - 2011-07-22 10:38 -0700
Re: Inconsistencies between zipfile and tarfile APIs Terry Reedy <tjreedy@udel.edu> - 2011-07-22 01:45 -0400
Re: Inconsistencies between zipfile and tarfile APIs rantingrick <rantingrick@gmail.com> - 2011-07-21 23:40 -0700
Re: Inconsistencies between zipfile and tarfile APIs Corey Richardson <kb1pkl@aim.com> - 2011-07-22 03:19 -0400
Re: Inconsistencies between zipfile and tarfile APIs Terry Reedy <tjreedy@udel.edu> - 2011-07-22 15:33 -0400
Re: Inconsistencies between zipfile and tarfile APIs Ned Deily <nad@acm.org> - 2011-07-22 14:17 -0700
Re: Inconsistencies between zipfile and tarfile APIs Terry Reedy <tjreedy@udel.edu> - 2011-07-22 20:31 -0400
Re: Inconsistencies between zipfile and tarfile APIs Ryan Kelly <ryan@rfk.id.au> - 2011-07-22 15:56 +1000
Re: [PyWart 1001] Inconsistencies between zipfile and tarfile APIs Lars Gustäbel <lars@gustaebel.de> - 2011-07-22 10:26 +0200
Re: Inconsistencies between zipfile and tarfile APIs rantingrick <rantingrick@gmail.com> - 2011-07-22 10:11 -0700
Re: Inconsistencies between zipfile and tarfile APIs Chris Angelico <rosuav@gmail.com> - 2011-07-23 03:23 +1000
Re: Inconsistencies between zipfile and tarfile APIs Chris Angelico <rosuav@gmail.com> - 2011-07-23 03:25 +1000
Re: [PyWart 1001] Inconsistencies between zipfile and tarfile APIs Thomas Jollans <t@jollybox.de> - 2011-07-22 12:31 +0200
Re: [PyWart 1001] Inconsistencies between zipfile and tarfile APIs Tim Chase <python.list@tim.thechases.com> - 2011-07-22 06:25 -0500
Re: [PyWart 1001] Inconsistencies between zipfile and tarfile APIs Terry Reedy <tjreedy@udel.edu> - 2011-07-22 15:49 -0400
Page 1 of 2 [1] 2 Next page →
| From | rantingrick <rantingrick@gmail.com> |
|---|---|
| Date | 2011-07-21 20:46 -0700 |
| Subject | [PyWart 1001] Inconsistencies between zipfile and tarfile APIs |
| Message-ID | <5fd8e664-c855-41a2-9d8b-36d4c486f0b9@n35g2000yqf.googlegroups.com> |
I may have found the mother of all inconsitency warts when comparing
the zipfile and tarfile modules. Not only are the API's different, but
the entry and exits are differnet AND zipfile/tarfile do not behave
like proper file objects should.
>>> import zipfile, tarfile
>>> import os
>>> os.path.exists('C:\\zip.zip')
True
>>> os.path.exists('C:\\tar.tar')
True
>>> tarfile.is_tarfile('C:\\tar.tar')
True
>>> zipfile.is_zipfile('C:\\zip.zip')
True
>>> ZIP_PATH = 'C:\\zip.zip'
>>> TAR_PATH = 'C:\\tar.tar'
--------------------------------------------------
1. Zipfile and tarfile entry exit.
--------------------------------------------------
>>> zf = zipfile.open(ZIP_PATH)
Traceback (most recent call last):
File "<pyshell#12>", line 1, in <module>
zf = zipfile.open(ZIP_PATH)
AttributeError: 'module' object has no attribute 'open'
>>> tf = tarfile.open(TAR_PATH)
>>> tf
<tarfile.TarFile object at 0x02B3B850>
>>> tf.close()
>>> tf
<tarfile.TarFile object at 0x02B3B850>
*COMMENT*
As you can see, the tarfile modules exports an open function and
zipfile does not. Actually i would prefer that neither export an open
function and instead only expose a class for instantion.
*COMMENT*
Since a zipfile object is a file object then asking for the tf object
after the object after the file is closed should show a proper
message!
>>> tf = tarfile.TarFile(TAR_PATH)
Traceback (most recent call last):
File "<pyshell#72>", line 1, in <module>
tf = tarfile.TarFile(TAR_PATH)
File "C:\Python27\lib\tarfile.py", line 1572, in __init__
self.firstmember = self.next()
File "C:\Python27\lib\tarfile.py", line 2335, in next
raise ReadError(str(e))
ReadError: invalid header
>>> tf = tarfile.TarFile.open(TAR_PATH)
>>> tf
<tarfile.TarFile object at 0x02C251D0>
>>> tf.fp
Traceback (most recent call last):
File "<pyshell#75>", line 1, in <module>
tf.fp
AttributeError: 'TarFile' object has no attribute 'fp'
>>> tf
<tarfile.TarFile object at 0x02C251D0>
>>> tf.close()
>>> tf
<tarfile.TarFile object at 0x02C251D0>
>>> tf.fileobj
<bz2.BZ2File object at 0x02C24458>
>>> tf.closed
True
*COMMENT*
Tarfile is missing the attribute "fp" and instead exposes a boolean
"closed". This mismatching API is asinine! Both tarfile and zipfile
should behave EXACTLY like file objects
>>> f = open('C:\\text.txt', 'r')
>>> f.read()
''
>>> f
<open file 'C:\text.txt', mode 'r' at 0x02B26F98>
>>> f.close()
>>> f
<closed file 'C:\text.txt', mode 'r' at 0x02B26F98>
--------------------------------------------------
2. Zipfile SPECIFIC entry exit
--------------------------------------------------
>>> zf
<zipfile.ZipFile instance at 0x02B2C6E8>
>>> zf.fp
>>> zf = zipfile.ZipFile(ZIP_PATH)
>>> zf
<zipfile.ZipFile instance at 0x02B720A8>
>>> zf.fp
<open file 'C:\zip.zip', mode 'rb' at 0x02B26F98>
>>> zf.close()
>>> zf
<zipfile.ZipFile instance at 0x02B720A8>
>>> zf.fp
>>> print repr(zf.fp)
None
*COMMENT*
As you can see, unlike tarfile zipfile cannot handle a passed path.
--------------------------------------------------
3. Zipfile and Tarfile obj API differences.
--------------------------------------------------
zf.namelist() -> tf.getnames()
zf.getinfo(name) -> tf.getmenber(name)
zf.infolist() -> tf.getmembers()
zf.printdir() -> tf.list()
*COMMENT*
Would it have been too difficult to make these names match? Really?
--------------------------------------------------
4. Zipfile and Tarfile infoobj API differences.
--------------------------------------------------
zInfo.filename -> tInfo.name
zInfo.file_size -> tInfo.size
zInfo.date_time -> tInfo.mtime
*COMMENT*
Note the inconsistencies in naming conventions of the zipinfo methods.
*COMMENT*
Not only is modified time named different between zipinfo and tarinfo,
they even return completely different values of time.
--------------------------------------------------
Conclusion:
--------------------------------------------------
It is very obvious that these modules need some consistency between
not only themselves but also collectively. People, when emulating a
file type always be sure to emulate the built-in python file type as
closely as possible.
PS: I will be posting more warts very soon. This stdlib is a gawd
awful mess!
[toc] | [next] | [standalone]
| From | Corey Richardson <kb1pkl@aim.com> |
|---|---|
| Date | 2011-07-22 00:13 -0400 |
| Message-ID | <mailman.1344.1311308080.1164.python-list@python.org> |
| In reply to | #10064 |
[Multipart message — attachments visible in raw view] — view raw
Excerpts from rantingrick's message of Thu Jul 21 23:46:05 -0400 2011:
>
> I may have found the mother of all inconsitency warts when comparing
> the zipfile and tarfile modules. Not only are the API's different, but
> the entry and exits are differnet AND zipfile/tarfile do not behave
> like proper file objects should.
>
I agree, actually.
--
Corey Richardson
"Those who deny freedom to others, deserve it not for themselves"
-- Abraham Lincoln
[toc] | [prev] | [next] | [standalone]
| From | rantingrick <rantingrick@gmail.com> |
|---|---|
| Date | 2011-07-21 21:48 -0700 |
| Subject | Re: Inconsistencies between zipfile and tarfile APIs |
| Message-ID | <3241cbe4-9829-438b-ac0e-a0b87aff62d9@q15g2000yqk.googlegroups.com> |
| In reply to | #10068 |
On Jul 21, 11:13 pm, Corey Richardson <kb1...@aim.com> wrote: > Excerpts from rantingrick's message of Thu Jul 21 23:46:05 -0400 2011: > > > I may have found the mother of all inconsitency warts when comparing > > the zipfile and tarfile modules. Not only are the API's different, but > > the entry and exits are differnet AND zipfile/tarfile do not behave > > like proper file objects should. > > I agree, actually. Unfortunately i know what the "powers that be" are going to say about fixing this wart. PTB: "Sorry we cannot break backwards compatibility" Rick: But what about Python 3000? PTB: " Oh, well, umm, lets see. Well that was then and this is now! Maybe i can offer a solution. A NEW module called "archive.py" (could even be a package!) which exports both the zip and tar file classes. However, unlike the current situation this archive module will be consistent with it's API. >>> from archive import ZipFile, TarFile >>> zf = ZipFile(path, *args) >>> tf = TarFile(path, *args)
[toc] | [prev] | [next] | [standalone]
| From | Corey Richardson <kb1pkl@aim.com> |
|---|---|
| Date | 2011-07-22 01:05 -0400 |
| Subject | Re: Inconsistencies between zipfile and tarfile APIs |
| Message-ID | <mailman.1346.1311311483.1164.python-list@python.org> |
| In reply to | #10070 |
[Multipart message — attachments visible in raw view] — view raw
Excerpts from rantingrick's message of Fri Jul 22 00:48:37 -0400 2011:
> On Jul 21, 11:13pm, Corey Richardson <kb1...@aim.com> wrote:
> > I agree, actually.
>
>
> Maybe i can offer a solution. A NEW module called "archive.py" (could
> even be a package!) which exports both the zip and tar file classes.
> However, unlike the current situation this archive module will be
> consistent with it's API.
>
> >>> from archive import ZipFile, TarFile
> >>> zf = ZipFile(path, *args)
> >>> tf = TarFile(path, *args)
I have nothing to do this weekend, I might as well either write my own or
twist around the existing implementations in the hg repo.
--
Corey Richardson
"Those who deny freedom to others, deserve it not for themselves"
-- Abraham Lincoln
[toc] | [prev] | [next] | [standalone]
| From | rantingrick <rantingrick@gmail.com> |
|---|---|
| Date | 2011-07-21 22:58 -0700 |
| Subject | Re: Inconsistencies between zipfile and tarfile APIs |
| Message-ID | <96257847-f17d-4681-9b71-b353c76c86fe@s17g2000yqs.googlegroups.com> |
| In reply to | #10071 |
On Jul 22, 12:05 am, Corey Richardson <kb1...@aim.com> wrote: > > >>> from archive import ZipFile, TarFile > > >>> zf = ZipFile(path, *args) > > >>> tf = TarFile(path, *args) > > I have nothing to do this weekend, I might as well either write my own or > twist around the existing implementations in the hg repo. My hat is off to you Mr. Richardson. I've even considered creating my own clean versions of these two modules, because heck, it is not that difficult to do! However we must stop fixing these warts on a local level Corey. We MUST clean up this damn python stdlib once and for all. I am willing and you are willing; that's two people. However, can we convince the powers that be to upgrade these modules? Sure, if we get enough people shouting for it to happen they will notice. So come on people make your voices heard. Chime in and let the devs know we are ready to unite and tackle these problems in our stdlib. What this community needs (first and foremost) is some positive attitudes. If you don't want to write the code fine. But at least chime in and say... "Hey guys, that's a good idea! I would like to see some of these APIs cleaned up too. good luck! +1" Now, even if we get one hundred people chanting... "Yes, Yes, Fix This Mess!"... i know Guido and company are going to frown because of backwards incompatibility. But let me tell you something people, the longer we put off these changes the more painful they are going to be. Python 3000 would have been the perfect time to introduce a more intuitive and unified zip/tar archive module however that did not happen. So now we need to think about adding a duplicate module "archive.py" and deprecating zipfile.py and tarfile.py. We can remove the old modules when Python 4000 rolls out. That's just step one people, we have a long way to go!
[toc] | [prev] | [next] | [standalone]
| From | Lars Gustäbel <lars@gustaebel.de> |
|---|---|
| Date | 2011-07-22 10:49 +0200 |
| Subject | Re: Inconsistencies between zipfile and tarfile APIs |
| Message-ID | <mailman.1355.1311324559.1164.python-list@python.org> |
| In reply to | #10075 |
On Thu, Jul 21, 2011 at 10:58:37PM -0700, rantingrick wrote: > My hat is off to you Mr. Richardson. I've even considered creating my > own clean versions of these two modules, because heck, it is not that > difficult to do! However we must stop fixing these warts on a local > level Corey. We MUST clean up this damn python stdlib once and for > all. One could get the impression that you are leading a grass-roots movement fighting a big faceless corporation. Instead, what you're dealing with is this warm and friendly Python community you could as well be a part of if you are a reasonable guy and write good code. > I am willing and you are willing; that's two people. However, can we > convince the powers that be to upgrade these modules? Sure, if we get > enough people shouting for it to happen they will notice. So come on > people make your voices heard. Chime in and let the devs know we are > ready to unite and tackle these problems in our stdlib. Yeah, great. Please write code. Or a PEP. > What this community needs (first and foremost) is some positive > attitudes. If you don't want to write the code fine. But at least > chime in and say... "Hey guys, that's a good idea! I would like to see > some of these APIs cleaned up too. good luck! +1" +1 > Now, even if we get one hundred people chanting... "Yes, Yes, Fix This > Mess!"... i know Guido and company are going to frown because of > backwards incompatibility. But let me tell you something people, the > longer we put off these changes the more painful they are going to > be. And backwards compatibility is bad why? Tell me, what exactly is your view towards this? Should there be none? > Python 3000 would have been the perfect time to introduce a more > intuitive and unified zip/tar archive module however that did not > happen. So now we need to think about adding a duplicate module > "archive.py" and deprecating zipfile.py and tarfile.py. We can remove > the old modules when Python 4000 rolls out. > > That's just step one people, we have a long way to go! archive.py is no new idea. Unfortunately, to this day, nobody had the time to come up with an implementation. Let me say it again: less false pathos, more code. Please. -- Lars Gustäbel lars@gustaebel.de To a man with a hammer, everything looks like a nail. (Mark Twain)
[toc] | [prev] | [next] | [standalone]
| From | rantingrick <rantingrick@gmail.com> |
|---|---|
| Date | 2011-07-22 10:38 -0700 |
| Subject | Re: Inconsistencies between zipfile and tarfile APIs |
| Message-ID | <870de1bf-9445-455d-8fb7-4ef1f6c4efd3@ft10g2000vbb.googlegroups.com> |
| In reply to | #10088 |
On Jul 22, 3:49 am, Lars Gustäbel <l...@gustaebel.de> wrote: > One could get the impression that you are leading a grass-roots movement > fighting a big faceless corporation. Instead, what you're dealing with is this > warm and friendly Python community you could as well be a part of if you are a > reasonable guy and write good code. Sometimes i do feel as if i am fighting against an evil empire. I am a reasonable guy and i do write -good-, no excellent code. > Yeah, great. Please write code. Or a PEP. I am not about to just hop through all the hoops of PEP and PEP8 code just to have someone say "Sorry, we are not going to include your code". What i want at this point is to get feedback from everyone about this proposed archive.py module. Because unlike other people, i don't want to ram MY preferred API down others throats. Step one is getting feedback on the idea of including a new archive module. Step two is hammering out an acceptable API spec. Step three is is actually writing the code and finally getting it accepted into the stdlib. Not only do i need feedback from everyday Python scripters, i need feedback from Python-dev. I even need feedback from the great GvR himself! (maybe not right away but eventually). > > What this community needs (first and foremost) is some positive > > attitudes. If you don't want to write the code fine. But at least > > chime in and say... "Hey guys, that's a good idea! I would like to see > > some of these APIs cleaned up too. good luck! +1" > > +1 Thank you! Now, can you convince your comrades at pydev to offer their opinions here also? Even if all they do is say "+1". > > Now, even if we get one hundred people chanting... "Yes, Yes, Fix This > > Mess!"... i know Guido and company are going to frown because of > > backwards incompatibility. But let me tell you something people, the > > longer we put off these changes the more painful they are going to > > be. > > And backwards compatibility is bad why? Tell me, what exactly is your view > towards this? Should there be none? First let me be clear that "backwards-compatibility" (BC) is very important to any community. We should always strive for BC. However there is no doubt we are going to make mistakes along the way and at some point SOME APIs will need to be broken in the name of consistency or some other important reason. As i've said before Py3000 would have been the PERFECT opportunity to fix this broken API within the current zipfile and tarfile modules. Since that did not happen, we must now introduce a new module "archive.py" and deprecate the zip and tar modules immediately. We shall remove them forever in Python4000. If you guys think we are done breaking BC, you are in for big surprises! Py3000 was just the beginning of clean-ups. Py4000 is going to be a game changer! And when we finally get to Py4000 and remove all these ugly warts python is going to be a better language for it. Mark my words people! > archive.py is no new idea. Unfortunately, to this day, nobody had the time to > come up with an implementation. It's time to change; Can't stay the same; Rev-o-lu-tion is MY name! We can never become complacent and believe we have reached perfection because we never will.
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2011-07-22 01:45 -0400 |
| Subject | Re: Inconsistencies between zipfile and tarfile APIs |
| Message-ID | <mailman.1347.1311313569.1164.python-list@python.org> |
| In reply to | #10070 |
On 7/22/2011 12:48 AM, rantingrick wrote: > On Jul 21, 11:13 pm, Corey Richardson<kb1...@aim.com> wrote: >> Excerpts from rantingrick's message of Thu Jul 21 23:46:05 -0400 2011: >> >>> I may have found the mother of all inconsitency warts when comparing >>> the zipfile and tarfile modules. Not only are the API's different, but >>> the entry and exits are differnet AND zipfile/tarfile do not behave >>> like proper file objects should. >> >> I agree, actually. Hmm. Archives are more like directories than files. Windows, at least, seems to partly treat zipfiles as more or less as such. Certainly, 7zip present a directory interface. So opening a zipfile/tarfile would be like opening a directory, which we normally do not do. On the other hand, I am not sure I like python's interface to directories that much. It would be more sensible to open files within the archives. Certainly, it would be nice to have the result act like file objects as much as possible. Seaching open issues for 'tarfile' or 'zipfile' returns about 40 issues each. So I think some people would care more about fixing bugs than adjusting the interfaces. Of course, some of the issues may be about the interface and increasing consistency where it can be done without compatibility issues. However, I do not think there are any active developers focued on those two modules. > Unfortunately i know what the "powers that be" are going to say about > fixing this wart. > > PTB: "Sorry we cannot break backwards compatibility" Do you propose we break compatibility more than we do? You are not the only Python ranter. People at Google march into Guido's office to complain instead of posting here. > Rick: But what about Python 3000? > PTB: " Oh, well, umm, lets see. Well that was then and this is now! The changes made for 3.0 were more than enough for some people to discourage migration to Py3. And we *have* made additional changes since. So the resistance to incompatible feature changes has increased. > Maybe i can offer a solution. A NEW module called "archive.py" (could > even be a package!) which exports both the zip and tar file classes. > However, unlike the current situation this archive module will be > consistent with it's API. > >>>> from archive import ZipFile, TarFile >>>> zf = ZipFile(path, *args) >>>> tf = TarFile(path, *args) Not a bad idea. Put it on PyPI and see how much support you can get. -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | rantingrick <rantingrick@gmail.com> |
|---|---|
| Date | 2011-07-21 23:40 -0700 |
| Subject | Re: Inconsistencies between zipfile and tarfile APIs |
| Message-ID | <dc3861c9-cd36-4a42-b020-b3b008b85225@12g2000yqr.googlegroups.com> |
| In reply to | #10072 |
On Jul 22, 12:45 am, Terry Reedy <tjre...@udel.edu> wrote: > On 7/22/2011 12:48 AM, rantingrick wrote: > > On Jul 21, 11:13 pm, Corey Richardson<kb1...@aim.com> wrote: > Hmm. Archives are more like directories than files. Windows, at least, > seems to partly treat zipfiles as more or less as such. Yes but a zipfile is just a file not a directory. This is not the first time Microsoft has "mislead" people you know. ;-) > Certainly, 7zip > present a directory interface. So opening a zipfile/tarfile would be > like opening a directory, which we normally do not do. On the other > hand, I am not sure I like python's interface to directories that much. I don't think we should make comparisons between applications and API's. > It would be more sensible to open files within the archives. Certainly, > it would be nice to have the result act like file objects as much as > possible. Well you still need to start at the treetop (which is the zip/tar file) because lots of important information is exposed at that level: * compressed file listing * created, modified times * adding / deleting * etc. I'll admit you could think of it as a directory but i would not want to do that. People need to realize that tar and zip files are FILES and NOT folders. > Seaching open issues for 'tarfile' or 'zipfile' returns about 40 issues > each. So I think some people would care more about fixing bugs than > adjusting the interfaces. Of course, some of the issues may be about the > interface and increasing consistency where it can be done without > compatibility issues. Yes i agree! If we can at least do something as meager as this it would be a step forward. However i still believe the current API is broken beyond repair so we must introduce a new "archive" module. That's my opinion anyway. > However, I do not think there are any active > developers focued on those two modules. We need some fresh blood infused into Python-dev. I have been trying to get involved for a long time. We as a community need to realize that this community is NOT a homogeneous block. We need to be a little more accepting of new folks and new ideas. I know this language would evolve much quicker if we did. > > Unfortunately i know what the "powers that be" are going to say about > > fixing this wart. > > > PTB: "Sorry we cannot break backwards compatibility" > > Do you propose we break compatibility more than we do? You are not the > only Python ranter. People at Google march into Guido's office to > complain instead of posting here. Well, i do feel for Guido because i know he's taking holy hell over this whole Python 3000 thing. If you guys don't remember i was a strong opponent of almost all the changes a few years ago (search the archives). However soon after taking a "serious" look at the changes and considering the benefits i was convinced. I believe we are moving in the correct direction with the language HOWEVER the library is growing stale by the second. I want to breathe new life into this library and i believe many more people like myself exist but they don't know how to get involved. I can tell everyone who is listening the easiest first step is simply to speak up and make a voice for yourself. Don't be afraid to state your opinions. You can start right now by chiming in on this thread. Anybody is welcome to offer opinions no matter what experience level. > > Rick: But what about Python 3000? > > PTB: " Oh, well, umm, lets see. Well that was then and this is now! > > The changes made for 3.0 were more than enough for some people to > discourage migration to Py3. And we *have* made additional changes > since. So the resistance to incompatible feature changes has increased. Yes i do understand these changes have been very painful for some folks (me included). However there is only but one constant in this universe and that constant is change. I believe we can improve many of these API's starting with zip/tar modules. By the time Python 4000 gets here (and it will be much sooner than you guys realize!) we need to have this stdlib in pristine condition. That means: * Removing style guide violations. * Removing inconsistencies in existing API's. * Making sure doc strings and comments are everywhere. * Cleaning up the IDLE library (needs a complete re-write!) * Cleaning up Tkinter. * And more Baby steps are the key to winning this battle. We hit all the easy stuff first (doc-strings and style guide) and save the painful stuff for Python 4000. Meanwhile we introduce new modules and deprecate the old stuff. However we need to start the python 4000 migration now. We cannot keep putting off what should have already been done in Python 3000. > > Maybe i can offer a solution. A NEW module called "archive.py" (could > > even be a package!) which exports both the zip and tar file classes. > > However, unlike the current situation this archive module will be > > consistent with it's API. > > Not a bad idea. Put it on PyPI and see how much support you can get. Thanks, I might just do that!
[toc] | [prev] | [next] | [standalone]
| From | Corey Richardson <kb1pkl@aim.com> |
|---|---|
| Date | 2011-07-22 03:19 -0400 |
| Subject | Re: Inconsistencies between zipfile and tarfile APIs |
| Message-ID | <mailman.1351.1311319223.1164.python-list@python.org> |
| In reply to | #10078 |
[Multipart message — attachments visible in raw view] — view raw
Excerpts from rantingrick's message of Fri Jul 22 02:40:51 -0400 2011:
> On Jul 22, 12:45am, Terry Reedy <tjre...@udel.edu> wrote:
> > On 7/22/2011 12:48 AM, rantingrick wrote:
> > > On Jul 21, 11:13 pm, Corey Richardson<kb1...@aim.com> wrote:
>
> > Hmm. Archives are more like directories than files. Windows, at least,
> > seems to partly treat zipfiles as more or less as such.
>
> Yes but a zipfile is just a file not a directory. This is not the
> first time Microsoft has "mislead" people you know. ;-)
>
Ehh...yes and no. Physically, it is a file and nothing more. But its actual
use and contents could reflect that of a directory. Are files and directories
that different, after all? I don't believe so. They are both an expression
of the same thing. Both contain data, one just contains others of itself.
Of course, treating a zipfile as a directory will certainly have a performance
cost. But here in Linux-land (and elsewhere I'm sure) I can mount, for example,
a disk image to a mountpoint anywhere. It's a useful thing to do!
> > Certainly, 7zip
> > present a directory interface. So opening a zipfile/tarfile would be
> > like opening a directory, which we normally do not do. On the other
> > hand, I am not sure I like python's interface to directories that much.
>
> I don't think we should make comparisons between applications and
> API's.
>
Ehh...yes and no again. Maybe the applications are on to something? Whether
the filesystem is physically on disk or is just a representation of a
filesystem on a file in a filesystem on disk, treating them both as a
filesystem is a useful abstraction (NOT the only one available?)
> > It would be more sensible to open files within the archives. Certainly,
> > it would be nice to have the result act like file objects as much as
> > possible.
>
> Well you still need to start at the treetop (which is the zip/tar
> file) because lots of important information is exposed at that level:
>
> * compressed file listing
> * created, modified times
> * adding / deleting
> * etc.
>
> I'll admit you could think of it as a directory but i would not want
> to do that. People need to realize that tar and zip files are FILES
> and NOT folders.
>
I think it's a useful abstraction to think if an archive as a directory.
They ARE files, yes. But must their physical representation impact their
semantics? I think not! It doesn't matter if Python's list object is a
linked-list down under or if it isn't. Or any sequence, for that matter!
It's a useful abstraction to treat them all as sequences, uniform interface
etc, even though one sequence might be a linked list in a C module, or
a row from a database, or whatever!
> > Seaching open issues for 'tarfile' or 'zipfile' returns about 40 issues
> > each. So I think some people would care more about fixing bugs than
> > adjusting the interfaces. Of course, some of the issues may be about the
> > interface and increasing consistency where it can be done without
> > compatibility issues.
>
> Yes i agree! If we can at least do something as meager as this it
> would be a step forward. However i still believe the current API is
> broken beyond repair so we must introduce a new "archive" module.
> That's my opinion anyway.
>
Checking if such a thing exists already may be more useful. I saw someone
mention a project similar?
> > However, I do not think there are any active
> > developers focued on those two modules.
>
> We need some fresh blood infused into Python-dev. I have been trying
> to get involved for a long time. We as a community need to realize
> that this community is NOT a homogeneous block. We need to be a little
> more accepting of new folks and new ideas. I know this language would
> evolve much quicker if we did.
>
> > > Rick: But what about Python 3000?
> > > PTB: " Oh, well, umm, lets see. Well that was then and this is now!
> >
> > The changes made for 3.0 were more than enough for some people to
> > discourage migration to Py3. And we *have* made additional changes
> > since. So the resistance to incompatible feature changes has increased.
>
> Yes i do understand these changes have been very painful for some
> folks (me included). However there is only but one constant in this
> universe and that constant is change. I believe we can improve many of
> these API's starting with zip/tar modules. By the time Python 4000
> gets here (and it will be much sooner than you guys realize!) we need
> to have this stdlib in pristine condition. That means:
>
> * Removing style guide violations.
> * Removing inconsistencies in existing API's.
> * Making sure doc strings and comments are everywhere.
> * Cleaning up the IDLE library (needs a complete re-write!)
> * Cleaning up Tkinter.
> * And more
>
All noble goals. I think the fact that everyone* knows that the stdlib is
a mess and not the epitome of Good Python is kinda sad...
* for some definition of "everyone"
--
Corey Richardson
"Those who deny freedom to others, deserve it not for themselves"
-- Abraham Lincoln
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2011-07-22 15:33 -0400 |
| Subject | Re: Inconsistencies between zipfile and tarfile APIs |
| Message-ID | <mailman.1384.1311363231.1164.python-list@python.org> |
| In reply to | #10078 |
On 7/22/2011 2:40 AM, rantingrick wrote: > On Jul 22, 12:45 am, Terry Reedy<tjre...@udel.edu> wrote: Let me give some overall comments rather than respond point by point. Python-dev is a volunteer *human* community, not a faceless corporation, with an ever-changing composition (a very mutable set;-). It is too small, really, for the current size of the project. Python 3 was mostly about syntax cleanup. Python-dev was not large enough to also do much stdlib cleanup. With the syntax moratorium, attention *was* focused on the stdlib and problems were found. Some functions names was actively incorrect (due to shift from str-unicode to bytes-strings). Some functions were undocumented and ambiguous as to their public/private status. Some deprecations were made that will take effect in 3.3 or 3.4. This introduced the problem that upgrading to Python 3 is no longer a single thing. We really need 2to3.1 (the current 2to3), 2to3.2, 2to3.3, etc, but someone would have to make the new versions, but no one, currently, has the energy and interest to do that. So people who did not port their 2.x code early now use the problem of multiple Python 3 targets as another excuse not to do so now. (Actually, most 2.x code should not be ported, but their are more libraries that we do need in 3.x.) The way to revamp a module is to introduce a new module. Any anythong now must be released first on PyPI. This has precedent. In 2.x days, urllib2 was an upgrade to urllib though I do not if it was on PyPI. For 3.x, Stephen Behnel's argparse supercedess optparse, but the latter remains with the notice in red: "Deprecated since version 2.7: The optparse module is deprecated and will not be developed further; development will continue with the argparse module.". Argparse was first released on pypi and versions compatible with earlier than 2.7 and 3.2 remain there. The new 3.3 module 'distribute' is a renamed distutils2. It is now on PyPI, where it has been tested with current and earlier versions and it will remain there even after 3.3 is released. An archive module should be released or at least listed on PyPI. It will thus be available wherther or not incorporated into the stdlib. (Many useful modules never are, partly because the authors recognize that there are disadvantages as well as advantages to being in the stdlib.) It should be compatible with at least 3.1+ so that people can use it and be compatible with multiple 3.x versions. Starting with a version < 1.0 implies that the api is subject to change with user experience. This does not preclude also making compatible changes *also* in stdlib modules. And as I mentioned before, there are already a lot of bug and feature requests on the tracker. Merely putting a new face (api) on a sick pig is not enough. -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Ned Deily <nad@acm.org> |
|---|---|
| Date | 2011-07-22 14:17 -0700 |
| Subject | Re: Inconsistencies between zipfile and tarfile APIs |
| Message-ID | <mailman.1392.1311369463.1164.python-list@python.org> |
| In reply to | #10078 |
In article <j0cjaf$mum$1@dough.gmane.org>, Terry Reedy <tjreedy@udel.edu> wrote: > This introduced the problem that upgrading to Python 3 is no longer a > single thing. We really need 2to3.1 (the current 2to3), 2to3.2, 2to3.3, > etc, but someone would have to make the new versions, but no one, > currently, has the energy and interest to do that. So people who did not > port their 2.x code early now use the problem of multiple Python 3 > targets as another excuse not to do so now. (Actually, most 2.x code > should not be ported, but their are more libraries that we do need in 3.x.) I don't quite understand this. Since 2to3 is included with Python 3, there are, in fact, separate releases of 2to3 for each release of Python 3 so far. And, unlike with Python 2 with a large installed base across a number of versions, Python 3 version support can be and is much more focused now in its early releases. Support for 3.0 was terminated immediately upon release of 3.1. And 3.1 is now in security-fix mode only. So, except for a brief overlap after the initial release of 3.2, there has only been one Python 3 release that needs to be targeted. Of course, that will change over time as adoption continues and mainstream OS's include specific Python 3 releases. But, for now, it's easy: just target the most recent Python 3 release, currently 3.2.1. Don't worry about earlier releases. -- Ned Deily, nad@acm.org
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2011-07-22 20:31 -0400 |
| Subject | Re: Inconsistencies between zipfile and tarfile APIs |
| Message-ID | <mailman.1397.1311381090.1164.python-list@python.org> |
| In reply to | #10078 |
On 7/22/2011 5:17 PM, Ned Deily wrote: > In article<j0cjaf$mum$1@dough.gmane.org>, > Terry Reedy<tjreedy@udel.edu> wrote: >> This introduced the problem that upgrading to Python 3 is no longer a >> single thing. We really need 2to3.1 (the current 2to3), 2to3.2, 2to3.3, >> etc, but someone would have to make the new versions, but no one, >> currently, has the energy and interest to do that. So people who did not >> port their 2.x code early now use the problem of multiple Python 3 >> targets as another excuse not to do so now. (Actually, most 2.x code >> should not be ported, but their are more libraries that we do need in 3.x.) The above should be taken as reporting, accurate or not, rather than advocacy. > I don't quite understand this. Since 2to3 is included with Python 3, > there are, in fact, separate releases of 2to3 for each release of Python > 3 so far. To the best of my knowledge, 2to3 is not being adjusted on a per-release basis. I am for doing this, but as I remember, there was some opposition when the question was discussed on py-dev. If I am wrong, I would be glad to be corrected. > And, unlike with Python 2 with a large installed base across > a number of versions, Python 3 version support can be and is much more > focused now in its early releases. Support for 3.0 was terminated > immediately upon release of 3.1. And 3.1 is now in security-fix mode > only. So, except for a brief overlap after the initial release of 3.2, > there has only been one Python 3 release that needs to be targeted. Of > course, that will change over time as adoption continues and mainstream > OS's include specific Python 3 releases. But, for now, it's easy: just > target the most recent Python 3 release, currently 3.2.1. Don't worry > about earlier releases. That would be my attitude too. I would hope that most of the major library are available for 3.2 before 3.3 is out. There there would only be the normal minor adjustments for code that happens to hit the new deprecations. -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Ryan Kelly <ryan@rfk.id.au> |
|---|---|
| Date | 2011-07-22 15:56 +1000 |
| Subject | Re: Inconsistencies between zipfile and tarfile APIs |
| Message-ID | <mailman.1348.1311314191.1164.python-list@python.org> |
| In reply to | #10070 |
[Multipart message — attachments visible in raw view] — view raw
On Fri, 2011-07-22 at 01:45 -0400, Terry Reedy wrote:
> On 7/22/2011 12:48 AM, rantingrick wrote:
> > On Jul 21, 11:13 pm, Corey Richardson<kb1...@aim.com> wrote:
> >> Excerpts from rantingrick's message of Thu Jul 21 23:46:05 -0400 2011:
> >>
> >>> I may have found the mother of all inconsitency warts when comparing
> >>> the zipfile and tarfile modules. Not only are the API's different, but
> >>> the entry and exits are differnet AND zipfile/tarfile do not behave
> >>> like proper file objects should.
> >>
> >> I agree, actually.
>
> Hmm. Archives are more like directories than files. Windows, at least,
> seems to partly treat zipfiles as more or less as such. Certainly, 7zip
> present a directory interface. So opening a zipfile/tarfile would be
> like opening a directory, which we normally do not do. On the other
> hand, I am not sure I like python's interface to directories that much.
Indeed. Actually, I'd say that archives are more like *entire
filesystems* than either files or directories.
We have a pretty nice ZipFS implementation as part of the PyFilesystem
project:
http://packages.python.org/fs/
If anyone cares enough to whip up a TarFS implementation it would be
gratefully merged into trunk. (There may even be the start of one in
the bugtracker somewhere, I don't recall...)
Cheers,
Ryan
--
Ryan Kelly
http://www.rfk.id.au | This message is digitally signed. Please visit
ryan@rfk.id.au | http://www.rfk.id.au/ramblings/gpg/ for details
[toc] | [prev] | [next] | [standalone]
| From | Lars Gustäbel <lars@gustaebel.de> |
|---|---|
| Date | 2011-07-22 10:26 +0200 |
| Message-ID | <mailman.1354.1311323850.1164.python-list@python.org> |
| In reply to | #10064 |
On Thu, Jul 21, 2011 at 08:46:05PM -0700, rantingrick wrote: > I may have found the mother of all inconsitency warts when comparing > the zipfile and tarfile modules. Not only are the API's different, but > the entry and exits are differnet AND zipfile/tarfile do not behave > like proper file objects should. There is a reason why these two APIs are different. When I wrote tarfile zipfile had already been existing for maybe 8 years and I didn't like its interface very much. So, I came up with a different one for tarfile that in my opinion was more general and better suited the format and the kind of things I wanted to do with it. In the meantime the zipfile API got a lot of attention and some portions of tarfile's API were ported to zipfile. > *COMMENT* > As you can see, the tarfile modules exports an open function and > zipfile does not. Actually i would prefer that neither export an open > function and instead only expose a class for instantion. So that is your preference. > *COMMENT* > Since a zipfile object is a file object then asking for the tf object > after the object after the file is closed should show a proper > message! It is no file object. > *COMMENT* > Tarfile is missing the attribute "fp" and instead exposes a boolean > "closed". This mismatching API is asinine! Both tarfile and zipfile > should behave EXACTLY like file objects No, they don't. Because they have not much in common with file objects. I am not sure what you are trying to prove here. And although I must admit that you have a point overall you seem to get the details wrong. If tarfile and zipfile objects behave "EXACTLY" like file objects, what does the read() method return? What does seek() do? And readline()? What do you prove when you say that tarfile has no "fp" attribute? You're not supposed to use the tarfile's internal file object, there is nothing productive you could do with it. > *COMMENT* > As you can see, unlike tarfile zipfile cannot handle a passed path. Hm, I don't know what you mean. > zf.namelist() -> tf.getnames() > zf.getinfo(name) -> tf.getmenber(name) > zf.infolist() -> tf.getmembers() > zf.printdir() -> tf.list() > > *COMMENT* > Would it have been too difficult to make these names match? Really? As I already stated above, I didn't want to adopt the zipfile API because I found it unsuitable. So I came up with an entirely new one. I thought that being incompatible was better than using an API that did not fit exactly. > *COMMENT* > Note the inconsistencies in naming conventions of the zipinfo methods. > > *COMMENT* > Not only is modified time named different between zipinfo and tarinfo, > they even return completely different values of time. See above. > It is very obvious that these modules need some consistency between > not only themselves but also collectively. People, when emulating a > file type always be sure to emulate the built-in python file type as > closely as possible. See above. > PS: I will be posting more warts very soon. This stdlib is a gawd > awful mess! I do not agree. Although I come across one or two odd things myself from time to time, I think the stdlib as a whole is great, usable and powerful. The stdlib surely needs our attention. Instead of answering your post, I should have been writing code and fixing bugs ... -- Lars Gustäbel lars@gustaebel.de Seek simplicity, and distrust it. (Alfred North Whitehead)
[toc] | [prev] | [next] | [standalone]
| From | rantingrick <rantingrick@gmail.com> |
|---|---|
| Date | 2011-07-22 10:11 -0700 |
| Subject | Re: Inconsistencies between zipfile and tarfile APIs |
| Message-ID | <65cc5bad-bf16-47c8-a6ba-8c9a588d7209@z14g2000yqh.googlegroups.com> |
| In reply to | #10087 |
On Jul 22, 3:26 am, Lars Gustäbel <l...@gustaebel.de> wrote:
> There is a reason why these two APIs are different. When I wrote tarfile
> zipfile had already been existing for maybe 8 years and I didn't like its
> interface very much. So, I came up with a different one for tarfile that in my
> opinion was more general and better suited the format and the kind of things I
> wanted to do with it. In the meantime the zipfile API got a lot of attention
> and some portions of tarfile's API were ported to zipfile.
Well i'll admit that i do like like the tarfile's API much better; so
kudos to you kind sir.
> > *COMMENT*
> > As you can see, the tarfile modules exports an open function and
> > zipfile does not. Actually i would prefer that neither export an open
> > function and instead only expose a class for instantion.
>
> So that is your preference.
WWrong! It is more that just a MERE preference. Tarfile and zipfile
are BOTH archive modules and as such should present a consistent API.
I really don't care so much about the actual details AS LONG AS THE
APIs ARE CONSISTENT!
> > *COMMENT*
> > Since a zipfile object is a file object then asking for the tf object
> > after the object after the file is closed should show a proper
> > message!
>
> It is no file object.
Then why bother to open and close it like a file object? If we are not
going to treat it as a file object then we should not have API methods
open and close.
> > *COMMENT*
> > Tarfile is missing the attribute "fp" and instead exposes a boolean
> > "closed". This mismatching API is asinine! Both tarfile and zipfile
> > should behave EXACTLY like file objects
>
> If tarfile and zipfile
> objects behave "EXACTLY" like file objects, what does the read() method return?
> What does seek() do? And readline()?
I am not suggesting that these methods become available. What i was
referring to is the fact that the instance does not return its current
state like a true file object would. But just for academic sake we
could apply these three methods in the following manner:
* read() -> extract the entire archive.
* readline() -> extract the N'ith archive member.
* seek() -> move to the N'ith archive member.
Not that i think we should however.
> What do you prove when you say that tarfile has no "fp" attribute?
My point is that the API's between tarfile and zipfile should be
consistent. "fp" is another example of inconsistency. If we are going
to have an "fp" method in one, we should have it in the other.
> > *COMMENT*
> > As you can see, unlike tarfile zipfile cannot handle a passed path.
>
> Hm, I don't know what you mean.
Sorry that comment was placed in the wrong position. I also eulogizer
for sending the message three times; it seems my finger was a little
shaky that night. What i was referring to is that tarfile does not
allow a path to be passed to the constructor whereas zipfile does:
>>> import tarfile, zipfile
>>> tf = tarfile.TarFile('c:\\tar.tar')
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
tf = tarfile.TarFile('c:\\tar.tar')
File "C:\Python27\lib\tarfile.py", line 1572, in __init__
self.firstmember = self.next()
File "C:\Python27\lib\tarfile.py", line 2335, in next
raise ReadError(str(e))
ReadError: invalid header
>>> zf = zipfile.ZipFile('C:\\zip.zip')
>>> zf
<zipfile.ZipFile instance at 0x02C6CE18>
> > zf.namelist() -> tf.getnames()
> > zf.getinfo(name) -> tf.getmenber(name)
> > zf.infolist() -> tf.getmembers()
> > zf.printdir() -> tf.list()
>
> > *COMMENT*
> > Would it have been too difficult to make these names match? Really?
>
> As I already stated above, I didn't want to adopt the zipfile API because I
> found it unsuitable. So I came up with an entirely new one. I thought that
> being incompatible was better than using an API that did not fit exactly.
I agree with you. Now if we can ONLY change the zipfile API to match
then we would be golden!
> > PS: I will be posting more warts very soon. This stdlib is a gawd
> > awful mess!
>
> I do not agree. Although I come across one or two odd things myself from time
> to time, I think the stdlib as a whole is great, usable and powerful.
And that's why we find ourselves in this current dilemma. This stdlib
IS a mess and yours and everyone else's denials about it is not
helping the situation.
> The stdlib surely needs our attention. Instead of answering your post, I should
> have been writing code and fixing bugs ...
Will you be starting with the zipfile API migration?
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2011-07-23 03:23 +1000 |
| Subject | Re: Inconsistencies between zipfile and tarfile APIs |
| Message-ID | <mailman.1373.1311355427.1164.python-list@python.org> |
| In reply to | #10119 |
On Sat, Jul 23, 2011 at 3:11 AM, rantingrick <rantingrick@gmail.com> wrote: > WWrong! It is more that just a MERE preference. Tarfile and zipfile > are BOTH archive modules and as such should present a consistent API. > I really don't care so much about the actual details AS LONG AS THE > APIs ARE CONSISTENT! Python and C++ are BOTH programming languages and as such should present a consistent API. I really don't care so much about the actual details <caps>as long as the APIs (standard libraries) are consistent!</caps> Chris Angelico
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2011-07-23 03:25 +1000 |
| Subject | Re: Inconsistencies between zipfile and tarfile APIs |
| Message-ID | <mailman.1374.1311355531.1164.python-list@python.org> |
| In reply to | #10119 |
Oh, and: On Sat, Jul 23, 2011 at 3:11 AM, rantingrick <rantingrick@gmail.com> wrote: > Will you be starting with the zipfile API migration? > Will you? Rick, quit ranting and start coding. If you want things to happen, the best way is to do them. If you make a post on the dev list WITH A PATCH, or submit your patch on the bug tracker, then people might start taking you seriously. In other words, put up or shut up. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Thomas Jollans <t@jollybox.de> |
|---|---|
| Date | 2011-07-22 12:31 +0200 |
| Message-ID | <mailman.1358.1311330697.1164.python-list@python.org> |
| In reply to | #10064 |
On 22/07/11 05:46, rantingrick wrote: > PS: I will be posting more warts very soon. This stdlib is a gawd > awful mess! Please don't. Not here. There's a wonderful bug tracker at python.org. Use that. That's where this kind of thing belongs. And, please, be concise. What's the point of shouting it out here anyway? Just fix what you think needs fixing! Sure, you can come here to ask for comments on your new and improved API. Sure, when you've got something presentable, come here and show us. But nobody needs this kind of rant, rantingrick.
[toc] | [prev] | [next] | [standalone]
| From | Tim Chase <python.list@tim.thechases.com> |
|---|---|
| Date | 2011-07-22 06:25 -0500 |
| Message-ID | <mailman.1360.1311333956.1164.python-list@python.org> |
| In reply to | #10064 |
On 07/22/2011 03:26 AM, Lars Gustäbel wrote: > On Thu, Jul 21, 2011 at 08:46:05PM -0700, rantingrick wrote: >> Tarfile is missing the attribute "fp" and instead exposes a >> boolean "closed". This mismatching API is asinine! Both >> tarfile and zipfile should behave EXACTLY like file objects > > What do you prove when you say that tarfile has no "fp" > attribute? You're not supposed to use the tarfile's internal > file object, there is nothing productive you could do with > it. While I've needed access to such a fp object, it's been limited to cases where I passed a file-like object to the constructor instead of a path-name: tf = tarfile.open(fileobj=foo, ...) so I had access to "foo" without reaching into the tarfile/zipfile object for the internal fp. Usually this involves using a StringIO object or a temp-file that then gets cleaned up when complete. -tkc
[toc] | [prev] | [next] | [standalone]
Page 1 of 2 [1] 2 Next page →
Back to top | Article view | comp.lang.python
csiph-web