Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #47857 > unrolled thread
| Started by | cutems93 <ms2597@cornell.edu> |
|---|---|
| First post | 2013-06-12 16:27 -0700 |
| Last post | 2013-06-13 08:52 -0400 |
| Articles | 12 on this page of 52 — 27 participants |
Back to article view | Back to comp.lang.python
Version Control Software cutems93 <ms2597@cornell.edu> - 2013-06-12 16:27 -0700
Re: Version Control Software Mark Janssen <dreamingforward@gmail.com> - 2013-06-12 16:36 -0700
Re: Version Control Software Joel Goldstick <joel.goldstick@gmail.com> - 2013-06-12 19:52 -0400
Re: Version Control Software Chris Angelico <rosuav@gmail.com> - 2013-06-13 10:04 +1000
Re: Version Control Software Tim Chase <python.list@tim.thechases.com> - 2013-06-12 21:41 -0500
Re: Version Control Software Ben Finney <ben+python@benfinney.id.au> - 2013-06-13 12:30 +1000
Re: Version Control Software rusi <rustompmody@gmail.com> - 2013-06-13 04:54 -0700
Re: Version Control Software Grant Edwards <invalid@invalid.invalid> - 2013-06-13 17:06 +0000
Re: Version Control Software Chris Angelico <rosuav@gmail.com> - 2013-06-14 07:26 +1000
Re: Version Control Software Grant Edwards <invalid@invalid.invalid> - 2013-06-13 21:53 +0000
Re: Version Control Software Chris Angelico <rosuav@gmail.com> - 2013-06-14 07:59 +1000
Re: Version Control Software Zero Piraeus <schesis@gmail.com> - 2013-06-13 18:20 -0400
Re: Version Control Software Terry Reedy <tjreedy@udel.edu> - 2013-06-13 20:09 -0400
Re: Version Control Software Fábio Santos <fabiosantosart@gmail.com> - 2013-06-13 23:15 +0100
Re: Version Control Software Chris Angelico <rosuav@gmail.com> - 2013-06-14 08:17 +1000
Re: Version Control Software Benjamin Kaplan <benjamin.kaplan@case.edu> - 2013-06-13 15:24 -0700
Re: Version Control Software Neil Hodgson <nhodgson@iinet.net.au> - 2013-06-14 08:53 +1000
Re: Version Control Software Tim Chase <python.list@tim.thechases.com> - 2013-06-12 21:48 -0500
Re: Version Control Software Roy Smith <roy@panix.com> - 2013-06-12 22:51 -0400
Re: Version Control Software Rui Maciel <rui.maciel@gmail.com> - 2013-06-13 13:43 +0100
Re: Version Control Software cutems93 <ms2597@cornell.edu> - 2013-06-12 23:00 -0700
Re: Version Control Software rusi <rustompmody@gmail.com> - 2013-06-12 23:43 -0700
Re: Version Control Software Roy Smith <roy@panix.com> - 2013-06-13 07:08 -0400
Re: Version Control Software MRAB <python@mrabarnett.plus.com> - 2013-06-13 12:26 +0100
Re: Version Control Software rusi <rustompmody@gmail.com> - 2013-06-13 04:46 -0700
Re: Version Control Software Anssi Saari <as@sci.fi> - 2013-06-14 15:06 +0300
Re: Version Control Software Roy Smith <roy@panix.com> - 2013-06-14 08:32 -0400
Re: Version Control Software Grant Edwards <invalid@invalid.invalid> - 2013-06-14 14:24 +0000
Re: Version Control Software Dave Angel <davea@davea.name> - 2013-06-14 16:55 -0400
Re: Version Control Software Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-06-14 20:26 -0400
Re: Version Control Software Tim Delaney <timothy.c.delaney@gmail.com> - 2013-06-15 15:39 +1000
Re: Version Control Software Chris Angelico <rosuav@gmail.com> - 2013-06-15 15:53 +1000
Re: Version Control Software Roy Smith <roy@panix.com> - 2013-06-15 10:16 -0400
Re: Version Control Software Giorgos Tzampanakis <giorgos.tzampanakis@gmail.com> - 2013-06-15 15:29 +0000
Re: Version Control Software Dan Sommers <dan@tombstonezero.net> - 2013-06-15 18:29 +0000
Re: Version Control Software Chris Angelico <rosuav@gmail.com> - 2013-06-16 09:01 +1000
Re: Version Control Software Tim Delaney <timothy.c.delaney@gmail.com> - 2013-06-16 07:49 +1000
Re: Version Control Software Chris Angelico <rosuav@gmail.com> - 2013-06-16 09:14 +1000
Re: Version Control Software rusi <rustompmody@gmail.com> - 2013-06-15 20:55 -0700
Re: Version Control Software Chris Angelico <rosuav@gmail.com> - 2013-06-16 14:13 +1000
Re: Version Control Software Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-16 05:20 +0000
Re: Version Control Software Chris Angelico <rosuav@gmail.com> - 2013-06-16 15:29 +1000
Re: Version Control Software Terry Reedy <tjreedy@udel.edu> - 2013-06-16 05:15 -0400
Re: Version Control Software Chris Angelico <rosuav@gmail.com> - 2013-06-16 19:51 +1000
Re: Version Control Software Chris “Kwpolska” Warrick <kwpolska@gmail.com> - 2013-06-16 15:30 +0200
Re: Version Control Software Roy Smith <roy@panix.com> - 2013-06-16 09:50 -0400
Re: Version Control Software Lele Gaifax <lele@metapensiero.it> - 2013-06-16 17:48 +0200
Re: Version Control Software Terry Reedy <tjreedy@udel.edu> - 2013-06-16 13:02 -0400
Re: Version Control Software Jason Swails <jason.swails@gmail.com> - 2013-06-16 12:39 -0400
Re: Version Control Software Serhiy Storchaka <storchaka@gmail.com> - 2013-06-13 10:20 +0300
Re: Version Control Software Tim Chase <python.list@tim.thechases.com> - 2013-06-13 07:34 -0500
Re: Version Control Software Roy Smith <roy@panix.com> - 2013-06-13 08:52 -0400
Page 3 of 3 — ← Prev page 1 2 [3]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-06-16 05:20 +0000 |
| Message-ID | <51bd4b1c$0$29966$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #48409 |
On Sun, 16 Jun 2013 14:13:13 +1000, Chris Angelico wrote: > I didn't think there would be that much difference, tbh. Mainly, I'm > just seeing cpython as not being 200MB of history, or so I'd thought. > Pike has ~30K commits (based on 'git log --oneline|wc -l'); CPython has > roughly 80K (based on 'hg log|grep changeset|wc -l' - there's likely an > easier way but I don't know Mercurial). So yeah, okay, it's been doing > more. But I still don't see 200MB in that. Seems a lot of content. If you're bringing in the *entire* CPython code base, as shown here: http://hg.python.org/ keep in mind that it includes the equivalent of four independent implementations: - CPython 2.x - CPython 3.x - Stackless - Jython plus various other bits and pieces. Plus, no offence intended at Pike which I'm sure is an awesome language, but it may not be quite as much active development as Python... as you point out yourself, there are nearly three times as many commits to CPython as to Pike, which coincidentally (or not) corresponds to the CPython repo being nearly three times as large as the Pike repo. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-06-16 15:29 +1000 |
| Message-ID | <mailman.3427.1371360584.3114.python-list@python.org> |
| In reply to | #48412 |
On Sun, Jun 16, 2013 at 3:20 PM, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: > On Sun, 16 Jun 2013 14:13:13 +1000, Chris Angelico wrote: > >> I didn't think there would be that much difference, tbh. Mainly, I'm >> just seeing cpython as not being 200MB of history, or so I'd thought. >> Pike has ~30K commits (based on 'git log --oneline|wc -l'); CPython has >> roughly 80K (based on 'hg log|grep changeset|wc -l' - there's likely an >> easier way but I don't know Mercurial). So yeah, okay, it's been doing >> more. But I still don't see 200MB in that. Seems a lot of content. > > If you're bringing in the *entire* CPython code base, as shown here: > > http://hg.python.org/ > > keep in mind that it includes the equivalent of four independent > implementations: > > - CPython 2.x > - CPython 3.x > - Stackless > - Jython Hrm. Why are there other Pythons in the cpython repository? Yes, CPython 2.x and 3.x, but why the other two? > Plus, no offence intended at Pike which I'm sure is an awesome language, > but it may not be quite as much active development as Python... as you > point out yourself, there are nearly three times as many commits to > CPython as to Pike, which coincidentally (or not) corresponds to the > CPython repo being nearly three times as large as the Pike repo. Yeah. Actually, I suspect that what's going on here, and what led to my confusion, is that Pike wasn't always done using git, so quite a few of the earlier versions simply aren't here. So it's an error in my perceptions rather than any real difference. However, comparisons aside, 200MB is still a fair bit to fetch before doing anything with Python. Does Mercurial have any equivalent of git's shallow clone feature? ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2013-06-16 05:15 -0400 |
| Message-ID | <mailman.3430.1371374156.3114.python-list@python.org> |
| In reply to | #48412 |
On 6/16/2013 1:29 AM, Chris Angelico wrote: > On Sun, Jun 16, 2013 at 3:20 PM, Steven D'Aprano >> If you're bringing in the *entire* CPython code base, as shown here: >> >> http://hg.python.org/ This is the python.org collection of repositories, not just cpython. >> keep in mind that it includes the equivalent of four independent >> implementations: >> >> - CPython 2.x >> - CPython 3.x >> - Stackless >> - Jython > > Hrm. Why are there other Pythons in the cpython repository? There are not. The cpython repository http://hg.python.org/cpython/ only contains cpython. As I write, the last revision is 84110. Windows says that my cpython clone has about 1400 folders, 15000 files, and 500 million bytes -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-06-16 19:51 +1000 |
| Message-ID | <mailman.3431.1371376315.3114.python-list@python.org> |
| In reply to | #48412 |
On Sun, Jun 16, 2013 at 7:15 PM, Terry Reedy <tjreedy@udel.edu> wrote: > On 6/16/2013 1:29 AM, Chris Angelico wrote: >> >> On Sun, Jun 16, 2013 at 3:20 PM, Steven D'Aprano >>> keep in mind that it includes the equivalent of four independent >>> implementations: >>> >>> - CPython 2.x >>> - CPython 3.x > > >>> - Stackless >>> - Jython >> >> >> Hrm. Why are there other Pythons in the cpython repository? > > > There are not. The cpython repository > http://hg.python.org/cpython/ > only contains cpython. As I write, the last revision is 84110. Windows says > that my cpython clone has about 1400 folders, 15000 files, and 500 million > bytes Ah, well it's this one that I have. So it should have only CPython in it. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Chris “Kwpolska” Warrick <kwpolska@gmail.com> |
|---|---|
| Date | 2013-06-16 15:30 +0200 |
| Message-ID | <mailman.3442.1371389433.3114.python-list@python.org> |
| In reply to | #48298 |
On Sun, Jun 16, 2013 at 1:14 AM, Chris Angelico <rosuav@gmail.com> wrote: > Hmm. ~/cpython/.hg is 200MB+, but ~/pike/.git is only 86MB. Does > Mercurial compress its content? A tar.gz of each comes down, but only > to ~170MB and ~75MB respectively, so I'm guessing the bulk of it is > already compressed. But 200MB for cpython seems like a lot. Next time, do a more fair comparison. I created an empty git and hg repository, and created a file promptly named “file” with DIGIT ONE (0x31; UTF-8/ASCII–encoded) and commited it with “c1” as the message, then I turned it into “12” and commited as “c2” and did this one more time, making the file “123” at commit named “c3”. [kwpolska@kwpolska-lin .hg@default]% cat * */* */*/* 2>/dev/null | wc -c 1481 [kwpolska@kwpolska-lin .git@master]% cat * */* */*/* */*/*/* 2>/dev/null | wc -c 16860 ← WRONG! There is just one problem with this: an empty git repository starts at 15216 bytes, due to some sample hooks. Let’s remove them and try again: [kwpolska@kwpolska-lin .git@master]% rm hooks/* [kwpolska@kwpolska-lin .git@master]% cat * */* */*/* */*/*/* */*/*/* 2>/dev/null | wc -c 2499 which is a much more sane number. This includes a config file (in the ini/configparser format) and such. According to my maths skils (or rather zsh’s skills), new commits are responsible for 1644 bytes in the git repo and 1391 bytes in the hg repo. (I’m using wc -c to count the bytes in all files there are. du is unaccurate with files smaller than 4096 bytes.) -- Kwpolska <http://kwpolska.tk> | GPG KEY: 5EAAEA16 stop html mail | always bottom-post http://asciiribbon.org | http://caliburn.nl/topposting.html
[toc] | [prev] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2013-06-16 09:50 -0400 |
| Message-ID | <roy-47D4D9.09501916062013@news.panix.com> |
| In reply to | #48449 |
In article <mailman.3442.1371389433.3114.python-list@python.org>,
Chris メKwpolskaモ Warrick <kwpolska@gmail.com> wrote:
> (I’m using wc -c to count the bytes in all files there are. du is
> unaccurate with files smaller than 4096 bytes.)
It's not that du is not accurate, it's that it's measuring something
different. It's measuring how much disk space the file is using. For
most files, that's the number of characters in the file rounded up to a
full block. For large files, I believe it also includes the overhead of
indirect blocks or extent trees. And, finally, for sparse files, it
takes into account that some logical blocks in the file may not be
mapped to any physical storage.
So, whether you want to use "du" or "wc -c" depends on what you're
trying to measure. If you want to know how much disk space you're
using, du is the right tool. If you want to know how much data will be
transmitted if the file is serialized (i.e. packed in a tarball or sent
via a "{hg,git} clone" operation), then "wc-c" is what you want.
All that being said, for the vast majority of cases (and I would be
astonished if this was not true for any real-life vcs repo), the
difference between what wc and du tell you is not worth worrying about.
And du is going to be a heck of a lot faster.
[toc] | [prev] | [next] | [standalone]
| From | Lele Gaifax <lele@metapensiero.it> |
|---|---|
| Date | 2013-06-16 17:48 +0200 |
| Message-ID | <mailman.3446.1371397725.3114.python-list@python.org> |
| In reply to | #48453 |
[Multipart message — attachments visible in raw view] — view raw
Roy Smith <roy@panix.com> writes: > In article <mailman.3442.1371389433.3114.python-list@python.org>, > Chris Kwpolska Warrick <kwpolska@gmail.com> wrote: > >> (I€™m using wc -c to count the bytes in all files there are. du is >> unaccurate with files smaller than 4096 bytes.) > > It's not that du is not accurate, it's that it's measuring something > different. It's measuring how much disk space the file is using. For > most files, that's the number of characters in the file rounded up to a > full block. I think “du -c” emits a number very close to “wc -c”.
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2013-06-16 13:02 -0400 |
| Message-ID | <mailman.3448.1371402178.3114.python-list@python.org> |
| In reply to | #48453 |
On 6/16/2013 11:48 AM, Lele Gaifax wrote: > Roy Smith <roy@panix.com> writes: > >> In article <mailman.3442.1371389433.3114.python-list@python.org>, >> Chris Kwpolska Warrick <kwpolska@gmail.com> wrote: >> >>> (I��m using wc -c to count the bytes in all files there are. du is >>> unaccurate with files smaller than 4096 bytes.) >> >> It's not that du is not accurate, it's that it's measuring something >> different. It's measuring how much disk space the file is using. For >> most files, that's the number of characters in the file rounded up to a >> full block. > > I think “du -c” emits a number very close to “wc -c”. In Windows Explorer, the Properties box displays both the Size and 'Size on disk', in both (KB or MB) and bytes. The block size for the disk I am looking at is 4KB, so the Size on disk in KB is a multiple of that. -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Jason Swails <jason.swails@gmail.com> |
|---|---|
| Date | 2013-06-16 12:39 -0400 |
| Message-ID | <mailman.3447.1371400774.3114.python-list@python.org> |
| In reply to | #48298 |
[Multipart message — attachments visible in raw view] — view raw
On Sun, Jun 16, 2013 at 9:30 AM, Chris “Kwpolska” Warrick <
kwpolska@gmail.com> wrote:
> On Sun, Jun 16, 2013 at 1:14 AM, Chris Angelico <rosuav@gmail.com> wrote:
> > Hmm. ~/cpython/.hg is 200MB+, but ~/pike/.git is only 86MB. Does
> > Mercurial compress its content? A tar.gz of each comes down, but only
> > to ~170MB and ~75MB respectively, so I'm guessing the bulk of it is
> > already compressed. But 200MB for cpython seems like a lot.
>
> Next time, do a more fair comparison.
>
> I created an empty git and hg repository, and created a file promptly
> named “file” with DIGIT ONE (0x31; UTF-8/ASCII–encoded) and commited
> it with “c1” as the message, then I turned it into “12” and commited
> as “c2” and did this one more time, making the file “123” at commit
> named “c3”.
>
> [kwpolska@kwpolska-lin .hg@default]% cat * */* */*/* 2>/dev/null | wc -c
> 1481
> [kwpolska@kwpolska-lin .git@master]% cat * */* */*/* */*/*/* 2>/dev/null
> | wc -c
> 16860 ← WRONG!
>
> There is just one problem with this: an empty git repository starts at
> 15216 bytes, due to some sample hooks. Let’s remove them and try
> again:
>
> [kwpolska@kwpolska-lin .git@master]% rm hooks/*
> [kwpolska@kwpolska-lin .git@master]% cat * */* */*/* */*/*/* */*/*/*
> 2>/dev/null | wc -c
> 2499
>
> which is a much more sane number. This includes a config file (in the
> ini/configparser format) and such. According to my maths skils (or
> rather zsh’s skills), new commits are responsible for 1644 bytes in
> the git repo and 1391 bytes in the hg repo.
>
This is not a fair comparison, either. If we want to do a fair comparison
pertinent to this discussion, let's convert the cpython mercurial
repository into a git repository and allow the git repo to repack the diffs
the way it deems fit.
I'm using the git-remote-hg.py script [
https://github.com/felipec/git/blob/fc/master/contrib/remote-helpers/git-remote-hg.py]
to clone a mercurial repo into a native git repo. Then, in one of the rare
cases, using git gc --aggressive. [1]
The result:
Git:
cpython_git/.git $ du -h --max-depth=1
40K ./hooks
145M ./objects
20K ./logs
24K ./refs
24K ./info
146M .
Mercurial:
cpython/.hg $ du -h --max-depth=1
209M ./store
20K ./cache
209M .
And to help illustrate the equivalence of the two repositories:
Git:
cpython_git $ git log | head; git log | tail
commit 78f82bde04f8b3832f3cb6725c4bd9c8d705d13b
Author: Brett Cannon <brett@python.org>
Date: Sat Jun 15 23:24:11 2013 -0400
Make test_builtin work when executed directly
commit a7b16f8188a16905bbc1d49fe6fd940078dd1f3d
Merge: 346494a af14b7c
Author: Gregory P. Smith <greg@krypto.org>
Date: Sat Jun 15 18:14:56 2013 -0700
Author: Guido van Rossum <guido@python.org>
Date: Mon Sep 10 11:15:23 1990 +0000
Warning about incompleteness.
commit b5e5004ae8f54d7d5ddfa0688fc8385cafde0e63
Author: Guido van Rossum <guido@python.org>
Date: Thu Aug 9 14:25:15 1990 +0000
Initial revision
Mercurial:
cpython $ hg log | head; hg log | tail
changeset: 84163:5b90da280515
bookmark: master
tag: tip
user: Brett Cannon <brett@python.org>
date: Sat Jun 15 23:24:11 2013 -0400
summary: Make test_builtin work when executed directly
changeset: 84162:7dee56b6ff34
parent: 84159:5e8b377942f7
parent: 84161:7e06a99bb821
user: Guido van Rossum <guido@python.org>
date: Mon Sep 10 11:15:23 1990 +0000
summary: Warning about incompleteness.
changeset: 0:3cd033e6b530
branch: legacy-trunk
user: Guido van Rossum <guido@python.org>
date: Thu Aug 09 14:25:15 1990 +0000
summary: Initial revision
They both appear to have the same history. In this particular case, it
seems that git does a better job in terms of space management, probably due
to the fact that it doesn't store duplicate copies of identical source code
that appears in different files (it tracks content, not files).
That being said, from what I've read both git and mercurial have their
advantages, both in the performance arena and the features/usability arena
(I only know how to really use git). I'd certainly take a DVCS over a
centralized model any day.
All the best,
Jason
[1] I know I just posted in this thread about --aggressive being bad, but
the packing from the translation was horrible --> the translated git repo
was ~2 GB in size. An `aggressive' repacking was necessary to allow git to
decide how to pack the diffs.
[toc] | [prev] | [next] | [standalone]
| From | Serhiy Storchaka <storchaka@gmail.com> |
|---|---|
| Date | 2013-06-13 10:20 +0300 |
| Message-ID | <mailman.3168.1371108061.3114.python-list@python.org> |
| In reply to | #47857 |
13.06.13 05:41, Tim Chase написав(ла): > -hg: last I checked, can't do octopus merges (merges with more than > two parents) > > +git: can do octopus merges Actually it is possible in Mercurial. I just have made a merge of two files in CPython test suite (http://bugs.python.org/issue18048).
[toc] | [prev] | [next] | [standalone]
| From | Tim Chase <python.list@tim.thechases.com> |
|---|---|
| Date | 2013-06-13 07:34 -0500 |
| Message-ID | <mailman.3185.1371126784.3114.python-list@python.org> |
| In reply to | #47857 |
On 2013-06-13 10:20, Serhiy Storchaka wrote: > 13.06.13 05:41, Tim Chase написав(ла): > > -hg: last I checked, can't do octopus merges (merges with more > > than two parents) > > > > +git: can do octopus merges > > Actually it is possible in Mercurial. Okay, then that moots this pro/con pair. I seem to recall that at one point in history, Mercurial required you to do pairwise merges rather than letting you merge multiple branches in one pass. -tkc
[toc] | [prev] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2013-06-13 08:52 -0400 |
| Message-ID | <roy-EFB9F5.08521813062013@news.panix.com> |
| In reply to | #47947 |
In article <mailman.3185.1371126784.3114.python-list@python.org>, Tim Chase <python.list@tim.thechases.com> wrote: > On 2013-06-13 10:20, Serhiy Storchaka wrote: > > 13.06.13 05:41, Tim Chase написав(ла): > > > -hg: last I checked, can't do octopus merges (merges with more > > > than two parents) > > > > > > +git: can do octopus merges > > > > Actually it is possible in Mercurial. > > Okay, then that moots this pro/con pair. I seem to recall that at > one point in history, Mercurial required you to do pairwise merges > rather than letting you merge multiple branches in one pass. > > -tkc So, I guess the next questions is, why would you *want* to merge multiple branches in one pass? What's the use case? I've been using VCSs for a long time (I've used RCS, CVS, ClearCase, SVN (briefly), Perforce, Git, and hg). I can't ever remember a time when I've wanted to do such a thing. Maybe it's the kind of thing that makes sense on a huge distributed project with hundreds of people committing patches willy-nilly? How would hg even represent such a multi-way merge? Doesn't every revision have exactly one or two parents?
[toc] | [prev] | [standalone]
Page 3 of 3 — ← Prev page 1 2 [3]
Back to top | Article view | comp.lang.python
csiph-web