Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; '16,': 0.03; 'author:': 0.03; 'skip:[ 20': 0.04; 'cpython': 0.05; 'guido': 0.05; 'repository': 0.05; 'duplicate': 0.07; 'revision': 0.07; 'brett': 0.09; 'bytes,': 0.09; 'changeset:': 0.09; 'executed': 0.09; 'git': 0.09; 'mercurial': 0.09; 'rossum': 0.09; 'thu': 0.09; 'url:github': 0.09; 'cc:addr:python-list': 0.11; 'email addr:python.org>': 0.11; 'translation': 0.12; 'thread': 0.14; 'posted': 0.15; '(it': 0.16; '1990': 0.16; 'branch:': 0.16; 'cc:name:python list': 0.16; 'clone': 0.16; 'comparison.': 0.16; 'dvcs': 0.16; 'equivalence': 0.16; 'guessing': 0.16; 'hmm.': 0.16; 'illustrate': 0.16; 'parent:': 0.16; 'sane': 0.16; 'url:py': 0.16; '\xc2\xa0if': 0.16; 'wrote:': 0.18; 'commit': 0.19; 'translated': 0.19; 'starts': 0.20; 'seems': 0.21; 'appears': 0.22; '(in': 0.22; '+0000': 0.22; 'aug': 0.22; 'cc:addr:gmail.com': 0.22; 'email addr:gmail.com>': 0.22; 'cc:addr:python.org': 0.22; 'cc:2**1': 0.23; 'bytes': 0.24; 'certainly': 0.24; 'config': 0.24; 'either.': 0.24; 'decide': 0.24; 'initial': 0.24; '(or': 0.24; "i've": 0.25; 'source': 0.25; 'script': 0.25; '>': 0.26; 'this:': 0.26; 'van': 0.27; 'header:In-Reply-To:1': 0.27; 'appear': 0.29; 'chris': 0.29; '[1]': 0.29; 'am,': 0.29; "doesn't": 0.30; 'said,': 0.30; 'message-id:@mail.gmail.com': 0.30; "i'm": 0.30; 'code': 0.31; 'comparison': 0.31; 'lot.': 0.31; 'sep': 0.31; 'such.': 0.31; 'file': 0.32; 'probably': 0.32; 'date:': 0.34; "i'd": 0.34; 'problem': 0.35; 'created': 0.35; 'skip:u 20': 0.35; 'case,': 0.35; 'convert': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'really': 0.36; 'arena': 0.36; 'sat': 0.36; 'skip:~ 10': 0.36; 'next': 0.36; 'responsible': 0.36; 'two': 0.37; 'performance': 0.37; 'being': 0.38; 'jason': 0.38; 'files': 0.38; 'skip:- 10': 0.38; 'fact': 0.38; 'rather': 0.38; 'does': 0.39; 'skip:8 10': 0.39; 'skip:b 40': 0.39; 'space': 0.40; 'according': 0.40; 'how': 0.40; 'remove': 0.60; 'read': 0.60; 'skip:\xc2 10': 0.60; 'then,': 0.60; 'new': 0.61; 'making': 0.63; 'day.': 0.63; '8bit%:10': 0.64; 'more': 0.64; 'different': 0.65; 'skip:+ 10': 0.65; 'size.': 0.65; 'skip:\xe2 10': 0.65; 'to:addr:gmail.com': 0.65; 'due': 0.66; 'sample': 0.67; 'content,': 0.68; 'smith': 0.68; 'skip:a 40': 0.72; '8bit%:43': 0.74; 'bulk': 0.74; 'horrible': 0.84; 'let\xe2\x80\x99s': 0.84; 'maths': 0.84; 'packing': 0.84; 'promptly': 0.84; 'subject:Control': 0.84; 'url:master': 0.84; 'wrong!': 0.84; '2013': 0.98 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=loECjDTzEE4UTDOqwhaOZegwCAMNdNzJXdb67sa6aLY=; b=s0O12hoovAsbqESepjgbz42JAN2N5Jepk74zCqgd73Plw3zJSq89KAjJTmOVnKNJex 38fv9KVoUlR9yxCYybk635C/xHEPjh4azoyof/B2ONAHJ+WJFRwe/Nn4WmW3EaAQ0FP6 Y9/XWa5RbGzqVcNj7vSQYlfg1KTGBtnJw/bE/XFZ1vwToq8ta5synnUeCfnfY7KFdi0/ YcTqZgcyxKPVY4E83h04aS1q0AGXPUwtumultXUZh4bGeYx1ssuIGYPjzwAU8EztO26w 2gw9eZUQH5XvTu4g3fsu4r90n0s7ydR0udQDNbSEHjo0uKJWdgsamn1YAZU/qAQFxPp9 Nxfw== MIME-Version: 1.0 X-Received: by 10.50.3.103 with SMTP id b7mr3060287igb.54.1371400770382; Sun, 16 Jun 2013 09:39:30 -0700 (PDT) In-Reply-To: References: <98c13a55-dbf2-46a7-a2aa-8c5f052ff375@googlegroups.com> <2644d0de-9a81-41aa-b27a-cb4535964b58@googlegroups.com> <51BB8338.50006@davea.name> Date: Sun, 16 Jun 2013 12:39:30 -0400 Subject: Re: Version Control Software From: Jason Swails To: =?windows-1252?Q?Chris_=93Kwpolska=94_Warrick?= Content-Type: multipart/alternative; boundary=089e013c61e0c3736f04df481f5b Cc: python list X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 305 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1371400774 news.xs4all.nl 16003 [2001:888:2000:d::a6]:57902 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:48461 --089e013c61e0c3736f04df481f5b Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Sun, Jun 16, 2013 at 9:30 AM, Chris =E2=80=9CKwpolska=E2=80=9D Warrick < kwpolska@gmail.com> wrote: > On Sun, Jun 16, 2013 at 1:14 AM, Chris Angelico wrote: > > Hmm. ~/cpython/.hg is 200MB+, but ~/pike/.git is only 86MB. Does > > Mercurial compress its content? A tar.gz of each comes down, but only > > to ~170MB and ~75MB respectively, so I'm guessing the bulk of it is > > already compressed. But 200MB for cpython seems like a lot. > > Next time, do a more fair comparison. > > I created an empty git and hg repository, and created a file promptly > named =E2=80=9Cfile=E2=80=9D with DIGIT ONE (0x31; UTF-8/ASCII=E2=80=93en= coded) and commited > it with =E2=80=9Cc1=E2=80=9D as the message, then I turned it into =E2=80= =9C12=E2=80=9D and commited > as =E2=80=9Cc2=E2=80=9D and did this one more time, making the file =E2= =80=9C123=E2=80=9D at commit > named =E2=80=9Cc3=E2=80=9D. > > [kwpolska@kwpolska-lin .hg@default]% cat * */* */*/* 2>/dev/null | wc -c > 1481 > [kwpolska@kwpolska-lin .git@master]% cat * */* */*/* */*/*/* 2>/dev/null > | wc -c > 16860 =E2=86=90 WRONG! > > There is just one problem with this: an empty git repository starts at > 15216 bytes, due to some sample hooks. Let=E2=80=99s remove them and try > again: > > [kwpolska@kwpolska-lin .git@master]% rm hooks/* > [kwpolska@kwpolska-lin .git@master]% cat * */* */*/* */*/*/* */*/*/* > 2>/dev/null | wc -c > 2499 > > which is a much more sane number. This includes a config file (in the > ini/configparser format) and such. According to my maths skils (or > rather zsh=E2=80=99s skills), new commits are responsible for 1644 bytes = in > the git repo and 1391 bytes in the hg repo. > This is not a fair comparison, either. If we want to do a fair comparison pertinent to this discussion, let's convert the cpython mercurial repository into a git repository and allow the git repo to repack the diffs the way it deems fit. I'm using the git-remote-hg.py script [ https://github.com/felipec/git/blob/fc/master/contrib/remote-helpers/git-re= mote-hg.py] to clone a mercurial repo into a native git repo. Then, in one of the rare cases, using git gc --aggressive. [1] The result: Git: cpython_git/.git $ du -h --max-depth=3D1 40K ./hooks 145M ./objects 20K ./logs 24K ./refs 24K ./info 146M . Mercurial: cpython/.hg $ du -h --max-depth=3D1 209M ./store 20K ./cache 209M . And to help illustrate the equivalence of the two repositories: Git: cpython_git $ git log | head; git log | tail commit 78f82bde04f8b3832f3cb6725c4bd9c8d705d13b Author: Brett Cannon Date: Sat Jun 15 23:24:11 2013 -0400 Make test_builtin work when executed directly commit a7b16f8188a16905bbc1d49fe6fd940078dd1f3d Merge: 346494a af14b7c Author: Gregory P. Smith Date: Sat Jun 15 18:14:56 2013 -0700 Author: Guido van Rossum Date: Mon Sep 10 11:15:23 1990 +0000 Warning about incompleteness. commit b5e5004ae8f54d7d5ddfa0688fc8385cafde0e63 Author: Guido van Rossum Date: Thu Aug 9 14:25:15 1990 +0000 Initial revision Mercurial: cpython $ hg log | head; hg log | tail changeset: 84163:5b90da280515 bookmark: master tag: tip user: Brett Cannon date: Sat Jun 15 23:24:11 2013 -0400 summary: Make test_builtin work when executed directly changeset: 84162:7dee56b6ff34 parent: 84159:5e8b377942f7 parent: 84161:7e06a99bb821 user: Guido van Rossum date: Mon Sep 10 11:15:23 1990 +0000 summary: Warning about incompleteness. changeset: 0:3cd033e6b530 branch: legacy-trunk user: Guido van Rossum date: Thu Aug 09 14:25:15 1990 +0000 summary: Initial revision They both appear to have the same history. In this particular case, it seems that git does a better job in terms of space management, probably due to the fact that it doesn't store duplicate copies of identical source code that appears in different files (it tracks content, not files). That being said, from what I've read both git and mercurial have their advantages, both in the performance arena and the features/usability arena (I only know how to really use git). I'd certainly take a DVCS over a centralized model any day. All the best, Jason [1] I know I just posted in this thread about --aggressive being bad, but the packing from the translation was horrible --> the translated git repo was ~2 GB in size. An `aggressive' repacking was necessary to allow git to decide how to pack the diffs. --089e013c61e0c3736f04df481f5b Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable



On Sun, Jun 16, 2013 at 9:30 AM,= Chris =E2=80=9CKwpolska=E2=80=9D Warrick <kwpolska@gmail.com> wrote:
On Sun, Jun 16, 2013 at 1:14 AM, Chris Angelico <<= a href=3D"mailto:rosuav@gmail.com" target=3D"_blank">rosuav@gmail.com&g= t; wrote:
> Hmm. ~/cpython/.hg is 200MB+, but ~/pike/.git is only 86MB. Does
> Mercurial compress its content? A tar.gz of each comes down, but only<= br> > to ~170MB and ~75MB respectively, so I'm guessing the bulk of it i= s
> already compressed. But 200MB for cpython seems like a lot.

Next time, do a more fair comparison.

I created an empty git and hg repository, and created a file promptly
named =E2=80=9Cfile=E2=80=9D with DIGIT ONE (0x31; UTF-8/ASCII=E2=80=93enco= ded) and commited
it with =E2=80=9Cc1=E2=80=9D as the message, then I turned it into =E2=80= =9C12=E2=80=9D and commited
as =E2=80=9Cc2=E2=80=9D and did this one more time, making the file =E2=80= =9C123=E2=80=9D at commit
named =E2=80=9Cc3=E2=80=9D.

[kwpolska@kwpolska-lin .hg@default]% cat * */* */*/* 2>/dev/null | wc -c=
1481
[kwpolska@kwpolska-lin .git@master]% cat * */* */*/* */*/*/* 2>/dev/null= | wc -c
16860 =E2=86=90 WRONG!

There is just one problem with this: an empty git repository starts at
15216 bytes, due to some sample hooks. =C2=A0Let=E2=80=99s remove them and = try
again:

[kwpolska@kwpolska-lin .git@master]% rm hooks/*
[kwpolska@kwpolska-lin .git@master]% cat * */* */*/* */*/*/* */*/*/*
2>/dev/null | wc -c
2499

which is a much more sane number. =C2=A0This includes a config file (in the=
ini/configparser format) and such. =C2=A0According to my maths skils (or rather zsh=E2=80=99s skills), new commits are responsible for 1644 bytes in=
the git repo and 1391 bytes in the hg repo.

=
This is not a fair comparison, either. =C2=A0I= f we want to do a fair comparison pertinent to this discussion, let's c= onvert the cpython mercurial repository into a git repository and allow the= git repo to repack the diffs the way it deems fit.

I'm= using the git-remote-hg.py script [https://github.c= om/felipec/git/blob/fc/master/contrib/remote-helpers/git-remote-hg.py] = to clone a mercurial repo into a native git repo. =C2=A0Then, in one of the= rare cases, using git gc --aggressive. [1]

T= he result:

Git:
cpython_git/.git $ du -h --max-depth=3D1
= 40K ./hooks
145M ./objects
20K= ./logs
24K ./refs
24K ./info
146M .

Mer= curial:
cpython/.hg $ du -h --max-depth=3D1
209M ./store
20K ./cache
209= M .


And to help illustrate the equivalence of the two= repositories:

Git:

cpython_git $ git log | head; g= it log | tail

commit 78f82bde04f8b3832f3cb6725c4bd9c8d705d13b
Author: Brett Cannon <brett@p= ython.org>
Date: =C2=A0 Sat Jun 15= 23:24:11 2013 -0400

=C2=A0 = =C2=A0 Make test_builtin work when executed directly

commit a7b16f8188a16905b= bc1d49fe6fd940078dd1f3d
Merge: 346494a af14b7c
Author: Gregory P. Smith <= greg@krypto.org>
Date: =C2=A0 Sat = Jun 15 18:14:56 2013 -0700
Author: Guido van Rossum <guido@python.org>
Date: =C2=A0 Mon Sep 10 11:15:23 1990 +0000

=C2=A0 =C2=A0 Warning about incompletene= ss.

commit b5e5004ae8f54d7d5ddfa0688fc8385cafde0e63
Author: Guido van Rossum <g= uido@python.org>
Date: =C2=A0 Thu Aug 9 14:25:15 1990 +0000

=C2= =A0 =C2=A0 Initial revision

Mercurial:
=

cpython $ hg log | head; hg log | tail

c= hangeset: =C2=A0 84163:5b90da280515
bookmark: =C2=A0 =C2=A0master=
tag: =C2=A0 =C2=A0 =C2=A0 =C2=A0 tip
user: =C2=A0 =C2= =A0 =C2=A0 =C2=A0Brett Cannon <brett= @python.org>
date: =C2=A0 =C2=A0 =C2=A0 =C2=A0Sat Jun 15 23:24:11 2013 -0400
<= div>summary: =C2=A0 =C2=A0 Make test_builtin work when executed directly

changeset: =C2=A0 84162:7dee56b6ff34
paren= t: =C2=A0 =C2=A0 =C2=A084159:5e8b377942f7
parent: =C2=A0 =C2=A0 =C2=A084161:7e06a99bb821
user: =C2=A0 = =C2=A0 =C2=A0 =C2=A0Guido van Rossum <guido@python.org>
date: =C2=A0 =C2=A0 =C2=A0 =C2=A0Mon S= ep 10 11:15:23 1990 +0000
summary: =C2=A0 =C2=A0 Warning about in= completeness.

changeset: =C2=A0 0:3cd033e6b530
branch: =C2= =A0 =C2=A0 =C2=A0legacy-trunk
user: =C2=A0 =C2=A0 =C2=A0 =C2=A0Gu= ido van Rossum <guido@python.org= >
date: =C2=A0 =C2=A0 =C2=A0 =C2=A0Thu Aug 09 14:25:15 1990 +0= 000
summary: =C2=A0 =C2=A0 Initial revision

They both appear to have the same history. =C2=A0In this particular c= ase, it seems that git does a better job in terms of space management, prob= ably due to the fact that it doesn't store duplicate copies of identica= l source code that appears in different files (it tracks content, not files= ).

That being said, from what I've read both git and m= ercurial have their advantages, both in the performance arena and the featu= res/usability arena (I only know how to really use git). =C2=A0I'd cert= ainly take a DVCS over a centralized model any day.

All the best,
Jason

=
[1] I know I just posted in this thread about --aggressive= being bad, but the packing from the translation was horrible --> the tr= anslated git repo was ~2 GB in size. =C2=A0An `aggressive' repacking wa= s necessary to allow git to decide how to pack the diffs.
--089e013c61e0c3736f04df481f5b--