Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.004 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'subject:: [': 0.03; 'preferably': 0.03; 'instance,': 0.05; 'figures': 0.07; 'involves': 0.07; 'pretend': 0.07; 'correct.': 0.09; 'deliberately': 0.09; 'either.': 0.09; 'etc...': 0.09; 'figuring': 0.09; 'hits': 0.09; 'identifies': 0.09; 'mix': 0.09; 'sites.': 0.09; 'usage.': 0.09; 'looked': 0.10; 'pm,': 0.11; 'say,': 0.14; 'wrote:': 0.14; 'altered': 0.16; 'bit.': 0.16; 'crawlers': 0.16; 'hammer': 0.16; 'liberal': 0.16; 'reckon': 0.16; 'subject:versus': 0.16; 'sure,': 0.16; 'switching': 0.16; 'what?': 0.16; 'subject:] ': 0.16; 'usage': 0.20; '(which': 0.21; 'header:In-Reply-To:1': 0.22; "aren't": 0.22; 'science.': 0.22; 'thu,': 0.22; 'trying': 0.23; '"not': 0.23; '(b)': 0.23; 'assumes': 0.23; 'etc,': 0.23; 'further.': 0.23; 'high.': 0.23; '\xa0so': 0.23; "what's": 0.24; 'downloads': 0.25; 'unable': 0.26; 'chris': 0.27; 'work.': 0.27; 'google': 0.27; 'message-id:@mail.gmail.com': 0.28; 'server': 0.29; '(as': 0.29; 'assuming': 0.29; 'retrieval': 0.29; 'toward': 0.29; 'this.': 0.30; 'does': 0.31; "can't": 0.31; 'data,': 0.31; 'programmers': 0.31; 'anyone': 0.31; 'to:addr:python-list': 0.32; 'done': 0.32; '...': 0.32; "i've": 0.33; 'headers': 0.33; 'vote': 0.33; 'someone': 0.33; 'bit': 0.33; 'data.': 0.33; 'fairly': 0.33; "isn't": 0.34; 'using': 0.34; 'skip:" 10': 0.34; 'actually': 0.34; 'there': 0.35; 'but,': 0.35; 'open': 0.35; '"we': 0.35; '(a)': 0.35; 'accurate': 0.35; 'lie': 0.35; 'recognize': 0.35; 'subject:software': 0.35; '(with': 0.36; 'getting': 0.36; 'think': 0.36; 'problem.': 0.36; 'enough': 0.37; "we're": 0.37; 'data': 0.37; 'some': 0.37; 'received:209.85': 0.37; 'useful': 0.37; 'apr': 0.38; 'identity': 0.38; "who's": 0.38; 'received:google.com': 0.38; 'goes': 0.38; 'install': 0.38; 'less': 0.38; 'but': 0.38; 'reasonable': 0.38; 'reasons': 0.38; 'software': 0.38; 'active': 0.39; 'to:addr:python.org': 0.39; 'could': 0.39; 'where': 0.39; 'received:209': 0.39; 'takes': 0.40; 'count': 0.40; 'include': 0.40; 'would': 0.40; "it's": 0.40; 'header:Received:5': 0.40; 'messages': 0.40; 'might': 0.40; 'huge': 0.62; 'further': 0.62; '2011': 0.62; 'sites': 0.62; 'free': 0.62; 'and,': 0.63; 'site.': 0.63; 'market': 0.63; 'overall': 0.64; 'browser': 0.64; 'ever': 0.65; 'share': 0.67; 'engine': 0.67; 'care': 0.67; 'sales': 0.69; 'economic': 0.69; 'subject:Free': 0.72; 'viewed': 0.77; '"browser': 0.84; "(i've": 0.84; 'chrome': 0.84; 'dent': 0.84; 'figures.': 0.84; 'received:209.85.210.174': 0.84; 'received:mail- iy0-f174.google.com': 0.84; 'stupid': 0.84; 'here...': 0.91; 'realm': 0.91; 'republican': 0.91; 'seriously,': 0.91 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type:content-transfer-encoding; bh=ccsxNrhPh7fqu3aFdMNazyotj3UKnr/qc0bDh3gjF9A=; b=TU6F43w1ENR8miJwYIaY4I709dTU+SiM/yvAeBhUo0HfxJwjw9oQ2ieiKh9kLro8/8 OLO1tGMyU7yo4HStWnoV1uqM0umoB03aNZ2lypZ59K7CHq+Tl2PmGm0fW1aL2YPkVfzw GuR5/D3A90ssE+WKD/ghBpR/BEN5hjEf33hKE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=psPO14Oc3QkKv7A+337l6o6ajBeKQ2ZX7I1m8DBlOMLza/T51VX+hdeUM+/r3Jt37G 120GmQ+OOX51rHF6/ZL6ZomqGiDyDTY8y1vzbv245h91uXitnNPVzOYGtitXJNiz5iQb WF9qkri+BC79oZGdT8YICVq4AQmkzChSimldE= MIME-Version: 1.0 In-Reply-To: References: <4d9c5ca5$0$29991$c3e8da3$5496439d@news.astraweb.com> <87tyebf3r3.fsf_-_@benfinney.id.au> <4d9d6a4a$0$29992$c3e8da3$5496439d@news.astraweb.com> <6Nknp.10080$lx3.7480@newsfe02.iad> <4d9e6551$0$29991$c3e8da3$5496439d@news.astraweb.com> <4da0f1e1$0$29986$c3e8da3$5496439d@news.astraweb.com> Date: Thu, 14 Apr 2011 19:15:05 +1000 Subject: Re: [OT] Free software versus software idea patents From: Chris Angelico To: python-list@python.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 85 NNTP-Posting-Host: 82.94.164.166 X-Trace: 1302772509 news.xs4all.nl 81481 [::ffff:82.94.164.166]:47355 X-Complaints-To: abuse@xs4all.nl Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:3190 On Thu, Apr 14, 2011 at 4:04 PM, harrismh777 wrot= e: > > =A0 =A0How many web crawlers have you built? Are there any web programmer= s out > there who need a web bot to hit multiple sites zillions of times a month > from different places on earth to 'up' the number of hits for economic > reasons? I've seen my share of this. A well-behaved spider will (a) have a UA that identifies itself (as a bot, and preferably as itself - eg "GoogleBot", etc - some even go so far as to include a URL for more info), and (b) start by fetching /robots.txt before they go any further. Servers can recognize properly-built crawlers. And improperly-built crawlers, deliberately trying to hammer a server to lie about browser stats? Seriously, do you think people actually care THAT much? > =A0 =A0How mamy times have you altered the identity of your web browser s= o that > the web site would 'work'? You know, stupid messages from the server that > say, "We only support IE 6+, upgrade your browser...", =A0so you tell it > you're using IE 6 and, well no problem. Yep. Which means that the figures will always be skewed toward IE a bit. But it's a lot less than you might think; most people don't leave UA switchers active all the time, and the number of web sites that require them is dropping. It's true that UA switching will tip toward IE (I've never seen a site where you have to pretend to be Google Chrome), but the epidemology is, I believe, not all that high. > =A0 =A0Web site data is bogus. It assumes even distributions... it assume= s even > usage of the site from all surfers, it assumes no web crawlers and no bot= s, > it assumes no browser identity tampering, and it assumes that there aren'= t > those who for economic reasons are not inflating the numbers deliberately > (no, really??) from world-owned bot farms. Even distributions of what? 1) Assuming nothing, it merely gives data. About one site. That's why overall "browser marketshare" stats have to be done by averaging multiple sites. 2) Web crawlers - see above. If you've ever looked at AWStats or Webalizer or *insert stats engine here*, you'll have seen that it will identify them. AWStats goes a bit further and will identify "viewed traffic" and "not viewed traffic" even if it's unable to identify the specific bot. 3) Yes, it assumes no UA switchers, obviously. It's just based on headers. But I reckon you could easily identify someone who's using a switcher, based on other headers - for instance, I doubt very much that IE6 will send "Accept-Encoding: gzip,deflate,sdch" (which my Chrome does). 4) Assumes people aren't deliberately fiddling the figures. Yeah, that would be correct. We're in the realm of conspiracy theories here... does anyone seriously think that browser stats are THAT important that they'd go to multiple web servers with deceitful hits? Not forgetting that they'd have to mix up the IPs, make plausible "browsing sessions" (with referers and image retrieval and so on), vary the date/times, etc, etc, etc, etc... and generate enough hits to make a reasonable dent in the figures. > =A0 =A0There is no reliable way to measure free software usage. But, ther= e sure > is a lot of posturing going on in the market place ... =A0wonder why? Sure, and there's no reliable way to measure non-free software usage either. What's the difference? You could count sales of Microsoft Office, and you could count downloads of Open Office. Neither is any more accurate than the other; although I think the 24-hour figures for Firefox 4 / IE 9 downloads are fairly indicative, since people can't get them off their respective OS install CDs. And this isn't restricted to electronica either. Which is more popular, Coca-Cola or Pepsi? Do more people vote Liberal or Labour, Republican or Democrat, Whig or Tory? Statisticking is a huge science. Most of it involves figuring out what's important - anyone can get data, but getting useful information out of the data takes some work. Chris Angelico