Path: csiph.com!usenet.pasdenom.info!news.albasani.net!nntp-feed.chiark.greenend.org.uk!ewrotcd!news.nosignal.org!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.018 X-Spam-Evidence: '*H*': 0.96; '*S*': 0.00; 'python.': 0.02; 'anyway.': 0.04; 'newbie': 0.05; 'scipy': 0.05; 'python': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; "wouldn't": 0.11; 'suggest': 0.11; 'aug': 0.13; 'debates': 0.16; 'ideally,': 0.16; 'posting,': 0.16; 'received:80.91.229.3': 0.16; 'received:plane.gmane.org': 0.16; 'wrote:': 0.17; 'saying': 0.18; '(or': 0.18; 'trying': 0.21; 'bit': 0.21; 'thanks.': 0.21; 'form:': 0.22; 'header:In-Reply-To:1': 0.25; 'header:User- Agent:1': 0.26; 'wondering': 0.26; 'raw': 0.27; 'header:X -Complaints-To:1': 0.28; "i'm": 0.29; 'knows': 0.30; 'thursday,': 0.30; '-----': 0.32; 'from:addr:yahoo.co.uk': 0.32; 'certain': 0.33; 'cleaning': 0.33; 'anyone': 0.33; 'to:addr:python-list': 0.33; 'list': 0.35; 'pm,': 0.35; 'there': 0.35; 'list.': 0.35; 'received:org': 0.36; 'subject:': 0.36; 'but': 0.36; 'data.': 0.36; 'email addr:python.org': 0.36; 'should': 0.36; 'resources': 0.37; 'sent:': 0.37; 'data': 0.37; 'subject:: ': 0.38; 'mark': 0.38; 'from:': 0.38; 'some': 0.38; 'sure': 0.38; 'to:addr:python.org': 0.39; 'header:Received:5': 0.40; 'your': 0.60; 'more.': 0.62; 'email name:python-list': 0.62; 'dont': 0.64; 'here': 0.65; 'management': 0.65; 'subject:Data': 0.65; 'talking': 0.66; 'august': 0.66; 'teach': 0.69; 'analysis': 0.70; 'special': 0.73; 'answered,': 0.84; 'basically,': 0.84; 'elevated': 0.84; 'horrible': 0.84; 'much,': 0.84; 'received:89': 0.86; 'dirty': 0.91; 'rusi': 0.91; 'hands': 0.97 X-Injected-Via-Gmane: http://gmane.org/ To: python-list@python.org From: Mark Lawrence Subject: Re: Data cleaning workouts Date: Fri, 24 Aug 2012 09:16:30 +0100 References: <5042082c-5764-4c87-897a-776793753f55@r1g2000pbq.googlegroups.com> <1345790897.27768.YahooMailNeo@web122401.mail.ne1.yahoo.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Gmane-NNTP-Posting-Host: host-89-243-202-177.as13285.net User-Agent: Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/20120713 Thunderbird/14.0 In-Reply-To: <1345790897.27768.YahooMailNeo@web122401.mail.ne1.yahoo.com> X-Antivirus: avast! (VPS 120824-0, 23/08/2012), Outbound message X-Antivirus-Status: Clean X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 39 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1345796137 news.xs4all.nl 6930 [2001:888:2000:d::a6]:56295 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:27784 Elevated Python types don't get their hands dirty top posting, but I'm certain that they would when talking data or there wouldn't be so many debates on which data type to use :) On 24/08/2012 07:48, Fg Nu wrote: > > > Thanks. I will try the SciPy list. It was a bit of a hail mary anyway. Pretty sure elevated Python types don't actually get their hands dirty with data. ;) > > > > ----- Original Message ----- > From: rusi > To: python-list@python.org > Cc: > Sent: Thursday, August 23, 2012 11:01 PM > Subject: Re: Data cleaning workouts > > On Aug 23, 12:52 pm, Fg Nu wrote: >> List folk, >> >> I am a newbie trying to get used to Python. I was wondering if anyone knows of web resources that teach good practices in data cleaning and management for statistics/analytics/machine learning, particularly using Python. >> >> Ideally, these would be exercises of the form: here is some horrible raw data --> here is what it should look like after it has been cleaned. Guidelines about steps that should always be taken, practices that should be avoided; basically, workflow of data analysis in Python with special emphasis on the cleaning part. > > Since no one has answered, I suggest you narrow your searching from > 'python' to 'scipy' (or 'numpy'). > Also perhaps ipython. > And then perhaps try those specific mailing lists/fora. > > Since I dont know this area much, not saying more. > -- Cheers. Mark Lawrence.