Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #73819

Re: fixing an horrific formatted csv file.

Path csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!rt.uk.eu.org!newsfeed.xs4all.nl!newsfeed2a.news.xs4all.nl!xs4all!news.tele.dk!news.tele.dk!small.news.tele.dk!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <rosuav@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.067
X-Spam-Evidence '*H*': 0.87; '*S*': 0.00; 'true,': 0.05; 'stops': 0.07; 'mess': 0.09; 'cc:addr:python-list': 0.11; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'messy': 0.16; 'tends': 0.16; 'tweak': 0.16; 'wrote:': 0.18; 'wed,': 0.18; 'file,': 0.19; 'cc:addr:python.org': 0.22; 'case.': 0.24; 'helpful': 0.24; "haven't": 0.24; 'cc:2**0': 0.24; 'script': 0.25; 'gets': 0.27; 'header:In-Reply-To:1': 0.27; 'point': 0.28; 'am,': 0.29; "doesn't": 0.30; '(like': 0.30; 'message- id:@mail.gmail.com': 0.30; 'lines': 0.31; 'that.': 0.31; 'usually': 0.31; 'fixing': 0.31; 'produces': 0.31; 'file': 0.32; 'figure': 0.32; 'run': 0.32; "i'd": 0.34; 'something': 0.35; 'case,': 0.35; 'received:google.com': 0.35; 'data,': 0.36; 'effort': 0.37; 'wrong': 0.37; 'being': 0.38; 'handle': 0.38; 'does': 0.39; 'either': 0.39; 'how': 0.40; 'full': 0.61; 'providing': 0.61; 'simple': 0.61; 'first': 0.61; "you've": 0.63; 'grab': 0.64; 'pick': 0.64; 'worth': 0.66; 'believe': 0.68; 'results': 0.69; 'jul': 0.74; 'repeat.': 0.84; 'rinse': 0.84; 'to:none': 0.92; 'incredibly': 0.96
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=tXqIOkSjkbK6qKsBVyCvx5NEWG9Iotc9IAodZ1vds1o=; b=J8eohTiLPbKP42ThrlnOCAdqpjnjAn3ob5o/IlzBnHODMy9KGKjQHnHk+CV/oel7MN yQOBYzVW0u8otowFWWALFhmp8Knrv7qFLs4vt+ntRTX7qXU2hZdloW3beVPs3a0zOv0b LdnDTpHec+lZa9CfgIOi884tVqDH3iQgpCtYGKzfQe0D2QGDB5qCerHOY+8+qRxvQEBI kuMaFfi575ORfWowU3JEXAg5UruS98bkRUzKQXyUM0ETzyjnQkGivlXceiBbvmZg37KI ZT/DkCC7av7Lk7SLkEehZm9KrpDabYhezEU1b1K5jw0JOMx6oQ3d8+cIgiReP1xVnmI6 t8EQ==
MIME-Version 1.0
X-Received by 10.221.55.70 with SMTP id vx6mr29851976vcb.23.1404264053513; Tue, 01 Jul 2014 18:20:53 -0700 (PDT)
In-Reply-To <0d3871c6-81d4-4168-9408-ad85299b0955@googlegroups.com>
References <47e2e29d-b5c3-4aa6-abf9-3b1e46eb0dec@googlegroups.com> <mailman.11385.1404247829.18130.python-list@python.org> <0d3871c6-81d4-4168-9408-ad85299b0955@googlegroups.com>
Date Wed, 2 Jul 2014 11:20:53 +1000
Subject Re: fixing an horrific formatted csv file.
From Chris Angelico <rosuav@gmail.com>
Cc "python-list@python.org" <python-list@python.org>
Content-Type text/plain; charset=UTF-8
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.11392.1404264061.18130.python-list@python.org> (permalink)
Lines 19
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1404264061 news.xs4all.nl 2829 [2001:888:2000:d::a6]:48972
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:73819

Show key headers only | View raw


On Wed, Jul 2, 2014 at 7:41 AM, flebber <flebber.crue@gmail.com> wrote:
> I understand why providing full solutions is frowned upon, because it doesn't assist in learning. Which is true,  it's incredibly helpful in this case.

In this case, my main reason for not providing a full solution is that
the work tends to be iterative. When I have a huge and messy file,
what I usually do is grab the first half-dozen lines and work out how
I'd go about fixing them manually, then write a script that does that.
Then run the script on the whole file, and see where it either chokes
or produces wrong data. Pick up the first few lines of wrong data,
figure out how to tweak the program to handle those. Rinse and repeat.

Often, what that results in is a file that gets progressively tidier.
When the scope of the mess is infinite (like with human-entered data -
believe you me, you haven't seen messy until you've seen what a
committee can do to a simple job), this means you stop working on the
script at exactly the point where it stops being worth the effort -
which is something that only you can decide.

ChrisA

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

fixing an horrific formatted csv file. flebber <flebber.crue@gmail.com> - 2014-07-01 07:04 -0700
  Re: fixing an horrific formatted csv file. MRAB <python@mrabarnett.plus.com> - 2014-07-01 15:32 +0100
  Re: fixing an horrific formatted csv file. "F.R." <anthra.norell@bluewin.ch> - 2014-07-01 22:49 +0200
    Re: fixing an horrific formatted csv file. flebber <flebber.crue@gmail.com> - 2014-07-01 14:41 -0700
      Re: fixing an horrific formatted csv file. Chris Angelico <rosuav@gmail.com> - 2014-07-02 11:20 +1000
        Re: fixing an horrific formatted csv file. flebber <flebber.crue@gmail.com> - 2014-07-02 02:13 -0700
          Re: fixing an horrific formatted csv file. "F.R." <anthra.norell@bluewin.ch> - 2014-07-02 17:51 +0200
            Re: fixing an horrific formatted csv file. flebber <flebber.crue@gmail.com> - 2014-07-03 21:12 -0700
              Re: fixing an horrific formatted csv file. Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-07-04 18:19 +1200
                Re: fixing an horrific formatted csv file. flebber <flebber.crue@gmail.com> - 2014-07-04 03:48 -0700
              Re: fixing an horrific formatted csv file. flebber <flebber.crue@gmail.com> - 2014-07-04 03:28 -0700
                Re: fixing an horrific formatted csv file. "F.R." <anthra.norell@bluewin.ch> - 2014-07-04 15:24 +0200

csiph-web