Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!rt.uk.eu.org!newsfeed.xs4all.nl!newsfeed2a.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.061 X-Spam-Evidence: '*H*': 0.88; '*S*': 0.00; 'insert': 0.05; 'output': 0.05; 'meeting,': 0.09; 'reformat': 0.09; 'skip:$ 20': 0.09; '"w")': 0.16; 'columns': 0.16; 'csv': 0.16; 'farm,': 0.16; 'from:addr:mrabarnett.plus.com': 0.16; 'from:addr:python': 0.16; 'from:name:mrab': 0.16; 'headers,': 0.16; 'inserting': 0.16; 'message-id:@mrabarnett.plus.com': 0.16; 'received:192.168.1.4': 0.16; 'restriction,': 0.16; 'wrote:': 0.18; 'trying': 0.19; 'import': 0.22; 'this?': 0.23; 'header:User-Agent:1': 0.23; 'headers': 0.24; 'right.': 0.26; 'meeting': 0.26; 'second': 0.26; 'header:In-Reply-To:1': 0.27; 'point': 0.28; "doesn't": 0.30; 'statement': 0.30; 'getting': 0.31; 'lines': 0.31; "skip:' 10": 0.31; 'too.': 0.31; 'indentation': 0.31; 'race,': 0.31; 'file': 0.32; 'class': 0.32; 'this.': 0.32; 'skip:c 30': 0.32; 'skip:m 30': 0.32; 'third': 0.33; 'something': 0.35; 'next': 0.36; 'should': 0.36; 'skip:o 20': 0.38; 'to:addr:python-list': 0.38; 'to:addr:python.org': 0.39; 'how': 0.40; 'number,': 0.60; 'then,': 0.60; 'skip:o 30': 0.61; 'new': 0.61; 'entire': 0.61; 'first': 0.61; 'field': 0.63; 'skip:n 10': 0.64; 'grab': 0.64; 'more': 0.64; 'sample': 0.67; 'date,': 0.68; 'skip:w 30': 0.69; 'skip:$ 10': 0.81; "'with'": 0.84; '95,': 0.84; 'benchmark': 0.84; 'fourth': 0.84; 'seventh': 0.84; 'sixth': 0.84; 'ninth': 0.91; 'sex': 0.93; 'race': 0.95 X-CM-Score: 0.00 X-CNFS-Analysis: v=2.1 cv=OZcWD3jY c=1 sm=1 tr=0 a=0nF1XD0wxitMEM03M9B4ZQ==:117 a=0nF1XD0wxitMEM03M9B4ZQ==:17 a=0Bzu9jTXAAAA:8 a=u9EReRu7m0cA:10 a=U-js5S2BNKoA:10 a=ihvODaAuJD4A:10 a=IkcTkHD0fZMA:10 a=EBOSESyhAAAA:8 a=l8UqG0samyPLNkA0YAQA:9 a=fUfJr2ODK9cwNxdC:21 a=--y49a559n4AdfE6:21 a=QEXdDO2ut3YA:10 X-AUTH: mrabarnett:2500 Date: Tue, 01 Jul 2014 15:32:23 +0100 From: MRAB User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: python-list@python.org Subject: Re: fixing an horrific formatted csv file. References: <47e2e29d-b5c3-4aa6-abf9-3b1e46eb0dec@googlegroups.com> In-Reply-To: <47e2e29d-b5c3-4aa6-abf9-3b1e46eb0dec@googlegroups.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 71 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1404225146 news.xs4all.nl 2872 [2001:888:2000:d::a6]:47545 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:73791 On 2014-07-01 15:04, flebber wrote: > What I am trying to do is to reformat a csv file into something more usable. > currently the file has no headers, multiple lines with varying columns that are not related. > > This is a sample > > Meeting,05/07/14,RHIL,Rosehill Gardens,Weights,TAB,+3m Entire Circuit, , > Race,1,CIVIC STAKES,CIVIC,CIVIC,1350,~ ,3U ,~ ,QLT ,54,0,0,5/07/2014,, , , , ,No class restriction, Quality, For Three-Years-Old and Upwards, No sex restriction, (Listed),Of $100000. First $60000, second $20000, third $10000, fourth $5000, fifth $2000, sixth $1000, seventh $1000, eighth $1000 > Horse,1,Bennetta,0,"Grahame Begg",Randwick,,0,0,16-3-1-3 $390450.00,,0,0,0,,98.00,M, > Horse,2,Breakfast in Bed,0,"David Vandyke",Warwick Farm,,0,0,20-6-1-5 $201250.00,,0,0,0,,81.00,M, > Horse,3,Capital Commander,0,"Gerald Ryan",Rosehill,,0,0,43-9-9-3 $438625.00,,0,0,0,,85.00,M, > Horse,4,Coup Ay Tee (NZ),0,"Chris Waller",Rosehill,,0,0,35-9-6-5 $519811.00,,0,0,0,,101.00,G, > Horse,5,Generalife,0,"John O'Shea",Warwick Farm,,0,0,19-6-1-3 $235045.00,,0,0,0,,87.00,G, > Horse,6,He's Your Man (FR),0,"Chris Waller",Rosehill,,0,0,13-2-3-1 $108110.00,,0,0,0,,93.00,G, > Horse,7,Hidden Kisses,0,"Chris Waller",Rosehill,,0,0,40-8-8-5 $565750.00,,0,0,0,,96.00,M, > Horse,8,Oakfield Commands,0,"Gerald Ryan",Rosehill,,0,0,22-7-4-6 $269530.00,,0,0,0,,94.00,G, > Horse,9,Taxmeifyoucan,0,"Gregory Hickman",Warwick Farm,,0,0,18-2-4-4 $539730.00,,0,0,0,,91.00,G, > Horse,10,The Peak,0,"Bart & James Cummings",Randwick,,0,0,15-6-1-0 $426732.00,,0,0,0,,95.00,G, > Horse,11,Tougher Than Ever (NZ),0,"Chris Waller",Rosehill,,0,0,17-3-2-3 $321613.00,,0,0,0,,97.00,H, > Horse,12,TROMSO,0,"Chris Waller",Rosehill,,0,0,47-8-11-2 $622300.00,,0,0,0,,103.00,G, > Race,2,FLYING WELTER - BENCHMARK 95 HCP,BM95,BM95,1100,BM95 ,3U ,~ ,HCP ,54,0,0,5/07/2014,, , , , ,BenchMark 95, Handicap, For Three-Years-Old and Upwards, No sex restriction,Of $85000. First $48750, second $16750, third $8350, fourth $4150, fifth $2000, sixth $1000, seventh $1000, eighth $1000, ninth $1000, tenth $1000 > Horse,1,Big Bonanza,0,"Don Robb",Wyong,,0,57.5,31-9-4-3 $366860.00,,0,0,0,,92.00,G, > Horse,2,Casual Choice,0,"Joseph Pride",Warwick Farm,,0,54,8-2-3-0 $105930.00,,0,0,0, > > So what I am trying to so is end up with an output like this. > > Meeting, Date, Race, Number, Name, Trainer, Location > Rosehill, 05/07/14, 1, 1,Bennetta,"Grahame Begg",Randwick, > Rosehill, 05/07/14, 1, 2,Breakfast in Bed,"David Vandyke",Warwick Farm, > > So as a start i thought i would try inserting the Meeting and Race number however I am just not getting it right. > > import csv > > outfile = open("/home/sayth/Scripts/cleancsv.csv", "w") > with open('/home/sayth/Scripts/test.csv') as f: > f_csv = csv.reader(f) > headers = next(f_csv) > for row in f_csv: > meeting = row[3] in row[0] == 'Meeting' > new = row.insert(0, meeting) > while row[1] in row[0] == 'Race' < 9: # pref less than next found row[0] > > # grab row[1] as id number > id = row[1] > # from row[0] and insert it in first position > new_lines = new.insert(1, id) > outfile.write(new_lines) > outfile.close() > > How should I go about this? > There's no point in reading the first row as the headers because it clearly doesn't contain just the headings. First write a row for the header. Then, for each row: If the first field is 'Meeting', then remember the meeting, etc. If the first field is 'Race', then remember the race, etc. If the first field is 'Horse', then write the row with the additional fields for race, etc. And so on. BTW, the indentation for the 'outfile.close()' line is wrong. It would, of course, be better to use the 'with' statement for that file too.