Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!rt.uk.eu.org!newsfeed.xs4all.nl!newsfeed4a.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.019 X-Spam-Evidence: '*H*': 0.96; '*S*': 0.00; 'elif': 0.05; 'insert': 0.05; 'output': 0.05; 'indices': 0.07; 'pride': 0.07; '[0]': 0.09; 'don': 0.09; 'meeting,': 0.09; 'reformat': 0.09; 'skip:$ 20': 0.09; 'warwick': 0.09; 'def': 0.12; '"w")': 0.16; '[5]': 0.16; 'columns': 0.16; 'csv': 0.16; 'cummings': 0.16; 'farm,': 0.16; 'headers,': 0.16; 'inserting': 0.16; 'peak': 0.16; 'received:195.186': 0.16; 'received:bluewin.ch': 0.16; 'restriction,': 0.16; 'wrote:': 0.18; 'trying': 0.19; 'acquired': 0.19; '>>>': 0.22; 'import': 0.22; 'print': 0.22; 'this?': 0.23; 'header:User-Agent:1': 0.23; 'example.': 0.24; 'headers': 0.24; 'possibly': 0.26; 'right.': 0.26; 'meeting': 0.26; 'second': 0.26; 'header:In-Reply-To:1': 0.27; 'record': 0.27; 'point': 0.28; 'chris': 0.29; '[1]': 0.29; '[2]': 0.30; 'skip:( 20': 0.30; 'getting': 0.31; 'lines': 0.31; "skip:' 10": 0.31; 'race,': 0.31; 'skip:- 100': 0.31; 'file': 0.32; 'class': 0.32; 'this.': 0.32; 'figure': 0.32; 'skip:c 30': 0.32; 'skip:m 30': 0.32; 'third': 0.33; 'something': 0.35; "he's": 0.36; 'ryan': 0.36; 'next': 0.36; 'thanks': 0.36; 'should': 0.36; 'skip:o 20': 0.38; 'skip:[ 10': 0.38; 'to:addr:python-list': 0.38; 'pm,': 0.38; "couldn't": 0.39; 'to:addr:python.org': 0.39; 'how': 0.40; 'commands': 0.60; 'number,': 0.60; 'most': 0.60; 'skip:o 30': 0.61; 'new': 0.61; 'entire': 0.61; 'john': 0.61; 'first': 0.61; 'complete': 0.62; 'name': 0.63; 'such': 0.63; 'skip:n 10': 0.64; 'grab': 0.64; 'more': 0.64; 'sample': 0.67; 'date,': 0.68; 'skip:w 30': 0.69; 'capital': 0.73; 'skip:$ 10': 0.81; "'2',": 0.84; "'3',": 0.84; '95,': 0.84; 'benchmark': 0.84; 'fourth': 0.84; 'gardens': 0.84; 'seventh': 0.84; 'sixth': 0.84; 'trainer': 0.84; 'ninth': 0.91; 'sex': 0.93; 'race': 0.95 Date: Tue, 01 Jul 2014 22:49:14 +0200 From: "F.R." User-Agent: Mozilla/5.0 (X11; Linux i686; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: python-list@python.org Subject: Re: fixing an horrific formatted csv file. References: <47e2e29d-b5c3-4aa6-abf9-3b1e46eb0dec@googlegroups.com> In-Reply-To: <47e2e29d-b5c3-4aa6-abf9-3b1e46eb0dec@googlegroups.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 155 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1404247829 news.xs4all.nl 2963 [2001:888:2000:d::a6]:40152 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:73809 On 07/01/2014 04:04 PM, flebber wrote: > What I am trying to do is to reformat a csv file into something more usable. > currently the file has no headers, multiple lines with varying columns that are not related. > > This is a sample > > Meeting,05/07/14,RHIL,Rosehill Gardens,Weights,TAB,+3m Entire Circuit, , > Race,1,CIVIC STAKES,CIVIC,CIVIC,1350,~ ,3U ,~ ,QLT ,54,0,0,5/07/2014,, , , , ,No class restriction, Quality, For Three-Years-Old and Upwards, No sex restriction, (Listed),Of $100000. First $60000, second $20000, third $10000, fourth $5000, fifth $2000, sixth $1000, seventh $1000, eighth $1000 > Horse,1,Bennetta,0,"Grahame Begg",Randwick,,0,0,16-3-1-3 $390450.00,,0,0,0,,98.00,M, > Horse,2,Breakfast in Bed,0,"David Vandyke",Warwick Farm,,0,0,20-6-1-5 $201250.00,,0,0,0,,81.00,M, > Horse,3,Capital Commander,0,"Gerald Ryan",Rosehill,,0,0,43-9-9-3 $438625.00,,0,0,0,,85.00,M, > Horse,4,Coup Ay Tee (NZ),0,"Chris Waller",Rosehill,,0,0,35-9-6-5 $519811.00,,0,0,0,,101.00,G, > Horse,5,Generalife,0,"John O'Shea",Warwick Farm,,0,0,19-6-1-3 $235045.00,,0,0,0,,87.00,G, > Horse,6,He's Your Man (FR),0,"Chris Waller",Rosehill,,0,0,13-2-3-1 $108110.00,,0,0,0,,93.00,G, > Horse,7,Hidden Kisses,0,"Chris Waller",Rosehill,,0,0,40-8-8-5 $565750.00,,0,0,0,,96.00,M, > Horse,8,Oakfield Commands,0,"Gerald Ryan",Rosehill,,0,0,22-7-4-6 $269530.00,,0,0,0,,94.00,G, > Horse,9,Taxmeifyoucan,0,"Gregory Hickman",Warwick Farm,,0,0,18-2-4-4 $539730.00,,0,0,0,,91.00,G, > Horse,10,The Peak,0,"Bart & James Cummings",Randwick,,0,0,15-6-1-0 $426732.00,,0,0,0,,95.00,G, > Horse,11,Tougher Than Ever (NZ),0,"Chris Waller",Rosehill,,0,0,17-3-2-3 $321613.00,,0,0,0,,97.00,H, > Horse,12,TROMSO,0,"Chris Waller",Rosehill,,0,0,47-8-11-2 $622300.00,,0,0,0,,103.00,G, > Race,2,FLYING WELTER - BENCHMARK 95 HCP,BM95,BM95,1100,BM95 ,3U ,~ ,HCP ,54,0,0,5/07/2014,, , , , ,BenchMark 95, Handicap, For Three-Years-Old and Upwards, No sex restriction,Of $85000. First $48750, second $16750, third $8350, fourth $4150, fifth $2000, sixth $1000, seventh $1000, eighth $1000, ninth $1000, tenth $1000 > Horse,1,Big Bonanza,0,"Don Robb",Wyong,,0,57.5,31-9-4-3 $366860.00,,0,0,0,,92.00,G, > Horse,2,Casual Choice,0,"Joseph Pride",Warwick Farm,,0,54,8-2-3-0 $105930.00,,0,0,0, > > So what I am trying to so is end up with an output like this. > > Meeting, Date, Race, Number, Name, Trainer, Location > Rosehill, 05/07/14, 1, 1,Bennetta,"Grahame Begg",Randwick, > Rosehill, 05/07/14, 1, 2,Breakfast in Bed,"David Vandyke",Warwick Farm, > > So as a start i thought i would try inserting the Meeting and Race number however I am just not getting it right. > > import csv > > outfile = open("/home/sayth/Scripts/cleancsv.csv", "w") > with open('/home/sayth/Scripts/test.csv') as f: > f_csv = csv.reader(f) > headers = next(f_csv) > for row in f_csv: > meeting = row[3] in row[0] == 'Meeting' > new = row.insert(0, meeting) > while row[1] in row[0] == 'Race' < 9: # pref less than next found row[0] > > # grab row[1] as id number > id = row[1] > # from row[0] and insert it in first position > new_lines = new.insert(1, id) > outfile.write(new_lines) > outfile.close() > > How should I go about this? > > Thanks > > Sayth Reformatting is what I do most and over time I have acquired some practice. Complete solutions are not often proposed, possibly sneered on for their officiousness. In that case I apologize. I couldn't resist. It is such a nice example. Having solved it, I figure why not share it . . . Frederic ------------------------------------------------------------------------------------------------------------ def race_table (csv_text): input_table = [[item.strip(' "') for item in record.split (',')] for record in csv_text.splitlines ()] # At this point look at input_table to find the record indices output_table = [] for record in input_table: if record [0] == 'Meeting': meeting = record [3] elif record [0] == 'Race': date = record [13] race = record [1] elif record [0] == 'Horse': number = record [1] name = record [2] trainer = record [4] location = record [5] output_table.append ((meeting, date, race, number, name, trainer, location)) return output_table >>> for record in race_table (your_csv_text): print record ('Rosehill Gardens', '5/07/2014', '1', '1', 'Bennetta', 'Grahame Begg', 'Randwick') ('Rosehill Gardens', '5/07/2014', '1', '2', 'Breakfast in Bed', 'David Vandyke', 'Warwick Farm') ('Rosehill Gardens', '5/07/2014', '1', '3', 'Capital Commander', 'Gerald Ryan', 'Rosehill') ('Rosehill Gardens', '5/07/2014', '1', '4', 'Coup Ay Tee (NZ)', 'Chris Waller', 'Rosehill') ('Rosehill Gardens', '5/07/2014', '1', '5', 'Generalife', "John O'Shea", 'Warwick Farm') ('Rosehill Gardens', '5/07/2014', '1', '6', "He's Your Man (FR)", 'Chris Waller', 'Rosehill') ('Rosehill Gardens', '5/07/2014', '1', '7', 'Hidden Kisses', 'Chris Waller', 'Rosehill') ('Rosehill Gardens', '5/07/2014', '1', '8', 'Oakfield Commands', 'Gerald Ryan', 'Rosehill') ('Rosehill Gardens', '5/07/2014', '1', '9', 'Taxmeifyoucan', 'Gregory Hickman', 'Warwick Farm') ('Rosehill Gardens', '5/07/2014', '1', '10', 'The Peak', 'Bart & James Cummings', 'Randwick') ('Rosehill Gardens', '5/07/2014', '1', '11', 'Tougher Than Ever (NZ)', 'Chris Waller', 'Rosehill') ('Rosehill Gardens', '5/07/2014', '1', '12', 'TROMSO', 'Chris Waller', 'Rosehill') ('Rosehill Gardens', '5/07/2014', '2', '1', 'Big Bonanza', 'Don Robb', 'Wyong') ('Rosehill Gardens', '5/07/2014', '2', '2', 'Casual Choice', 'Joseph Pride', 'Warwick Farm') >>> TM = TX.Table_Maker (headings = ('Meeting','Date','Race','Number','Name','Trainer','Location')) >>> TM (race_table (your_csv_text)).write () Meeting | Date | Race | Number | Name | Trainer | Location | Rosehill Gardens | 5/07/2014 | 1 | 1 | Bennetta | Grahame Begg | Randwick | Rosehill Gardens | 5/07/2014 | 1 | 2 | Breakfast in Bed | David Vandyke | Warwick Farm | Rosehill Gardens | 5/07/2014 | 1 | 3 | Capital Commander | Gerald Ryan | Rosehill | Rosehill Gardens | 5/07/2014 | 1 | 4 | Coup Ay Tee (NZ) | Chris Waller | Rosehill | Rosehill Gardens | 5/07/2014 | 1 | 5 | Generalife | John O'Shea | Warwick Farm | Rosehill Gardens | 5/07/2014 | 1 | 6 | He's Your Man (FR) | Chris Waller | Rosehill | Rosehill Gardens | 5/07/2014 | 1 | 7 | Hidden Kisses | Chris Waller | Rosehill | Rosehill Gardens | 5/07/2014 | 1 | 8 | Oakfield Commands | Gerald Ryan | Rosehill | Rosehill Gardens | 5/07/2014 | 1 | 9 | Taxmeifyoucan | Gregory Hickman | Warwick Farm | Rosehill Gardens | 5/07/2014 | 1 | 10 | The Peak | Bart & James Cummings | Randwick | Rosehill Gardens | 5/07/2014 | 1 | 11 | Tougher Than Ever (NZ) | Chris Waller | Rosehill | Rosehill Gardens | 5/07/2014 | 1 | 12 | TROMSO | Chris Waller | Rosehill | Rosehill Gardens | 5/07/2014 | 2 | 1 | Big Bonanza | Don Robb | Wyong | Rosehill Gardens | 5/07/2014 | 2 | 2 | Casual Choice | Joseph Pride | Warwick Farm |