Path: csiph.com!usenet.pasdenom.info!news.albasani.net!rt.uk.eu.org!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'essentially': 0.04; 'output': 0.05; 'url:pipermail': 0.05; 'distributing': 0.07; 'failing': 0.07; 'lines,': 0.07; 'transform': 0.07; 'derived': 0.09; 'name?': 0.09; "system's": 0.09; 'python': 0.11; 'def': 0.12; "'''": 0.16; '(),': 0.16; '(file': 0.16; '(self):': 0.16; 'command,': 0.16; 'csv': 0.16; 'element.': 0.16; 'list)': 0.16; 'needless': 0.16; 'received:195.186': 0.16; 'received:bluewin.ch': 0.16; 'scratch': 0.16; 'sequence.': 0.16; 'table)': 0.16; 'timestamps': 0.16; 'usable': 0.16; 'write,': 0.16; 'www': 0.16; 'prevent': 0.16; 'wrote:': 0.18; 'library': 0.18; 'written': 0.21; '>>>': 0.22; 'input': 0.22; 'example': 0.22; 'hack': 0.22; 'handles': 0.22; 'previously': 0.22; 'proposed': 0.22; 'print': 0.22; 'header:User-Agent:1': 0.23; 'text,': 0.24; 'sort': 0.25; 'mention': 0.26; 'post': 0.26; 'distribute': 0.26; 'header:In- Reply-To:1': 0.27; 'testing': 0.29; 'feature': 0.29; 'am,': 0.29; 'quickly': 0.29; 'skip:( 20': 0.30; 'code': 0.31; 'lines': 0.31; 'that.': 0.31; 'too.': 0.31; 'bad.': 0.31; 'equivalent.': 0.31; 'file:': 0.31; 'fixing': 0.31; 'unique,': 0.31; 'file': 0.32; 'class': 0.32; 'run': 0.32; 'another': 0.32; 'quite': 0.32; 'text': 0.33; 'worked': 0.33; 'url:python': 0.33; 'ago': 0.33; 'entirely': 0.33; 'framework': 0.33; 'reader': 0.33; 'table': 0.34; "i'd": 0.34; 'could': 0.34; 'display': 0.35; 'something': 0.35; 'etc.)': 0.35; 'objects': 0.35; 'building': 0.35; 'in:': 0.36; 'done': 0.36; 'url:org': 0.36; 'should': 0.36; 'list': 0.37; 'growing': 0.38; 'checks': 0.38; 'to:addr:python-list': 0.38; 'list,': 0.38; 'does': 0.39; 'functional': 0.39; 'itself': 0.39; 'pdf': 0.39; 'to:addr:python.org': 0.39; 'url:mail': 0.40; 'how': 0.40; 'skip:u 10': 0.60; 'chain': 0.60; 'consists': 0.60; 'new': 0.61; 'took': 0.61; 'matter': 0.61; 'first': 0.61; 'making': 0.63; 'name': 0.63; 'such': 0.63; 'day.': 0.63; 'more': 0.64; 'skip:r 30': 0.69; 'qualified': 0.72; '(url)': 0.84; 'all!': 0.84; 'database:': 0.84; 'faster.': 0.84; 'horrible': 0.84; 'race': 0.95 Date: Wed, 02 Jul 2014 17:51:05 +0200 From: "F.R." User-Agent: Mozilla/5.0 (X11; Linux i686; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: python-list@python.org Subject: Re: fixing an horrific formatted csv file. References: <47e2e29d-b5c3-4aa6-abf9-3b1e46eb0dec@googlegroups.com> <0d3871c6-81d4-4168-9408-ad85299b0955@googlegroups.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 90 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1404316334 news.xs4all.nl 2937 [2001:888:2000:d::a6]:59977 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:73840 On 07/02/2014 11:13 AM, flebber wrote: >>>>> TM = TX.Table_Maker (headings = >> ('Meeting','Date','Race','Number','Name','Trainer','Location')) >>>>> TM (race_table (your_csv_text)).write () > Where do I find TX? Found this mention in the list, was it available in pip by any name? > https://mail.python.org/pipermail/python-list/2014-February/667464.html > > Sayth I'd have to make it available. I proposed it some time ago and received a couple of suggestions in return. It is a modular transformation framework written entirely in python (2.7). It consists essentially of a base class "Transformer" that handles input and output in such a way that Transformer objects can be chained. It saved me from drowning an a horrible and growing tangle of hacks. Finding something usable I had previously done took time. Understanding how it worked took more time and adapting it took still more time, so that writing yet another hack from scratch was faster. A number of hacks I could quickly wrap into a Transformer object and so could start building a library of standard Transformers. The Table_Maker is one of them. The table making code is quite bad. It suffers from feature overload. I would clean it up for distribution. I'd be happy to distribute the base class and a few standard Translators, such as I use every day. (File Reader, File Writer, DB Run Command, DB Write, Table Maker, PDF To Text, Text To Lines, Lines To Text, Sort, Sort And Unique, etc.) Writing one's own Transformers is a breeze. Testing too, because a Transformer keeps its input and output and, in line with the system's design philosophy, does only its own single thing. A Chain is a list of Transformers that run in sequence. It is itself derived from Transformer and is a functional equivalent. So Chains nest. Fixing a Chain that nothing comes out of is a straightforward matter too. It will still have run up to the failing element. Chain.show () reveals the culprit as the first one to have no output. I am not up to date on distributing and would depend on qualified help on that. Frederic -------------------------------------------------------------------------------- A brief overview The TX solution to your race table would be (TX is the name of the module): class Race_Table (TX.Transformer): ''' In: CSV text Out: Tabular data (2-dimensional list) ''' name = 'Race_Table' @TX.setup # Checks timestamps to prevent needless reruns in the absence of new input def transform (self): for line in self.Input.data: # See my post self.Output.take (output_table) Example file to file: >>> Race_Schedule_F2F = TX.Chain (TX.File_Reader (), Race_Table (), TX.List_To_CSV (delimiter = ';'), TX.File_Writer (terminal = out_file_name) >>> Race_Schedule_F2F (input_file_name) # Does it all! Example web to database: >>> Race_Schedule_WWW2DB = TX.Chain (TX.WWW_Reader (), Race_Schedule_HTML_Reader (), Race_Table (), TX.DB_Writer (table_name = 'horses')) >>> Race_Schedule_WWW2DB (url) # Does is all! You'd have to write the Race_Schedule_HTML_Reader Verify your table: >>> Table_Viewer = TX.Chain (TX.Table_Maker (), TX.Table_Writer ()) >>> Race_Schedule_WWW2DB.show_tree () # See which one should display Chain Chain[0] - WWW Reader Chain[1] - Race_Schedule_HTML_Reader Chain[2] - Race_Table Chain[3] - DB Writer >>> print Table_Viewer (Race_Schedule_WWW2DB[2]()) # All Transformers keep their data (Display of table) Verify database: >>> print Table_Viewer (TX.DB_Reader (table_name = 'horses')()) (Display of database table)