Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Peter Otten <__peter__@web.de> Newsgroups: comp.lang.python Subject: Re: Considering migrating to Python from Visual Basic 6 for engineering applications Date: Fri, 19 Feb 2016 15:58:13 +0100 Organization: None Lines: 103 Message-ID: References: <90cc50d2-1ce5-4588-9bfd-a49d439f00dd@googlegroups.com> <14c75a68-0d2e-45cc-8d73-0d71b6a6aea6@googlegroups.com> <9e57761f-26e1-41c5-8e71-23800de1fdd3@googlegroups.com> <20160219074057.16b9eddb@bigbox.christie.dr> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7Bit X-Trace: news.uni-berlin.de 4IO2n2SaAyKCeZVbjRfA/QW0qN4hhHslRf4rXi/ttf1Q== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'else:': 0.03; 'elif': 0.04; 'subject:Python': 0.05; 'none:': 0.05; 'python3': 0.05; 'bits': 0.07; 'valueerror:': 0.07; '%r"': 0.09; 'mess': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'subject:Visual': 0.09; 'thread': 0.10; 'read.': 0.13; 'def': 0.13; '"an': 0.16; '"f":': 0.16; '\'"\'': 0.16; '(2,': 0.16; '(3,': 0.16; '1):': 0.16; '24,': 0.16; '<--': 0.16; 'determining': 0.16; 'float"': 0.16; 'int"': 0.16; 'line.split()': 0.16; 'naive': 0.16; 'quoted': 0.16; 'received:80.91.229.3': 0.16; 'received:dip0.t-ipconnect.de': 0.16; 'received:io': 0.16; 'received:plane.gmane.org': 0.16; 'received:psf.io': 0.16; 'received:t-ipconnect.de': 0.16; 'rough': 0.16; 'wrote:': 0.16; 'result,': 0.18; ';-)': 0.18; 'int,': 0.22; 'parse': 0.22; 'parser': 0.22; 'parsing': 0.22; 'space.': 0.22; 'bit': 0.23; 'import': 0.24; '(most': 0.24; 'tim': 0.24; 'module': 0.25; 'header:User-Agent:1': 0.26; 'example': 0.26; 'header:X -Complaints-To:1': 0.26; 'skip:" 20': 0.26; 'figure': 0.27; 'least': 0.27; 'values': 0.28; 'cat': 0.29; 'chase': 0.29; 'faster,': 0.29; 'str': 0.29; 'character': 0.29; 'convert': 0.29; 'raise': 0.29; 'code': 0.30; '15,': 0.30; "can't": 0.32; 'controls': 0.33; 'traceback': 0.33; 'file': 0.34; 'gives': 0.35; 'false': 0.35; 'item': 0.35; 'but': 0.36; 'too': 0.36; 'to:addr :python-list': 0.36; 'subject:: ': 0.37; 'received:org': 0.37; '(with': 0.38; 'skip:p 20': 0.38; 'goes': 0.39; 'data': 0.39; 'does': 0.39; 'subject:from': 0.39; 'enough': 0.39; 'to:addr:python.org': 0.40; 'received:de': 0.40; 'some': 0.40; 'easy': 0.60; 'your': 0.60; 'email addr:gmail.com': 0.62; 'are?': 0.84; 'converters': 0.84; 'float,': 0.84 X-Injected-Via-Gmane: http://gmane.org/ X-Gmane-NNTP-Posting-Host: p57bd8ae6.dip0.t-ipconnect.de User-Agent: KNode/4.13.3 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.21rc2 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:103197 Tim Chase wrote: > On 2016-02-19 02:47, wrong.address.1@gmail.com wrote: >> 2 12.657823 0.1823467E-04 114 0 >> 3 4 5 9 11 >> "Lower" >> 278.15 >> >> Is it straightforward to read this, or does one have to read one >> character at a time and then figure out what the numbers are? -- > > It's easy to read. What you do with that mess of data is the complex > part. They come in as byte-strings, but you'd have to convert them > to the corresponding formats: > > from shlex import shlex > USE_LEX = True # False > with open('data.txt') as f: > for i, line in enumerate(f, 1): > if USE_LEX: > bits = shlex(line) > else: > bits = line.split() > for j, bit in enumerate(bits, 1): > if bit.isdigit(): > result = int(bit) > t = "an int" > elif '"' in bit: > result = bit > t = "a string" > else: > result = float(bit) > t = "a float" > print("On line %i I think that item %i %r is %s: %r" % ( > i, > j, > bit, > t, > result, > )) > > The USE_LEX controls whether the example code uses string-splitting > on white-space, or uses the built-in "shlex" module to parse for > quoted strings that might contain a space. The naive way of > string-splitting will be faster, but choke on string-data containing > spaces. > > You'd have to make up your own heuristics for determining what type > each data "bit" is, parsing it out (with int(), float() or whatever), > but the above gives you some rough ideas with at least one known > bug/edge-case. Or just tell the parser what to expect: $ cat read_data_shlex2.py import shlex CONVERTERS = { "i": int, "f": float, "s": str } def parse_line(types, line=None, file=None): if line is None: line = file.readline() values = shlex.split(line) if len(values) != len(types): raise ValueError("Too few/many values %r <-- %r" % (types, values)) return tuple(CONVERTERS[t](v) for t, v in zip(types, values)) with open("data.txt") as f: print(parse_line("iffii", file=f)) print(parse_line("iiiii", file=f)) print(parse_line("s", file=f)) print(parse_line("fsi", file=f)) print(parse_line("ff", file=f)) $ cat data.txt 2 12.657823 0.1823467E-04 114 0 3 4 5 9 11 "Lower" 1.2 "foo \" bar \\ baz" 42 278.15 $ python3 read_data_shlex2.py (2, 12.657823, 1.823467e-05, 114, 0) (3, 4, 5, 9, 11) ('Lower',) (1.2, 'foo " bar \\ baz', 42) Traceback (most recent call last): File "read_data_shlex2.py", line 24, in print(parse_line("ff", file=f)) File "read_data_shlex2.py", line 15, in parse_line raise ValueError("Too few/many values %r <-- %r" % (types, values)) ValueError: Too few/many values 'ff' <-- ['278.15'] $ But we can't do *all* the work for you ;-) If this thread goes long enough eventually we will ;)