Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail
From: Peter Otten <__peter__@web.de>
Newsgroups: comp.lang.python
Subject: Re: Considering migrating to Python from Visual Basic 6 for engineering applications
Date: Fri, 19 Feb 2016 15:58:13 +0100
Organization: None
Lines: 103
Message-ID: <mailman.50.1455893909.2289.python-list@python.org>
References: <90cc50d2-1ce5-4588-9bfd-a49d439f00dd@googlegroups.com> <mailman.226.1455748107.22075.python-list@python.org> <14c75a68-0d2e-45cc-8d73-0d71b6a6aea6@googlegroups.com> <mailman.243.1455790578.22075.python-list@python.org> <c977991d-a635-4a8f-a0e3-28f436b378f6@googlegroups.com> <CAPTjJmqF2TciSgr=qdyy+9WcWrhYVvo7VCk6s5aOo0t38PpgTw@mail.gmail.com> <mailman.5.1455808876.2289.python-list@python.org> <f136cf1a-b332-485f-889a-ef701ca5993c@googlegroups.com> <mailman.20.1455823868.2289.python-list@python.org> <9e57761f-26e1-41c5-8e71-23800de1fdd3@googlegroups.com> <20160219074057.16b9eddb@bigbox.christie.dr>
Mime-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7Bit
User-Agent: KNode/4.13.3
Precedence: list
Xref: csiph.com comp.lang.python:103197

Tim Chase wrote:

> On 2016-02-19 02:47, wrong.address.1@gmail.com wrote:
>> 2 12.657823 0.1823467E-04 114 0
>> 3 4 5 9 11
>> "Lower"
>> 278.15
>> 
>> Is it straightforward to read this, or does one have to read one
>> character at a time and then figure out what the numbers are? --
> 
> It's easy to read.  What you do with that mess of data is the complex
> part.  They come in as byte-strings, but you'd have to convert them
> to the corresponding formats:
> 
>   from shlex import shlex
>   USE_LEX = True # False
>   with open('data.txt') as f:
>     for i, line in enumerate(f, 1):
>       if USE_LEX:
>         bits = shlex(line)
>       else:
>         bits = line.split()
>       for j, bit in enumerate(bits, 1):
>         if bit.isdigit():
>           result = int(bit)
>           t = "an int"
>         elif '"' in bit:
>           result = bit
>           t = "a string"
>         else:
>           result = float(bit)
>           t = "a float"
>         print("On line %i I think that item %i %r is %s: %r" % (
>           i,
>           j,
>           bit,
>           t,
>           result,
>           ))
> 
> The USE_LEX controls whether the example code uses string-splitting
> on white-space, or uses the built-in "shlex" module to parse for
> quoted strings that might contain a space.  The naive way of
> string-splitting will be faster, but choke on string-data containing
> spaces.
> 
> You'd have to make up your own heuristics for determining what type
> each data "bit" is, parsing it out (with int(), float() or whatever),
> but the above gives you some rough ideas with at least one known
> bug/edge-case.  

Or just tell the parser what to expect:

$ cat read_data_shlex2.py
import shlex

CONVERTERS = {
    "i": int,
    "f": float,
    "s": str
}


def parse_line(types, line=None, file=None):
    if line is None:
        line = file.readline()
    values = shlex.split(line)
    if len(values) != len(types):
        raise ValueError("Too few/many values %r <-- %r" % (types, values))
    return tuple(CONVERTERS[t](v) for t, v in zip(types, values))


with open("data.txt") as f:
    print(parse_line("iffii", file=f))
    print(parse_line("iiiii", file=f))
    print(parse_line("s", file=f))
    print(parse_line("fsi", file=f))
    print(parse_line("ff", file=f))
$ cat data.txt
2 12.657823 0.1823467E-04 114 0
3 4 5 9 11
"Lower"
1.2 "foo \" bar \\ baz" 42
278.15
$ python3 read_data_shlex2.py 
(2, 12.657823, 1.823467e-05, 114, 0)
(3, 4, 5, 9, 11)
('Lower',)
(1.2, 'foo " bar \\ baz', 42)
Traceback (most recent call last):
  File "read_data_shlex2.py", line 24, in <module>
    print(parse_line("ff", file=f))
  File "read_data_shlex2.py", line 15, in parse_line
    raise ValueError("Too few/many values %r <-- %r" % (types, values))
ValueError: Too few/many values 'ff' <-- ['278.15']
$ 


But we can't do *all* the work for you ;-)

If this thread goes long enough eventually we will ;)