Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!news.albasani.net!nuzba.szn.dk!pnx.dk!amsnews11.chello.com!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Date: Tue, 20 Dec 2011 21:03:21 +0100
From: =?UTF-8?B?SsOpcsO0bWU=?= <jerome@jolimont.fr>
To: python-list@python.org
Subject: Re: Text Processing
In-Reply-To: <209c2abf-dd56-4a7f-839b-fad92920d457@m7g2000vbc.googlegroups.com>
References: <209c2abf-dd56-4a7f-839b-fad92920d457@m7g2000vbc.googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.3879.1324411263.27778.python-list@python.org>
Lines: 70
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:17608

Tue, 20 Dec 2011 11:17:15 -0800 (PST)
Yigit Turgut a =C3=A9crit:

> Hi all,
>=20
> I have a text file containing such data ;
>=20
>         A                B                C
> -------------------------------------------------------
> -2.0100e-01    8.000e-02    8.000e-05
> -2.0000e-01    0.000e+00   4.800e-04
> -1.9900e-01    4.000e-02    1.600e-04
>=20
> But I only need Section B, and I need to change the notation to ;
>=20
> 8.000e-02 =3D 0.08
> 0.000e+00 =3D 0.00
> 4.000e-02 =3D 0.04
>=20
> Text file is approximately 10MB in size. I looked around to see if
> there is a quick and dirty workaround but there are lots of modules,
> lots of options.. I am confused.
>=20
> Which module is most suitable for this task ?

You could try to do it yourself.

You'd need to know what seperates the datas. Tabulation character ? Spaces ?

Exemple :

Input file
----------

        A                B                C
-------------------------------------------------------
-2.0100e-01    8.000e-02    8.000e-05
-2.0000e-01    0.000e+00    4.800e-04
-1.9900e-01    4.000e-02    1.600e-04


Python code
-----------

# Open file
with open('test1.plt','r') as f:

    b_values =3D []
   =20
    # skip as many lines as needed
    line =3D f.readline()
    line =3D f.readline()
    line =3D f.readline()

    while line:
        #start =3D line.find(u"\u0009", 0) + 1   #seek Tab
        start =3D line.find("   ", 0) + 4        #seek 4 spaces
        #end =3D line.find(u"\u0009", start)
        end =3D line.find("   ", start)
        b_values.append(float(line[start:end].strip()))
        line =3D f.readline()

    print b_values

It gets trickier if the amount of spaces is not constant. I would then try
with regular expressions. Perhaps would regexp be more efficient in any cas=
e.

--=20
J=C3=A9r=C3=B4me