Groups > comp.lang.python > #17602 > unrolled thread

Text Processing

Started by	Yigit Turgut <y.turgut@gmail.com>
First post	2011-12-20 11:17 -0800
Last post	2011-12-22 03:11 -0800
Articles	6 — 5 participants

Back to article view | Back to comp.lang.python

  Text Processing Yigit Turgut <y.turgut@gmail.com> - 2011-12-20 11:17 -0800
    Re: Text Processing Dave Angel <d@davea.name> - 2011-12-20 14:57 -0500
    Re: Text Processing Jérôme <jerome@jolimont.fr> - 2011-12-20 21:03 +0100
    Re: Text Processing Nick Dokos <nicholas.dokos@hp.com> - 2011-12-20 16:04 -0500
    Re: Text Processing Alexander Kapps <alex.kapps@web.de> - 2011-12-21 01:01 +0100
      Re: Text Processing Yigit Turgut <y.turgut@gmail.com> - 2011-12-22 03:11 -0800

#17602 — Text Processing

From	Yigit Turgut <y.turgut@gmail.com>
Date	2011-12-20 11:17 -0800
Subject	Text Processing
Message-ID	<209c2abf-dd56-4a7f-839b-fad92920d457@m7g2000vbc.googlegroups.com>

Hi all,

I have a text file containing such data ;

        A                B                C
-------------------------------------------------------
-2.0100e-01    8.000e-02    8.000e-05
-2.0000e-01    0.000e+00   4.800e-04
-1.9900e-01    4.000e-02    1.600e-04

But I only need Section B, and I need to change the notation to ;

8.000e-02 = 0.08
0.000e+00 = 0.00
4.000e-02 = 0.04

Text file is approximately 10MB in size. I looked around to see if
there is a quick and dirty workaround but there are lots of modules,
lots of options.. I am confused.

Which module is most suitable for this task ?

[toc] | [next] | [standalone]

#17607

From	Dave Angel <d@davea.name>
Date	2011-12-20 14:57 -0500
Message-ID	<mailman.3878.1324411044.27778.python-list@python.org>
In reply to	#17602

On 12/20/2011 02:17 PM, Yigit Turgut wrote:
> Hi all,
>
> I have a text file containing such data ;
>
>          A                B                C
> -------------------------------------------------------
> -2.0100e-01    8.000e-02    8.000e-05
> -2.0000e-01    0.000e+00   4.800e-04
> -1.9900e-01    4.000e-02    1.600e-04
>
> But I only need Section B, and I need to change the notation to ;
>
> 8.000e-02 = 0.08
> 0.000e+00 = 0.00
> 4.000e-02 = 0.04
>
> Text file is approximately 10MB in size. I looked around to see if
> there is a quick and dirty workaround but there are lots of modules,
> lots of options.. I am confused.
>
> Which module is most suitable for this task ?
You probably don't need anything but sys (to parse the command options) 
and os (maybe).

open the file
     for eachline
         if one of the header lines, continue
         separate out the part you want
         print it, formatted as you like

Then just run the script with its stdout redirected, and you've got your 
new file

The details depend on what your experience with Python is, and what 
version of Python you're running.

-- 

DaveA

[toc] | [prev] | [next] | [standalone]

#17608

From	Jérôme <jerome@jolimont.fr>
Date	2011-12-20 21:03 +0100
Message-ID	<mailman.3879.1324411263.27778.python-list@python.org>
In reply to	#17602

Tue, 20 Dec 2011 11:17:15 -0800 (PST)
Yigit Turgut a écrit:

> Hi all,
> 
> I have a text file containing such data ;
> 
>         A                B                C
> -------------------------------------------------------
> -2.0100e-01    8.000e-02    8.000e-05
> -2.0000e-01    0.000e+00   4.800e-04
> -1.9900e-01    4.000e-02    1.600e-04
> 
> But I only need Section B, and I need to change the notation to ;
> 
> 8.000e-02 = 0.08
> 0.000e+00 = 0.00
> 4.000e-02 = 0.04
> 
> Text file is approximately 10MB in size. I looked around to see if
> there is a quick and dirty workaround but there are lots of modules,
> lots of options.. I am confused.
> 
> Which module is most suitable for this task ?

You could try to do it yourself.

You'd need to know what seperates the datas. Tabulation character ? Spaces ?

Exemple :

Input file
----------

        A                B                C
-------------------------------------------------------
-2.0100e-01    8.000e-02    8.000e-05
-2.0000e-01    0.000e+00    4.800e-04
-1.9900e-01    4.000e-02    1.600e-04


Python code
-----------

# Open file
with open('test1.plt','r') as f:

    b_values = []
    
    # skip as many lines as needed
    line = f.readline()
    line = f.readline()
    line = f.readline()

    while line:
        #start = line.find(u"\u0009", 0) + 1   #seek Tab
        start = line.find("   ", 0) + 4        #seek 4 spaces
        #end = line.find(u"\u0009", start)
        end = line.find("   ", start)
        b_values.append(float(line[start:end].strip()))
        line = f.readline()

    print b_values

It gets trickier if the amount of spaces is not constant. I would then try
with regular expressions. Perhaps would regexp be more efficient in any case.

-- 
Jérôme

[toc] | [prev] | [next] | [standalone]

#17623

From	Nick Dokos <nicholas.dokos@hp.com>
Date	2011-12-20 16:04 -0500
Message-ID	<mailman.3885.1324418997.27778.python-list@python.org>
In reply to	#17602

Jérôme <jerome@jolimont.fr> wrote:

> Tue, 20 Dec 2011 11:17:15 -0800 (PST)
> Yigit Turgut a écrit:
> 
> > Hi all,
> > 
> > I have a text file containing such data ;
> > 
> >         A                B                C
> > -------------------------------------------------------
> > -2.0100e-01    8.000e-02    8.000e-05
> > -2.0000e-01    0.000e+00   4.800e-04
> > -1.9900e-01    4.000e-02    1.600e-04
> > 
> > But I only need Section B, and I need to change the notation to ;
> > 
> > 8.000e-02 = 0.08
> > 0.000e+00 = 0.00
> > 4.000e-02 = 0.04
> > 
> > Text file is approximately 10MB in size. I looked around to see if
> > there is a quick and dirty workaround but there are lots of modules,
> > lots of options.. I am confused.
> > 
> > Which module is most suitable for this task ?
> 
> You could try to do it yourself.
> 

Does it have to be python? If not, I'd go with something similar to

   sed 1,2d foo.data | awk '{printf("%.2f\n", $2);}'

Nick

[toc] | [prev] | [next] | [standalone]

#17626

From	Alexander Kapps <alex.kapps@web.de>
Date	2011-12-21 01:01 +0100
Message-ID	<mailman.3887.1324425704.27778.python-list@python.org>
In reply to	#17602

On 20.12.2011 22:04, Nick Dokos wrote:

>>> I have a text file containing such data ;
>>>
>>>          A                B                C
>>> -------------------------------------------------------
>>> -2.0100e-01    8.000e-02    8.000e-05
>>> -2.0000e-01    0.000e+00   4.800e-04
>>> -1.9900e-01    4.000e-02    1.600e-04
>>>
>>> But I only need Section B, and I need to change the notation to ;
>>>
>>> 8.000e-02 = 0.08
>>> 0.000e+00 = 0.00
>>> 4.000e-02 = 0.04

> Does it have to be python? If not, I'd go with something similar to
>
>     sed 1,2d foo.data | awk '{printf("%.2f\n", $2);}'
>

Why sed and awk:

awk 'NR>2 {printf("%.2f\n", $2);}' data.txt

And in Python:

f = open("data.txt")
f.readline()	# skip header
f.readline()	# skip header
for line in f:
     print "%02s" % float(line.split()[1])

[toc] | [prev] | [next] | [standalone]

#17724

From	Yigit Turgut <y.turgut@gmail.com>
Date	2011-12-22 03:11 -0800
Message-ID	<6ff3a578-5a77-4e31-8e54-e29c46299e3c@f1g2000yqi.googlegroups.com>
In reply to	#17626

On Dec 21, 2:01 am, Alexander Kapps <alex.ka...@web.de> wrote:
> On 20.12.2011 22:04, Nick Dokos wrote:
>
>
>
>
>
>
>
>
>
> >>> I have a text file containing such data ;
>
> >>>          A                B                C
> >>> -------------------------------------------------------
> >>> -2.0100e-01    8.000e-02    8.000e-05
> >>> -2.0000e-01    0.000e+00   4.800e-04
> >>> -1.9900e-01    4.000e-02    1.600e-04
>
> >>> But I only need Section B, and I need to change the notation to ;
>
> >>> 8.000e-02 = 0.08
> >>> 0.000e+00 = 0.00
> >>> 4.000e-02 = 0.04
> > Does it have to be python? If not, I'd go with something similar to
>
> >     sed 1,2d foo.data | awk '{printf("%.2f\n", $2);}'
>
> Why sed and awk:
>
> awk 'NR>2 {printf("%.2f\n", $2);}' data.txt
>
> And in Python:
>
> f = open("data.txt")
> f.readline()    # skip header
> f.readline()    # skip header
> for line in f:
>      print "%02s" % float(line.split()[1])

@Jerome ; Your suggestion provided floating point error, it might need
some slight modificiation.

@Nick ; Sorry mate, it needs to be in Python. But I noted solution in
case if I need for another case.

@Alexander ; Works as expected.

Thank you all for the replies.

[toc] | [prev] | [standalone]

csiph-web

Text Processing

Contents

#17602 — Text Processing

#17607

#17608

#17623

#17626

#17724