Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #17602 > unrolled thread

Text Processing

Started byYigit Turgut <y.turgut@gmail.com>
First post2011-12-20 11:17 -0800
Last post2011-12-22 03:11 -0800
Articles 6 — 5 participants

Back to article view | Back to comp.lang.python


Contents

  Text Processing Yigit Turgut <y.turgut@gmail.com> - 2011-12-20 11:17 -0800
    Re: Text Processing Dave Angel <d@davea.name> - 2011-12-20 14:57 -0500
    Re: Text Processing Jérôme <jerome@jolimont.fr> - 2011-12-20 21:03 +0100
    Re: Text Processing Nick Dokos <nicholas.dokos@hp.com> - 2011-12-20 16:04 -0500
    Re: Text Processing Alexander Kapps <alex.kapps@web.de> - 2011-12-21 01:01 +0100
      Re: Text Processing Yigit Turgut <y.turgut@gmail.com> - 2011-12-22 03:11 -0800

#17602 — Text Processing

FromYigit Turgut <y.turgut@gmail.com>
Date2011-12-20 11:17 -0800
SubjectText Processing
Message-ID<209c2abf-dd56-4a7f-839b-fad92920d457@m7g2000vbc.googlegroups.com>
Hi all,

I have a text file containing such data ;

        A                B                C
-------------------------------------------------------
-2.0100e-01    8.000e-02    8.000e-05
-2.0000e-01    0.000e+00   4.800e-04
-1.9900e-01    4.000e-02    1.600e-04

But I only need Section B, and I need to change the notation to ;

8.000e-02 = 0.08
0.000e+00 = 0.00
4.000e-02 = 0.04

Text file is approximately 10MB in size. I looked around to see if
there is a quick and dirty workaround but there are lots of modules,
lots of options.. I am confused.

Which module is most suitable for this task ?

[toc] | [next] | [standalone]


#17607

FromDave Angel <d@davea.name>
Date2011-12-20 14:57 -0500
Message-ID<mailman.3878.1324411044.27778.python-list@python.org>
In reply to#17602
On 12/20/2011 02:17 PM, Yigit Turgut wrote:
> Hi all,
>
> I have a text file containing such data ;
>
>          A                B                C
> -------------------------------------------------------
> -2.0100e-01    8.000e-02    8.000e-05
> -2.0000e-01    0.000e+00   4.800e-04
> -1.9900e-01    4.000e-02    1.600e-04
>
> But I only need Section B, and I need to change the notation to ;
>
> 8.000e-02 = 0.08
> 0.000e+00 = 0.00
> 4.000e-02 = 0.04
>
> Text file is approximately 10MB in size. I looked around to see if
> there is a quick and dirty workaround but there are lots of modules,
> lots of options.. I am confused.
>
> Which module is most suitable for this task ?
You probably don't need anything but sys (to parse the command options) 
and os (maybe).

open the file
     for eachline
         if one of the header lines, continue
         separate out the part you want
         print it, formatted as you like

Then just run the script with its stdout redirected, and you've got your 
new file

The details depend on what your experience with Python is, and what 
version of Python you're running.

-- 

DaveA

[toc] | [prev] | [next] | [standalone]


#17608

FromJérôme <jerome@jolimont.fr>
Date2011-12-20 21:03 +0100
Message-ID<mailman.3879.1324411263.27778.python-list@python.org>
In reply to#17602
Tue, 20 Dec 2011 11:17:15 -0800 (PST)
Yigit Turgut a écrit:

> Hi all,
> 
> I have a text file containing such data ;
> 
>         A                B                C
> -------------------------------------------------------
> -2.0100e-01    8.000e-02    8.000e-05
> -2.0000e-01    0.000e+00   4.800e-04
> -1.9900e-01    4.000e-02    1.600e-04
> 
> But I only need Section B, and I need to change the notation to ;
> 
> 8.000e-02 = 0.08
> 0.000e+00 = 0.00
> 4.000e-02 = 0.04
> 
> Text file is approximately 10MB in size. I looked around to see if
> there is a quick and dirty workaround but there are lots of modules,
> lots of options.. I am confused.
> 
> Which module is most suitable for this task ?

You could try to do it yourself.

You'd need to know what seperates the datas. Tabulation character ? Spaces ?

Exemple :

Input file
----------

        A                B                C
-------------------------------------------------------
-2.0100e-01    8.000e-02    8.000e-05
-2.0000e-01    0.000e+00    4.800e-04
-1.9900e-01    4.000e-02    1.600e-04


Python code
-----------

# Open file
with open('test1.plt','r') as f:

    b_values = []
    
    # skip as many lines as needed
    line = f.readline()
    line = f.readline()
    line = f.readline()

    while line:
        #start = line.find(u"\u0009", 0) + 1   #seek Tab
        start = line.find("   ", 0) + 4        #seek 4 spaces
        #end = line.find(u"\u0009", start)
        end = line.find("   ", start)
        b_values.append(float(line[start:end].strip()))
        line = f.readline()

    print b_values

It gets trickier if the amount of spaces is not constant. I would then try
with regular expressions. Perhaps would regexp be more efficient in any case.

-- 
Jérôme

[toc] | [prev] | [next] | [standalone]


#17623

FromNick Dokos <nicholas.dokos@hp.com>
Date2011-12-20 16:04 -0500
Message-ID<mailman.3885.1324418997.27778.python-list@python.org>
In reply to#17602
Jérôme <jerome@jolimont.fr> wrote:

> Tue, 20 Dec 2011 11:17:15 -0800 (PST)
> Yigit Turgut a écrit:
> 
> > Hi all,
> > 
> > I have a text file containing such data ;
> > 
> >         A                B                C
> > -------------------------------------------------------
> > -2.0100e-01    8.000e-02    8.000e-05
> > -2.0000e-01    0.000e+00   4.800e-04
> > -1.9900e-01    4.000e-02    1.600e-04
> > 
> > But I only need Section B, and I need to change the notation to ;
> > 
> > 8.000e-02 = 0.08
> > 0.000e+00 = 0.00
> > 4.000e-02 = 0.04
> > 
> > Text file is approximately 10MB in size. I looked around to see if
> > there is a quick and dirty workaround but there are lots of modules,
> > lots of options.. I am confused.
> > 
> > Which module is most suitable for this task ?
> 
> You could try to do it yourself.
> 

Does it have to be python? If not, I'd go with something similar to

   sed 1,2d foo.data | awk '{printf("%.2f\n", $2);}'

Nick

[toc] | [prev] | [next] | [standalone]


#17626

FromAlexander Kapps <alex.kapps@web.de>
Date2011-12-21 01:01 +0100
Message-ID<mailman.3887.1324425704.27778.python-list@python.org>
In reply to#17602
On 20.12.2011 22:04, Nick Dokos wrote:

>>> I have a text file containing such data ;
>>>
>>>          A                B                C
>>> -------------------------------------------------------
>>> -2.0100e-01    8.000e-02    8.000e-05
>>> -2.0000e-01    0.000e+00   4.800e-04
>>> -1.9900e-01    4.000e-02    1.600e-04
>>>
>>> But I only need Section B, and I need to change the notation to ;
>>>
>>> 8.000e-02 = 0.08
>>> 0.000e+00 = 0.00
>>> 4.000e-02 = 0.04

> Does it have to be python? If not, I'd go with something similar to
>
>     sed 1,2d foo.data | awk '{printf("%.2f\n", $2);}'
>

Why sed and awk:

awk 'NR>2 {printf("%.2f\n", $2);}' data.txt

And in Python:

f = open("data.txt")
f.readline()	# skip header
f.readline()	# skip header
for line in f:
     print "%02s" % float(line.split()[1])

[toc] | [prev] | [next] | [standalone]


#17724

FromYigit Turgut <y.turgut@gmail.com>
Date2011-12-22 03:11 -0800
Message-ID<6ff3a578-5a77-4e31-8e54-e29c46299e3c@f1g2000yqi.googlegroups.com>
In reply to#17626
On Dec 21, 2:01 am, Alexander Kapps <alex.ka...@web.de> wrote:
> On 20.12.2011 22:04, Nick Dokos wrote:
>
>
>
>
>
>
>
>
>
> >>> I have a text file containing such data ;
>
> >>>          A                B                C
> >>> -------------------------------------------------------
> >>> -2.0100e-01    8.000e-02    8.000e-05
> >>> -2.0000e-01    0.000e+00   4.800e-04
> >>> -1.9900e-01    4.000e-02    1.600e-04
>
> >>> But I only need Section B, and I need to change the notation to ;
>
> >>> 8.000e-02 = 0.08
> >>> 0.000e+00 = 0.00
> >>> 4.000e-02 = 0.04
> > Does it have to be python? If not, I'd go with something similar to
>
> >     sed 1,2d foo.data | awk '{printf("%.2f\n", $2);}'
>
> Why sed and awk:
>
> awk 'NR>2 {printf("%.2f\n", $2);}' data.txt
>
> And in Python:
>
> f = open("data.txt")
> f.readline()    # skip header
> f.readline()    # skip header
> for line in f:
>      print "%02s" % float(line.split()[1])

@Jerome ; Your suggestion provided floating point error, it might need
some slight modificiation.

@Nick ; Sorry mate, it needs to be in Python. But I noted solution in
case if I need for another case.

@Alexander ; Works as expected.

Thank you all for the replies.

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web