Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #17602 > unrolled thread
| Started by | Yigit Turgut <y.turgut@gmail.com> |
|---|---|
| First post | 2011-12-20 11:17 -0800 |
| Last post | 2011-12-22 03:11 -0800 |
| Articles | 6 — 5 participants |
Back to article view | Back to comp.lang.python
Text Processing Yigit Turgut <y.turgut@gmail.com> - 2011-12-20 11:17 -0800
Re: Text Processing Dave Angel <d@davea.name> - 2011-12-20 14:57 -0500
Re: Text Processing Jérôme <jerome@jolimont.fr> - 2011-12-20 21:03 +0100
Re: Text Processing Nick Dokos <nicholas.dokos@hp.com> - 2011-12-20 16:04 -0500
Re: Text Processing Alexander Kapps <alex.kapps@web.de> - 2011-12-21 01:01 +0100
Re: Text Processing Yigit Turgut <y.turgut@gmail.com> - 2011-12-22 03:11 -0800
| From | Yigit Turgut <y.turgut@gmail.com> |
|---|---|
| Date | 2011-12-20 11:17 -0800 |
| Subject | Text Processing |
| Message-ID | <209c2abf-dd56-4a7f-839b-fad92920d457@m7g2000vbc.googlegroups.com> |
Hi all,
I have a text file containing such data ;
A B C
-------------------------------------------------------
-2.0100e-01 8.000e-02 8.000e-05
-2.0000e-01 0.000e+00 4.800e-04
-1.9900e-01 4.000e-02 1.600e-04
But I only need Section B, and I need to change the notation to ;
8.000e-02 = 0.08
0.000e+00 = 0.00
4.000e-02 = 0.04
Text file is approximately 10MB in size. I looked around to see if
there is a quick and dirty workaround but there are lots of modules,
lots of options.. I am confused.
Which module is most suitable for this task ?
[toc] | [next] | [standalone]
| From | Dave Angel <d@davea.name> |
|---|---|
| Date | 2011-12-20 14:57 -0500 |
| Message-ID | <mailman.3878.1324411044.27778.python-list@python.org> |
| In reply to | #17602 |
On 12/20/2011 02:17 PM, Yigit Turgut wrote:
> Hi all,
>
> I have a text file containing such data ;
>
> A B C
> -------------------------------------------------------
> -2.0100e-01 8.000e-02 8.000e-05
> -2.0000e-01 0.000e+00 4.800e-04
> -1.9900e-01 4.000e-02 1.600e-04
>
> But I only need Section B, and I need to change the notation to ;
>
> 8.000e-02 = 0.08
> 0.000e+00 = 0.00
> 4.000e-02 = 0.04
>
> Text file is approximately 10MB in size. I looked around to see if
> there is a quick and dirty workaround but there are lots of modules,
> lots of options.. I am confused.
>
> Which module is most suitable for this task ?
You probably don't need anything but sys (to parse the command options)
and os (maybe).
open the file
for eachline
if one of the header lines, continue
separate out the part you want
print it, formatted as you like
Then just run the script with its stdout redirected, and you've got your
new file
The details depend on what your experience with Python is, and what
version of Python you're running.
--
DaveA
[toc] | [prev] | [next] | [standalone]
| From | Jérôme <jerome@jolimont.fr> |
|---|---|
| Date | 2011-12-20 21:03 +0100 |
| Message-ID | <mailman.3879.1324411263.27778.python-list@python.org> |
| In reply to | #17602 |
Tue, 20 Dec 2011 11:17:15 -0800 (PST)
Yigit Turgut a écrit:
> Hi all,
>
> I have a text file containing such data ;
>
> A B C
> -------------------------------------------------------
> -2.0100e-01 8.000e-02 8.000e-05
> -2.0000e-01 0.000e+00 4.800e-04
> -1.9900e-01 4.000e-02 1.600e-04
>
> But I only need Section B, and I need to change the notation to ;
>
> 8.000e-02 = 0.08
> 0.000e+00 = 0.00
> 4.000e-02 = 0.04
>
> Text file is approximately 10MB in size. I looked around to see if
> there is a quick and dirty workaround but there are lots of modules,
> lots of options.. I am confused.
>
> Which module is most suitable for this task ?
You could try to do it yourself.
You'd need to know what seperates the datas. Tabulation character ? Spaces ?
Exemple :
Input file
----------
A B C
-------------------------------------------------------
-2.0100e-01 8.000e-02 8.000e-05
-2.0000e-01 0.000e+00 4.800e-04
-1.9900e-01 4.000e-02 1.600e-04
Python code
-----------
# Open file
with open('test1.plt','r') as f:
b_values = []
# skip as many lines as needed
line = f.readline()
line = f.readline()
line = f.readline()
while line:
#start = line.find(u"\u0009", 0) + 1 #seek Tab
start = line.find(" ", 0) + 4 #seek 4 spaces
#end = line.find(u"\u0009", start)
end = line.find(" ", start)
b_values.append(float(line[start:end].strip()))
line = f.readline()
print b_values
It gets trickier if the amount of spaces is not constant. I would then try
with regular expressions. Perhaps would regexp be more efficient in any case.
--
Jérôme
[toc] | [prev] | [next] | [standalone]
| From | Nick Dokos <nicholas.dokos@hp.com> |
|---|---|
| Date | 2011-12-20 16:04 -0500 |
| Message-ID | <mailman.3885.1324418997.27778.python-list@python.org> |
| In reply to | #17602 |
Jérôme <jerome@jolimont.fr> wrote:
> Tue, 20 Dec 2011 11:17:15 -0800 (PST)
> Yigit Turgut a écrit:
>
> > Hi all,
> >
> > I have a text file containing such data ;
> >
> > A B C
> > -------------------------------------------------------
> > -2.0100e-01 8.000e-02 8.000e-05
> > -2.0000e-01 0.000e+00 4.800e-04
> > -1.9900e-01 4.000e-02 1.600e-04
> >
> > But I only need Section B, and I need to change the notation to ;
> >
> > 8.000e-02 = 0.08
> > 0.000e+00 = 0.00
> > 4.000e-02 = 0.04
> >
> > Text file is approximately 10MB in size. I looked around to see if
> > there is a quick and dirty workaround but there are lots of modules,
> > lots of options.. I am confused.
> >
> > Which module is most suitable for this task ?
>
> You could try to do it yourself.
>
Does it have to be python? If not, I'd go with something similar to
sed 1,2d foo.data | awk '{printf("%.2f\n", $2);}'
Nick
[toc] | [prev] | [next] | [standalone]
| From | Alexander Kapps <alex.kapps@web.de> |
|---|---|
| Date | 2011-12-21 01:01 +0100 |
| Message-ID | <mailman.3887.1324425704.27778.python-list@python.org> |
| In reply to | #17602 |
On 20.12.2011 22:04, Nick Dokos wrote:
>>> I have a text file containing such data ;
>>>
>>> A B C
>>> -------------------------------------------------------
>>> -2.0100e-01 8.000e-02 8.000e-05
>>> -2.0000e-01 0.000e+00 4.800e-04
>>> -1.9900e-01 4.000e-02 1.600e-04
>>>
>>> But I only need Section B, and I need to change the notation to ;
>>>
>>> 8.000e-02 = 0.08
>>> 0.000e+00 = 0.00
>>> 4.000e-02 = 0.04
> Does it have to be python? If not, I'd go with something similar to
>
> sed 1,2d foo.data | awk '{printf("%.2f\n", $2);}'
>
Why sed and awk:
awk 'NR>2 {printf("%.2f\n", $2);}' data.txt
And in Python:
f = open("data.txt")
f.readline() # skip header
f.readline() # skip header
for line in f:
print "%02s" % float(line.split()[1])
[toc] | [prev] | [next] | [standalone]
| From | Yigit Turgut <y.turgut@gmail.com> |
|---|---|
| Date | 2011-12-22 03:11 -0800 |
| Message-ID | <6ff3a578-5a77-4e31-8e54-e29c46299e3c@f1g2000yqi.googlegroups.com> |
| In reply to | #17626 |
On Dec 21, 2:01 am, Alexander Kapps <alex.ka...@web.de> wrote:
> On 20.12.2011 22:04, Nick Dokos wrote:
>
>
>
>
>
>
>
>
>
> >>> I have a text file containing such data ;
>
> >>> A B C
> >>> -------------------------------------------------------
> >>> -2.0100e-01 8.000e-02 8.000e-05
> >>> -2.0000e-01 0.000e+00 4.800e-04
> >>> -1.9900e-01 4.000e-02 1.600e-04
>
> >>> But I only need Section B, and I need to change the notation to ;
>
> >>> 8.000e-02 = 0.08
> >>> 0.000e+00 = 0.00
> >>> 4.000e-02 = 0.04
> > Does it have to be python? If not, I'd go with something similar to
>
> > sed 1,2d foo.data | awk '{printf("%.2f\n", $2);}'
>
> Why sed and awk:
>
> awk 'NR>2 {printf("%.2f\n", $2);}' data.txt
>
> And in Python:
>
> f = open("data.txt")
> f.readline() # skip header
> f.readline() # skip header
> for line in f:
> print "%02s" % float(line.split()[1])
@Jerome ; Your suggestion provided floating point error, it might need
some slight modificiation.
@Nick ; Sorry mate, it needs to be in Python. But I noted solution in
case if I need for another case.
@Alexander ; Works as expected.
Thank you all for the replies.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web