Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #94704 > unrolled thread

Concatenating columns via Python

Started byhannahgracemcdonald16@gmail.com
First post2015-07-28 11:50 -0700
Last post2015-07-28 20:17 +0100
Articles 2 — 2 participants

Back to article view | Back to comp.lang.python


Contents

  Concatenating columns via Python hannahgracemcdonald16@gmail.com - 2015-07-28 11:50 -0700
    Re: Concatenating columns via Python MRAB <python@mrabarnett.plus.com> - 2015-07-28 20:17 +0100

#94704 — Concatenating columns via Python

Fromhannahgracemcdonald16@gmail.com
Date2015-07-28 11:50 -0700
SubjectConcatenating columns via Python
Message-ID<dc18354e-36c5-4f71-88ae-5cf7b70b689c@googlegroups.com>
I extracted a table from a PDF so the data is quite messy and the data that should be in 1 row is in 3 colums, like so:
   year       color                 location        
1 1997       blue,                   MD
2            green,
3            and yellow

SO far my code is below, but I know I am missing data I am just not sure what to put in it:

# Simply read and split an example Table 4 
import sys

# Assigning count number and getting rid of right space
def main():
count = 0
pieces = []
for line in open(infile, 'U'):
if count < 130: 
data = line.replace('"', '').rstrip().split("\t")
data = clean_data(data) 
if data[1] == "year" and data[1] != "":
write_pieces(pieces)
pieces = data
str.join(pieces)
else: 
for i in range(len(data)):
pieces[i] = pieces[i] + data[i]
str.join(pieces)

# Executing command to remove right space
def clean_data(s): 
return [x.rstrip() for x in s]

def write_pieces(pieces):
print 

if __name__ == '__main__':
infile = "file.txt"
main()

[toc] | [next] | [standalone]


#94705

FromMRAB <python@mrabarnett.plus.com>
Date2015-07-28 20:17 +0100
Message-ID<mailman.1051.1438111039.3674.python-list@python.org>
In reply to#94704
On 2015-07-28 19:50, hannahgracemcdonald16@gmail.com wrote:
> I extracted a table from a PDF so the data is quite messy and the data that should be in 1 row is in 3 colums, like so:
>     year       color                 location
> 1 1997       blue,                   MD
> 2            green,
> 3            and yellow
>
> SO far my code is below, but I know I am missing data I am just not sure what to put in it:
>
> # Simply read and split an example Table 4
> import sys
>
The indentation is messed up, which makes it hard to follow.

> # Assigning count number and getting rid of right space
> def main():
> count = 0
> pieces = []
> for line in open(infile, 'U'):
> if count < 130:
> data = line.replace('"', '').rstrip().split("\t")
> data = clean_data(data)
> if data[1] == "year" and data[1] != "":

If the first test is true, then the second test is definitely true, and 
is unnecessary.

> write_pieces(pieces)
> pieces = data
> str.join(pieces)

str.join _returns_ its result.

> else:
> for i in range(len(data)):
> pieces[i] = pieces[i] + data[i]
> str.join(pieces)
>
> # Executing command to remove right space
> def clean_data(s):
> return [x.rstrip() for x in s]
>
> def write_pieces(pieces):
> print
>
> if __name__ == '__main__':
> infile = "file.txt"
> main()
>

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web