Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #64826

Re:Remove unwanted characters from column

From Dave Angel <davea@davea.name>
Subject Re:Remove unwanted characters from column
Date 2014-01-27 00:24 -0500
Organization news.gmane.org
References <8d703876-ba90-492d-a558-a5a9bb8023c7@googlegroups.com>
Newsgroups comp.lang.python
Message-ID <mailman.6019.1390800112.18130.python-list@python.org> (permalink)

Show all headers | View raw


 matt.s.marotta@gmail.com Wrote in message:
> School assignment is to create a tab separated output with the original given addresses in one column and then the addresses split into other columns (ex, columns for city, postal code, street suffix).
> 
> Here is my code:
> 
> inHandler = open(inFile, 'r')
> outHandler = open(outFile, 'w')
> outHandler.write("FarmID\tAddress\tStreetNum\tStreetName\tSufType\tDir\tCity\tProvince\tPostalCode")
> for line in inHandler:
>     str = line.replace("FarmID\tAddress", " ")
>     outHandler.write(str[0:-1])
> 
>     str = str.replace(" ","\t", 1)
>     str = str.replace(" Rd, ","\tRd\t\t")
>     str = str.replace("Rd ","\tRd\t\t")
>     str = str.replace("Ave, ","\tAve\t\t")
>     str = str.replace("Ave ","\tAve\t\t")
>     str = str.replace("St ","\tSt\t\t")
>     str = str.replace("St, ","\tSt\t\t")    
>     str = str.replace("Dr, ","\tDr\t\t")
>     str = str.replace("Lane, ","\tLane\t\t")
>     str = str.replace("Pky, ","\tPky\t\t")
>     str = str.replace(" Sq, ","\tSq\t\t")
>     str = str.replace(" Pl, ","\tPl\t\t")
> 
>     str = str.replace("\tE, ","E\t")
>     str = str.replace("\tN, ","N\t")
>     str = str.replace("\tS, ","S\t")
>     str = str.replace("\tW, ","W\t")
>     str = str.replace(",\t","\t\t")
>     str = str.replace(", ON ","\tON\t")
> 
>     outHandler.write(str)
> 
> inHandler.close()
> outHandler.close()
> 
> 
> Here is some sample addresses, there are over 100:
> 
> FarmID	Address
> 1	1067 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0
> 2	4260 Mountainview Rd, Lincoln, ON L0R 1B2
> 3	25 Hunter Rd, Grimsby, ON L3M 4A3
> 4	1091 Hutchinson Rd, Haldimand, ON N0A 1K0
> 5	5172 Green Lane Rd, Lincoln, ON L0R 1B3
> 6	500 Glenridge Ave, St. Catharines, ON L2S 3A1
> 7	471 Foss Rd, Pelham, ON L0S 1C0
> 8	758 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0
> 9	3836 Main St, Lincoln, ON L0R 1S0
> 
> 
> 
> I have everything worked out, except that the final output places the farmID at the end of postal code as seen in the example below (notice the brackets showing where the farmID is placed):
> 
> FarmID	Address	StreetNum	StreetName	SufType	Dir	City	Province	PostalCode 	
> 1	1067 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0(1)	1067	Niagara Stone	Rd		Niagara-On-The-Lake	ON	L0S 1J0
> 
> Any ideas on how to fix this? Keep in mind as well that the farmID will have 2 characters at a certain point.
> 

Your specific concern is triggered by having two writes in the loop.

Get rid of the first and you're marginally closer. 

But really,  you've got much bigger troubles. All those
 unrestricted replace calls are not at all robust. But maybe
 you'll get away with it for a school assignment if the test data
 is very limited. 

Better would be to treat it like a parsing problem,  figuring what
 delimiter rule applies to each field,  and building a list Then
 use str.join to build the line for the outHandler.
 

-- 
DaveA

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Remove unwanted characters from column matt.s.marotta@gmail.com - 2014-01-26 19:49 -0800
  Re:Remove unwanted characters from column Dave Angel <davea@davea.name> - 2014-01-27 00:24 -0500
    Re: Remove unwanted characters from column matt.s.marotta@gmail.com - 2014-01-27 05:32 -0800
      Re: Remove unwanted characters from column Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-01-27 13:54 +0000
        Re: Remove unwanted characters from column matt.s.marotta@gmail.com - 2014-01-27 06:23 -0800
          Re: Remove unwanted characters from column Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-01-27 14:34 +0000
          Re: Remove unwanted characters from column Chris Angelico <rosuav@gmail.com> - 2014-01-28 01:57 +1100
            Re: Remove unwanted characters from column matt.s.marotta@gmail.com - 2014-01-27 07:03 -0800
              Re: Remove unwanted characters from column Chris Angelico <rosuav@gmail.com> - 2014-01-28 02:19 +1100
  Re: Remove unwanted characters from column Denis McMahon <denismfmcmahon@gmail.com> - 2014-01-27 15:22 +0000

csiph-web