Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!rt.uk.eu.org!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Date: Sun, 26 Jan 2014 23:28:37 +0000
From: MRAB <python@mrabarnett.plus.com>
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0
MIME-Version: 1.0
To: python-list@python.org
Subject: Re: Unwanted Spaces and Iterative Loop
References: <988fec60-228a-4427-b07e-b4327c7e02ae@googlegroups.com>
In-Reply-To: <988fec60-228a-4427-b07e-b4327c7e02ae@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.6006.1390778925.18130.python-list@python.org>
Lines: 64
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:64801

On 2014-01-26 21:46, matt.s.marotta@gmail.com wrote:
> I have been working on a python script that separates mailing addresses into different components.
>
> Here is my code:
>
> inFile = "directory"
> outFile = "directory"
> inHandler = open(inFile, 'r')
> outHandler = open(outFile, 'w')

Shouldn't you be writing a '\n' at the end of the line?

> outHandler.write("FarmID\tAddress\tStreetNum\tStreetName\tSufType\tDir\tCity\tProvince\tPostalCode")
> for line in inHandler:

This is being done on every single line of the file:

>      str = line.replace("FarmID\tAddress", " ")
>      outHandler.write(str[0:-1])
>
>      str = str.replace(" ","\t", 1)
>      str = str.replace(" Rd,","\tRd\t\t")
>      str = str.replace(" Rd","\tRd\t")
>      str = str.replace("Ave,","\tAve\t\t")
>      str = str.replace("Ave ","\tAve\t\t")
>      str = str.replace("St ","\tSt\t\t")
>      str = str.replace("St,","\tSt\t\t")
>      str = str.replace("Dr,","\tDr\t\t")
>      str = str.replace("Lane,","\tLane\t\t")
>      str = str.replace("Pky,","\tPky\t\t")
>      str = str.replace(" Sq,","\tSq\t\t")
>      str = str.replace(" Pl,","\tPl\t\t")
>
>      str = str.replace("\tE,","E\t")
>      str = str.replace("\tN,","N\t")
>      str = str.replace("\tS,","S\t")
>      str = str.replace("\tW,","W\t")
>      str = str.replace(",","\t")
>      str = str.replace(" ON","ON\t")
>
>
>      outHandler.write(str)
> inHandler.close()
>
> The text file that this manipulates has 91 addresses, so I'll just paste 5 of them in here to get the idea:
>
> FarmID	Address
> 1	1067 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0
> 2	4260 Mountainview Rd, Lincoln, ON L0R 1B2
> 3	25 Hunter Rd, Grimsby, ON L3M 4A3
> 4	1091 Hutchinson Rd, Haldimand, ON N0A 1K0
>
> My issue is that in the output file, there is a space before each city and each postal code that I do not want there.
>
You could try splitting on '\t', stripping the leading and trailing
whitespace on each part, and then joining them together again with
'\t'. (Make sure that you also write the '\n' at the end of line.)

> Furthermore, the FarmID is being added on to the end of the postal code under the original address column for each address.  This also is not supposed to be happening, and I am having trouble designing an iterative loop to remove/prevent that from happening.
>
> Any help is greatly appreciated!
>
As Mark said, you could also use the CSV module.