Groups > comp.lang.python > #64824 > unrolled thread

Remove unwanted characters from column

Started by	matt.s.marotta@gmail.com
First post	2014-01-26 19:49 -0800
Last post	2014-01-27 15:22 +0000
Articles	10 — 6 participants

Back to article view | Back to comp.lang.python

  Remove unwanted characters from column matt.s.marotta@gmail.com - 2014-01-26 19:49 -0800
    Re:Remove unwanted characters from column Dave Angel <davea@davea.name> - 2014-01-27 00:24 -0500
      Re: Remove unwanted characters from column matt.s.marotta@gmail.com - 2014-01-27 05:32 -0800
        Re: Remove unwanted characters from column Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-01-27 13:54 +0000
          Re: Remove unwanted characters from column matt.s.marotta@gmail.com - 2014-01-27 06:23 -0800
            Re: Remove unwanted characters from column Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-01-27 14:34 +0000
            Re: Remove unwanted characters from column Chris Angelico <rosuav@gmail.com> - 2014-01-28 01:57 +1100
              Re: Remove unwanted characters from column matt.s.marotta@gmail.com - 2014-01-27 07:03 -0800
                Re: Remove unwanted characters from column Chris Angelico <rosuav@gmail.com> - 2014-01-28 02:19 +1100
    Re: Remove unwanted characters from column Denis McMahon <denismfmcmahon@gmail.com> - 2014-01-27 15:22 +0000

#64824 — Remove unwanted characters from column

From	matt.s.marotta@gmail.com
Date	2014-01-26 19:49 -0800
Subject	Remove unwanted characters from column
Message-ID	<8d703876-ba90-492d-a558-a5a9bb8023c7@googlegroups.com>

School assignment is to create a tab separated output with the original given addresses in one column and then the addresses split into other columns (ex, columns for city, postal code, street suffix).

Here is my code:

inHandler = open(inFile, 'r')
outHandler = open(outFile, 'w')
outHandler.write("FarmID\tAddress\tStreetNum\tStreetName\tSufType\tDir\tCity\tProvince\tPostalCode")
for line in inHandler:
    str = line.replace("FarmID\tAddress", " ")
    outHandler.write(str[0:-1])

    str = str.replace(" ","\t", 1)
    str = str.replace(" Rd, ","\tRd\t\t")
    str = str.replace("Rd ","\tRd\t\t")
    str = str.replace("Ave, ","\tAve\t\t")
    str = str.replace("Ave ","\tAve\t\t")
    str = str.replace("St ","\tSt\t\t")
    str = str.replace("St, ","\tSt\t\t")    
    str = str.replace("Dr, ","\tDr\t\t")
    str = str.replace("Lane, ","\tLane\t\t")
    str = str.replace("Pky, ","\tPky\t\t")
    str = str.replace(" Sq, ","\tSq\t\t")
    str = str.replace(" Pl, ","\tPl\t\t")

    str = str.replace("\tE, ","E\t")
    str = str.replace("\tN, ","N\t")
    str = str.replace("\tS, ","S\t")
    str = str.replace("\tW, ","W\t")
    str = str.replace(",\t","\t\t")
    str = str.replace(", ON ","\tON\t")

    outHandler.write(str)

inHandler.close()
outHandler.close()


Here is some sample addresses, there are over 100:

FarmID	Address
1	1067 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0
2	4260 Mountainview Rd, Lincoln, ON L0R 1B2
3	25 Hunter Rd, Grimsby, ON L3M 4A3
4	1091 Hutchinson Rd, Haldimand, ON N0A 1K0
5	5172 Green Lane Rd, Lincoln, ON L0R 1B3
6	500 Glenridge Ave, St. Catharines, ON L2S 3A1
7	471 Foss Rd, Pelham, ON L0S 1C0
8	758 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0
9	3836 Main St, Lincoln, ON L0R 1S0



I have everything worked out, except that the final output places the farmID at the end of postal code as seen in the example below (notice the brackets showing where the farmID is placed):

FarmID	Address	StreetNum	StreetName	SufType	Dir	City	Province	PostalCode 	
1	1067 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0(1)	1067	Niagara Stone	Rd		Niagara-On-The-Lake	ON	L0S 1J0

Any ideas on how to fix this? Keep in mind as well that the farmID will have 2 characters at a certain point.

[toc] | [next] | [standalone]

#64826

From	Dave Angel <davea@davea.name>
Date	2014-01-27 00:24 -0500
Message-ID	<mailman.6019.1390800112.18130.python-list@python.org>
In reply to	#64824

 matt.s.marotta@gmail.com Wrote in message:
> School assignment is to create a tab separated output with the original given addresses in one column and then the addresses split into other columns (ex, columns for city, postal code, street suffix).
> 
> Here is my code:
> 
> inHandler = open(inFile, 'r')
> outHandler = open(outFile, 'w')
> outHandler.write("FarmID\tAddress\tStreetNum\tStreetName\tSufType\tDir\tCity\tProvince\tPostalCode")
> for line in inHandler:
>     str = line.replace("FarmID\tAddress", " ")
>     outHandler.write(str[0:-1])
> 
>     str = str.replace(" ","\t", 1)
>     str = str.replace(" Rd, ","\tRd\t\t")
>     str = str.replace("Rd ","\tRd\t\t")
>     str = str.replace("Ave, ","\tAve\t\t")
>     str = str.replace("Ave ","\tAve\t\t")
>     str = str.replace("St ","\tSt\t\t")
>     str = str.replace("St, ","\tSt\t\t")    
>     str = str.replace("Dr, ","\tDr\t\t")
>     str = str.replace("Lane, ","\tLane\t\t")
>     str = str.replace("Pky, ","\tPky\t\t")
>     str = str.replace(" Sq, ","\tSq\t\t")
>     str = str.replace(" Pl, ","\tPl\t\t")
> 
>     str = str.replace("\tE, ","E\t")
>     str = str.replace("\tN, ","N\t")
>     str = str.replace("\tS, ","S\t")
>     str = str.replace("\tW, ","W\t")
>     str = str.replace(",\t","\t\t")
>     str = str.replace(", ON ","\tON\t")
> 
>     outHandler.write(str)
> 
> inHandler.close()
> outHandler.close()
> 
> 
> Here is some sample addresses, there are over 100:
> 
> FarmID	Address
> 1	1067 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0
> 2	4260 Mountainview Rd, Lincoln, ON L0R 1B2
> 3	25 Hunter Rd, Grimsby, ON L3M 4A3
> 4	1091 Hutchinson Rd, Haldimand, ON N0A 1K0
> 5	5172 Green Lane Rd, Lincoln, ON L0R 1B3
> 6	500 Glenridge Ave, St. Catharines, ON L2S 3A1
> 7	471 Foss Rd, Pelham, ON L0S 1C0
> 8	758 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0
> 9	3836 Main St, Lincoln, ON L0R 1S0
> 
> 
> 
> I have everything worked out, except that the final output places the farmID at the end of postal code as seen in the example below (notice the brackets showing where the farmID is placed):
> 
> FarmID	Address	StreetNum	StreetName	SufType	Dir	City	Province	PostalCode 	
> 1	1067 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0(1)	1067	Niagara Stone	Rd		Niagara-On-The-Lake	ON	L0S 1J0
> 
> Any ideas on how to fix this? Keep in mind as well that the farmID will have 2 characters at a certain point.
> 

Your specific concern is triggered by having two writes in the loop.

Get rid of the first and you're marginally closer. 

But really,  you've got much bigger troubles. All those
 unrestricted replace calls are not at all robust. But maybe
 you'll get away with it for a school assignment if the test data
 is very limited. 

Better would be to treat it like a parsing problem,  figuring what
 delimiter rule applies to each field,  and building a list Then
 use str.join to build the line for the outHandler.
 

-- 
DaveA

[toc] | [prev] | [next] | [standalone]

#64857

From	matt.s.marotta@gmail.com
Date	2014-01-27 05:32 -0800
Message-ID	<04d509f9-54fa-4ba5-bb6e-24ce60e12523@googlegroups.com>
In reply to	#64826

On Monday, 27 January 2014 00:24:11 UTC-5, Dave Angel  wrote:
> matt.s.marotta@gmail.com Wrote in message:
> 
> > School assignment is to create a tab separated output with the original given addresses in one column and then the addresses split into other columns (ex, columns for city, postal code, street suffix).
> 
> > 
> 
> > Here is my code:
> 
> > 
> 
> > inHandler = open(inFile, 'r')
> 
> > outHandler = open(outFile, 'w')
> 
> > outHandler.write("FarmID\tAddress\tStreetNum\tStreetName\tSufType\tDir\tCity\tProvince\tPostalCode")
> 
> > for line in inHandler:
> 
> >     str = line.replace("FarmID\tAddress", " ")
> 
> >     outHandler.write(str[0:-1])
> 
> > 
> 
> >     str = str.replace(" ","\t", 1)
> 
> >     str = str.replace(" Rd, ","\tRd\t\t")
> 
> >     str = str.replace("Rd ","\tRd\t\t")
> 
> >     str = str.replace("Ave, ","\tAve\t\t")
> 
> >     str = str.replace("Ave ","\tAve\t\t")
> 
> >     str = str.replace("St ","\tSt\t\t")
> 
> >     str = str.replace("St, ","\tSt\t\t")    
> 
> >     str = str.replace("Dr, ","\tDr\t\t")
> 
> >     str = str.replace("Lane, ","\tLane\t\t")
> 
> >     str = str.replace("Pky, ","\tPky\t\t")
> 
> >     str = str.replace(" Sq, ","\tSq\t\t")
> 
> >     str = str.replace(" Pl, ","\tPl\t\t")
> 
> > 
> 
> >     str = str.replace("\tE, ","E\t")
> 
> >     str = str.replace("\tN, ","N\t")
> 
> >     str = str.replace("\tS, ","S\t")
> 
> >     str = str.replace("\tW, ","W\t")
> 
> >     str = str.replace(",\t","\t\t")
> 
> >     str = str.replace(", ON ","\tON\t")
> 
> > 
> 
> >     outHandler.write(str)
> 
> > 
> 
> > inHandler.close()
> 
> > outHandler.close()
> 
> > 
> 
> > 
> 
> > Here is some sample addresses, there are over 100:
> 
> > 
> 
> > FarmID	Address
> 
> > 1	1067 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0
> 
> > 2	4260 Mountainview Rd, Lincoln, ON L0R 1B2
> 
> > 3	25 Hunter Rd, Grimsby, ON L3M 4A3
> 
> > 4	1091 Hutchinson Rd, Haldimand, ON N0A 1K0
> 
> > 5	5172 Green Lane Rd, Lincoln, ON L0R 1B3
> 
> > 6	500 Glenridge Ave, St. Catharines, ON L2S 3A1
> 
> > 7	471 Foss Rd, Pelham, ON L0S 1C0
> 
> > 8	758 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0
> 
> > 9	3836 Main St, Lincoln, ON L0R 1S0
> 
> > 
> 
> > 
> 
> > 
> 
> > I have everything worked out, except that the final output places the farmID at the end of postal code as seen in the example below (notice the brackets showing where the farmID is placed):
> 
> > 
> 
> > FarmID	Address	StreetNum	StreetName	SufType	Dir	City	Province	PostalCode 	
> 
> > 1	1067 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0(1)	1067	Niagara Stone	Rd		Niagara-On-The-Lake	ON	L0S 1J0
> 
> > 
> 
> > Any ideas on how to fix this? Keep in mind as well that the farmID will have 2 characters at a certain point.
> 
> > 
> 
> 
> 
> Your specific concern is triggered by having two writes in the loop.
> 
> 
> 
> Get rid of the first and you're marginally closer. 
> 
> 
> 
> But really,  you've got much bigger troubles. All those
> 
>  unrestricted replace calls are not at all robust. But maybe
> 
>  you'll get away with it for a school assignment if the test data
> 
>  is very limited. 
> 
> 
> 
> Better would be to treat it like a parsing problem,  figuring what
> 
>  delimiter rule applies to each field,  and building a list Then
> 
>  use str.join to build the line for the outHandler.
> 
>  
> 
> 
> 
> -- 
> 
> DaveA

The code that I used is the proper way that we were supposed to complete the assignment.  All I need now is an 'if...then' statement to get rid of the unwanted FarmID at the end of the addresses.  I just don't know what will come after the 'if' part.

[toc] | [prev] | [next] | [standalone]

#64861

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2014-01-27 13:54 +0000
Message-ID	<52e6650c$0$29999$c3e8da3$5496439d@news.astraweb.com>
In reply to	#64857

On Mon, 27 Jan 2014 05:32:08 -0800, matt.s.marotta wrote:

> The code that I used is the proper way that we were supposed to complete
> the assignment.  All I need now is an 'if...then' statement to get rid
> of the unwanted FarmID at the end of the addresses.  I just don't know
> what will come after the 'if' part.

Show us what you do know. If you don't know the "if", what about the 
"then"?

if .... :
    do what?

What do you intend to do inside the if? Under what circumstances would 
you do it?

If you can answer those questions in English, then we can help you write 
code to do it.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#64864

From	matt.s.marotta@gmail.com
Date	2014-01-27 06:23 -0800
Message-ID	<9b121fe6-9ed8-4adf-af24-80166f348c7d@googlegroups.com>
In reply to	#64861

On Monday, 27 January 2014 08:54:20 UTC-5, Steven D'Aprano  wrote:
> On Mon, 27 Jan 2014 05:32:08 -0800, matt.s.marotta wrote:
> 
> 
> 
> > The code that I used is the proper way that we were supposed to complete
> 
> > the assignment.  All I need now is an 'if...then' statement to get rid
> 
> > of the unwanted FarmID at the end of the addresses.  I just don't know
> 
> > what will come after the 'if' part.
> 
> 
> 
> Show us what you do know. If you don't know the "if", what about the 
> 
> "then"?
> 
> 
> 
> 
> 
> if .... :
> 
>     do what?
> 
> 
> 
> 
> 
> What do you intend to do inside the if? Under what circumstances would 
> 
> you do it?
> 
> 
> 
> If you can answer those questions in English, then we can help you write 
> 
> code to do it.
> 
> 
> 
> 
> 
> -- 
> 
> Steven

If the farmID < 10:
remove one character from the address column
Elif farmID > 10:
remove two characters from the address column

[toc] | [prev] | [next] | [standalone]

#64865

From	Mark Lawrence <breamoreboy@yahoo.co.uk>
Date	2014-01-27 14:34 +0000
Message-ID	<mailman.6042.1390833253.18130.python-list@python.org>
In reply to	#64864

On 27/01/2014 14:23, matt.s.marotta@gmail.com wrote:
> On Monday, 27 January 2014 08:54:20 UTC-5, Steven D'Aprano  wrote:
>> On Mon, 27 Jan 2014 05:32:08 -0800, matt.s.marotta wrote:
>>
>>
>>
>>> The code that I used is the proper way that we were supposed to complete
>>
>>> the assignment.  All I need now is an 'if...then' statement to get rid
>>
>>> of the unwanted FarmID at the end of the addresses.  I just don't know
>>
>>> what will come after the 'if' part.
>>
>>
>>
>> Show us what you do know. If you don't know the "if", what about the
>>
>> "then"?
>>
>>
>>
>>
>>
>> if .... :
>>
>>      do what?
>>
>>
>>
>>
>>
>> What do you intend to do inside the if? Under what circumstances would
>>
>> you do it?
>>
>>
>>
>> If you can answer those questions in English, then we can help you write
>>
>> code to do it.
>>
>>
>>
>>
>>
>> --
>>
>> Steven
>
> If the farmID < 10:
> remove one character from the address column
> Elif farmID > 10:
> remove two characters from the address column
>

Would you please read and action this 
https://wiki.python.org/moin/GoogleGroupsPython to prevent us seeing the 
double line spacing above, thanks.

-- 
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.

Mark Lawrence

[toc] | [prev] | [next] | [standalone]

#64867

From	Chris Angelico <rosuav@gmail.com>
Date	2014-01-28 01:57 +1100
Message-ID	<mailman.6044.1390834662.18130.python-list@python.org>
In reply to	#64864

On Tue, Jan 28, 2014 at 1:23 AM,  <matt.s.marotta@gmail.com> wrote:
> If the farmID < 10:
> remove one character from the address column
> Elif farmID > 10:
> remove two characters from the address column

What if farmID == 10?

ChrisA

[toc] | [prev] | [next] | [standalone]

#64868

From	matt.s.marotta@gmail.com
Date	2014-01-27 07:03 -0800
Message-ID	<948612d7-8e69-46d0-9fab-707b762a48a7@googlegroups.com>
In reply to	#64867

On Monday, 27 January 2014 09:57:32 UTC-5, Chris Angelico  wrote:
> On Tue, Jan 28, 2014 at 1:23 AM,  <matt.s.marotta@gmail.com> wrote:
> 
> > If the farmID < 10:
> > remove one character from the address column
> > Elif farmID > 10:
> > remove two characters from the address column
> 
> What if farmID == 10?
>
> ChrisA

Ok, sorry this is how it should be.

If the FarmID < 10:
remove one character from the address column

If the FarmID > 9:
remove two characters from the address column

My issue is I can't figure out what statement to use to define FarmID.

[toc] | [prev] | [next] | [standalone]

#64869

From	Chris Angelico <rosuav@gmail.com>
Date	2014-01-28 02:19 +1100
Message-ID	<mailman.6045.1390836009.18130.python-list@python.org>
In reply to	#64868

On Tue, Jan 28, 2014 at 2:03 AM,  <matt.s.marotta@gmail.com> wrote:
> On Monday, 27 January 2014 09:57:32 UTC-5, Chris Angelico  wrote:
>> On Tue, Jan 28, 2014 at 1:23 AM,  <matt.s.marotta@gmail.com> wrote:
>>
>> > If the farmID < 10:
>> > remove one character from the address column
>> > Elif farmID > 10:
>> > remove two characters from the address column
>>
>> What if farmID == 10?
>>
>> ChrisA
>
> Ok, sorry this is how it should be.
>
> If the FarmID < 10:
> remove one character from the address column
>
> If the FarmID > 9:
> remove two characters from the address column
>
> My issue is I can't figure out what statement to use to define FarmID.

More commonly, that would be written as

if farmID < 10:
    # remove one character
else:
    # remove two characters

Though this still suffers from the limitation of not handling 100 or
1000, so you might want to look at len(str(farmID)) instead.

ChrisA

[toc] | [prev] | [next] | [standalone]

#64870

From	Denis McMahon <denismfmcmahon@gmail.com>
Date	2014-01-27 15:22 +0000
Message-ID	<lc5tkh$f90$2@dont-email.me>
In reply to	#64824

On Sun, 26 Jan 2014 19:49:01 -0800, matt.s.marotta wrote:

> School assignment is to create a tab separated output with the original
> given addresses in one column and then the addresses split into other
> columns (ex, columns for city, postal code, street suffix).

If you're trying to create fixed width output from variable width fields, 
format specifiers may be better to use than tabs.

The problem with tabs is that columns end up misaligned when the data 
fields in a column contain a mixture of items of less length than the tab 
spacing and items of greater length than the tab spacing, unless you can 
work out the tab spacing and adjust accordingly.

For example, my code which uses the re module to separate the various 
record components and a format specifier to print the text and html 
versions (and csvwriter for the csv) them creates the outputs seen here:

http://www.sined.co.uk/tmp/farms.txt
http://www.sined.co.uk/tmp/farms.csv
http://www.sined.co.uk/tmp/farms.htm

-- 
Denis McMahon, denismfmcmahon@gmail.com

[toc] | [prev] | [standalone]

csiph-web

Remove unwanted characters from column

Contents

#64824 — Remove unwanted characters from column

#64826

#64857

#64861

#64864

#64865

#64867

#64868

#64869

#64870