Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #64824 > unrolled thread
| Started by | matt.s.marotta@gmail.com |
|---|---|
| First post | 2014-01-26 19:49 -0800 |
| Last post | 2014-01-27 15:22 +0000 |
| Articles | 10 — 6 participants |
Back to article view | Back to comp.lang.python
Remove unwanted characters from column matt.s.marotta@gmail.com - 2014-01-26 19:49 -0800
Re:Remove unwanted characters from column Dave Angel <davea@davea.name> - 2014-01-27 00:24 -0500
Re: Remove unwanted characters from column matt.s.marotta@gmail.com - 2014-01-27 05:32 -0800
Re: Remove unwanted characters from column Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-01-27 13:54 +0000
Re: Remove unwanted characters from column matt.s.marotta@gmail.com - 2014-01-27 06:23 -0800
Re: Remove unwanted characters from column Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-01-27 14:34 +0000
Re: Remove unwanted characters from column Chris Angelico <rosuav@gmail.com> - 2014-01-28 01:57 +1100
Re: Remove unwanted characters from column matt.s.marotta@gmail.com - 2014-01-27 07:03 -0800
Re: Remove unwanted characters from column Chris Angelico <rosuav@gmail.com> - 2014-01-28 02:19 +1100
Re: Remove unwanted characters from column Denis McMahon <denismfmcmahon@gmail.com> - 2014-01-27 15:22 +0000
| From | matt.s.marotta@gmail.com |
|---|---|
| Date | 2014-01-26 19:49 -0800 |
| Subject | Remove unwanted characters from column |
| Message-ID | <8d703876-ba90-492d-a558-a5a9bb8023c7@googlegroups.com> |
School assignment is to create a tab separated output with the original given addresses in one column and then the addresses split into other columns (ex, columns for city, postal code, street suffix).
Here is my code:
inHandler = open(inFile, 'r')
outHandler = open(outFile, 'w')
outHandler.write("FarmID\tAddress\tStreetNum\tStreetName\tSufType\tDir\tCity\tProvince\tPostalCode")
for line in inHandler:
str = line.replace("FarmID\tAddress", " ")
outHandler.write(str[0:-1])
str = str.replace(" ","\t", 1)
str = str.replace(" Rd, ","\tRd\t\t")
str = str.replace("Rd ","\tRd\t\t")
str = str.replace("Ave, ","\tAve\t\t")
str = str.replace("Ave ","\tAve\t\t")
str = str.replace("St ","\tSt\t\t")
str = str.replace("St, ","\tSt\t\t")
str = str.replace("Dr, ","\tDr\t\t")
str = str.replace("Lane, ","\tLane\t\t")
str = str.replace("Pky, ","\tPky\t\t")
str = str.replace(" Sq, ","\tSq\t\t")
str = str.replace(" Pl, ","\tPl\t\t")
str = str.replace("\tE, ","E\t")
str = str.replace("\tN, ","N\t")
str = str.replace("\tS, ","S\t")
str = str.replace("\tW, ","W\t")
str = str.replace(",\t","\t\t")
str = str.replace(", ON ","\tON\t")
outHandler.write(str)
inHandler.close()
outHandler.close()
Here is some sample addresses, there are over 100:
FarmID Address
1 1067 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0
2 4260 Mountainview Rd, Lincoln, ON L0R 1B2
3 25 Hunter Rd, Grimsby, ON L3M 4A3
4 1091 Hutchinson Rd, Haldimand, ON N0A 1K0
5 5172 Green Lane Rd, Lincoln, ON L0R 1B3
6 500 Glenridge Ave, St. Catharines, ON L2S 3A1
7 471 Foss Rd, Pelham, ON L0S 1C0
8 758 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0
9 3836 Main St, Lincoln, ON L0R 1S0
I have everything worked out, except that the final output places the farmID at the end of postal code as seen in the example below (notice the brackets showing where the farmID is placed):
FarmID Address StreetNum StreetName SufType Dir City Province PostalCode
1 1067 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0(1) 1067 Niagara Stone Rd Niagara-On-The-Lake ON L0S 1J0
Any ideas on how to fix this? Keep in mind as well that the farmID will have 2 characters at a certain point.
[toc] | [next] | [standalone]
| From | Dave Angel <davea@davea.name> |
|---|---|
| Date | 2014-01-27 00:24 -0500 |
| Message-ID | <mailman.6019.1390800112.18130.python-list@python.org> |
| In reply to | #64824 |
matt.s.marotta@gmail.com Wrote in message:
> School assignment is to create a tab separated output with the original given addresses in one column and then the addresses split into other columns (ex, columns for city, postal code, street suffix).
>
> Here is my code:
>
> inHandler = open(inFile, 'r')
> outHandler = open(outFile, 'w')
> outHandler.write("FarmID\tAddress\tStreetNum\tStreetName\tSufType\tDir\tCity\tProvince\tPostalCode")
> for line in inHandler:
> str = line.replace("FarmID\tAddress", " ")
> outHandler.write(str[0:-1])
>
> str = str.replace(" ","\t", 1)
> str = str.replace(" Rd, ","\tRd\t\t")
> str = str.replace("Rd ","\tRd\t\t")
> str = str.replace("Ave, ","\tAve\t\t")
> str = str.replace("Ave ","\tAve\t\t")
> str = str.replace("St ","\tSt\t\t")
> str = str.replace("St, ","\tSt\t\t")
> str = str.replace("Dr, ","\tDr\t\t")
> str = str.replace("Lane, ","\tLane\t\t")
> str = str.replace("Pky, ","\tPky\t\t")
> str = str.replace(" Sq, ","\tSq\t\t")
> str = str.replace(" Pl, ","\tPl\t\t")
>
> str = str.replace("\tE, ","E\t")
> str = str.replace("\tN, ","N\t")
> str = str.replace("\tS, ","S\t")
> str = str.replace("\tW, ","W\t")
> str = str.replace(",\t","\t\t")
> str = str.replace(", ON ","\tON\t")
>
> outHandler.write(str)
>
> inHandler.close()
> outHandler.close()
>
>
> Here is some sample addresses, there are over 100:
>
> FarmID Address
> 1 1067 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0
> 2 4260 Mountainview Rd, Lincoln, ON L0R 1B2
> 3 25 Hunter Rd, Grimsby, ON L3M 4A3
> 4 1091 Hutchinson Rd, Haldimand, ON N0A 1K0
> 5 5172 Green Lane Rd, Lincoln, ON L0R 1B3
> 6 500 Glenridge Ave, St. Catharines, ON L2S 3A1
> 7 471 Foss Rd, Pelham, ON L0S 1C0
> 8 758 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0
> 9 3836 Main St, Lincoln, ON L0R 1S0
>
>
>
> I have everything worked out, except that the final output places the farmID at the end of postal code as seen in the example below (notice the brackets showing where the farmID is placed):
>
> FarmID Address StreetNum StreetName SufType Dir City Province PostalCode
> 1 1067 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0(1) 1067 Niagara Stone Rd Niagara-On-The-Lake ON L0S 1J0
>
> Any ideas on how to fix this? Keep in mind as well that the farmID will have 2 characters at a certain point.
>
Your specific concern is triggered by having two writes in the loop.
Get rid of the first and you're marginally closer.
But really, you've got much bigger troubles. All those
unrestricted replace calls are not at all robust. But maybe
you'll get away with it for a school assignment if the test data
is very limited.
Better would be to treat it like a parsing problem, figuring what
delimiter rule applies to each field, and building a list Then
use str.join to build the line for the outHandler.
--
DaveA
[toc] | [prev] | [next] | [standalone]
| From | matt.s.marotta@gmail.com |
|---|---|
| Date | 2014-01-27 05:32 -0800 |
| Message-ID | <04d509f9-54fa-4ba5-bb6e-24ce60e12523@googlegroups.com> |
| In reply to | #64826 |
On Monday, 27 January 2014 00:24:11 UTC-5, Dave Angel wrote:
> matt.s.marotta@gmail.com Wrote in message:
>
> > School assignment is to create a tab separated output with the original given addresses in one column and then the addresses split into other columns (ex, columns for city, postal code, street suffix).
>
> >
>
> > Here is my code:
>
> >
>
> > inHandler = open(inFile, 'r')
>
> > outHandler = open(outFile, 'w')
>
> > outHandler.write("FarmID\tAddress\tStreetNum\tStreetName\tSufType\tDir\tCity\tProvince\tPostalCode")
>
> > for line in inHandler:
>
> > str = line.replace("FarmID\tAddress", " ")
>
> > outHandler.write(str[0:-1])
>
> >
>
> > str = str.replace(" ","\t", 1)
>
> > str = str.replace(" Rd, ","\tRd\t\t")
>
> > str = str.replace("Rd ","\tRd\t\t")
>
> > str = str.replace("Ave, ","\tAve\t\t")
>
> > str = str.replace("Ave ","\tAve\t\t")
>
> > str = str.replace("St ","\tSt\t\t")
>
> > str = str.replace("St, ","\tSt\t\t")
>
> > str = str.replace("Dr, ","\tDr\t\t")
>
> > str = str.replace("Lane, ","\tLane\t\t")
>
> > str = str.replace("Pky, ","\tPky\t\t")
>
> > str = str.replace(" Sq, ","\tSq\t\t")
>
> > str = str.replace(" Pl, ","\tPl\t\t")
>
> >
>
> > str = str.replace("\tE, ","E\t")
>
> > str = str.replace("\tN, ","N\t")
>
> > str = str.replace("\tS, ","S\t")
>
> > str = str.replace("\tW, ","W\t")
>
> > str = str.replace(",\t","\t\t")
>
> > str = str.replace(", ON ","\tON\t")
>
> >
>
> > outHandler.write(str)
>
> >
>
> > inHandler.close()
>
> > outHandler.close()
>
> >
>
> >
>
> > Here is some sample addresses, there are over 100:
>
> >
>
> > FarmID Address
>
> > 1 1067 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0
>
> > 2 4260 Mountainview Rd, Lincoln, ON L0R 1B2
>
> > 3 25 Hunter Rd, Grimsby, ON L3M 4A3
>
> > 4 1091 Hutchinson Rd, Haldimand, ON N0A 1K0
>
> > 5 5172 Green Lane Rd, Lincoln, ON L0R 1B3
>
> > 6 500 Glenridge Ave, St. Catharines, ON L2S 3A1
>
> > 7 471 Foss Rd, Pelham, ON L0S 1C0
>
> > 8 758 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0
>
> > 9 3836 Main St, Lincoln, ON L0R 1S0
>
> >
>
> >
>
> >
>
> > I have everything worked out, except that the final output places the farmID at the end of postal code as seen in the example below (notice the brackets showing where the farmID is placed):
>
> >
>
> > FarmID Address StreetNum StreetName SufType Dir City Province PostalCode
>
> > 1 1067 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0(1) 1067 Niagara Stone Rd Niagara-On-The-Lake ON L0S 1J0
>
> >
>
> > Any ideas on how to fix this? Keep in mind as well that the farmID will have 2 characters at a certain point.
>
> >
>
>
>
> Your specific concern is triggered by having two writes in the loop.
>
>
>
> Get rid of the first and you're marginally closer.
>
>
>
> But really, you've got much bigger troubles. All those
>
> unrestricted replace calls are not at all robust. But maybe
>
> you'll get away with it for a school assignment if the test data
>
> is very limited.
>
>
>
> Better would be to treat it like a parsing problem, figuring what
>
> delimiter rule applies to each field, and building a list Then
>
> use str.join to build the line for the outHandler.
>
>
>
>
>
> --
>
> DaveA
The code that I used is the proper way that we were supposed to complete the assignment. All I need now is an 'if...then' statement to get rid of the unwanted FarmID at the end of the addresses. I just don't know what will come after the 'if' part.
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2014-01-27 13:54 +0000 |
| Message-ID | <52e6650c$0$29999$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #64857 |
On Mon, 27 Jan 2014 05:32:08 -0800, matt.s.marotta wrote:
> The code that I used is the proper way that we were supposed to complete
> the assignment. All I need now is an 'if...then' statement to get rid
> of the unwanted FarmID at the end of the addresses. I just don't know
> what will come after the 'if' part.
Show us what you do know. If you don't know the "if", what about the
"then"?
if .... :
do what?
What do you intend to do inside the if? Under what circumstances would
you do it?
If you can answer those questions in English, then we can help you write
code to do it.
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | matt.s.marotta@gmail.com |
|---|---|
| Date | 2014-01-27 06:23 -0800 |
| Message-ID | <9b121fe6-9ed8-4adf-af24-80166f348c7d@googlegroups.com> |
| In reply to | #64861 |
On Monday, 27 January 2014 08:54:20 UTC-5, Steven D'Aprano wrote: > On Mon, 27 Jan 2014 05:32:08 -0800, matt.s.marotta wrote: > > > > > The code that I used is the proper way that we were supposed to complete > > > the assignment. All I need now is an 'if...then' statement to get rid > > > of the unwanted FarmID at the end of the addresses. I just don't know > > > what will come after the 'if' part. > > > > Show us what you do know. If you don't know the "if", what about the > > "then"? > > > > > > if .... : > > do what? > > > > > > What do you intend to do inside the if? Under what circumstances would > > you do it? > > > > If you can answer those questions in English, then we can help you write > > code to do it. > > > > > > -- > > Steven If the farmID < 10: remove one character from the address column Elif farmID > 10: remove two characters from the address column
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2014-01-27 14:34 +0000 |
| Message-ID | <mailman.6042.1390833253.18130.python-list@python.org> |
| In reply to | #64864 |
On 27/01/2014 14:23, matt.s.marotta@gmail.com wrote: > On Monday, 27 January 2014 08:54:20 UTC-5, Steven D'Aprano wrote: >> On Mon, 27 Jan 2014 05:32:08 -0800, matt.s.marotta wrote: >> >> >> >>> The code that I used is the proper way that we were supposed to complete >> >>> the assignment. All I need now is an 'if...then' statement to get rid >> >>> of the unwanted FarmID at the end of the addresses. I just don't know >> >>> what will come after the 'if' part. >> >> >> >> Show us what you do know. If you don't know the "if", what about the >> >> "then"? >> >> >> >> >> >> if .... : >> >> do what? >> >> >> >> >> >> What do you intend to do inside the if? Under what circumstances would >> >> you do it? >> >> >> >> If you can answer those questions in English, then we can help you write >> >> code to do it. >> >> >> >> >> >> -- >> >> Steven > > If the farmID < 10: > remove one character from the address column > Elif farmID > 10: > remove two characters from the address column > Would you please read and action this https://wiki.python.org/moin/GoogleGroupsPython to prevent us seeing the double line spacing above, thanks. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2014-01-28 01:57 +1100 |
| Message-ID | <mailman.6044.1390834662.18130.python-list@python.org> |
| In reply to | #64864 |
On Tue, Jan 28, 2014 at 1:23 AM, <matt.s.marotta@gmail.com> wrote: > If the farmID < 10: > remove one character from the address column > Elif farmID > 10: > remove two characters from the address column What if farmID == 10? ChrisA
[toc] | [prev] | [next] | [standalone]
| From | matt.s.marotta@gmail.com |
|---|---|
| Date | 2014-01-27 07:03 -0800 |
| Message-ID | <948612d7-8e69-46d0-9fab-707b762a48a7@googlegroups.com> |
| In reply to | #64867 |
On Monday, 27 January 2014 09:57:32 UTC-5, Chris Angelico wrote: > On Tue, Jan 28, 2014 at 1:23 AM, <matt.s.marotta@gmail.com> wrote: > > > If the farmID < 10: > > remove one character from the address column > > Elif farmID > 10: > > remove two characters from the address column > > What if farmID == 10? > > ChrisA Ok, sorry this is how it should be. If the FarmID < 10: remove one character from the address column If the FarmID > 9: remove two characters from the address column My issue is I can't figure out what statement to use to define FarmID.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2014-01-28 02:19 +1100 |
| Message-ID | <mailman.6045.1390836009.18130.python-list@python.org> |
| In reply to | #64868 |
On Tue, Jan 28, 2014 at 2:03 AM, <matt.s.marotta@gmail.com> wrote:
> On Monday, 27 January 2014 09:57:32 UTC-5, Chris Angelico wrote:
>> On Tue, Jan 28, 2014 at 1:23 AM, <matt.s.marotta@gmail.com> wrote:
>>
>> > If the farmID < 10:
>> > remove one character from the address column
>> > Elif farmID > 10:
>> > remove two characters from the address column
>>
>> What if farmID == 10?
>>
>> ChrisA
>
> Ok, sorry this is how it should be.
>
> If the FarmID < 10:
> remove one character from the address column
>
> If the FarmID > 9:
> remove two characters from the address column
>
> My issue is I can't figure out what statement to use to define FarmID.
More commonly, that would be written as
if farmID < 10:
# remove one character
else:
# remove two characters
Though this still suffers from the limitation of not handling 100 or
1000, so you might want to look at len(str(farmID)) instead.
ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Denis McMahon <denismfmcmahon@gmail.com> |
|---|---|
| Date | 2014-01-27 15:22 +0000 |
| Message-ID | <lc5tkh$f90$2@dont-email.me> |
| In reply to | #64824 |
On Sun, 26 Jan 2014 19:49:01 -0800, matt.s.marotta wrote: > School assignment is to create a tab separated output with the original > given addresses in one column and then the addresses split into other > columns (ex, columns for city, postal code, street suffix). If you're trying to create fixed width output from variable width fields, format specifiers may be better to use than tabs. The problem with tabs is that columns end up misaligned when the data fields in a column contain a mixture of items of less length than the tab spacing and items of greater length than the tab spacing, unless you can work out the tab spacing and adjust accordingly. For example, my code which uses the re module to separate the various record components and a format specifier to print the text and html versions (and csvwriter for the csv) them creates the outputs seen here: http://www.sined.co.uk/tmp/farms.txt http://www.sined.co.uk/tmp/farms.csv http://www.sined.co.uk/tmp/farms.htm -- Denis McMahon, denismfmcmahon@gmail.com
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web