Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #64518

Re: Separate Address number and name

From Denis McMahon <denismfmcmahon@gmail.com>
Newsgroups comp.lang.python
Subject Re: Separate Address number and name
Date 2014-01-22 17:35 +0000
Organization A noiseless patient Spider
Message-ID <lbovgq$668$4@dont-email.me> (permalink)
References <9fe1b47b-65ce-4063-9188-07b81cdba49f@googlegroups.com> <a218f6ef-37c1-4eaf-a127-5ea4846cb332@googlegroups.com>

Show all headers | View raw


On Tue, 21 Jan 2014 16:06:56 -0800, Shane Konings wrote:


> The following is a sample of the data. There are hundreds of lines that
> need to have an automated process of splitting the strings into headings
> to be imported into excel with theses headings
> 
> ID  Address  StreetNum  StreetName  SufType  Dir   City  Province 
> PostalCode

Ok, the following general method seems to work:

First, use a regex to capture two numeric groups and the rest of the line 
separated by whitespace. If you can't find all three fields, you have 
unexpected data format.

re.search( r"(\d+)\s+(\d+)\s+(.*)", data )

Second, split the rest of the line on a regex of comma + 0 or more 
whitespace.

re.split( r",\s+", data )

Check that the rest of the line has 3 or 4 bits, otherwise you have an 
unexpected lack or excess of data fields.

Split the first bit of the rest of the line into street name and suffix/
type. If you can't split it, use it as the street name and set the suffix/
type to blank.

re.search( r"(.*)\s+(\w+)", data )

If there are 3 bits in rest of line, set direction to blank, otherwise 
set direction to the second bit.

Set the city to the last but one bit of the rest of the line.

Capture one word followed by two words in the last bit of the rest of the 
line, and use these as the province and postcode.

re.search( r"(\w+)\s+(\w+\s+\w+)", data )

Providing none of the searches or the split errored, you should now have 
the data fields you need to write. The easiest way to write them might be 
to assemble them as a list and use the csv module.

I'm assuming you're capable of working out from the help on the python re 
module what to use for each data, and how to access the captured results 
of a search, and the results of a split. I'm also assuming you're capable 
of working out how to use the csv module from the documentation. If 
you're not, then either go back and ask your lecturer for help, or tell 
your boss to hire a real programmer for his quick and easy coding jobs.

-- 
Denis McMahon, denismfmcmahon@gmail.com

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Separate Address number and name Shane Konings <shane.konings@gmail.com> - 2014-01-21 15:49 -0800
  Re: Separate Address number and name Anders Wegge Keller <wegge@wegge.dk> - 2014-01-22 00:55 +0100
    Re: Separate Address number and name Shane Konings <shane.konings@gmail.com> - 2014-01-21 16:01 -0800
  Re: Separate Address number and name Shane Konings <shane.konings@gmail.com> - 2014-01-21 16:06 -0800
    Re: Separate Address number and name Anders Wegge Keller <wegge@wegge.dk> - 2014-01-22 02:04 +0100
    Re: Separate Address number and name Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-01-22 10:08 +0000
    Re: Separate Address number and name Denis McMahon <denismfmcmahon@gmail.com> - 2014-01-22 17:35 +0000
      Re: Separate Address number and name Denis McMahon <denismfmcmahon@gmail.com> - 2014-01-23 18:11 +0000
  Re: Separate Address number and name Asaf Las <roegltd@gmail.com> - 2014-01-21 16:08 -0800
  Re: Separate Address number and name Ben Finney <ben+python@benfinney.id.au> - 2014-01-22 11:08 +1100
  Re: Separate Address number and name John Gordon <gordon@panix.com> - 2014-01-22 02:46 +0000
    Re: Separate Address number and name Tim Chase <python.list@tim.thechases.com> - 2014-01-21 21:03 -0600
  Re: Separate Address number and name Denis McMahon <denismfmcmahon@gmail.com> - 2014-01-22 15:40 +0000

csiph-web