Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #64471

Re: Separate Address number and name

Newsgroups comp.lang.python
Subject Re: Separate Address number and name
References <9fe1b47b-65ce-4063-9188-07b81cdba49f@googlegroups.com> <a218f6ef-37c1-4eaf-a127-5ea4846cb332@googlegroups.com>
From Anders Wegge Keller <wegge@wegge.dk>
Date 2014-01-22 02:04 +0100
Message-ID <87mwio3jym.fsf@huddi.jernurt.dk> (permalink)
Organization SunSITE.dk - Supporting Open source

Show all headers | View raw


Shane Konings <shane.konings@gmail.com> writes:

...

> The following is a sample of the data. There are hundreds of lines
> that need to have an automated process of splitting the strings into
> headings to be imported into excel with theses headings

> ID  Address  StreetNum  StreetName  SufType  Dir   City  Province  PostalCode
> 
> 
> 1	1067 Niagara Stone Rd, W, Niagara-On-The-Lake, ON L0S 1J0
> 2	4260 Mountainview Rd, Lincoln, ON L0R 1B2
> 3	25 Hunter Rd, Grimsby, E, ON L3M 4A3
> 4	1091 Hutchinson Rd, Haldimand, ON N0A 1K0
> 5	5172 Green Lane Rd, Lincoln, ON L0R 1B3
> 6	500 Glenridge Ave, East, St. Catharines, ON L2S 3A1
> 7	471 Foss Rd, Pelham, ON L0S 1C0
> 8	758 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0
> 9	3836 Main St, North, Lincoln, ON L0R 1S0
> 10	1025 York Rd, W, Niagara-On-The-Lake, ON L0S 1P0

 The input doesn't look consistent to me. Is Dir supposed to be an
optional value? If that is the only optional, it can be worked
around. But if the missing direction (I'm guessing) is due to
malformed input data, you have a hell of a job in front of you.

 What do you want to do with incomplete or malformed data? Try to
parse it as a "best effort", or simply spew out an error message for
an operator to look at?

 In the latter case, I suggest a stepwise approach:

* Split input by ',' ->res0

* Split the first result by ' ' -> res

-> Id = res[0]
-> Address = res[1:]
-> StreetNum = res[1]
-> StreetName= res [2:]
-> SufType = res[-1]

* Check if res0[1] looks like a cardinal direction
 If so Dir = res0[1]
 Otherwise, croak or use the default direction. Insert an element in
 the list, so the remainder is shifted to match the following steps.

-> City = res0[2]

* Split res0[3] by ' ' -> respp

respp[0] -> Province
respp[1:] -> Postcode


 And put in som basic sanitation of the resulting values, before
committing them as a parsed result. Provinces and post codes, should
be easy enough to validate against a fixed list. 

-- 
/Wegge

Leder efter redundant peering af dk.*,linux.debian.*

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Separate Address number and name Shane Konings <shane.konings@gmail.com> - 2014-01-21 15:49 -0800
  Re: Separate Address number and name Anders Wegge Keller <wegge@wegge.dk> - 2014-01-22 00:55 +0100
    Re: Separate Address number and name Shane Konings <shane.konings@gmail.com> - 2014-01-21 16:01 -0800
  Re: Separate Address number and name Shane Konings <shane.konings@gmail.com> - 2014-01-21 16:06 -0800
    Re: Separate Address number and name Anders Wegge Keller <wegge@wegge.dk> - 2014-01-22 02:04 +0100
    Re: Separate Address number and name Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-01-22 10:08 +0000
    Re: Separate Address number and name Denis McMahon <denismfmcmahon@gmail.com> - 2014-01-22 17:35 +0000
      Re: Separate Address number and name Denis McMahon <denismfmcmahon@gmail.com> - 2014-01-23 18:11 +0000
  Re: Separate Address number and name Asaf Las <roegltd@gmail.com> - 2014-01-21 16:08 -0800
  Re: Separate Address number and name Ben Finney <ben+python@benfinney.id.au> - 2014-01-22 11:08 +1100
  Re: Separate Address number and name John Gordon <gordon@panix.com> - 2014-01-22 02:46 +0000
    Re: Separate Address number and name Tim Chase <python.list@tim.thechases.com> - 2014-01-21 21:03 -0600
  Re: Separate Address number and name Denis McMahon <denismfmcmahon@gmail.com> - 2014-01-22 15:40 +0000

csiph-web