Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <527BD83A.7020604@mrabarnett.plus.com>
References: <CAP16ngqrVAnJPxBXi8B-cAL_Q+yr47pQs1WxhvCHLM8oeVKpsg@mail.gmail.com> <CAP16ngpgdF=uYr5j8OLtBZqEmsUw9f6XPg-5Rd8HpzhBjoSpyw@mail.gmail.com> <527BD83A.7020604@mrabarnett.plus.com>
Date: Thu, 7 Nov 2013 13:45:36 -0500
Subject: Re: splitting file/content into lines based on regex termination
From: bruce <badouglas@gmail.com>
To: python-list@python.org
Content-Type: text/plain; charset=ISO-8859-1
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.2149.1383849945.18130.python-list@python.org>
Lines: 155
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:58688

hi.

thanks for the reply.

tried what you suggested. what I see now, is that I print out the
lines, but not the regex data at all. my initial try, gave me the
line, and then the next items , followed by the next line, etc...

what I then tried, was to do a capture/findall of the regex, and
combine the outputs in separate loops, which will be ugly but will
work....

  ff= "byu2.dat"
  #fff= "sdsu2.dat"
  with open(ff,"r") as myfile:
    s=myfile.read()


  s=s.replace("&nbsp", "")

  #with open(fff,"w") as myfile2:
  #  myfile2.write(s)
#<br>#45 / 58#0#
#<br>#45 / 58#0#
  #dat1=re.compile("<br>#(\d+) / (\d+)#(\d+)#").search(s).findall()
  dat1=re.findall("<br>#(\d+) / (\d+)#(\d+)#",s)
  dat=re.compile("<br>#(\d+) / (\d+)#(\d+)#").split(s)
  dat2 = re.compile(r"<br>#\d+ / \d+#\d+#").split(s)
  #dat=re.split('("<br>#(\d+) / (\d+)#(\d+)#")',s)
  #dat=re.compile("<br>#(\d+)").split(s)


  for m in dat:
    if m:
      print "m = "+m

      #sys.exit()

  print "dat1"
  print dat1
  print len(dat1)
  print "dat2a"
  #sys.exit()

#  for m in dat1:
#    if m:
#      print "m = "+m
#
#      #sys.exit()

  for m in dat2:
    if m:
      print "m = "+m

      #sys.exit()

  sys.exit()

  return


the test data is pasted to -->>> http://bpaste.net/show/kYzBUIfhc5023phOVmcu/

thanks
!!


On Thu, Nov 7, 2013 at 1:13 PM, MRAB <python@mrabarnett.plus.com> wrote:
> On 07/11/2013 17:45, bruce wrote:
>>
>> update...
>>
>>    dat=re.compile("<br>#(\d+) / (\d+)#(\d+)#").split(s)
>>
>> almost works..
>>
>> except i get
>> m = 10116#000#C S#S#100##001##DAY#Fund of Computing#Barrett,
>> William#3#MWF<br>#08:00am<br>#08:50am<br>#3718 HBLL
>> m = 45
>> m = 58
>> m = 0
>> m = 10116#000#C S#S#100##002##DAY#Fund of Computing#Barrett,
>> William#3#MWF<br>#09:00am<br>#09:50am<br>#3718 HBLL
>> m = 9
>> m = 58
>> m = 0
>>
>> and what i want is:
>> m = 10116#000#C S#S#100##001##DAY#Fund of Computing#Barrett,
>> William#3#MWF<br>#08:00am<br>#08:50am<br>#3718 HBLL 45 / 58,0
>> m = 10116#000#C S#S#100##002##DAY#Fund of Computing#Barrett,
>> William#3#MWF<br>#09:00am<br>#09:50am<br>#3718 HBLL 9 / 58,0
>>
>>
>> so i'd have the results of the "compile/regex process" to be added to
>> the split lines
>>
>> thoughts/comments??
>>
>> thanks
>>
> The split method also returns what's matched in any capture groups,
> i.e. "(\d+)". Try omitting the parentheses:
>
>     dat = re.compile(r"<br>#\d+ / \d+#\d+#").split(s)
>
> You should also be using raw string literals as above (r"..."). It
> doesn't matter in this instance, but it might in others.
>
>>
>>
>> On Thu, Nov 7, 2013 at 12:15 PM, bruce <badouglas@gmail.com> wrote:
>>>
>>> hi.
>>>
>>> got a test file with the sample content listed below:
>>>
>>> the content is one long string, and needs to be split into separate lines
>>>
>>> I'm thinking the pattern to split on should be a kind of regex like::
>>> <br>#45 / 58#0#
>>> or
>>> <br>#9 / 58#0
>>> but i have no idea how to make this happen!!
>>>
>>> if i read the content into a buf -> s
>>>
>>> import re
>>> dat = re.compile("what goes here??").split(s)
>>>
>>> --i'm not sure what goes in the compile() to get the process to work..
>>>
>>> thoughts/comments would be helpful.
>>>
>>> thanks
>>>
>>>
>>> test dat::
>>> 10116#000#C S#S#100##001##DAY#Fund of Computing#Barrett,
>>> William#3#MWF<br>#08:00am<br>#08:50am<br>#3718 HBLL <br>#45 /
>>> 58#0#10116#000#C S#S#100##002##DAY#Fund of Computing#Barrett,
>>> William#3#MWF<br>#09:00am<br>#09:50am<br>#3718 HBLL <br>#9 /
>>> 58#0#10178#000#C S#S#124##001##DAY#Computer Systems#Roper,
>>> Paul#3#MWF<br>#11:00am<br>#11:50am<br>#1170 TMCB <br>#41 /
>>> 145#0#10178#000#C S#S#124##002##DAY#Computer Systems#Roper,
>>> Paul#3#MWF<br>#2:00pm<br>#2:50pm<br>#1170 TMCB <br>#40 /
>>> 120#0#01489#002#C S#S#142##001##DAY#Intro to Computer
>>> Programming#Burton, Robert <div class='instructors'>Seppi, Kevin<br
>>> /></div><span
>>
>>
>
> --
> https://mail.python.org/mailman/listinfo/python-list