Groups > comp.lang.python > #33936 > unrolled thread

Regular expression for different date formats in Python

Started by	undesputed.hackerz@gmail.com
First post	2012-11-26 05:15 -0800
Last post	2012-11-26 13:57 -0800
Articles	5 — 4 participants

Back to article view | Back to comp.lang.python

  Regular expression for different date formats in Python undesputed.hackerz@gmail.com - 2012-11-26 05:15 -0800
    Re: Regular expression for different date formats in Python Vlastimil Brom <vlastimil.brom@gmail.com> - 2012-11-26 17:30 +0100
    Re: Regular expression for different date formats in Python Michael Torrie <torriem@gmail.com> - 2012-11-26 09:05 -0700
      Re: Regular expression for different date formats in Python Miki Tebeka <miki.tebeka@gmail.com> - 2012-11-26 13:57 -0800
      Re: Regular expression for different date formats in Python Miki Tebeka <miki.tebeka@gmail.com> - 2012-11-26 13:57 -0800

#33936 — Regular expression for different date formats in Python

From	undesputed.hackerz@gmail.com
Date	2012-11-26 05:15 -0800
Subject	Regular expression for different date formats in Python
Message-ID	<804a4170-a317-482c-8eca-e76027a0ee85@googlegroups.com>

Hello Developers,

I am a beginner in python and need help with writing a regular expression for date and time to be fetched from some html documents. In the following code I am walking through the html files in a folder called event and printing the headings with h1 tag using beautifulsoup. These html pages also contains different formats of date and time. I want to fetch and display this information as well. Different formats of date in these html documents are:

21 - 27 Nov 2012
1 Dec 2012
30 Nov - 2 Dec 2012
26 Nov 2012

Can someone help me out with fetching these formats from these html documents ?
Here is my code for walking through the files and fetching h1 from those html files:


Code:

 
    import re
    import os
    from bs4 import BeautifulSoup
    
    for subdir, dirs, files in os.walk("/home/himanshu/event/"):
        for fle in files:
            path = os.path.join(subdir, fle)    
            soup = BeautifulSoup(open(path))
            
            print (soup.h1.string)
           
            #Date and Time detection

[toc] | [next] | [standalone]

#33939

From	Vlastimil Brom <vlastimil.brom@gmail.com>
Date	2012-11-26 17:30 +0100
Message-ID	<mailman.291.1353947420.29569.python-list@python.org>
In reply to	#33936

2012/11/26  <undesputed.hackerz@gmail.com>:
> Hello Developers,
>
> I am a beginner in python and need help with writing a regular expression for date and time to be fetched from some html documents. In the following code I am walking through the html files in a folder called event and printing the headings with h1 tag using beautifulsoup. These html pages also contains different formats of date and time. I want to fetch and display this information as well. Different formats of date in these html documents are:
>
> 21 - 27 Nov 2012
> 1 Dec 2012
> 30 Nov - 2 Dec 2012
> 26 Nov 2012
>
> Can someone help me out with fetching these formats from these html documents ?
> Here is my code for walking through the files and fetching h1 from those html files:
>
>
> Code:
>
>
>     import re
>     import os
>     from bs4 import BeautifulSoup
>
>     for subdir, dirs, files in os.walk("/home/himanshu/event/"):
>         for fle in files:
>             path = os.path.join(subdir, fle)
>             soup = BeautifulSoup(open(path))
>
>             print (soup.h1.string)
>
>             #Date and Time detection
>
> --
> http://mail.python.org/mailman/listinfo/python-list

Hi,
the following pattern seems to match all of your examples,

(\d{1,2} )?(Nov|Dec)?( ?- )?(\d{1,2}) (Nov|Dec) (\d{4})

however, it doesn't look like very robust - of course, you have to add
the remaining months' abbreviations and check on the (parts of the)
HTML documents, you are interested in.

hth,
   vbr

[toc] | [prev] | [next] | [standalone]

#33940

From	Michael Torrie <torriem@gmail.com>
Date	2012-11-26 09:05 -0700
Message-ID	<mailman.292.1353947661.29569.python-list@python.org>
In reply to	#33936

On 11/26/2012 06:15 AM, undesputed.hackerz@gmail.com wrote:
> I am a beginner in python and need help with writing a regular
> expression for date and time to be fetched from some html documents.

Would the "parser" module from the third-party dateutil module work for you?

http://pypi.python.org/pypi/python-dateutil
http://labix.org/python-dateutil#head-c0e81a473b647dfa787dc11e8c69557ec2c3ecd2

I don't believe the library is updated for Python 3 yet, sadly.  But I
bet it could be ported fairly easily.  I think it's pure python.

[toc] | [prev] | [next] | [standalone]

#33946

From	Miki Tebeka <miki.tebeka@gmail.com>
Date	2012-11-26 13:57 -0800
Message-ID	<258e1a51-a2dc-40ff-946f-6cf37f3c36eb@googlegroups.com>
In reply to	#33940

On Monday, November 26, 2012 8:34:22 AM UTC-8, Michael Torrie wrote:
> http://pypi.python.org/pypi/python-dateutil
> ...
> I don't believe the library is updated for Python 3 yet, sadly.
dateutil supports 3.x since version 2.0.

[toc] | [prev] | [next] | [standalone]

#33947

From	Miki Tebeka <miki.tebeka@gmail.com>
Date	2012-11-26 13:57 -0800
Message-ID	<mailman.299.1353967034.29569.python-list@python.org>
In reply to	#33940

On Monday, November 26, 2012 8:34:22 AM UTC-8, Michael Torrie wrote:
> http://pypi.python.org/pypi/python-dateutil
> ...
> I don't believe the library is updated for Python 3 yet, sadly.
dateutil supports 3.x since version 2.0.

[toc] | [prev] | [standalone]

csiph-web

Regular expression for different date formats in Python

Contents

#33936 — Regular expression for different date formats in Python

#33939

#33940

#33946

#33947