RE: Review Request of Python Code

From	Joaquin Alzola <Joaquin.Alzola@lebara.com>
Newsgroups	comp.lang.python
Subject	RE: Review Request of Python Code
Date	2016-03-10 19:12 +0000
Message-ID	<mailman.148.1457638025.15725.python-list@python.org> (permalink)
References	<f0973a0d-62ba-402b-ab23-cb68bdd15323@googlegroups.com> <af65a7a6-3179-4bca-9022-ae0d2ec61a11@googlegroups.com>

Show all headers | View raw

SQL doesn't allow decimal numbers for LIMIT.
Use decimal numbers it still work but is the proper way.

Then clean up a bit your code and remove the commented lines #

-----Original Message-----
From: Python-list [mailto:python-list-bounces+joaquin.alzola=lebara.com@python.org] On Behalf Of subhabangalore@gmail.com
Sent: 10 March 2016 18:12
To: python-list@python.org
Subject: Re: Review Request of Python Code

On Wednesday, March 9, 2016 at 9:49:17 AM UTC+5:30, subhaba...@gmail.com wrote:
> Dear Group,
>
> I am trying to write a code for pulling data from MySQL at the backend and annotating words and trying to put the results as separated sentences with each line. The code is generally running fine but I am feeling it may be better in the end of giving out sentences, and for small data sets it is okay but with 50,000 news articles it is performing dead slow. I am using Python2.7.11 on Windows 7 with 8GB RAM.
>
> I am trying to copy the code here, for your kind review.
>
> import MySQLdb
> import nltk
> def sql_connect_NewTest1():
>     db = MySQLdb.connect(host="localhost",
>                      user="*****",
>                      passwd="*****",
>                      db="abcd_efgh")
>     cur = db.cursor()
>     #cur.execute("SELECT * FROM newsinput limit 0,50000;") #REPORTING RUNTIME ERROR
>     cur.execute("SELECT * FROM newsinput limit 0,50;")
>     dict_open=open("/python27/NewTotalTag.txt","r") #OPENING THE DICTIONARY FILE
>     dict_read=dict_open.read()
>     dict_word=dict_read.split()
>     a4=dict_word #Assignment for code.
>     list1=[]
>     flist1=[]
>     nlist=[]
>     for row in cur.fetchall():
>         #print row[2]
>         var1=row[3]
>         #print var1 #Printing lines
>         #var2=len(var1) # Length of file
>         var3=var1.split(".") #SPLITTING INTO LINES
>         #print var3 #Printing The Lines
>         #list1.append(var1)
>         var4=len(var3) #Number of all lines
>         #print "No",var4
>         for line in var3:
>             #print line
>             #flist1.append(line)
>             linew=line.split()
>             for word in linew:
>                 if word in a4:
>                     windex=a4.index(word)
>                     windex1=windex+1
>                     word1=a4[windex1]
>                     word2=word+"/"+word1
>                     nlist.append(word2)
>                     #print list1
>                     #print nlist
>                 elif word not in a4:
>                     word3=word+"/"+"NA"
>                     nlist.append(word3)
>                     #print list1
>                     #print nlist
>                 else:
>                     print "None"
>
>     #print "###",flist1
>     #print len(flist1)
>     #db.close()
>     #print nlist
>     lol = lambda lst, sz: [lst[i:i+sz] for i in range(0, len(lst), sz)] #TRYING TO SPLIT THE RESULTS AS SENTENCES
>     nlist1=lol(nlist,7)
>     #print nlist1
>     for i in nlist1:
>         string1=" ".join(i)
>         print i
>         #print string1
>
>
> Thanks in Advance.

****************************************************************************
Dear Group,

Thank you all, for your kind time and all suggestions in helping me.

Thank you Steve for writing the whole code. It is working full and fine. But speed is still an issue. We need to speed up.

Inada I tried to change to
cur = db.cursor(MySQLdb.cursors.SSCursor) but my System Admin said that may not be an issue.

Freidrich, my problem is I have a big text repository of .txt files in MySQL in the backend. I have another list of words with their possible tags. The tags are not conventional Parts of Speech(PoS) tags,  and bit defined by others.
The code is expected to read each file and its each line.
On reading each line it will scan the list for appropriate tag, if it is found it would assign, else would assign NA.
The assignment should be in the format of /tag, so that if there is a string of n words, it should look like, w1/tag w2/tag w3/tag w4/tag ....wn/tag,

where tag may be tag in the list or NA as per the situation.

This format is taken because the files are expected to be tagged in Brown Corpus format. There is a Python Library named NLTK.
If I want to save my data for use with their models, I need some specifications. I want to use it as Tagged Corpus format.

Now the tagged data coming out in this format, should be one tagged sentences in each new line or a lattice.

They expect the data to be saved in .pos format but presently I am not doing in this code, I may do that later.

Please let me know if I need to give any more information.

Matt, thank you for if...else suggestion, the data of NewTotalTag.txt is like a simple list of words with unconventional tags, like,

w1 tag1
w2 tag2
w3 tag3
...
...
w3  tag3

like that.

Regards,
Subhabrata

--
https://mail.python.org/mailman/listinfo/python-list
This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt.

Thread

Review Request of Python Code subhabangalore@gmail.com - 2016-03-08 20:18 -0800
  Re: Review Request of Python Code Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2016-03-09 16:10 +1100
  Re: Review Request of Python Code INADA Naoki <songofacandy@gmail.com> - 2016-03-09 16:52 +0900
  Re: Review Request of Python Code Friedrich Rentsch <anthra.norell@bluewin.ch> - 2016-03-09 10:06 +0100
  Re: Review Request of Python Code Matt Wheeler <m@funkyhat.org> - 2016-03-09 12:06 +0000
  Re: Review Request of Python Code Matt Wheeler <m@funkyhat.org> - 2016-03-09 12:33 +0000
  Re: Review Request of Python Code subhabangalore@gmail.com - 2016-03-10 10:12 -0800
    Re: Review Request of Python Code BartC <bc@freeuk.com> - 2016-03-10 18:36 +0000
    Re: Review Request of Python Code Matt Wheeler <m@funkyhat.org> - 2016-03-10 18:51 +0000
      Re: Review Request of Python Code subhabangalore@gmail.com - 2016-03-10 12:14 -0800
    RE: Review Request of Python Code Joaquin Alzola <Joaquin.Alzola@lebara.com> - 2016-03-10 19:12 +0000
  Re: Review Request of Python Code Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-03-10 19:56 +0000

csiph-web