Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #104544
| Path | csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail |
|---|---|
| From | Joaquin Alzola <Joaquin.Alzola@lebara.com> |
| Newsgroups | comp.lang.python |
| Subject | RE: Review Request of Python Code |
| Date | Thu, 10 Mar 2016 19:12:37 +0000 |
| Lines | 150 |
| Message-ID | <mailman.148.1457638025.15725.python-list@python.org> (permalink) |
| References | <f0973a0d-62ba-402b-ab23-cb68bdd15323@googlegroups.com> <af65a7a6-3179-4bca-9022-ae0d2ec61a11@googlegroups.com> |
| Mime-Version | 1.0 |
| Content-Type | text/plain; charset="us-ascii" |
| Content-Transfer-Encoding | quoted-printable |
| X-Trace | news.uni-berlin.de j5zknbZsfcImJYhgflGGSAEK4tlCsCGocor+FKcwau6Q== |
| Return-Path | <Joaquin.Alzola@lebara.com> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.000 |
| X-Spam-Evidence | '*H*': 1.00; '*S*': 0.00; 'else:': 0.03; 'elif': 0.04; 'subject:Python': 0.05; 'repository': 0.05; 'assignment': 0.07; 'wednesday,': 0.07; 'cc:addr:python-list': 0.09; '#print': 0.09; 'okay': 0.09; 'runtime': 0.09; 'situation.': 0.09; 'slow.': 0.09; 'python': 0.10; 'skip:# 20': 0.13; 'def': 0.13; 'appropriate': 0.14; 'backend': 0.15; 'message-----': 0.15; '"none"': 0.16; '.txt': 0.16; '2016': 0.16; 'backend.': 0.16; 'commented': 0.16; 'lambda': 0.16; 'list1': 0.16; 'models,': 0.16; 'privilege.': 0.16; 'pulling': 0.16; 'range(0,': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'row': 0.16; 'skip:[ 60': 0.16; 'tagged': 0.16; 'tags.': 0.16; 'wrote:': 0.16; 'string': 0.17; 'steve': 0.18; 'all,': 0.20; 'library': 0.20; 'windows': 0.20; 'cc:2**0': 0.20; 'cc:addr:python.org': 0.20; 'issue.': 0.20; 'assign': 0.22; 'subject:Code': 0.22; 'trying': 0.22; 'code,': 0.23; 'code.': 0.23; 'defined': 0.23; 'bit': 0.23; 'advance.': 0.23; 'performing': 0.23; 'sets': 0.23; 'split': 0.23; 'tried': 0.24; 'import': 0.24; 'words': 0.24; 'header:In-Reply- To:1': 0.24; "doesn't": 0.26; 'skip:m 30': 0.27; 'error': 0.27; 'skip:# 10': 0.27; 'format,': 0.27; 'fine': 0.28; 'decimal': 0.29; 'dictionary': 0.29; 'separated': 0.29; 'admin': 0.29; 'print': 0.30; 'that.': 0.30; 'url:mailman': 0.30; 'code': 0.30; 'another': 0.32; 'up.': 0.32; 'generally': 0.32; 'skip:d 40': 0.32; 'problem': 0.33; 'url:python': 0.33; 'skip:- 10': 0.34; 'url:listinfo': 0.34; 'file': 0.34; 'skip:d 20': 0.34; 'running': 0.34; 'list': 0.34; 'sent:': 0.35; 'text': 0.35; 'saved': 0.35; 'subject:': 0.35; 'expected': 0.35; 'but': 0.36; 'should': 0.36; 'there': 0.36; 'url:org': 0.36; 'lines': 0.36; 'possible': 0.36; 'email addr:python.org': 0.36; 'others.': 0.36; 'subject:: ': 0.37; 'received:10': 0.37; 'expect': 0.37; 'thanks': 0.37; 'charset:us-ascii': 0.37; 'doing': 0.38; 'skip:v 20': 0.38; 'thank': 0.38; 'files': 0.38; 'end': 0.39; 'data': 0.39; 'format': 0.39; 'from:': 0.39; 'url:mail': 0.40; 'where': 0.40; 'still': 0.40; 'some': 0.40; 'group,': 0.60; 'save': 0.60; 'your': 0.60; 'skip:u 10': 0.61; 'email addr:gmail.com': 0.62; 'per': 0.62; 'skip:n 10': 0.62; 'more': 0.63; 'march': 0.64; 'limit': 0.65; 'contact': 0.66; 'python-list': 0.66; 'results': 0.66; 'articles': 0.67; 'email name:python-list': 0.67; 'helping': 0.67; 'dear': 0.67; 'news': 0.68; 'subject': 0.70; 'skip:* 70': 0.70; 'disclose': 0.71; 'feeling': 0.72; 'tags,': 0.79; 'sentences,': 0.84; 'suggestion,': 0.84; 'utc+5:30,': 0.84 |
| DKIM-Signature | v=1; a=rsa-sha256; c=relaxed/relaxed; d=lebara.onmicrosoft.com; s=selector1-lebara-com; h=From:To:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=iXKFq7Q7ppe7VITL3Nu3VtavqPi/wTwn8N7MpBi3mnw=; b=KxVxTE/zcIv8EZHA+5F6vLOv9ikWHLlY9zGmQSis3thakBc/emhECcw3ls4JahSq5Tb49J3VqAEDtsxerHghVpRBYxu5/9g8vA+5CP0ZBpJldCjeGe3S1WrA3sszQYN9jL1otPEgNs8SAQu1jvoV/LZKbEQ6WJwtRxcQTo6gk+0= |
| Thread-Topic | Review Request of Python Code |
| Thread-Index | AQHRevjyQX/WRcnrAE6v6aQJ20neAJ9TCosQ |
| In-Reply-To | <af65a7a6-3179-4bca-9022-ae0d2ec61a11@googlegroups.com> |
| Accept-Language | en-GB, en-US |
| Content-Language | en-US |
| X-MS-Has-Attach | |
| X-MS-TNEF-Correlator | |
| authentication-results | gmail.com; dkim=none (message not signed) header.d=none;gmail.com; dmarc=none action=none header.from=lebara.com; |
| x-originating-ip | [165.225.80.109] |
| x-ms-office365-filtering-correlation-id | 728d74b8-1f17-42a0-aa44-08d34917f162 |
| x-microsoft-exchange-diagnostics | 1; DB5PR07MB1495; 5:8vN1fqBKynSHvS7RRKDfWlzBL+1B+D2ZLgWLwIRm5I5AvlDrZhFaLvVWh+Juev0vn2uUJCj3qd6ZmwLURBqeMS6vIrYSz3myUBVoVr/8HqsPG06sAFmgfAtJAUxpC3fztuBoB4PZ3mxc8/xmzMB8qQ==; 24:83AVw/6nn0dPCGvNwV3N65g/QjiOKixgeKRlBGloSa2JRu+p4AKnx9JqQ6DyClRYzGw/warKMWJcEnurVUJIOO0gQjkCwcNEtI2b87Cdfvc= |
| x-microsoft-antispam | UriScan:;BCL:0;PCL:0;RULEID:;SRVR:DB5PR07MB1495; |
| x-microsoft-antispam-prvs | <DB5PR07MB1495440E24EDEA68090AA99DF0B40@DB5PR07MB1495.eurprd07.prod.outlook.com> |
| x-exchange-antispam-report-test | UriScan:; |
| x-exchange-antispam-report-cfa-test | BCL:0; PCL:0; RULEID:(601004)(2401047)(8121501046)(5005006)(10201501046)(3002001); SRVR:DB5PR07MB1495; BCL:0; PCL:0; RULEID:; SRVR:DB5PR07MB1495; |
| x-forefront-prvs | 08770259B4 |
| x-forefront-antispam-report | SFV:NSPM; SFS:(10019020)(6009001)(377454003)(61484003)(24454002)(13464003)(54356999)(345774005)(1411001)(50986999)(92566002)(76176999)(5008740100001)(10400500002)(81166005)(5003600100002)(74316001)(19580395003)(19580405001)(5004730100002)(33656002)(66066001)(2351001)(106116001)(5002640100001)(1220700001)(6116002)(1730700002)(86362001)(4326007)(3846002)(122556002)(102836003)(2501003)(586003)(3660700001)(2950100001)(1096002)(76576001)(5640700001)(87936001)(2900100001)(3280700002)(551544002)(2906002)(77096005)(11100500001)(110136002)(189998001)(15975445007); DIR:OUT; SFP:1102; SCL:1; SRVR:DB5PR07MB1495; H:DB5PR07MB1496.eurprd07.prod.outlook.com; FPR:; SPF:None; MLV:sfv; LANG:en; |
| spamdiagnosticoutput | 1:23 |
| spamdiagnosticmetadata | NSPM |
| X-OriginatorOrg | lebara.com |
| X-MS-Exchange-CrossTenant-originalarrivaltime | 10 Mar 2016 19:12:37.1597 (UTC) |
| X-MS-Exchange-CrossTenant-fromentityheader | Hosted |
| X-MS-Exchange-CrossTenant-id | d7093539-83cd-4991-b1b3-aacef74cf097 |
| X-MS-Exchange-Transport-CrossTenantHeadersStamped | DB5PR07MB1495 |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.21 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Xref | csiph.com comp.lang.python:104544 |
Show key headers only | View raw
SQL doesn't allow decimal numbers for LIMIT.
Use decimal numbers it still work but is the proper way.
Then clean up a bit your code and remove the commented lines #
-----Original Message-----
From: Python-list [mailto:python-list-bounces+joaquin.alzola=lebara.com@python.org] On Behalf Of subhabangalore@gmail.com
Sent: 10 March 2016 18:12
To: python-list@python.org
Subject: Re: Review Request of Python Code
On Wednesday, March 9, 2016 at 9:49:17 AM UTC+5:30, subhaba...@gmail.com wrote:
> Dear Group,
>
> I am trying to write a code for pulling data from MySQL at the backend and annotating words and trying to put the results as separated sentences with each line. The code is generally running fine but I am feeling it may be better in the end of giving out sentences, and for small data sets it is okay but with 50,000 news articles it is performing dead slow. I am using Python2.7.11 on Windows 7 with 8GB RAM.
>
> I am trying to copy the code here, for your kind review.
>
> import MySQLdb
> import nltk
> def sql_connect_NewTest1():
> db = MySQLdb.connect(host="localhost",
> user="*****",
> passwd="*****",
> db="abcd_efgh")
> cur = db.cursor()
> #cur.execute("SELECT * FROM newsinput limit 0,50000;") #REPORTING RUNTIME ERROR
> cur.execute("SELECT * FROM newsinput limit 0,50;")
> dict_open=open("/python27/NewTotalTag.txt","r") #OPENING THE DICTIONARY FILE
> dict_read=dict_open.read()
> dict_word=dict_read.split()
> a4=dict_word #Assignment for code.
> list1=[]
> flist1=[]
> nlist=[]
> for row in cur.fetchall():
> #print row[2]
> var1=row[3]
> #print var1 #Printing lines
> #var2=len(var1) # Length of file
> var3=var1.split(".") #SPLITTING INTO LINES
> #print var3 #Printing The Lines
> #list1.append(var1)
> var4=len(var3) #Number of all lines
> #print "No",var4
> for line in var3:
> #print line
> #flist1.append(line)
> linew=line.split()
> for word in linew:
> if word in a4:
> windex=a4.index(word)
> windex1=windex+1
> word1=a4[windex1]
> word2=word+"/"+word1
> nlist.append(word2)
> #print list1
> #print nlist
> elif word not in a4:
> word3=word+"/"+"NA"
> nlist.append(word3)
> #print list1
> #print nlist
> else:
> print "None"
>
> #print "###",flist1
> #print len(flist1)
> #db.close()
> #print nlist
> lol = lambda lst, sz: [lst[i:i+sz] for i in range(0, len(lst), sz)] #TRYING TO SPLIT THE RESULTS AS SENTENCES
> nlist1=lol(nlist,7)
> #print nlist1
> for i in nlist1:
> string1=" ".join(i)
> print i
> #print string1
>
>
> Thanks in Advance.
****************************************************************************
Dear Group,
Thank you all, for your kind time and all suggestions in helping me.
Thank you Steve for writing the whole code. It is working full and fine. But speed is still an issue. We need to speed up.
Inada I tried to change to
cur = db.cursor(MySQLdb.cursors.SSCursor) but my System Admin said that may not be an issue.
Freidrich, my problem is I have a big text repository of .txt files in MySQL in the backend. I have another list of words with their possible tags. The tags are not conventional Parts of Speech(PoS) tags, and bit defined by others.
The code is expected to read each file and its each line.
On reading each line it will scan the list for appropriate tag, if it is found it would assign, else would assign NA.
The assignment should be in the format of /tag, so that if there is a string of n words, it should look like, w1/tag w2/tag w3/tag w4/tag ....wn/tag,
where tag may be tag in the list or NA as per the situation.
This format is taken because the files are expected to be tagged in Brown Corpus format. There is a Python Library named NLTK.
If I want to save my data for use with their models, I need some specifications. I want to use it as Tagged Corpus format.
Now the tagged data coming out in this format, should be one tagged sentences in each new line or a lattice.
They expect the data to be saved in .pos format but presently I am not doing in this code, I may do that later.
Please let me know if I need to give any more information.
Matt, thank you for if...else suggestion, the data of NewTotalTag.txt is like a simple list of words with unconventional tags, like,
w1 tag1
w2 tag2
w3 tag3
...
...
w3 tag3
like that.
Regards,
Subhabrata
--
https://mail.python.org/mailman/listinfo/python-list
This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt.
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Review Request of Python Code subhabangalore@gmail.com - 2016-03-08 20:18 -0800
Re: Review Request of Python Code Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2016-03-09 16:10 +1100
Re: Review Request of Python Code INADA Naoki <songofacandy@gmail.com> - 2016-03-09 16:52 +0900
Re: Review Request of Python Code Friedrich Rentsch <anthra.norell@bluewin.ch> - 2016-03-09 10:06 +0100
Re: Review Request of Python Code Matt Wheeler <m@funkyhat.org> - 2016-03-09 12:06 +0000
Re: Review Request of Python Code Matt Wheeler <m@funkyhat.org> - 2016-03-09 12:33 +0000
Re: Review Request of Python Code subhabangalore@gmail.com - 2016-03-10 10:12 -0800
Re: Review Request of Python Code BartC <bc@freeuk.com> - 2016-03-10 18:36 +0000
Re: Review Request of Python Code Matt Wheeler <m@funkyhat.org> - 2016-03-10 18:51 +0000
Re: Review Request of Python Code subhabangalore@gmail.com - 2016-03-10 12:14 -0800
RE: Review Request of Python Code Joaquin Alzola <Joaquin.Alzola@lebara.com> - 2016-03-10 19:12 +0000
Re: Review Request of Python Code Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-03-10 19:56 +0000
csiph-web