Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #64674

Re: Need Help with Programming Science Project

Path csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Return-Path <python-python-list@m.gmane.org>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.005
X-Spam-Evidence '*H*': 0.99; '*S*': 0.00; 'algorithm': 0.04; 'resulting': 0.04; 'sufficient': 0.05; 'reason,': 0.07; 'bits': 0.09; 'expired': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'texts.': 0.09; 'subject:Help': 0.11; 'def': 0.12; 'random': 0.14; 'collins': 0.16; 'collins,': 0.16; 'copyrighted': 0.16; 'given,': 0.16; 'laws.': 0.16; 'received:80.91.229.3': 0.16; 'received:dip0.t-ipconnect.de': 0.16; 'received:plane.gmane.org': 0.16; 'received:t-ipconnect.de': 0.16; 'stories,': 0.16; 'subject:Programming': 0.16; 'wrote:': 0.18; 'bit': 0.19; 'pieces': 0.19; 'meant': 0.20; 'code,': 0.22; 'example': 0.22; 'import': 0.22; 'putting': 0.22; 'header:User- Agent:1': 0.23; 'fine': 0.24; 'helpful': 0.24; 'script': 0.25; 'post': 0.26; 'header:X-Complaints-To:1': 0.27; 'function': 0.29; 'appreciated.': 0.29; "i'm": 0.30; 'gives': 0.31; 'code': 0.31; 'author,': 0.31; 'ones.': 0.31; 'piece': 0.31; 'figure': 0.32; 'another': 0.32; 'text': 0.33; 'plain': 0.33; 'subject:with': 0.35; "can't": 0.35; 'anybody': 0.35; 'but': 0.35; 'really': 0.36; "i'll": 0.36; 'possible': 0.36; 'should': 0.36; 'so,': 0.37; 'two': 0.37; 'project': 0.37; 'thank': 0.38; 'to:addr:python- list': 0.38; 'short': 0.38; 'anything': 0.39; 'sure': 0.39; 'to:addr:python.org': 0.39; 'received:org': 0.40; 'how': 0.40; 'easy': 0.60; 'numbers': 0.61; 'matter': 0.61; 'you.': 0.62; 'name': 0.63; 'kind': 0.63; 'grab': 0.64; 'subject:Need': 0.64; 'different': 0.65; 'sample': 0.67; 'between': 0.67; 'determine': 0.67; 'believe': 0.68; 'analog': 0.84; 'books.': 0.84; 'clearer': 0.84; 'desperate': 0.84; 'given.': 0.84; 'ing': 0.84; 'subject:Project': 0.84; 'subject:Science': 0.91; 'examine': 0.93; 'insane': 0.93
X-Injected-Via-Gmane http://gmane.org/
To python-list@python.org
From Peter Otten <__peter__@web.de>
Subject Re: Need Help with Programming Science Project
Date Fri, 24 Jan 2014 12:07:35 +0100
Organization None
References <b1831c17-d9f9-4576-9488-7463e76ccf3b@googlegroups.com>
Mime-Version 1.0
Content-Type text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding 7Bit
X-Gmane-NNTP-Posting-Host p5084ae83.dip0.t-ipconnect.de
User-Agent KNode/4.7.3
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.5934.1390561622.18130.python-list@python.org> (permalink)
Lines 59
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1390561622 news.xs4all.nl 2924 [2001:888:2000:d::a6]:50353
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:64674

Show key headers only | View raw


theguy wrote:

> I have a science project that involves designing a program which can
> examine a bit of text with the author's name given, then figure out who
> the author is if another piece of example text without the name is given.
> I so far have three different authors in the program and have already put
> in the example text but for some reason, the program always leans toward
> one specific author, Suzanne Collins, no matter what insane number I try
> to put in or how much I tinker with the coding. I would post the code, but
> I don't know if it's fine to put it here, as it contains pieces from
> books. I do believe that would go against copyright laws. If I can figure
> out a way to put it in without the bits from the stories, then I'll do so,
> but as of now, any help is appreciated. I understand I'm not exactly mak
>  ing it easy since I'm not putting up any code, but I'm kind of desperate
>  for help here, as I can't seem to find anybody or anything else helpful
>  in any way. Thank you.

If I were to speculate what your program might look like:

text_samples = {
    "Suzanne Collins": "... some text by collins ...",
    "J. K. Rowling": "... some text by rowling ...",
    #...
}

unknown = "... sample text by unknown author ..."

def calc_match(text1, text2):
   import random
   return random.random()

guessed_author = None
guessed_match = None

for author, text in text_samples.items():
   match = calc_match(unknown, text)
   print(author, match)
   if guessed_author is None or match > guessed_match:
       guessed_author = author
       guessed_match = match

print("The author is", guessed_author)

The important part in this script are not the text samples or the loop to 
determine the best match -- it's the algorithm used to determine how good 
two texts match. 
In the above example that algorithm is encapsulated in the calc_match() 
function and it's really bad, it gives you random numbers between 0 and 1.

For us to help you it should be sufficient when you post the analog of this 
function in your code together with a description in plain english of how it 
is meant to calculate the similarity between two texts.

Alternatavely, instead of the copyrighted texts grab text samples from 
project gutenberg with expired copyright.

Make sure that the resulting post is as short as possible -- long text 
samples don't make the post clearer than short ones.

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Need Help with Programming Science Project theguy <kvxdelta@gmail.com> - 2014-01-24 02:05 -0800
  Re: Need Help with Programming Science Project Peter Otten <__peter__@web.de> - 2014-01-24 12:07 +0100
  Re: Need Help with Programming Science Project bob gailer <bgailer@gmail.com> - 2014-01-24 18:38 -0500
  Re: Need Help with Programming Science Project Chris Angelico <rosuav@gmail.com> - 2014-01-25 11:34 +1100
  Re: Need Help with Programming Science Project Ben Finney <ben+python@benfinney.id.au> - 2014-01-25 11:59 +1100
    Re: Need Help with Programming Science Project Roy Smith <roy@panix.com> - 2014-01-24 20:38 -0500
  Re: Need Help with Programming Science Project Terry Reedy <tjreedy@udel.edu> - 2014-01-24 20:10 -0500
  Re: Need Help with Programming Science Project kvxdelta@gmail.com - 2014-01-24 18:42 -0800
    Re: Need Help with Programming Science Project Rustom Mody <rustompmody@gmail.com> - 2014-01-24 19:06 -0800
      Re: Need Help with Programming Science Project theguy <kvxdelta@gmail.com> - 2014-01-24 20:58 -0800
        Re: Need Help with Programming Science Project Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-01-25 20:30 +1300
        Re: Need Help with Programming Science Project Denis McMahon <denismfmcmahon@gmail.com> - 2014-01-25 11:31 +0000
      Re: Need Help with Programming Science Project Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2014-01-25 09:42 -0500
        Re: Need Help with Programming Science Project Rustom Mody <rustompmody@gmail.com> - 2014-01-25 08:15 -0800
    Re: Need Help with Programming Science Project Dave Angel <davea@davea.name> - 2014-01-25 01:38 -0500
  Re: Need Help with Programming Science Project Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-01-25 20:25 +1300
  Re: Need Help with Programming Science Project alex23 <wuwei23@gmail.com> - 2014-01-28 17:31 +1000

csiph-web