Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #100658

Re: Catogorising strings into random versus non-random

From Christian Gollwitzer <auriocus@gmx.de>
Newsgroups comp.lang.python
Subject Re: Catogorising strings into random versus non-random
Date 2015-12-21 11:56 +0100
Organization A noiseless patient Spider
Message-ID <n58lnc$nok$2@dont-email.me> (permalink)
References <56776b9d$0$1615$c3e8da3$5496439d@news.astraweb.com> <mailman.16.1450686280.2237.python-list@python.org> <n58i7f$cbd$1@dont-email.me> <5677d62e$0$1605$c3e8da3$5496439d@news.astraweb.com> <n58lhv$nok$1@dont-email.me>

Show all headers | View raw


Am 21.12.15 um 11:53 schrieb Christian Gollwitzer:
> So for the spaces, either use a proper trainig material (some long
> corpus from Wikipedia or such), with punctuation removed. Then it will
> catch the correct probabilities at word boundaries. Or preprocess by
> removing the spaces.
>
>      Christian

PS: The real log-likelihood would become -infinity, when some pair does 
not appear at all in the training set (esp. the numbers, e.g.). I used 
the 1/total in the defaultdict to mitigate that. You could tweak that 
value a bit. The larger the corpus, the sharper it will divide by 
itself, too.

	Christian

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Catogorising strings into random versus non-random Steven D'Aprano <steve@pearwood.info> - 2015-12-21 14:01 +1100
  Categorising strings on meaningful–meaningless spectrum (was: Catogorising strings into random versus non-random) Ben Finney <ben+python@benfinney.id.au> - 2015-12-21 14:45 +1100
    Re: Categorising strings on meaningful–meaningless spectrum (was: Catogorising strings into random versus non-random) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-12-21 19:47 +1100
  Re: Catogorising strings into random versus non-random Chris Angelico <rosuav@gmail.com> - 2015-12-21 15:22 +1100
    Re: Catogorising strings into random versus non-random Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-12-21 19:57 +1100
    Re: Catogorising strings into random versus non-random Rick Johnson <rantingrickjohnson@gmail.com> - 2015-12-21 17:45 -0800
  Re: Catogorising strings into random versus non-random Peter Otten <__peter__@web.de> - 2015-12-21 09:24 +0100
    Re: Catogorising strings into random versus non-random Christian Gollwitzer <auriocus@gmx.de> - 2015-12-21 10:56 +0100
      Re: Catogorising strings into random versus non-random Steven D'Aprano <steve@pearwood.info> - 2015-12-21 21:36 +1100
        Re: Catogorising strings into random versus non-random Christian Gollwitzer <auriocus@gmx.de> - 2015-12-21 11:53 +0100
          Re: Catogorising strings into random versus non-random Christian Gollwitzer <auriocus@gmx.de> - 2015-12-21 11:56 +0100
  Re: Catogorising strings into random versus non-random Vlastimil Brom <vlastimil.brom@gmail.com> - 2015-12-21 14:25 +0100
  Re: Catogorising strings into random versus non-random Vincent Davis <vincent@vincentdavis.net> - 2015-12-21 07:51 -0600
  Re: Catogorising strings into random versus non-random duncan smith <duncan@invalid.invalid> - 2015-12-21 16:40 +0000
    Re: Catogorising strings into random versus non-random Ian Kelly <ian.g.kelly@gmail.com> - 2015-12-21 09:49 -0700
      Re: Catogorising strings into random versus non-random duncan smith <duncan@invalid.invalid> - 2015-12-21 17:41 +0000
    Re: Catogorising strings into random versus non-random Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-12-21 17:09 +0000
  Re: Catogorising strings into random versus non-random Paul Rubin <no.email@nospam.invalid> - 2015-12-21 09:20 -0800

csiph-web