Groups > comp.lang.java.programmer > #14955 > unrolled thread

Slightly off-topic: Determining the strength of "Hangman" word.

Started by	Daniel Pitts <newsgroup.nospam@virtualinfinity.net>
First post	2012-05-31 09:26 -0700
Last post	2012-06-01 13:46 -0700
Articles	10 — 4 participants

Back to article view | Back to comp.lang.java.programmer

  Slightly off-topic: Determining the strength of "Hangman" word. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-05-31 09:26 -0700
    Re: Slightly off-topic: Determining the strength of "Hangman" word. Lew <lewbloch@gmail.com> - 2012-05-31 09:44 -0700
    Re: Slightly off-topic: Determining the strength of "Hangman" word. Roedy Green <see_website@mindprod.com.invalid> - 2012-05-31 10:43 -0700
      Re: Slightly off-topic: Determining the strength of "Hangman" word. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-05-31 11:34 -0700
        Re: Slightly off-topic: Determining the strength of "Hangman" word. Roedy Green <see_website@mindprod.com.invalid> - 2012-06-01 02:55 -0700
          Re: Slightly off-topic: Determining the strength of "Hangman" word. Gene Wirchenko <genew@ocis.net> - 2012-06-01 09:34 -0700
            Re: Slightly off-topic: Determining the strength of "Hangman" word. Roedy Green <see_website@mindprod.com.invalid> - 2012-06-01 11:58 -0700
              Re: Slightly off-topic: Determining the strength of "Hangman" word. Lew <lewbloch@gmail.com> - 2012-06-01 12:46 -0700
            Re: Slightly off-topic: Determining the strength of "Hangman" word. Lew <lewbloch@gmail.com> - 2012-06-01 12:45 -0700
              Re: Slightly off-topic: Determining the strength of "Hangman" word. Gene Wirchenko <genew@ocis.net> - 2012-06-01 13:46 -0700

#14955 — Slightly off-topic: Determining the strength of "Hangman" word.

From	Daniel Pitts <newsgroup.nospam@virtualinfinity.net>
Date	2012-05-31 09:26 -0700
Subject	Slightly off-topic: Determining the strength of "Hangman" word.
Message-ID	<GYMxr.5280$Bn.3533@newsfe12.iad>

I've been playing a bit of Zynga's "Hanging with friends". I was 
thinking about how to go about creating an "aid" for this. I don't 
cheat, but I like solving these kinds of problems, just to prove I can.

There are two phases in Hanging with Friends. One phase is to guess the 
word that your opponent has constructed, and the other phase is to 
construct a word yourself.

In the construction phase, you are given a "bag" of 12 letters. I'm not 
sure if its a completely random distribution. I suspect its weighted in 
some way. Anyway, that's not relevant for this question.

So, It is relatively easy to write a program that uses a word list (such 
the "official scrabble dictionary" word lists in the Moby collection), 
to find all words in that list that can be constructed from the bag.

The problem is determining the strength of the word, how hard it is to 
guess.

There is probably a psychological component to this, since the "average" 
player isn't likely to use logic and will more likely just "guess" 
letters that seem likely.  A program (or expert) has the advantage 
(somewhat) in that it can figure out statistically which letters are 
most likely based on the remaining possible words, and it would then 
"guess" that letter. Although I'm not certain that is actually the most 
effective strategy either.

The algorithm for the "guess-ability" of a word is made more complicated 
by the fact that the word itself effects how many failed guesses 
opponent can have before losing the round.  I'm not sure what the 
algorithm is for that, though I suspect it has to do with the number of 
distinct letters and word-length.

Any thoughts on algorithms or data structures you might use to solve 
this kind of problem?

I've solved parts of this already. I've created "LetterBag", "Word", 
"LetterSet", and "WordIndex" classes.

The WordIndex makes it easy to find "All words that match a pattern, but 
don't contain letters in a specific LetterSet" and "All words that can 
be made from a specific LetterBag".

Oh, and to tie this into a previous thread, the whole thing fits in 
memory with room to spare ;-)

Thanks,
Daniel.

[toc] | [next] | [standalone]

#14956

From	Lew <lewbloch@gmail.com>
Date	2012-05-31 09:44 -0700
Message-ID	<00546a61-0951-4417-8d8d-79801837d8da@googlegroups.com>
In reply to	#14955

Daniel Pitts wrote:
> I've been playing a bit of Zynga's "Hanging with friends". I was 
> thinking about how to go about creating an "aid" for this. I don't 
> cheat, but I like solving these kinds of problems, just to prove I can.
> 
> There are two phases in Hanging with Friends. One phase is to guess the 
> word that your opponent has constructed, and the other phase is to 
> construct a word yourself.
> 
> In the construction phase, you are given a "bag" of 12 letters. I'm not 
> sure if its a completely random distribution. I suspect its weighted in 
> some way. Anyway, that's not relevant for this question.
> 
> So, It is relatively easy to write a program that uses a word list (such 
> the "official scrabble dictionary" word lists in the Moby collection), 
> to find all words in that list that can be constructed from the bag.
> 
> The problem is determining the strength of the word, how hard it is to 
> guess.
> 
> There is probably a psychological component to this, since the "average" 
> player isn't likely to use logic and will more likely just "guess" 
> letters that seem likely.  A program (or expert) has the advantage 
> (somewhat) in that it can figure out statistically which letters are 
> most likely based on the remaining possible words, and it would then 
> "guess" that letter. Although I'm not certain that is actually the most 
> effective strategy either.
> 
> The algorithm for the "guess-ability" of a word is made more complicated 
> by the fact that the word itself effects how many failed guesses 
> opponent can have before losing the round.  I'm not sure what the 
> algorithm is for that, though I suspect it has to do with the number of 
> distinct letters and word-length.
> 
> Any thoughts on algorithms or data structures you might use to solve 
> this kind of problem?
> 
> I've solved parts of this already. I've created "LetterBag", "Word", 
> "LetterSet", and "WordIndex" classes.
> 
> The WordIndex makes it easy to find "All words that match a pattern, but 
> don't contain letters in a specific LetterSet" and "All words that can 
> be made from a specific LetterBag".
> 
> Oh, and to tie this into a previous thread, the whole thing fits in 
> memory with room to spare ;-)

You could run a neural net over words opponents have constructed 
in historical games and run it as a predictor for new games.

www.syncleus.com
has an open-source NN (and more) implementation.

-- 
Lew

[toc] | [prev] | [next] | [standalone]

#14959

From	Roedy Green <see_website@mindprod.com.invalid>
Date	2012-05-31 10:43 -0700
Message-ID	<stafs7hb4s35rqo8ug3o8rg9dp5n4mleu2@4ax.com>
In reply to	#14955

On Thu, 31 May 2012 09:26:11 -0700, Daniel Pitts
<newsgroup.nospam@virtualinfinity.net> wrote, quoted or indirectly
quoted someone who said :

>Oh, and to tie this into a previous thread, the whole thing fits in 
>memory with room to spare ;-)

Here are three ideas:

What you need to do is collect a giant hunk of random text from
various websites and compute a word frequency for each word in your
bag.  The lower the frequency, the harder it is to guess.

Another measure would to compare your word with every other word in
the bag.  The more letters it has in common in the corresponding slot
the harder it would be guess.

Look for rare letter pairs. These are EASY to guess.

-- 
Roedy Green Canadian Mind Products
http://mindprod.com
Controlling complexity is the essence of computer programming.
~ Brian W. Kernighan 1942-01-01
.

[toc] | [prev] | [next] | [standalone]

#14963

From	Daniel Pitts <newsgroup.nospam@virtualinfinity.net>
Date	2012-05-31 11:34 -0700
Message-ID	<RQOxr.27960$x11.23008@newsfe21.iad>
In reply to	#14959

On 5/31/12 10:43 AM, Roedy Green wrote:
> On Thu, 31 May 2012 09:26:11 -0700, Daniel Pitts
> <newsgroup.nospam@virtualinfinity.net>  wrote, quoted or indirectly
> quoted someone who said :
>
>> Oh, and to tie this into a previous thread, the whole thing fits in
>> memory with room to spare ;-)
>
> Here are three ideas:
>
> What you need to do is collect a giant hunk of random text from
> various websites and compute a word frequency for each word in your
> bag.  The lower the frequency, the harder it is to guess.
Perhaps, though low usage-frequency doesn't mean it isn't easy to guess. 
For example "exterminate" isn't a very common word, but a person would 
probably get to "e?ter?in?te" with very will problems, and from there, 
exterminate is the only possibility. "mixt" is probably a more difficult 
word to guess. The progression is likely to be "?i??", (guess s,e,l,n,t) 
"?i?t", (guess r,f). At this point, the game would be over. There would 
be two possibilities left, "mixt" and "dipt".  People don't seem to 
think of "xt", so they might be more likely to guess "dipt".

This is of course, assuming they guess in the "highest letter frequency" 
order.

>
> Another measure would to compare your word with every other word in
> the bag.  The more letters it has in common in the corresponding slot
> the harder it would be guess.
>
> Look for rare letter pairs. These are EASY to guess.
The problem is that some words are false positives.  They seem ambiguous 
for the most part, but one letter absolutely gives it away.

[toc] | [prev] | [next] | [standalone]

#14972

From	Roedy Green <see_website@mindprod.com.invalid>
Date	2012-06-01 02:55 -0700
Message-ID	<q34hs7l2mer6nrmuhvh1e8b0729b8fgfje@4ax.com>
In reply to	#14963

On Thu, 31 May 2012 11:34:23 -0700, Daniel Pitts
<newsgroup.nospam@virtualinfinity.net> wrote, quoted or indirectly
quoted someone who said :

>The problem is that some words are false positives.  They seem ambiguous 
>for the most part, but one letter absolutely gives it away.

This is a messy problem. For every guessing strategy you need a
different algorithm to determine difficulty. But if you have enough
algorithms that each say produce a number 0..1 you can take the max as
the guessability or the sum, or something in between that gives extra
weight to high components (sum of cubes?)
-- 
Roedy Green Canadian Mind Products
http://mindprod.com
Controlling complexity is the essence of computer programming.
~ Brian W. Kernighan 1942-01-01
.

[toc] | [prev] | [next] | [standalone]

#14984

From	Gene Wirchenko <genew@ocis.net>
Date	2012-06-01 09:34 -0700
Message-ID	<llrhs7ltk8lg5fv8kecnpv4md6v464qfbf@4ax.com>
In reply to	#14972

On Fri, 01 Jun 2012 02:55:52 -0700, Roedy Green
<see_website@mindprod.com.invalid> wrote:

>On Thu, 31 May 2012 11:34:23 -0700, Daniel Pitts
><newsgroup.nospam@virtualinfinity.net> wrote, quoted or indirectly
>quoted someone who said :
>
>>The problem is that some words are false positives.  They seem ambiguous 
>>for the most part, but one letter absolutely gives it away.
>
>This is a messy problem. For every guessing strategy you need a
>different algorithm to determine difficulty. But if you have enough
>algorithms that each say produce a number 0..1 you can take the max as
>the guessability or the sum, or something in between that gives extra
>weight to high components (sum of cubes?)

     Suppose someone games your choice of algorithms by choosing words
to give bad results by those algorithms.

Sincerely,

Gene Wirchenko

[toc] | [prev] | [next] | [standalone]

#14991

From	Roedy Green <see_website@mindprod.com.invalid>
Date	2012-06-01 11:58 -0700
Message-ID	<f64is7h4o6n4n4nn3t8h3devo0p5ho1jb9@4ax.com>
In reply to	#14984

On Fri, 01 Jun 2012 09:34:15 -0700, Gene Wirchenko <genew@ocis.net>
wrote, quoted or indirectly quoted someone who said :

>     Suppose someone games your choice of algorithms by choosing words
>to give bad results by those algorithms.


I understood the game to allow me to know the list of words in the bag
before I start play. Is that correct?
-- 
Roedy Green Canadian Mind Products
http://mindprod.com
Controlling complexity is the essence of computer programming.
~ Brian W. Kernighan 1942-01-01
.

[toc] | [prev] | [next] | [standalone]

#14993

From	Lew <lewbloch@gmail.com>
Date	2012-06-01 12:46 -0700
Message-ID	<b8a0f059-a981-4701-b66f-9b5dfaaa30ac@googlegroups.com>
In reply to	#14991

Roedy Green wrote:
> I understood the game to allow me to know the list of words in the bag
> before I start play. Is that correct?

Isn't that true of all word games?

Or do I misunderstand what you mean by "in the bag"?

-- 
Lew

[toc] | [prev] | [next] | [standalone]

#14992

From	Lew <lewbloch@gmail.com>
Date	2012-06-01 12:45 -0700
Message-ID	<3db48bc5-4add-4cb7-b321-aeaef86e3177@googlegroups.com>
In reply to	#14984

Gene Wirchenko wrote:
> Roedy Green wrote:
>> Daniel Pitts wrote, quoted or indirectly quoted someone who said :
>>> The problem is that some words are false positives.  They seem ambiguous 
>>> for the most part, but one letter absolutely gives it away.
> >
>> This is a messy problem. For every guessing strategy you need a
>> different algorithm to determine difficulty. But if you have enough
>> algorithms that each say produce a number 0..1 you can take the max as
>> the guessability or the sum, or something in between that gives extra
>> weight to high components (sum of cubes?)

Not really. A statistical approach, that is, analysis of a large number of games
for frequency of words chosen, number of guesses by the opponent and if 
the opponent guessed the word, should account for strategies that include 
picking "easy" words and how well that works.

You don't even need to know the guessing strategies involved.

>      Suppose someone games your choice of algorithms by choosing words
> to give bad results by those algorithms.

They'd have to know what the algorithms are, and there has to be such a 
thing as a "bad result". A statistical approach uses facts - what actually 
happened. This is hard to game. If a player picks a word that is easy to 
guess, then their opponent is likely to guess it quickly. This is especially true 
if the results from games are fed back into the learning system so that it 
can adapt to sneaky opponents.

-- 
Lew

[toc] | [prev] | [next] | [standalone]

#14994

From	Gene Wirchenko <genew@ocis.net>
Date	2012-06-01 13:46 -0700
Message-ID	<ddais753ghoh1s3dv4ddpmv5elq8sqjjpt@4ax.com>
In reply to	#14992

On Fri, 1 Jun 2012 12:45:21 -0700 (PDT), Lew <lewbloch@gmail.com>
wrote:

>Gene Wirchenko wrote:
>> Roedy Green wrote:
>>> Daniel Pitts wrote, quoted or indirectly quoted someone who said :
>>>> The problem is that some words are false positives.  They seem ambiguous 
>>>> for the most part, but one letter absolutely gives it away.
>> >
>>> This is a messy problem. For every guessing strategy you need a
>>> different algorithm to determine difficulty. But if you have enough
>>> algorithms that each say produce a number 0..1 you can take the max as
>>> the guessability or the sum, or something in between that gives extra
>>> weight to high components (sum of cubes?)
>
>Not really. A statistical approach, that is, analysis of a large number of games
>for frequency of words chosen, number of guesses by the opponent and if 
>the opponent guessed the word, should account for strategies that include 
>picking "easy" words and how well that works.
>
>You don't even need to know the guessing strategies involved.
>
>>      Suppose someone games your choice of algorithms by choosing words
>> to give bad results by those algorithms.
>
>They'd have to know what the algorithms are, and there has to be such a 
>thing as a "bad result". A statistical approach uses facts - what actually 

     You mean like getting hanged?

>happened. This is hard to game. If a player picks a word that is easy to 
>guess, then their opponent is likely to guess it quickly. This is especially true 
>if the results from games are fed back into the learning system so that it 
>can adapt to sneaky opponents.

     I tend to play weighting my guesses by frequency ("etoainshrdlu"
and all that).  Someone could pick words that that does not work well
for.  If someone knows your strategy, they can game it.

Sincerely,

Gene Wirchenko

[toc] | [prev] | [standalone]

csiph-web

Slightly off-topic: Determining the strength of "Hangman" word.

Contents

#14955 — Slightly off-topic: Determining the strength of "Hangman" word.

#14956

#14959

#14963

#14972

#14984

#14991

#14993

#14992

#14994