最近我一直在iPhone上玩一款名为《Scramble》的游戏。有些人可能知道这个游戏叫拼字游戏。从本质上讲,当游戏开始时,你会得到一个字母矩阵:

F X I E
A M L O
E W B X
A S T U

The goal of the game is to find as many words as you can that can be formed by chaining letters together. You can start with any letter, and all the letters that surround it are fair game, and then once you move on to the next letter, all the letters that surround that letter are fair game, except for any previously used letters. So in the grid above, for example, I could come up with the words LOB, TUX, SEA, FAME, etc. Words must be at least 3 characters, and no more than NxN characters, which would be 16 in this game but can vary in some implementations. While this game is fun and addictive, I am apparently not very good at it and I wanted to cheat a little bit by making a program that would give me the best possible words (the longer the word the more points you get).

(来源:boggled.org)

不幸的是,我不太擅长算法或它们的效率等等。我的第一次尝试使用一个像这样的字典(约2.3MB),并进行线性搜索,试图匹配字典条目的组合。这需要花费很长时间来找到可能的单词,因为你每轮只有2分钟的时间,这是不够的。

我很有兴趣看看是否有任何Stackoverflowers可以提出更有效的解决方案。我主要是在寻找使用三大p的解决方案:Python、PHP和Perl,尽管任何使用Java或c++的东西也很酷,因为速度是至关重要的。

目前的解决方案:

Adam Rosenfield, Python, ~20岁 John Fouhy, Python, ~3秒 Kent Fredric, Perl, ~1s Darius Bacon, Python, ~1s rvarcher, VB。净,~ 1 s Paolo Bergantino, PHP(实时链接),~5s(本地~2s)


当前回答

只是为了好玩,我在bash中实现了一个。 它不是超级快,但很合理。

http://dev.xkyle.com/bashboggle/

其他回答

你可以把这个问题分成两部分:

某种搜索算法可以在网格中列举出可能的字符串。 一种测试字符串是否是有效单词的方法。

理想情况下,(2)还应该包括一种测试字符串是否是有效单词前缀的方法——这将允许您精简搜索并节省大量时间。

亚当·罗森菲尔德(Adam Rosenfield)的Trie是(2)的一个解决方案。它很优雅,可能是算法专家的首选,但有了现代语言和现代计算机,我们可能会更懒一点。此外,正如Kent所建议的,我们可以通过丢弃网格中没有字母的单词来减少字典的大小。这是一些蟒蛇:

def make_lookups(grid, fn='dict.txt'):
    # Make set of valid characters.
    chars = set()
    for word in grid:
        chars.update(word)

    words = set(x.strip() for x in open(fn) if set(x.strip()) <= chars)
    prefixes = set()
    for w in words:
        for i in range(len(w)+1):
            prefixes.add(w[:i])

    return words, prefixes

哇;常数时间前缀测试。加载你链接的字典需要几秒钟,但只有几秒钟:-)(注意words <= prefixes)

现在,对于第(1)部分,我倾向于用图表来思考。所以我将创建一个像这样的字典:

graph = { (x, y):set([(x0,y0), (x1,y1), (x2,y2)]), }

例如,graph[(x, y)]是你从位置(x, y)可以到达的坐标集。我还将添加一个虚拟节点None,它将连接到所有东西。

构建它有点笨拙,因为有8个可能的位置,你必须做边界检查。下面是一些相应笨拙的python代码:

def make_graph(grid):
    root = None
    graph = { root:set() }
    chardict = { root:'' }

    for i, row in enumerate(grid):
        for j, char in enumerate(row):
            chardict[(i, j)] = char
            node = (i, j)
            children = set()
            graph[node] = children
            graph[root].add(node)
            add_children(node, children, grid)

    return graph, chardict

def add_children(node, children, grid):
    x0, y0 = node
    for i in [-1,0,1]:
        x = x0 + i
        if not (0 <= x < len(grid)):
            continue
        for j in [-1,0,1]:
            y = y0 + j
            if not (0 <= y < len(grid[0])) or (i == j == 0):
                continue

            children.add((x,y))

这段代码还建立了一个字典映射(x,y)到相应的字符。这让我把一个位置列表转换成一个单词:

def to_word(chardict, pos_list):
    return ''.join(chardict[x] for x in pos_list)

最后,我们进行深度优先搜索。基本程序是:

搜索到达一个特定的节点。 检查到目前为止的路径是否可能是单词的一部分。如果不是,就不要进一步探索这个分支。 检查到目前为止的路径是否是一个单词。如果是,则添加到结果列表中。 探索迄今为止所有孩子未走的路。

Python:

def find_words(graph, chardict, position, prefix, results, words, prefixes):
    """ Arguments:
      graph :: mapping (x,y) to set of reachable positions
      chardict :: mapping (x,y) to character
      position :: current position (x,y) -- equals prefix[-1]
      prefix :: list of positions in current string
      results :: set of words found
      words :: set of valid words in the dictionary
      prefixes :: set of valid words or prefixes thereof
    """
    word = to_word(chardict, prefix)

    if word not in prefixes:
        return

    if word in words:
        results.add(word)

    for child in graph[position]:
        if child not in prefix:
            find_words(graph, chardict, child, prefix+[child], results, words, prefixes)

运行代码如下:

grid = ['fxie', 'amlo', 'ewbx', 'astu']
g, c = make_graph(grid)
w, p = make_lookups(grid)
res = set()
find_words(g, c, None, [], res, w, p)

检查保留区,看看答案。下面是为你的例子找到的单词列表,按大小排序:

 ['a', 'b', 'e', 'f', 'i', 'l', 'm', 'o', 's', 't',
 'u', 'w', 'x', 'ae', 'am', 'as', 'aw', 'ax', 'bo',
 'bu', 'ea', 'el', 'em', 'es', 'fa', 'ie', 'io', 'li',
 'lo', 'ma', 'me', 'mi', 'oe', 'ox', 'sa', 'se', 'st',
 'tu', 'ut', 'wa', 'we', 'xi', 'aes', 'ame', 'ami',
 'ase', 'ast', 'awa', 'awe', 'awl', 'blo', 'but', 'elb',
 'elm', 'fae', 'fam', 'lei', 'lie', 'lim', 'lob', 'lox',
 'mae', 'maw', 'mew', 'mil', 'mix', 'oil', 'olm', 'saw',
 'sea', 'sew', 'swa', 'tub', 'tux', 'twa', 'wae', 'was',
 'wax', 'wem', 'ambo', 'amil', 'amli', 'asem', 'axil',
 'axle', 'bleo', 'boil', 'bole', 'east', 'fame', 'limb',
 'lime', 'mesa', 'mewl', 'mile', 'milo', 'oime', 'sawt',
 'seam', 'seax', 'semi', 'stub', 'swam', 'twae', 'twas',
 'wame', 'wase', 'wast', 'weam', 'west', 'amble', 'awest',
 'axile', 'embox', 'limbo', 'limes', 'swami', 'embole',
 'famble', 'semble', 'wamble']

代码需要(字面上的)几秒钟来加载字典,但其余的在我的机器上是立即完成的。

令人惊讶的是,没有人尝试使用PHP版本。

这是John Fouhy的Python解决方案的PHP版本。

虽然我从其他人的答案中得到了一些建议,但这主要是抄袭约翰的。

$boggle = "fxie
           amlo
           ewbx
           astu";

$alphabet = str_split(str_replace(array("\n", " ", "\r"), "", strtolower($boggle)));
$rows = array_map('trim', explode("\n", $boggle));
$dictionary = file("C:/dict.txt");
$prefixes = array(''=>'');
$words = array();
$regex = '/[' . implode('', $alphabet) . ']{3,}$/S';
foreach($dictionary as $k=>$value) {
    $value = trim(strtolower($value));
    $length = strlen($value);
    if(preg_match($regex, $value)) {
        for($x = 0; $x < $length; $x++) {
            $letter = substr($value, 0, $x+1);
            if($letter == $value) {
                $words[$value] = 1;
            } else {
                $prefixes[$letter] = 1;
            }
        }
    }
}

$graph = array();
$chardict = array();
$positions = array();
$c = count($rows);
for($i = 0; $i < $c; $i++) {
    $l = strlen($rows[$i]);
    for($j = 0; $j < $l; $j++) {
        $chardict[$i.','.$j] = $rows[$i][$j];
        $children = array();
        $pos = array(-1,0,1);
        foreach($pos as $z) {
            $xCoord = $z + $i;
            if($xCoord < 0 || $xCoord >= count($rows)) {
                continue;
            }
            $len = strlen($rows[0]);
            foreach($pos as $w) {
                $yCoord = $j + $w;
                if(($yCoord < 0 || $yCoord >= $len) || ($z == 0 && $w == 0)) {
                    continue;
                }
                $children[] = array($xCoord, $yCoord);
            }
        }
        $graph['None'][] = array($i, $j);
        $graph[$i.','.$j] = $children;
    }
}

function to_word($chardict, $prefix) {
    $word = array();
    foreach($prefix as $v) {
        $word[] = $chardict[$v[0].','.$v[1]];
    }
    return implode("", $word);
}

function find_words($graph, $chardict, $position, $prefix, $prefixes, &$results, $words) {
    $word = to_word($chardict, $prefix);
    if(!isset($prefixes[$word])) return false;

    if(isset($words[$word])) {
        $results[] = $word;
    }

    foreach($graph[$position] as $child) {
        if(!in_array($child, $prefix)) {
            $newprefix = $prefix;
            $newprefix[] = $child;
            find_words($graph, $chardict, $child[0].','.$child[1], $newprefix, $prefixes, $results, $words);
        }
    }
}

$solution = array();
find_words($graph, $chardict, 'None', array(), $prefixes, $solution);
print_r($solution);

如果你想尝试的话,这里有一个实时链接。虽然在我的本地机器上需要大约2秒,但在我的web服务器上需要大约5秒。无论哪种情况,它都不是很快。尽管如此,它还是很可怕,所以我可以想象时间可以大大缩短。任何关于如何实现这一目标的建议都将不胜感激。PHP缺少元组,这使得坐标处理起来很奇怪,而且我无法理解到底发生了什么,这对我一点帮助都没有。

编辑:一些修复使它在本地少于1秒。

当我看到问题陈述时,我想到了“Trie”。但看到其他一些海报使用了这种方法,我寻找另一种不同的方法。可惜的是,Trie方法表现更好。我在我的机器上运行了Kent的Perl解决方案,在调整它以使用我的字典文件后,它花了0.31秒运行。我自己的perl实现需要0.54秒才能运行。

这就是我的方法:

Create a transition hash to model the legal transitions. Iterate through all 16^3 possible three letter combinations. In the loop, exclude illegal transitions and repeat visits to the same square. Form all the legal 3-letter sequences and store them in a hash. Then loop through all words in the dictionary. Exclude words that are too long or short Slide a 3-letter window across each word and see if it is among the 3-letter combos from step 2. Exclude words that fail. This eliminates most non-matches. If still not eliminated, use a recursive algorithm to see if the word can be formed by making paths through the puzzle. (This part is slow, but called infrequently.) Print out the words I found. I tried 3-letter and 4-letter sequences, but 4-letter sequences slowed the program down.

在我的代码中,我使用/usr/share/dict/words作为我的字典。它是MAC OS X和许多Unix系统的标准配置。如果你愿意,你可以使用另一个文件。要破解不同的谜题,只需更改变量@puzzle。这将很容易适应更大的矩阵。你只需要改变%transitions哈希值和%legalTransitions哈希值。

这种解决方案的优点是代码短,数据结构简单。

下面是Perl代码(我知道它使用了太多的全局变量):

#!/usr/bin/perl
use Time::HiRes  qw{ time };

sub readFile($);
sub findAllPrefixes($);
sub isWordTraceable($);
sub findWordsInPuzzle(@);

my $startTime = time;

# Puzzle to solve

my @puzzle = ( 
    F, X, I, E,
    A, M, L, O,
    E, W, B, X,
    A, S, T, U
);

my $minimumWordLength = 3;
my $maximumPrefixLength = 3; # I tried four and it slowed down.

# Slurp the word list.
my $wordlistFile = "/usr/share/dict/words";

my @words = split(/\n/, uc(readFile($wordlistFile)));
print "Words loaded from word list: " . scalar @words . "\n";

print "Word file load time: " . (time - $startTime) . "\n";
my $postLoad = time;

# Define the legal transitions from one letter position to another. 
# Positions are numbered 0-15.
#     0  1  2  3
#     4  5  6  7
#     8  9 10 11
#    12 13 14 15
my %transitions = ( 
   -1 => [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],
    0 => [1,4,5], 
    1 => [0,2,4,5,6],
    2 => [1,3,5,6,7],
    3 => [2,6,7],
    4 => [0,1,5,8,9],
    5 => [0,1,2,4,6,8,9,10],
    6 => [1,2,3,5,7,9,10,11],
    7 => [2,3,6,10,11],
    8 => [4,5,9,12,13],
    9 => [4,5,6,8,10,12,13,14],
    10 => [5,6,7,9,11,13,14,15],
    11 => [6,7,10,14,15],
    12 => [8,9,13],
    13 => [8,9,10,12,14],
    14 => [9,10,11,13,15],
    15 => [10,11,14]
);

# Convert the transition matrix into a hash for easy access.
my %legalTransitions = ();
foreach my $start (keys %transitions) {
    my $legalRef = $transitions{$start};
    foreach my $stop (@$legalRef) {
        my $index = ($start + 1) * (scalar @puzzle) + ($stop + 1);
        $legalTransitions{$index} = 1;
    }
}

my %prefixesInPuzzle = findAllPrefixes($maximumPrefixLength);

print "Find prefixes time: " . (time - $postLoad) . "\n";
my $postPrefix = time;

my @wordsFoundInPuzzle = findWordsInPuzzle(@words);

print "Find words in puzzle time: " . (time - $postPrefix) . "\n";

print "Unique prefixes found: " . (scalar keys %prefixesInPuzzle) . "\n";
print "Words found (" . (scalar @wordsFoundInPuzzle) . ") :\n    " . join("\n    ", @wordsFoundInPuzzle) . "\n";

print "Total Elapsed time: " . (time - $startTime) . "\n";

###########################################

sub readFile($) {
    my ($filename) = @_;
    my $contents;
    if (-e $filename) {
        # This is magic: it opens and reads a file into a scalar in one line of code. 
        # See http://www.perl.com/pub/a/2003/11/21/slurp.html
        $contents = do { local( @ARGV, $/ ) = $filename ; <> } ; 
    }
    else {
        $contents = '';
    }
    return $contents;
}

# Is it legal to move from the first position to the second? They must be adjacent.
sub isLegalTransition($$) {
    my ($pos1,$pos2) = @_;
    my $index = ($pos1 + 1) * (scalar @puzzle) + ($pos2 + 1);
    return $legalTransitions{$index};
}

# Find all prefixes where $minimumWordLength <= length <= $maxPrefixLength
#
#   $maxPrefixLength ... Maximum length of prefix we will store. Three gives best performance. 
sub findAllPrefixes($) {
    my ($maxPrefixLength) = @_;
    my %prefixes = ();
    my $puzzleSize = scalar @puzzle;

    # Every possible N-letter combination of the letters in the puzzle 
    # can be represented as an integer, though many of those combinations
    # involve illegal transitions, duplicated letters, etc.
    # Iterate through all those possibilities and eliminate the illegal ones.
    my $maxIndex = $puzzleSize ** $maxPrefixLength;

    for (my $i = 0; $i < $maxIndex; $i++) {
        my @path;
        my $remainder = $i;
        my $prevPosition = -1;
        my $prefix = '';
        my %usedPositions = ();
        for (my $prefixLength = 1; $prefixLength <= $maxPrefixLength; $prefixLength++) {
            my $position = $remainder % $puzzleSize;

            # Is this a valid step?
            #  a. Is the transition legal (to an adjacent square)?
            if (! isLegalTransition($prevPosition, $position)) {
                last;
            }

            #  b. Have we repeated a square?
            if ($usedPositions{$position}) {
                last;
            }
            else {
                $usedPositions{$position} = 1;
            }

            # Record this prefix if length >= $minimumWordLength.
            $prefix .= $puzzle[$position];
            if ($prefixLength >= $minimumWordLength) {
                $prefixes{$prefix} = 1;
            }

            push @path, $position;
            $remainder -= $position;
            $remainder /= $puzzleSize;
            $prevPosition = $position;
        } # end inner for
    } # end outer for
    return %prefixes;
}

# Loop through all words in dictionary, looking for ones that are in the puzzle.
sub findWordsInPuzzle(@) {
    my @allWords = @_;
    my @wordsFound = ();
    my $puzzleSize = scalar @puzzle;
WORD: foreach my $word (@allWords) {
        my $wordLength = length($word);
        if ($wordLength > $puzzleSize || $wordLength < $minimumWordLength) {
            # Reject word as too short or too long.
        }
        elsif ($wordLength <= $maximumPrefixLength ) {
            # Word should be in the prefix hash.
            if ($prefixesInPuzzle{$word}) {
                push @wordsFound, $word;
            }
        }
        else {
            # Scan through the word using a window of length $maximumPrefixLength, looking for any strings not in our prefix list.
            # If any are found that are not in the list, this word is not possible.
            # If no non-matches are found, we have more work to do.
            my $limit = $wordLength - $maximumPrefixLength + 1;
            for (my $startIndex = 0; $startIndex < $limit; $startIndex ++) {
                if (! $prefixesInPuzzle{substr($word, $startIndex, $maximumPrefixLength)}) {
                    next WORD;
                }
            }
            if (isWordTraceable($word)) {
                # Additional test necessary: see if we can form this word by following legal transitions
                push @wordsFound, $word;
            }
        }

    }
    return @wordsFound;
}

# Is it possible to trace out the word using only legal transitions?
sub isWordTraceable($) {
    my $word = shift;
    return traverse([split(//, $word)], [-1]); # Start at special square -1, which may transition to any square in the puzzle.
}

# Recursively look for a path through the puzzle that matches the word.
sub traverse($$) {
    my ($lettersRef, $pathRef) = @_;
    my $index = scalar @$pathRef - 1;
    my $position = $pathRef->[$index];
    my $letter = $lettersRef->[$index];
    my $branchesRef =  $transitions{$position};
BRANCH: foreach my $branch (@$branchesRef) {
            if ($puzzle[$branch] eq $letter) {
                # Have we used this position yet?
                foreach my $usedBranch (@$pathRef) {
                    if ($usedBranch == $branch) {
                        next BRANCH;
                    }
                }
                if (scalar @$lettersRef == $index + 1) {
                    return 1; # End of word and success.
                }
                push @$pathRef, $branch;
                if (traverse($lettersRef, $pathRef)) {
                    return 1; # Recursive success.
                }
                else {
                    pop @$pathRef;
                }
            }
        }
    return 0; # No path found. Failed.
}

这是我想出的解决填字游戏的办法。我想这是最“python”的做事方式:

from itertools import combinations
from itertools import izip
from math import fabs

def isAllowedStep(current,step,length,doubleLength):
            # for step == length -1 not to be 0 => trivial solutions are not allowed
    return length > 1 and \
           current + step < doubleLength and current - step > 0 and \
           ( step == 1 or step == -1 or step <= length+1 or step >= length - 1)

def getPairwiseList(someList):
    iterableList = iter(someList)
    return izip(iterableList, iterableList)

def isCombinationAllowed(combination,length,doubleLength):

    for (first,second) in  getPairwiseList(combination):
        _, firstCoordinate = first
        _, secondCoordinate = second
        if not isAllowedStep(firstCoordinate, fabs(secondCoordinate-firstCoordinate),length,doubleLength):
            return False
    return True

def extractSolution(combinations):
    return ["".join([x[0] for x in combinationTuple]) for combinationTuple in combinations]


length = 4
text = tuple("".join("fxie amlo ewbx astu".split()))
textIndices = tuple(range(len(text)))
coordinates = zip(text,textIndices)

validCombinations = [combination for combination in combinations(coordinates,length) if isCombinationAllowed(combination,length,length*length)]
solution = extractSolution(validCombinations)

我善意地建议你不要将这部分用于所有可能的匹配,但它实际上提供了一种检查你生成的单词是否真的构成有效单词的可能性:

import mechanize
def checkWord(word):
    url = "https://en.oxforddictionaries.com/search?filter=dictionary&query="+word
    br = mechanize.Browser()
    br.set_handle_robots(False)
    response = br.open(url)
    text = response.read()
    return "no exact matches"  not in text.lower()

print [valid for valid in solution[:10] if checkWord(valid)]

你的搜索算法是否会随着搜索的继续而不断减少单词列表?

例如,在上面的搜索中,你的单词只能以13个字母开头(有效地减少了一半的开头字母)。

当你添加更多的字母排列时,它会进一步减少可用的单词集,减少必要的搜索。

我会从这里开始。