最近我一直在iPhone上玩一款名为《Scramble》的游戏。有些人可能知道这个游戏叫拼字游戏。从本质上讲,当游戏开始时,你会得到一个字母矩阵:
F X I E
A M L O
E W B X
A S T U
The goal of the game is to find as many words as you can that can be formed by chaining letters together. You can start with any letter, and all the letters that surround it are fair game, and then once you move on to the next letter, all the letters that surround that letter are fair game, except for any previously used letters. So in the grid above, for example, I could come up with the words LOB, TUX, SEA, FAME, etc. Words must be at least 3 characters, and no more than NxN characters, which would be 16 in this game but can vary in some implementations. While this game is fun and addictive, I am apparently not very good at it and I wanted to cheat a little bit by making a program that would give me the best possible words (the longer the word the more points you get).
(来源:boggled.org)
不幸的是,我不太擅长算法或它们的效率等等。我的第一次尝试使用一个像这样的字典(约2.3MB),并进行线性搜索,试图匹配字典条目的组合。这需要花费很长时间来找到可能的单词,因为你每轮只有2分钟的时间,这是不够的。
我很有兴趣看看是否有任何Stackoverflowers可以提出更有效的解决方案。我主要是在寻找使用三大p的解决方案:Python、PHP和Perl,尽管任何使用Java或c++的东西也很酷,因为速度是至关重要的。
目前的解决方案:
Adam Rosenfield, Python, ~20岁
John Fouhy, Python, ~3秒
Kent Fredric, Perl, ~1s
Darius Bacon, Python, ~1s
rvarcher, VB。净,~ 1 s
Paolo Bergantino, PHP(实时链接),~5s(本地~2s)
我意识到这个问题的时间来了又去了,但由于我自己正在研究一个求解器,并在谷歌搜索时偶然发现了这个,我想我应该发布一个参考,因为它似乎与其他一些问题有点不同。
我选择在游戏棋盘上使用平面数组,并从棋盘上的每个字母进行递归搜索,从有效邻居遍历到有效邻居,如果索引中的有效前缀是当前字母列表,则扩展搜索。而遍历当前单词的概念是进入板的索引列表,而不是组成单词的字母。在检查索引时,将索引转换为字母并完成检查。
索引是一个蛮力字典,有点像trie,但允许对索引进行python查询。如果单词'cat'和'cater'在列表中,你会在字典中看到:
d = { 'c': ['cat','cater'],
'ca': ['cat','cater'],
'cat': ['cat','cater'],
'cate': ['cater'],
'cater': ['cater'],
}
因此,如果current_word是'ca',您就知道它是一个有效的前缀,因为'ca'在d中返回True(因此继续遍历板)。如果current_word是'cat',那么你知道它是一个有效的单词,因为它是一个有效的前缀,并且d['cat']中的'cat'也返回True。
如果感觉这允许一些可读的代码,似乎不是太慢。像其他人一样,这个系统的费用是读取/构建索引。解这个板子相当麻烦。
代码在http://gist.github.com/268079。它是故意垂直和幼稚的,有很多明确的有效性检查,因为我想理解问题,而不是用一堆魔法或晦涩难懂的东西把它弄得乱七八糟。
所以我想添加另一种PHP方法来解决这个问题,因为每个人都喜欢PHP。
我想做一点重构,比如对字典文件使用regexpression匹配,但现在我只是将整个字典文件加载到一个wordList中。
我使用了链表的思想。每个Node都有一个字符值、一个位置值和一个next指针。
location值是我发现两个节点是否连接的方法。
1 2 3 4
11 12 13 14
21 22 23 24
31 32 33 34
所以使用这个网格,如果第一个节点的位置等于第二个节点的位置+/- 1(同一行),+/- 9,10,11(上下一行),我就知道两个节点是连接的。
我使用递归进行主搜索。它从wordList中取出一个单词,找到所有可能的起点,然后递归地找到下一个可能的连接,记住它不能去到它已经使用的位置(这就是为什么我添加$notInLoc)。
无论如何,我知道它需要一些重构,并且希望听到关于如何使它更干净的想法,但是它根据我使用的字典文件产生了正确的结果。根据黑板上元音和组合的数量,大约需要3到6秒。我知道,一旦我对字典结果进行预匹配,这将显著减少。
<?php
ini_set('xdebug.var_display_max_depth', 20);
ini_set('xdebug.var_display_max_children', 1024);
ini_set('xdebug.var_display_max_data', 1024);
class Node {
var $loc;
function __construct($value) {
$this->value = $value;
$next = null;
}
}
class Boggle {
var $root;
var $locList = array (1, 2, 3, 4, 11, 12, 13, 14, 21, 22, 23, 24, 31, 32, 33, 34);
var $wordList = [];
var $foundWords = [];
function __construct($board) {
// Takes in a board string and creates all the nodes
$node = new Node($board[0]);
$node->loc = $this->locList[0];
$this->root = $node;
for ($i = 1; $i < strlen($board); $i++) {
$node->next = new Node($board[$i]);
$node->next->loc = $this->locList[$i];
$node = $node->next;
}
// Load in a dictionary file
// Use regexp to elimate all the words that could never appear and load the
// rest of the words into wordList
$handle = fopen("dict.txt", "r");
if ($handle) {
while (($line = fgets($handle)) !== false) {
// process the line read.
$line = trim($line);
if (strlen($line) > 2) {
$this->wordList[] = trim($line);
}
}
fclose($handle);
} else {
// error opening the file.
echo "Problem with the file.";
}
}
function isConnected($node1, $node2) {
// Determines if 2 nodes are connected on the boggle board
return (($node1->loc == $node2->loc + 1) || ($node1->loc == $node2->loc - 1) ||
($node1->loc == $node2->loc - 9) || ($node1->loc == $node2->loc - 10) || ($node1->loc == $node2->loc - 11) ||
($node1->loc == $node2->loc + 9) || ($node1->loc == $node2->loc + 10) || ($node1->loc == $node2->loc + 11)) ? true : false;
}
function find($value, $notInLoc = []) {
// Returns a node with the value that isn't in a location
$current = $this->root;
while($current) {
if ($current->value == $value && !in_array($current->loc, $notInLoc)) {
return $current;
}
if (isset($current->next)) {
$current = $current->next;
} else {
break;
}
}
return false;
}
function findAll($value) {
// Returns an array of nodes with a specific value
$current = $this->root;
$foundNodes = [];
while ($current) {
if ($current->value == $value) {
$foundNodes[] = $current;
}
if (isset($current->next)) {
$current = $current->next;
} else {
break;
}
}
return (empty($foundNodes)) ? false : $foundNodes;
}
function findAllConnectedTo($node, $value, $notInLoc = []) {
// Returns an array of nodes that are connected to a specific node and
// contain a specific value and are not in a certain location
$nodeList = $this->findAll($value);
$newList = [];
if ($nodeList) {
foreach ($nodeList as $node2) {
if (!in_array($node2->loc, $notInLoc) && $this->isConnected($node, $node2)) {
$newList[] = $node2;
}
}
}
return (empty($newList)) ? false : $newList;
}
function inner($word, $list, $i = 0, $notInLoc = []) {
$i++;
foreach($list as $node) {
$notInLoc[] = $node->loc;
if ($list2 = $this->findAllConnectedTo($node, $word[$i], $notInLoc)) {
if ($i == (strlen($word) - 1)) {
return true;
} else {
return $this->inner($word, $list2, $i, $notInLoc);
}
}
}
return false;
}
function findWord($word) {
if ($list = $this->findAll($word[0])) {
return $this->inner($word, $list);
}
return false;
}
function findAllWords() {
foreach($this->wordList as $word) {
if ($this->findWord($word)) {
$this->foundWords[] = $word;
}
}
}
function displayBoard() {
$current = $this->root;
for ($i=0; $i < 4; $i++) {
echo $current->value . " " . $current->next->value . " " . $current->next->next->value . " " . $current->next->next->next->value . "<br />";
if ($i < 3) {
$current = $current->next->next->next->next;
}
}
}
}
function randomBoardString() {
return substr(str_shuffle(str_repeat("abcdefghijklmnopqrstuvwxyz", 16)), 0, 16);
}
$myBoggle = new Boggle(randomBoardString());
$myBoggle->displayBoard();
$x = microtime(true);
$myBoggle->findAllWords();
$y = microtime(true);
echo ($y-$x);
var_dump($myBoggle->foundWords);
?>
如何简单的排序和使用字典中的二进制搜索?
在0.35秒内返回整个列表,并可以进一步优化(例如删除含有未使用字母的单词等)。
from bisect import bisect_left
f = open("dict.txt")
D.extend([line.strip() for line in f.readlines()])
D = sorted(D)
def neibs(M,x,y):
n = len(M)
for i in xrange(-1,2):
for j in xrange(-1,2):
if (i == 0 and j == 0) or (x + i < 0 or x + i >= n or y + j < 0 or y + j >= n):
continue
yield (x + i, y + j)
def findWords(M,D,x,y,prefix):
prefix = prefix + M[x][y]
# find word in dict by binary search
found = bisect_left(D,prefix)
# if found then yield
if D[found] == prefix:
yield prefix
# if what we found is not even a prefix then return
# (there is no point in going further)
if len(D[found]) < len(prefix) or D[found][:len(prefix)] != prefix:
return
# recourse
for neib in neibs(M,x,y):
for word in findWords(M,D,neib[0], neib[1], prefix):
yield word
def solve(M,D):
# check each starting point
for x in xrange(0,len(M)):
for y in xrange(0,len(M)):
for word in findWords(M,D,x,y,""):
yield word
grid = "fxie amlo ewbx astu".split()
print [x for x in solve(grid,D)]
我的答案和这里的其他答案一样,但我把它贴出来是因为它看起来比其他Python解决方案快一些,因为设置字典更快。(我对比了John Fouhy的解决方案。)设置后,解决的时间在噪声中下降。
grid = "fxie amlo ewbx astu".split()
nrows, ncols = len(grid), len(grid[0])
# A dictionary word that could be a solution must use only the grid's
# letters and have length >= 3. (With a case-insensitive match.)
import re
alphabet = ''.join(set(''.join(grid)))
bogglable = re.compile('[' + alphabet + ']{3,}$', re.I).match
words = set(word.rstrip('\n') for word in open('words') if bogglable(word))
prefixes = set(word[:i] for word in words
for i in range(2, len(word)+1))
def solve():
for y, row in enumerate(grid):
for x, letter in enumerate(row):
for result in extending(letter, ((x, y),)):
yield result
def extending(prefix, path):
if prefix in words:
yield (prefix, path)
for (nx, ny) in neighbors(path[-1]):
if (nx, ny) not in path:
prefix1 = prefix + grid[ny][nx]
if prefix1 in prefixes:
for result in extending(prefix1, path + ((nx, ny),)):
yield result
def neighbors((x, y)):
for nx in range(max(0, x-1), min(x+2, ncols)):
for ny in range(max(0, y-1), min(y+2, nrows)):
yield (nx, ny)
示例用法:
# Print a maximal-length word and its path:
print max(solve(), key=lambda (word, path): len(word))
编辑:过滤掉长度小于3个字母的单词。
编辑2:我很好奇为什么Kent Fredric的Perl解决方案更快;它使用正则表达式匹配,而不是一组字符。在Python中做同样的事情,速度大约会翻倍。
我建议根据单词做一个字母树。这棵树将由字母结构组成,像这样:
letter: char
isWord: boolean
然后构建树,每个深度添加一个新字母。换句话说,第一层是字母表;然后从这些树中,会有另外26个条目,以此类推,直到你把所有的单词都拼出来。坚持这个解析树,它将使所有可能的答案更快地查找。
使用这个解析过的树,您可以非常快速地找到解决方案。下面是伪代码:
BEGIN:
For each letter:
if the struct representing it on the current depth has isWord == true, enter it as an answer.
Cycle through all its neighbors; if there is a child of the current node corresponding to the letter, recursively call BEGIN on it.
这可以通过一些动态编程来加快。例如,在你的样本中,两个“A”都在一个“E”和一个“W”旁边,这(从它们击中它们的点来看)是相同的。我没有足够的时间来详细说明这个代码,但我想你们可以理解。
此外,我相信你会找到其他解决方案,如果你谷歌“Boggle solver”。
import java.util.HashSet;
import java.util.Set;
/**
* @author Sujeet Kumar (mrsujeet@gmail.com) It prints out all strings that can
* be formed by moving left, right, up, down, or diagonally and exist in
* a given dictionary , without repeating any cell. Assumes words are
* comprised of lower case letters. Currently prints words as many times
* as they appear, not just once. *
*/
public class BoggleGame
{
/* A sample 4X4 board/2D matrix */
private static char[][] board = { { 's', 'a', 's', 'g' },
{ 'a', 'u', 't', 'h' },
{ 'r', 't', 'j', 'e' },
{ 'k', 'a', 'h', 'e' }
};
/* A sample dictionary which contains unique collection of words */
private static Set<String> dictionary = new HashSet<String>();
private static boolean[][] visited = new boolean[board.length][board[0].length];
public static void main(String[] arg) {
dictionary.add("sujeet");
dictionary.add("sarthak");
findWords();
}
// show all words, starting from each possible starting place
private static void findWords() {
for (int i = 0; i < board.length; i++) {
for (int j = 0; j < board[i].length; j++) {
StringBuffer buffer = new StringBuffer();
dfs(i, j, buffer);
}
}
}
// run depth first search starting at cell (i, j)
private static void dfs(int i, int j, StringBuffer buffer) {
/*
* base case: just return in recursive call when index goes out of the
* size of matrix dimension
*/
if (i < 0 || j < 0 || i > board.length - 1 || j > board[i].length - 1) {
return;
}
/*
* base case: to return in recursive call when given cell is already
* visited in a given string of word
*/
if (visited[i][j] == true) { // can't visit a cell more than once
return;
}
// not to allow a cell to reuse
visited[i][j] = true;
// combining cell character with other visited cells characters to form
// word a potential word which may exist in dictionary
buffer.append(board[i][j]);
// found a word in dictionary. Print it.
if (dictionary.contains(buffer.toString())) {
System.out.println(buffer);
}
/*
* consider all neighbors.For a given cell considering all adjacent
* cells in horizontal, vertical and diagonal direction
*/
for (int k = i - 1; k <= i + 1; k++) {
for (int l = j - 1; l <= j + 1; l++) {
dfs(k, l, buffer);
}
}
buffer.deleteCharAt(buffer.length() - 1);
visited[i][j] = false;
}
}