最近我一直在iPhone上玩一款名为《Scramble》的游戏。有些人可能知道这个游戏叫拼字游戏。从本质上讲,当游戏开始时,你会得到一个字母矩阵:

F X I E
A M L O
E W B X
A S T U

The goal of the game is to find as many words as you can that can be formed by chaining letters together. You can start with any letter, and all the letters that surround it are fair game, and then once you move on to the next letter, all the letters that surround that letter are fair game, except for any previously used letters. So in the grid above, for example, I could come up with the words LOB, TUX, SEA, FAME, etc. Words must be at least 3 characters, and no more than NxN characters, which would be 16 in this game but can vary in some implementations. While this game is fun and addictive, I am apparently not very good at it and I wanted to cheat a little bit by making a program that would give me the best possible words (the longer the word the more points you get).

(来源:boggled.org)

不幸的是,我不太擅长算法或它们的效率等等。我的第一次尝试使用一个像这样的字典(约2.3MB),并进行线性搜索,试图匹配字典条目的组合。这需要花费很长时间来找到可能的单词,因为你每轮只有2分钟的时间,这是不够的。

我很有兴趣看看是否有任何Stackoverflowers可以提出更有效的解决方案。我主要是在寻找使用三大p的解决方案:Python、PHP和Perl,尽管任何使用Java或c++的东西也很酷,因为速度是至关重要的。

目前的解决方案:

Adam Rosenfield, Python, ~20岁 John Fouhy, Python, ~3秒 Kent Fredric, Perl, ~1s Darius Bacon, Python, ~1s rvarcher, VB。净,~ 1 s Paolo Bergantino, PHP(实时链接),~5s(本地~2s)


当前回答

首先,阅读c#语言设计师如何解决一个相关问题: http://blogs.msdn.com/ericlippert/archive/2009/02/04/a-nasality-talisman-for-the-sultana-analyst.aspx。

像他一样,您可以从字典开始,并通过从字母排序的字母数组到可以根据这些字母拼写的单词列表创建字典来规范化单词。

接下来,开始从黑板上创建可能的单词并查找它们。我怀疑这将让你走得很远,但肯定有更多的技巧可以加快速度。

其他回答

import java.util.HashSet;
import java.util.Set;

/**
 * @author Sujeet Kumar (mrsujeet@gmail.com) It prints out all strings that can
 *         be formed by moving left, right, up, down, or diagonally and exist in
 *         a given dictionary , without repeating any cell. Assumes words are
 *         comprised of lower case letters. Currently prints words as many times
 *         as they appear, not just once. *
 */

public class BoggleGame 
{
  /* A sample 4X4 board/2D matrix */
  private static char[][] board = { { 's', 'a', 's', 'g' },
                                  { 'a', 'u', 't', 'h' }, 
                                  { 'r', 't', 'j', 'e' },
                                  { 'k', 'a', 'h', 'e' }
};

/* A sample dictionary which contains unique collection of words */
private static Set<String> dictionary = new HashSet<String>();

private static boolean[][] visited = new boolean[board.length][board[0].length];

public static void main(String[] arg) {
    dictionary.add("sujeet");
    dictionary.add("sarthak");
    findWords();

}

// show all words, starting from each possible starting place
private static void findWords() {
    for (int i = 0; i < board.length; i++) {
        for (int j = 0; j < board[i].length; j++) {
            StringBuffer buffer = new StringBuffer();
            dfs(i, j, buffer);
        }

    }

}

// run depth first search starting at cell (i, j)
private static void dfs(int i, int j, StringBuffer buffer) {
    /*
     * base case: just return in recursive call when index goes out of the
     * size of matrix dimension
     */
    if (i < 0 || j < 0 || i > board.length - 1 || j > board[i].length - 1) {
        return;
    }

    /*
     * base case: to return in recursive call when given cell is already
     * visited in a given string of word
     */
    if (visited[i][j] == true) { // can't visit a cell more than once
        return;
    }

    // not to allow a cell to reuse
    visited[i][j] = true;

    // combining cell character with other visited cells characters to form
    // word a potential word which may exist in dictionary
    buffer.append(board[i][j]);

    // found a word in dictionary. Print it.
    if (dictionary.contains(buffer.toString())) {
        System.out.println(buffer);
    }

    /*
     * consider all neighbors.For a given cell considering all adjacent
     * cells in horizontal, vertical and diagonal direction
     */
    for (int k = i - 1; k <= i + 1; k++) {
        for (int l = j - 1; l <= j + 1; l++) {
            dfs(k, l, buffer);

        }

    }
    buffer.deleteCharAt(buffer.length() - 1);
    visited[i][j] = false;
  }
}

你可以把这个问题分成两部分:

某种搜索算法可以在网格中列举出可能的字符串。 一种测试字符串是否是有效单词的方法。

理想情况下,(2)还应该包括一种测试字符串是否是有效单词前缀的方法——这将允许您精简搜索并节省大量时间。

亚当·罗森菲尔德(Adam Rosenfield)的Trie是(2)的一个解决方案。它很优雅,可能是算法专家的首选,但有了现代语言和现代计算机,我们可能会更懒一点。此外,正如Kent所建议的,我们可以通过丢弃网格中没有字母的单词来减少字典的大小。这是一些蟒蛇:

def make_lookups(grid, fn='dict.txt'):
    # Make set of valid characters.
    chars = set()
    for word in grid:
        chars.update(word)

    words = set(x.strip() for x in open(fn) if set(x.strip()) <= chars)
    prefixes = set()
    for w in words:
        for i in range(len(w)+1):
            prefixes.add(w[:i])

    return words, prefixes

哇;常数时间前缀测试。加载你链接的字典需要几秒钟,但只有几秒钟:-)(注意words <= prefixes)

现在,对于第(1)部分,我倾向于用图表来思考。所以我将创建一个像这样的字典:

graph = { (x, y):set([(x0,y0), (x1,y1), (x2,y2)]), }

例如,graph[(x, y)]是你从位置(x, y)可以到达的坐标集。我还将添加一个虚拟节点None,它将连接到所有东西。

构建它有点笨拙,因为有8个可能的位置,你必须做边界检查。下面是一些相应笨拙的python代码:

def make_graph(grid):
    root = None
    graph = { root:set() }
    chardict = { root:'' }

    for i, row in enumerate(grid):
        for j, char in enumerate(row):
            chardict[(i, j)] = char
            node = (i, j)
            children = set()
            graph[node] = children
            graph[root].add(node)
            add_children(node, children, grid)

    return graph, chardict

def add_children(node, children, grid):
    x0, y0 = node
    for i in [-1,0,1]:
        x = x0 + i
        if not (0 <= x < len(grid)):
            continue
        for j in [-1,0,1]:
            y = y0 + j
            if not (0 <= y < len(grid[0])) or (i == j == 0):
                continue

            children.add((x,y))

这段代码还建立了一个字典映射(x,y)到相应的字符。这让我把一个位置列表转换成一个单词:

def to_word(chardict, pos_list):
    return ''.join(chardict[x] for x in pos_list)

最后,我们进行深度优先搜索。基本程序是:

搜索到达一个特定的节点。 检查到目前为止的路径是否可能是单词的一部分。如果不是,就不要进一步探索这个分支。 检查到目前为止的路径是否是一个单词。如果是,则添加到结果列表中。 探索迄今为止所有孩子未走的路。

Python:

def find_words(graph, chardict, position, prefix, results, words, prefixes):
    """ Arguments:
      graph :: mapping (x,y) to set of reachable positions
      chardict :: mapping (x,y) to character
      position :: current position (x,y) -- equals prefix[-1]
      prefix :: list of positions in current string
      results :: set of words found
      words :: set of valid words in the dictionary
      prefixes :: set of valid words or prefixes thereof
    """
    word = to_word(chardict, prefix)

    if word not in prefixes:
        return

    if word in words:
        results.add(word)

    for child in graph[position]:
        if child not in prefix:
            find_words(graph, chardict, child, prefix+[child], results, words, prefixes)

运行代码如下:

grid = ['fxie', 'amlo', 'ewbx', 'astu']
g, c = make_graph(grid)
w, p = make_lookups(grid)
res = set()
find_words(g, c, None, [], res, w, p)

检查保留区,看看答案。下面是为你的例子找到的单词列表,按大小排序:

 ['a', 'b', 'e', 'f', 'i', 'l', 'm', 'o', 's', 't',
 'u', 'w', 'x', 'ae', 'am', 'as', 'aw', 'ax', 'bo',
 'bu', 'ea', 'el', 'em', 'es', 'fa', 'ie', 'io', 'li',
 'lo', 'ma', 'me', 'mi', 'oe', 'ox', 'sa', 'se', 'st',
 'tu', 'ut', 'wa', 'we', 'xi', 'aes', 'ame', 'ami',
 'ase', 'ast', 'awa', 'awe', 'awl', 'blo', 'but', 'elb',
 'elm', 'fae', 'fam', 'lei', 'lie', 'lim', 'lob', 'lox',
 'mae', 'maw', 'mew', 'mil', 'mix', 'oil', 'olm', 'saw',
 'sea', 'sew', 'swa', 'tub', 'tux', 'twa', 'wae', 'was',
 'wax', 'wem', 'ambo', 'amil', 'amli', 'asem', 'axil',
 'axle', 'bleo', 'boil', 'bole', 'east', 'fame', 'limb',
 'lime', 'mesa', 'mewl', 'mile', 'milo', 'oime', 'sawt',
 'seam', 'seax', 'semi', 'stub', 'swam', 'twae', 'twas',
 'wame', 'wase', 'wast', 'weam', 'west', 'amble', 'awest',
 'axile', 'embox', 'limbo', 'limes', 'swami', 'embole',
 'famble', 'semble', 'wamble']

代码需要(字面上的)几秒钟来加载字典,但其余的在我的机器上是立即完成的。

我花了3个月的时间致力于解决10个最佳点密集的5x5 Boggle板问题。

这个问题现在已经解决了,并在5个网页上进行了全面披露。有问题请联系我。

该棋盘分析算法使用显式堆栈,通过具有直接子信息的有向无环词图伪递归遍历棋盘方格,并使用时间戳跟踪机制。这很可能是世界上最先进的词汇数据结构。

该方案在四核上每秒评估大约10,000块非常好的电路板。(9500 +分)

父网页:

DeepSearch.c - http://www.pathcom.com/~vadco/deep.html

组件网页:

最佳记分牌- http://www.pathcom.com/~vadco/binary.html

高级词汇结构- http://www.pathcom.com/~vadco/adtdawg.html

板分析算法- http://www.pathcom.com/~vadco/guns.html

并行批处理- http://www.pathcom.com/~vadco/parallel.html

- 只有追求最好的人才会对这本全面的著作感兴趣。

最快的解决方案可能是将字典存储在一个trie中。然后,创建一个三元组队列(x, y, s),其中队列中的每个元素对应于一个可以在网格中拼写的单词的前缀s,结束于位置(x, y)。初始化队列中有N x N个元素(其中N是网格的大小),网格中的每个正方形都有一个元素。然后,算法进行如下:

While the queue is not empty:
  Dequeue a triple (x, y, s)
  For each square (x', y') with letter c adjacent to (x, y):
    If s+c is a word, output s+c
    If s+c is a prefix of a word, insert (x', y', s+c) into the queue

如果将字典存储在trie中,则可以在常数时间内测试s+c是否是单词或单词的前缀(前提是还在每个队列数据中保留一些额外的元数据,例如指向trie中当前节点的指针),因此此算法的运行时间为O(可拼写的单词数量)。

[编辑]下面是我刚刚编写的Python实现:

#!/usr/bin/python

class TrieNode:
    def __init__(self, parent, value):
        self.parent = parent
        self.children = [None] * 26
        self.isWord = False
        if parent is not None:
            parent.children[ord(value) - 97] = self

def MakeTrie(dictfile):
    dict = open(dictfile)
    root = TrieNode(None, '')
    for word in dict:
        curNode = root
        for letter in word.lower():
            if 97 <= ord(letter) < 123:
                nextNode = curNode.children[ord(letter) - 97]
                if nextNode is None:
                    nextNode = TrieNode(curNode, letter)
                curNode = nextNode
        curNode.isWord = True
    return root

def BoggleWords(grid, dict):
    rows = len(grid)
    cols = len(grid[0])
    queue = []
    words = []
    for y in range(cols):
        for x in range(rows):
            c = grid[y][x]
            node = dict.children[ord(c) - 97]
            if node is not None:
                queue.append((x, y, c, node))
    while queue:
        x, y, s, node = queue[0]
        del queue[0]
        for dx, dy in ((1, 0), (1, -1), (0, -1), (-1, -1), (-1, 0), (-1, 1), (0, 1), (1, 1)):
            x2, y2 = x + dx, y + dy
            if 0 <= x2 < cols and 0 <= y2 < rows:
                s2 = s + grid[y2][x2]
                node2 = node.children[ord(grid[y2][x2]) - 97]
                if node2 is not None:
                    if node2.isWord:
                        words.append(s2)
                    queue.append((x2, y2, s2, node2))

    return words

使用示例:

d = MakeTrie('/usr/share/dict/words')
print(BoggleWords(['fxie','amlo','ewbx','astu'], d))

输出:

['fa', 'xi', 'ie', 'io', 'el', 'am', 'ax', 'ae', 'aw', 'mi', 'ma', 'me', 'lo', 'li', 'oe', 'ox', 'em', 'ea', 'ea', 'es', 'wa', 'we', 'wa', 'bo', 'bu', 'as', 'aw', 'ae', 'st', 'se', 'sa', 'tu', 'ut', 'fam', 'fae', 'imi', 'eli', 'elm', 'elb', 'ami', 'ama', 'ame', 'aes', 'awl', 'awa', 'awe', 'awa', 'mix', 'mim', 'mil', 'mam', 'max', 'mae', 'maw', 'mew', 'mem', 'mes', 'lob', 'lox', 'lei', 'leo', 'lie', 'lim', 'oil', 'olm', 'ewe', 'eme', 'wax', 'waf', 'wae', 'waw', 'wem', 'wea', 'wea', 'was', 'waw', 'wae', 'bob', 'blo', 'bub', 'but', 'ast', 'ase', 'asa', 'awl', 'awa', 'awe', 'awa', 'aes', 'swa', 'swa', 'sew', 'sea', 'sea', 'saw', 'tux', 'tub', 'tut', 'twa', 'twa', 'tst', 'utu', 'fama', 'fame', 'ixil', 'imam', 'amli', 'amil', 'ambo', 'axil', 'axle', 'mimi', 'mima', 'mime', 'milo', 'mile', 'mewl', 'mese', 'mesa', 'lolo', 'lobo', 'lima', 'lime', 'limb', 'lile', 'oime', 'oleo', 'olio', 'oboe', 'obol', 'emim', 'emil', 'east', 'ease', 'wame', 'wawa', 'wawa', 'weam', 'west', 'wese', 'wast', 'wase', 'wawa', 'wawa', 'boil', 'bolo', 'bole', 'bobo', 'blob', 'bleo', 'bubo', 'asem', 'stub', 'stut', 'swam', 'semi', 'seme', 'seam', 'seax', 'sasa', 'sawt', 'tutu', 'tuts', 'twae', 'twas', 'twae', 'ilima', 'amble', 'axile', 'awest', 'mamie', 'mambo', 'maxim', 'mease', 'mesem', 'limax', 'limes', 'limbo', 'limbu', 'obole', 'emesa', 'embox', 'awest', 'swami', 'famble', 'mimble', 'maxima', 'embolo', 'embole', 'wamble', 'semese', 'semble', 'sawbwa', 'sawbwa']

Notes: This program doesn't output 1-letter words, or filter by word length at all. That's easy to add but not really relevant to the problem. It also outputs some words multiple times if they can be spelled in multiple ways. If a given word can be spelled in many different ways (worst case: every letter in the grid is the same (e.g. 'A') and a word like 'aaaaaaaaaa' is in your dictionary), then the running time will get horribly exponential. Filtering out duplicates and sorting is trivial to due after the algorithm has finished.

一个Node.JS JavaScript解决方案。在不到一秒钟的时间内计算所有100个独特的单词,其中包括阅读字典文件(MBA 2012)。

Output: ["FAM","TUX","TUB","FAE","ELI","ELM","ELB","TWA","TWA","SAW","AMI","SWA","SWA","AME","SEA","SEW","AES","AWL","AWE","SEA","AWA","MIX","MIL","AST","ASE","MAX","MAE","MAW","MEW","AWE","MES","AWL","LIE","LIM","AWA","AES","BUT","BLO","WAS","WAE","WEA","LEI","LEO","LOB","LOX","WEM","OIL","OLM","WEA","WAE","WAX","WAF","MILO","EAST","WAME","TWAS","TWAE","EMIL","WEAM","OIME","AXIL","WEST","TWAE","LIMB","WASE","WAST","BLEO","STUB","BOIL","BOLE","LIME","SAWT","LIMA","MESA","MEWL","AXLE","FAME","ASEM","MILE","AMIL","SEAX","SEAM","SEMI","SWAM","AMBO","AMLI","AXILE","AMBLE","SWAMI","AWEST","AWEST","LIMAX","LIMES","LIMBU","LIMBO","EMBOX","SEMBLE","EMBOLE","WAMBLE","FAMBLE"]

代码:

var fs = require('fs')

var Node = function(value, row, col) {
    this.value = value
    this.row = row
    this.col = col
}

var Path = function() {
    this.nodes = []
}

Path.prototype.push = function(node) {
    this.nodes.push(node)
    return this
}

Path.prototype.contains = function(node) {
    for (var i = 0, ii = this.nodes.length; i < ii; i++) {
        if (this.nodes[i] === node) {
            return true
        }
    }

    return false
}

Path.prototype.clone = function() {
    var path = new Path()
    path.nodes = this.nodes.slice(0)
    return path
}

Path.prototype.to_word = function() {
    var word = ''

    for (var i = 0, ii = this.nodes.length; i < ii; ++i) {
        word += this.nodes[i].value
    }

    return word
}

var Board = function(nodes, dict) {
    // Expects n x m array.
    this.nodes = nodes
    this.words = []
    this.row_count = nodes.length
    this.col_count = nodes[0].length
    this.dict = dict
}

Board.from_raw = function(board, dict) {
    var ROW_COUNT = board.length
      , COL_COUNT = board[0].length

    var nodes = []

    // Replace board with Nodes
    for (var i = 0, ii = ROW_COUNT; i < ii; ++i) {
        nodes.push([])
        for (var j = 0, jj = COL_COUNT; j < jj; ++j) {
            nodes[i].push(new Node(board[i][j], i, j))
        }
    }

    return new Board(nodes, dict)
}

Board.prototype.toString = function() {
    return JSON.stringify(this.nodes)
}

Board.prototype.update_potential_words = function(dict) {
    for (var i = 0, ii = this.row_count; i < ii; ++i) {
        for (var j = 0, jj = this.col_count; j < jj; ++j) {
            var node = this.nodes[i][j]
              , path = new Path()

            path.push(node)

            this.dfs_search(path)
        }
    }
}

Board.prototype.on_board = function(row, col) {
    return 0 <= row && row < this.row_count && 0 <= col && col < this.col_count
}

Board.prototype.get_unsearched_neighbours = function(path) {
    var last_node = path.nodes[path.nodes.length - 1]

    var offsets = [
        [-1, -1], [-1,  0], [-1, +1]
      , [ 0, -1],           [ 0, +1]
      , [+1, -1], [+1,  0], [+1, +1]
    ]

    var neighbours = []

    for (var i = 0, ii = offsets.length; i < ii; ++i) {
        var offset = offsets[i]
        if (this.on_board(last_node.row + offset[0], last_node.col + offset[1])) {

            var potential_node = this.nodes[last_node.row + offset[0]][last_node.col + offset[1]]
            if (!path.contains(potential_node)) {
                // Create a new path if on board and we haven't visited this node yet.
                neighbours.push(potential_node)
            }
        }
    }

    return neighbours
}

Board.prototype.dfs_search = function(path) {
    var path_word = path.to_word()

    if (this.dict.contains_exact(path_word) && path_word.length >= 3) {
        this.words.push(path_word)
    }

    var neighbours = this.get_unsearched_neighbours(path)

    for (var i = 0, ii = neighbours.length; i < ii; ++i) {
        var neighbour = neighbours[i]
        var new_path = path.clone()
        new_path.push(neighbour)

        if (this.dict.contains_prefix(new_path.to_word())) {
            this.dfs_search(new_path)
        }
    }
}

var Dict = function() {
    this.dict_array = []

    var dict_data = fs.readFileSync('./web2', 'utf8')
    var dict_array = dict_data.split('\n')

    for (var i = 0, ii = dict_array.length; i < ii; ++i) {
        dict_array[i] = dict_array[i].toUpperCase()
    }

    this.dict_array = dict_array.sort()
}

Dict.prototype.contains_prefix = function(prefix) {
    // Binary search
    return this.search_prefix(prefix, 0, this.dict_array.length)
}

Dict.prototype.contains_exact = function(exact) {
    // Binary search
    return this.search_exact(exact, 0, this.dict_array.length)
}

Dict.prototype.search_prefix = function(prefix, start, end) {
    if (start >= end) {
        // If no more place to search, return no matter what.
        return this.dict_array[start].indexOf(prefix) > -1
    }

    var middle = Math.floor((start + end)/2)

    if (this.dict_array[middle].indexOf(prefix) > -1) {
        // If we prefix exists, return true.
        return true
    } else {
        // Recurse
        if (prefix <= this.dict_array[middle]) {
            return this.search_prefix(prefix, start, middle - 1)
        } else {
            return this.search_prefix(prefix, middle + 1, end)
        }
    }
}

Dict.prototype.search_exact = function(exact, start, end) {
    if (start >= end) {
        // If no more place to search, return no matter what.
        return this.dict_array[start] === exact
    }

    var middle = Math.floor((start + end)/2)

    if (this.dict_array[middle] === exact) {
        // If we prefix exists, return true.
        return true
    } else {
        // Recurse
        if (exact <= this.dict_array[middle]) {
            return this.search_exact(exact, start, middle - 1)
        } else {
            return this.search_exact(exact, middle + 1, end)
        }
    }
}

var board = [
    ['F', 'X', 'I', 'E']
  , ['A', 'M', 'L', 'O']
  , ['E', 'W', 'B', 'X']
  , ['A', 'S', 'T', 'U']
]

var dict = new Dict()

var b = Board.from_raw(board, dict)
b.update_potential_words()
console.log(JSON.stringify(b.words.sort(function(a, b) {
    return a.length - b.length
})))