最近我一直在iPhone上玩一款名为《Scramble》的游戏。有些人可能知道这个游戏叫拼字游戏。从本质上讲,当游戏开始时,你会得到一个字母矩阵:

F X I E
A M L O
E W B X
A S T U

The goal of the game is to find as many words as you can that can be formed by chaining letters together. You can start with any letter, and all the letters that surround it are fair game, and then once you move on to the next letter, all the letters that surround that letter are fair game, except for any previously used letters. So in the grid above, for example, I could come up with the words LOB, TUX, SEA, FAME, etc. Words must be at least 3 characters, and no more than NxN characters, which would be 16 in this game but can vary in some implementations. While this game is fun and addictive, I am apparently not very good at it and I wanted to cheat a little bit by making a program that would give me the best possible words (the longer the word the more points you get).

(来源:boggled.org)

不幸的是,我不太擅长算法或它们的效率等等。我的第一次尝试使用一个像这样的字典(约2.3MB),并进行线性搜索,试图匹配字典条目的组合。这需要花费很长时间来找到可能的单词,因为你每轮只有2分钟的时间,这是不够的。

我很有兴趣看看是否有任何Stackoverflowers可以提出更有效的解决方案。我主要是在寻找使用三大p的解决方案:Python、PHP和Perl,尽管任何使用Java或c++的东西也很酷,因为速度是至关重要的。

目前的解决方案:

Adam Rosenfield, Python, ~20岁 John Fouhy, Python, ~3秒 Kent Fredric, Perl, ~1s Darius Bacon, Python, ~1s rvarcher, VB。净,~ 1 s Paolo Bergantino, PHP(实时链接),~5s(本地~2s)


当前回答

    package ProblemSolving;

import java.util.HashSet;
import java.util.Set;

/**
 * Given a 2-dimensional array of characters and a
 * dictionary in which a word can be searched in O(1) time.
 * Need to print all the words from array which are present
 * in dictionary. Word can be formed in any direction but
 * has to end at any edge of array.
 * (Need not worry much about the dictionary)
 */
public class DictionaryWord {
    private static char[][] matrix = new char[][]{
            {'a', 'f', 'h', 'u', 'n'},
            {'e', 't', 'a', 'i', 'r'},
            {'a', 'e', 'g', 'g', 'o'},
            {'t', 'r', 'm', 'l', 'p'}
    };
    private static int dim_x = matrix.length;
    private static int dim_y = matrix[matrix.length -1].length;
    private static Set<String> wordSet = new HashSet<String>();

    public static void main(String[] args) {
        //dictionary
        wordSet.add("after");
        wordSet.add("hate");
        wordSet.add("hair");
        wordSet.add("air");
        wordSet.add("eat");
        wordSet.add("tea");

        for (int x = 0; x < dim_x; x++) {
            for (int y = 0; y < dim_y; y++) {
                checkAndPrint(matrix[x][y] + "");
                int[][] visitedMap = new int[dim_x][dim_y];
                visitedMap[x][y] = 1;
                recursion(matrix[x][y] + "", visitedMap, x, y);
            }
        }
    }

    private static void checkAndPrint(String word) {
        if (wordSet.contains(word)) {
            System.out.println(word);
        }
    }

    private static void recursion(String word, int[][] visitedMap, int x, int y) {
        for (int i = Math.max(x - 1, 0); i < Math.min(x + 2, dim_x); i++) {
            for (int j = Math.max(y - 1, 0); j < Math.min(y + 2, dim_y); j++) {
                if (visitedMap[i][j] == 1) {
                    continue;
                } else {
                    int[][] newVisitedMap = new int[dim_x][dim_y];
                    for (int p = 0; p < dim_x; p++) {
                        for (int q = 0; q < dim_y; q++) {
                           newVisitedMap[p][q] = visitedMap[p][q];
                        }
                    }
                    newVisitedMap[i][j] = 1;
                    checkAndPrint(word + matrix[i][j]);
                    recursion(word + matrix[i][j], newVisitedMap, i, j);
                }
            }
        }
    }

}

其他回答

首先,阅读c#语言设计师如何解决一个相关问题: http://blogs.msdn.com/ericlippert/archive/2009/02/04/a-nasality-talisman-for-the-sultana-analyst.aspx。

像他一样,您可以从字典开始,并通过从字母排序的字母数组到可以根据这些字母拼写的单词列表创建字典来规范化单词。

接下来,开始从黑板上创建可能的单词并查找它们。我怀疑这将让你走得很远,但肯定有更多的技巧可以加快速度。

对于字典加速,有一个通用的转换/过程可以大大减少提前的字典比较。

鉴于上面的网格只包含16个字符,其中一些字符是重复的,您可以通过简单地过滤掉具有不可获取字符的条目来大大减少字典中的总键数。

我认为这是明显的优化,但看到没有人这么做,我就提出来了。

在输入过程中,它将我的字典从20万个键减少到只有2000个键。这至少减少了内存开销,并且这肯定会映射到某个地方的速度增加,因为内存不是无限快的。

Perl实现

我的实现有点头重脚轻,因为我重视能够知道每个提取的字符串的确切路径,而不仅仅是其中的有效性。

我也有一些适应在那里,理论上允许一个网格中有洞的功能,网格有不同大小的线(假设你得到了正确的输入,它以某种方式对齐)。

早期筛选器是我的应用程序中最重要的瓶颈,正如之前怀疑的那样,注释掉了一行从1.5s膨胀到7.5s的代码。

在执行时,它似乎认为所有的个位数都在他们自己的有效单词上,但我很确定这是由于字典文件的工作方式。

它有点臃肿,但至少我重用了cpan中的Tree::Trie

其中有些部分是受到现有实现的启发,有些是我已经想到的。

建设性的批评和改进的方法欢迎(/我注意到他从来没有在CPAN上搜索过一个拼字游戏解决器,但这更有趣)

新标准更新

#!/usr/bin/perl 

use strict;
use warnings;

{

  # this package manages a given path through the grid.
  # Its an array of matrix-nodes in-order with
  # Convenience functions for pretty-printing the paths
  # and for extending paths as new paths.

  # Usage:
  # my $p = Prefix->new(path=>[ $startnode ]);
  # my $c = $p->child( $extensionNode );
  # print $c->current_word ;

  package Prefix;
  use Moose;

  has path => (
      isa     => 'ArrayRef[MatrixNode]',
      is      => 'rw',
      default => sub { [] },
  );
  has current_word => (
      isa        => 'Str',
      is         => 'rw',
      lazy_build => 1,
  );

  # Create a clone of this object
  # with a longer path

  # $o->child( $successive-node-on-graph );

  sub child {
      my $self    = shift;
      my $newNode = shift;
      my $f       = Prefix->new();

      # Have to do this manually or other recorded paths get modified
      push @{ $f->{path} }, @{ $self->{path} }, $newNode;
      return $f;
  }

  # Traverses $o->path left-to-right to get the string it represents.

  sub _build_current_word {
      my $self = shift;
      return join q{}, map { $_->{value} } @{ $self->{path} };
  }

  # Returns  the rightmost node on this path

  sub tail {
      my $self = shift;
      return $self->{path}->[-1];
  }

  # pretty-format $o->path

  sub pp_path {
      my $self = shift;
      my @path =
        map { '[' . $_->{x_position} . ',' . $_->{y_position} . ']' }
        @{ $self->{path} };
      return "[" . join( ",", @path ) . "]";
  }

  # pretty-format $o
  sub pp {
      my $self = shift;
      return $self->current_word . ' => ' . $self->pp_path;
  }

  __PACKAGE__->meta->make_immutable;
}

{

  # Basic package for tracking node data
  # without having to look on the grid.
  # I could have just used an array or a hash, but that got ugly.

# Once the matrix is up and running it doesn't really care so much about rows/columns,
# Its just a sea of points and each point has adjacent points.
# Relative positioning is only really useful to map it back to userspace

  package MatrixNode;
  use Moose;

  has x_position => ( isa => 'Int', is => 'rw', required => 1 );
  has y_position => ( isa => 'Int', is => 'rw', required => 1 );
  has value      => ( isa => 'Str', is => 'rw', required => 1 );
  has siblings   => (
      isa     => 'ArrayRef[MatrixNode]',
      is      => 'rw',
      default => sub { [] }
  );

# Its not implicitly uni-directional joins. It would be more effient in therory
# to make the link go both ways at the same time, but thats too hard to program around.
# and besides, this isn't slow enough to bother caring about.

  sub add_sibling {
      my $self    = shift;
      my $sibling = shift;
      push @{ $self->siblings }, $sibling;
  }

  # Convenience method to derive a path starting at this node

  sub to_path {
      my $self = shift;
      return Prefix->new( path => [$self] );
  }
  __PACKAGE__->meta->make_immutable;

}

{

  package Matrix;
  use Moose;

  has rows => (
      isa     => 'ArrayRef',
      is      => 'rw',
      default => sub { [] },
  );

  has regex => (
      isa        => 'Regexp',
      is         => 'rw',
      lazy_build => 1,
  );

  has cells => (
      isa        => 'ArrayRef',
      is         => 'rw',
      lazy_build => 1,
  );

  sub add_row {
      my $self = shift;
      push @{ $self->rows }, [@_];
  }

  # Most of these functions from here down are just builder functions,
  # or utilities to help build things.
  # Some just broken out to make it easier for me to process.
  # All thats really useful is add_row
  # The rest will generally be computed, stored, and ready to go
  # from ->cells by the time either ->cells or ->regex are called.

  # traverse all cells and make a regex that covers them.
  sub _build_regex {
      my $self  = shift;
      my $chars = q{};
      for my $cell ( @{ $self->cells } ) {
          $chars .= $cell->value();
      }
      $chars = "[^$chars]";
      return qr/$chars/i;
  }

  # convert a plain cell ( ie: [x][y] = 0 )
  # to an intelligent cell ie: [x][y] = object( x, y )
  # we only really keep them in this format temporarily
  # so we can go through and tie in neighbouring information.
  # after the neigbouring is done, the grid should be considered inoperative.

  sub _convert {
      my $self = shift;
      my $x    = shift;
      my $y    = shift;
      my $v    = $self->_read( $x, $y );
      my $n    = MatrixNode->new(
          x_position => $x,
          y_position => $y,
          value      => $v,
      );
      $self->_write( $x, $y, $n );
      return $n;
  }

# go through the rows/collums presently available and freeze them into objects.

  sub _build_cells {
      my $self = shift;
      my @out  = ();
      my @rows = @{ $self->{rows} };
      for my $x ( 0 .. $#rows ) {
          next unless defined $self->{rows}->[$x];
          my @col = @{ $self->{rows}->[$x] };
          for my $y ( 0 .. $#col ) {
              next unless defined $self->{rows}->[$x]->[$y];
              push @out, $self->_convert( $x, $y );
          }
      }
      for my $c (@out) {
          for my $n ( $self->_neighbours( $c->x_position, $c->y_position ) ) {
              $c->add_sibling( $self->{rows}->[ $n->[0] ]->[ $n->[1] ] );
          }
      }
      return \@out;
  }

  # given x,y , return array of points that refer to valid neighbours.
  sub _neighbours {
      my $self = shift;
      my $x    = shift;
      my $y    = shift;
      my @out  = ();
      for my $sx ( -1, 0, 1 ) {
          next if $sx + $x < 0;
          next if not defined $self->{rows}->[ $sx + $x ];
          for my $sy ( -1, 0, 1 ) {
              next if $sx == 0 && $sy == 0;
              next if $sy + $y < 0;
              next if not defined $self->{rows}->[ $sx + $x ]->[ $sy + $y ];
              push @out, [ $sx + $x, $sy + $y ];
          }
      }
      return @out;
  }

  sub _has_row {
      my $self = shift;
      my $x    = shift;
      return defined $self->{rows}->[$x];
  }

  sub _has_cell {
      my $self = shift;
      my $x    = shift;
      my $y    = shift;
      return defined $self->{rows}->[$x]->[$y];
  }

  sub _read {
      my $self = shift;
      my $x    = shift;
      my $y    = shift;
      return $self->{rows}->[$x]->[$y];
  }

  sub _write {
      my $self = shift;
      my $x    = shift;
      my $y    = shift;
      my $v    = shift;
      $self->{rows}->[$x]->[$y] = $v;
      return $v;
  }

  __PACKAGE__->meta->make_immutable;
}

use Tree::Trie;

sub readDict {
  my $fn = shift;
  my $re = shift;
  my $d  = Tree::Trie->new();

  # Dictionary Loading
  open my $fh, '<', $fn;
  while ( my $line = <$fh> ) {
      chomp($line);

 # Commenting the next line makes it go from 1.5 seconds to 7.5 seconds. EPIC.
      next if $line =~ $re;    # Early Filter
      $d->add( uc($line) );
  }
  return $d;
}

sub traverseGraph {
  my $d     = shift;
  my $m     = shift;
  my $min   = shift;
  my $max   = shift;
  my @words = ();

  # Inject all grid nodes into the processing queue.

  my @queue =
    grep { $d->lookup( $_->current_word ) }
    map  { $_->to_path } @{ $m->cells };

  while (@queue) {
      my $item = shift @queue;

      # put the dictionary into "exact match" mode.

      $d->deepsearch('exact');

      my $cword = $item->current_word;
      my $l     = length($cword);

      if ( $l >= $min && $d->lookup($cword) ) {
          push @words,
            $item;    # push current path into "words" if it exactly matches.
      }
      next if $l > $max;

      # put the dictionary into "is-a-prefix" mode.
      $d->deepsearch('boolean');

    siblingloop: foreach my $sibling ( @{ $item->tail->siblings } ) {
          foreach my $visited ( @{ $item->{path} } ) {
              next siblingloop if $sibling == $visited;
          }

          # given path y , iterate for all its end points
          my $subpath = $item->child($sibling);

          # create a new path for each end-point
          if ( $d->lookup( $subpath->current_word ) ) {

             # if the new path is a prefix, add it to the bottom of the queue.
              push @queue, $subpath;
          }
      }
  }
  return \@words;
}

sub setup_predetermined { 
  my $m = shift; 
  my $gameNo = shift;
  if( $gameNo == 0 ){
      $m->add_row(qw( F X I E ));
      $m->add_row(qw( A M L O ));
      $m->add_row(qw( E W B X ));
      $m->add_row(qw( A S T U ));
      return $m;
  }
  if( $gameNo == 1 ){
      $m->add_row(qw( D G H I ));
      $m->add_row(qw( K L P S ));
      $m->add_row(qw( Y E U T ));
      $m->add_row(qw( E O R N ));
      return $m;
  }
}
sub setup_random { 
  my $m = shift; 
  my $seed = shift;
  srand $seed;
  my @letters = 'A' .. 'Z' ; 
  for( 1 .. 4 ){ 
      my @r = ();
      for( 1 .. 4 ){
          push @r , $letters[int(rand(25))];
      }
      $m->add_row( @r );
  }
}

# Here is where the real work starts.

my $m = Matrix->new();
setup_predetermined( $m, 0 );
#setup_random( $m, 5 );

my $d = readDict( 'dict.txt', $m->regex );
my $c = scalar @{ $m->cells };    # get the max, as per spec

print join ",\n", map { $_->pp } @{
  traverseGraph( $d, $m, 3, $c ) ;
};

Arch/执行信息进行比较:

model name      : Intel(R) Core(TM)2 Duo CPU     T9300  @ 2.50GHz
cache size      : 6144 KB
Memory usage summary: heap total: 77057577, heap peak: 11446200, stack peak: 26448
       total calls   total memory   failed calls
 malloc|     947212       68763684              0
realloc|      11191        1045641              0  (nomove:9063, dec:4731, free:0)
 calloc|     121001        7248252              0
   free|     973159       65854762

Histogram for block sizes:
  0-15         392633  36% ==================================================
 16-31          43530   4% =====
 32-47          50048   4% ======
 48-63          70701   6% =========
 64-79          18831   1% ==
 80-95          19271   1% ==
 96-111        238398  22% ==============================
112-127          3007  <1% 
128-143        236727  21% ==============================

关于正则表达式优化的更多嘟囔

我使用的正则表达式优化对于多解字典是无用的,而对于多解字典,您将需要一个完整的字典,而不是一个预先修整过的字典。

然而,也就是说,对于一次性解决,它真的很快。(Perl正则表达式是在C!:))

以下是一些不同的代码添加:

sub readDict_nofilter {
  my $fn = shift;
  my $re = shift;
  my $d  = Tree::Trie->new();

  # Dictionary Loading
  open my $fh, '<', $fn;
  while ( my $line = <$fh> ) {
      chomp($line);
      $d->add( uc($line) );
  }
  return $d;
}

sub benchmark_io { 
  use Benchmark qw( cmpthese :hireswallclock );
   # generate a random 16 character string 
   # to simulate there being an input grid. 
  my $regexen = sub { 
      my @letters = 'A' .. 'Z' ; 
      my @lo = ();
      for( 1..16 ){ 
          push @lo , $_ ; 
      }
      my $c  = join '', @lo;
      $c = "[^$c]";
      return qr/$c/i;
  };
  cmpthese( 200 , { 
      filtered => sub { 
          readDict('dict.txt', $regexen->() );
      }, 
      unfiltered => sub {
          readDict_nofilter('dict.txt');
      }
  });
}
           s/iter unfiltered   filtered
unfiltered   8.16         --       -94%
filtered    0.464      1658%         --

Ps: 8.16 * 200 = 27分钟。

搞笑。几天前我差点因为这款该死的游戏而发布了同样的问题!然而我没有,因为我只是在谷歌上搜索boggle solver python,得到了我想要的所有答案。

一个Node.JS JavaScript解决方案。在不到一秒钟的时间内计算所有100个独特的单词,其中包括阅读字典文件(MBA 2012)。

Output: ["FAM","TUX","TUB","FAE","ELI","ELM","ELB","TWA","TWA","SAW","AMI","SWA","SWA","AME","SEA","SEW","AES","AWL","AWE","SEA","AWA","MIX","MIL","AST","ASE","MAX","MAE","MAW","MEW","AWE","MES","AWL","LIE","LIM","AWA","AES","BUT","BLO","WAS","WAE","WEA","LEI","LEO","LOB","LOX","WEM","OIL","OLM","WEA","WAE","WAX","WAF","MILO","EAST","WAME","TWAS","TWAE","EMIL","WEAM","OIME","AXIL","WEST","TWAE","LIMB","WASE","WAST","BLEO","STUB","BOIL","BOLE","LIME","SAWT","LIMA","MESA","MEWL","AXLE","FAME","ASEM","MILE","AMIL","SEAX","SEAM","SEMI","SWAM","AMBO","AMLI","AXILE","AMBLE","SWAMI","AWEST","AWEST","LIMAX","LIMES","LIMBU","LIMBO","EMBOX","SEMBLE","EMBOLE","WAMBLE","FAMBLE"]

代码:

var fs = require('fs')

var Node = function(value, row, col) {
    this.value = value
    this.row = row
    this.col = col
}

var Path = function() {
    this.nodes = []
}

Path.prototype.push = function(node) {
    this.nodes.push(node)
    return this
}

Path.prototype.contains = function(node) {
    for (var i = 0, ii = this.nodes.length; i < ii; i++) {
        if (this.nodes[i] === node) {
            return true
        }
    }

    return false
}

Path.prototype.clone = function() {
    var path = new Path()
    path.nodes = this.nodes.slice(0)
    return path
}

Path.prototype.to_word = function() {
    var word = ''

    for (var i = 0, ii = this.nodes.length; i < ii; ++i) {
        word += this.nodes[i].value
    }

    return word
}

var Board = function(nodes, dict) {
    // Expects n x m array.
    this.nodes = nodes
    this.words = []
    this.row_count = nodes.length
    this.col_count = nodes[0].length
    this.dict = dict
}

Board.from_raw = function(board, dict) {
    var ROW_COUNT = board.length
      , COL_COUNT = board[0].length

    var nodes = []

    // Replace board with Nodes
    for (var i = 0, ii = ROW_COUNT; i < ii; ++i) {
        nodes.push([])
        for (var j = 0, jj = COL_COUNT; j < jj; ++j) {
            nodes[i].push(new Node(board[i][j], i, j))
        }
    }

    return new Board(nodes, dict)
}

Board.prototype.toString = function() {
    return JSON.stringify(this.nodes)
}

Board.prototype.update_potential_words = function(dict) {
    for (var i = 0, ii = this.row_count; i < ii; ++i) {
        for (var j = 0, jj = this.col_count; j < jj; ++j) {
            var node = this.nodes[i][j]
              , path = new Path()

            path.push(node)

            this.dfs_search(path)
        }
    }
}

Board.prototype.on_board = function(row, col) {
    return 0 <= row && row < this.row_count && 0 <= col && col < this.col_count
}

Board.prototype.get_unsearched_neighbours = function(path) {
    var last_node = path.nodes[path.nodes.length - 1]

    var offsets = [
        [-1, -1], [-1,  0], [-1, +1]
      , [ 0, -1],           [ 0, +1]
      , [+1, -1], [+1,  0], [+1, +1]
    ]

    var neighbours = []

    for (var i = 0, ii = offsets.length; i < ii; ++i) {
        var offset = offsets[i]
        if (this.on_board(last_node.row + offset[0], last_node.col + offset[1])) {

            var potential_node = this.nodes[last_node.row + offset[0]][last_node.col + offset[1]]
            if (!path.contains(potential_node)) {
                // Create a new path if on board and we haven't visited this node yet.
                neighbours.push(potential_node)
            }
        }
    }

    return neighbours
}

Board.prototype.dfs_search = function(path) {
    var path_word = path.to_word()

    if (this.dict.contains_exact(path_word) && path_word.length >= 3) {
        this.words.push(path_word)
    }

    var neighbours = this.get_unsearched_neighbours(path)

    for (var i = 0, ii = neighbours.length; i < ii; ++i) {
        var neighbour = neighbours[i]
        var new_path = path.clone()
        new_path.push(neighbour)

        if (this.dict.contains_prefix(new_path.to_word())) {
            this.dfs_search(new_path)
        }
    }
}

var Dict = function() {
    this.dict_array = []

    var dict_data = fs.readFileSync('./web2', 'utf8')
    var dict_array = dict_data.split('\n')

    for (var i = 0, ii = dict_array.length; i < ii; ++i) {
        dict_array[i] = dict_array[i].toUpperCase()
    }

    this.dict_array = dict_array.sort()
}

Dict.prototype.contains_prefix = function(prefix) {
    // Binary search
    return this.search_prefix(prefix, 0, this.dict_array.length)
}

Dict.prototype.contains_exact = function(exact) {
    // Binary search
    return this.search_exact(exact, 0, this.dict_array.length)
}

Dict.prototype.search_prefix = function(prefix, start, end) {
    if (start >= end) {
        // If no more place to search, return no matter what.
        return this.dict_array[start].indexOf(prefix) > -1
    }

    var middle = Math.floor((start + end)/2)

    if (this.dict_array[middle].indexOf(prefix) > -1) {
        // If we prefix exists, return true.
        return true
    } else {
        // Recurse
        if (prefix <= this.dict_array[middle]) {
            return this.search_prefix(prefix, start, middle - 1)
        } else {
            return this.search_prefix(prefix, middle + 1, end)
        }
    }
}

Dict.prototype.search_exact = function(exact, start, end) {
    if (start >= end) {
        // If no more place to search, return no matter what.
        return this.dict_array[start] === exact
    }

    var middle = Math.floor((start + end)/2)

    if (this.dict_array[middle] === exact) {
        // If we prefix exists, return true.
        return true
    } else {
        // Recurse
        if (exact <= this.dict_array[middle]) {
            return this.search_exact(exact, start, middle - 1)
        } else {
            return this.search_exact(exact, middle + 1, end)
        }
    }
}

var board = [
    ['F', 'X', 'I', 'E']
  , ['A', 'M', 'L', 'O']
  , ['E', 'W', 'B', 'X']
  , ['A', 'S', 'T', 'U']
]

var dict = new Dict()

var b = Board.from_raw(board, dict)
b.update_potential_words()
console.log(JSON.stringify(b.words.sort(function(a, b) {
    return a.length - b.length
})))

你的搜索算法是否会随着搜索的继续而不断减少单词列表?

例如,在上面的搜索中,你的单词只能以13个字母开头(有效地减少了一半的开头字母)。

当你添加更多的字母排列时,它会进一步减少可用的单词集,减少必要的搜索。

我会从这里开始。