如何在Bash中定义哈希表?

什么是Python字典的等效，但在Bash中(应该适用于OS X和Linux)。

当前回答

有两件事，在任何2.6内核中使用/dev/shm (Redhat)都可以使用内存而不是/tmp，其他发行版可能会有所不同。hget也可以用read重新实现，如下所示:

function hget {

  while read key idx
  do
    if [ $key = $2 ]
    then
      echo $idx
      return
    fi
  done < /dev/shm/hashmap.$1
}

此外，通过假设所有键都是唯一的，返回值会使读循环短路，从而避免读取所有条目。如果您的实现可以有重复的键，那么只需省略返回。这节省了读取和派生grep和awk的开销。在这两个实现中使用/dev/shm会在3个条目的散列上使用time hget来搜索最后一个条目:

握把/卒：

hget() {
    grep "^$2 " /dev/shm/hashmap.$1 | awk '{ print $2 };'
}

$ time echo $(hget FD oracle)
3

real    0m0.011s
user    0m0.002s
sys     0m0.013s

Read / echo:

$ time echo $(hget FD oracle)
3

real    0m0.004s
user    0m0.000s
sys     0m0.004s

在多次调用中，我从未看到过低于50%的改善。这都是由于使用了/dev/shm.而导致的

2010-08-14 23:45:30

其他回答

在bash 4之前，在bash中没有使用关联数组的好方法。最好的办法是使用一种真正支持这些功能的解释语言，比如awk。另一方面，bash 4确实支持它们。

至于bash 3中不太好的方法，这里有一个参考:http://mywiki.wooledge.org/BashFAQ/006

2010-08-12 12:53:37

我真的很喜欢Al P的答案，但想要唯一性强制廉价，所以我更进一步-使用目录。有一些明显的限制(目录文件限制，无效的文件名)，但它应该适用于大多数情况。

hinit() {
    rm -rf /tmp/hashmap.$1
    mkdir -p /tmp/hashmap.$1
}

hput() {
    printf "$3" > /tmp/hashmap.$1/$2
}

hget() {
    cat /tmp/hashmap.$1/$2
}

hkeys() {
    ls -1 /tmp/hashmap.$1
}

hdestroy() {
    rm -rf /tmp/hashmap.$1
}

hinit ids

for (( i = 0; i < 10000; i++ )); do
    hput ids "key$i" "value$i"
done

for (( i = 0; i < 10000; i++ )); do
    printf '%s\n' $(hget ids "key$i") > /dev/null
done

hdestroy ids

在我的测试中，它的表现也稍微好一点。

$ time bash hash.sh 
real    0m46.500s
user    0m16.767s
sys     0m51.473s

$ time bash dirhash.sh 
real    0m35.875s
user    0m8.002s
sys     0m24.666s

我只是想帮帮忙。干杯!

编辑:添加hdestroy()

2010-10-28 18:36:34

我同意@lhunath和其他人的观点，关联数组是Bash 4的首选。如果你坚持使用Bash 3 (OSX，你不能更新的旧发行版)，你也可以使用expr，它应该无处不在，一个字符串和正则表达式。我喜欢它，尤其是当字典不是太大。

Choose 2 separators that you will not use in keys and values (e.g. ',' and ':' ) Write your map as a string (note the separator ',' also at beginning and end) animals=",moo:cow,woof:dog," Use a regex to extract the values get_animal { echo "$(expr "$animals" : ".*,$1:$[^,]*$,.*")" } Split the string to list the items get_animal_items { arr=$(echo "${animals:1:${#animals}-2}" | tr "," "\n") for i in $arr do value="${i##*:}" key="${i%%:*}" echo "${value} likes to $key" done }

现在你可以使用它:

$ animal = get_animal "moo"
cow
$ get_animal_items
cow likes to moo
dog likes to woof

2014-04-17 23:05:58

Bash 3解决方案:

在阅读一些答案的过程中，我整理了一个快速的小函数，我想贡献出来，可能会帮助到其他人。

# Define a hash like this
MYHASH=("firstName:Milan"
        "lastName:Adamovsky")

# Function to get value by key
getHashKey()
 {
  declare -a hash=("${!1}")
  local key
  local lookup=$2

  for key in "${hash[@]}" ; do
   KEY=${key%%:*}
   VALUE=${key#*:}
   if [[ $KEY == $lookup ]]
   then
    echo $VALUE
   fi
  done
 }

# Function to get a list of all keys
getHashKeys()
 {
  declare -a hash=("${!1}")
  local KEY
  local VALUE
  local key
  local lookup=$2

  for key in "${hash[@]}" ; do
   KEY=${key%%:*}
   VALUE=${key#*:}
   keys+="${KEY} "
  done

  echo $keys
 }

# Here we want to get the value of 'lastName'
echo $(getHashKey MYHASH[@] "lastName")


# Here we want to get all keys
echo $(getHashKeys MYHASH[@])

2013-08-29 15:26:17

你可以进一步修改hput()/hget()接口，这样你就有了如下命名的哈希值:

hput() {
    eval "$1""$2"='$3'
}

hget() {
    eval echo '${'"$1$2"'#hash}'
}

然后

hput capitals France Paris
hput capitals Netherlands Amsterdam
hput capitals Spain Madrid
echo `hget capitals France` and `hget capitals Netherlands` and `hget capitals Spain`

这让你可以定义其他不冲突的地图(例如，'rcapitals'根据首都城市进行国家查找)。但是，不管怎样，我想你会发现这一切都很糟糕，就性能而言。

编辑:上面的修改版本，支持非字母数字字符的键

hashKey() {
  # replace non-alphanumeric characters with underscore to make keys valid BASH identifiers
  echo "$1_$2" | sed -E "s/[^a-zA-Z0-9]+/_/g" | sed -E "s/^[^a-zA-Z0-9]+|[^a-zA-Z0-9]+\$//g"
}

hashPut() {
  local KEY=`hashKey $1 $2`
  eval "$KEY"="$3"
}

hashGet() {
  local KEY=`hashKey $1 $2`
  echo "${!KEY}"
}

最后编辑

如果你真的想要快速哈希查找，有一个非常非常糟糕的黑客，它实际上非常有效。它是这样的:将您的键/值写入一个临时文件，每行一个，然后使用'grep "^$key"'将它们取出，使用带有cut或awk或sed或其他工具的管道来检索值。

就像我说的，这听起来很可怕，听起来它应该很慢，做各种不必要的IO，但实际上它非常快(磁盘缓存很棒，不是吗?)，即使对于非常大的哈希表也是如此。你必须自己强制键的唯一性等等。即使只有几百个条目，输出文件/grep组合也会快很多——以我的经验，快几倍。它还消耗更少的内存。

这里有一种方法:

hinit() {
    rm -f /tmp/hashmap.$1
}

hput() {
    echo "$2 $3" >> /tmp/hashmap.$1
}

hget() {
    grep "^$2 " /tmp/hashmap.$1 | awk '{ print $2 };'
}

hinit capitals
hput capitals France Paris
hput capitals Netherlands Amsterdam
hput capitals Spain Madrid

echo `hget capitals France` and `hget capitals Netherlands` and `hget capitals Spain`

2010-02-08 23:38:29

如何在Bash中定义哈希表?

推荐文章

最新文章

标签