我希望提供一个结构化的配置文件,它对于非技术用户来说尽可能容易编辑(不幸的是它必须是一个文件),所以我想使用YAML。然而,我找不到任何方法从Unix shell脚本解析这个。
当前回答
以下是Stefan Farestam回答的扩展版本:
function parse_yaml {
local prefix=$2
local s='[[:space:]]*' w='[a-zA-Z0-9_]*' fs=$(echo @|tr @ '\034')
sed -ne "s|,$s\]$s\$|]|" \
-e ":1;s|^\($s\)\($w\)$s:$s\[$s\(.*\)$s,$s\(.*\)$s\]|\1\2: [\3]\n\1 - \4|;t1" \
-e "s|^\($s\)\($w\)$s:$s\[$s\(.*\)$s\]|\1\2:\n\1 - \3|;p" $1 | \
sed -ne "s|,$s}$s\$|}|" \
-e ":1;s|^\($s\)-$s{$s\(.*\)$s,$s\($w\)$s:$s\(.*\)$s}|\1- {\2}\n\1 \3: \4|;t1" \
-e "s|^\($s\)-$s{$s\(.*\)$s}|\1-\n\1 \2|;p" | \
sed -ne "s|^\($s\):|\1|" \
-e "s|^\($s\)-$s[\"']\(.*\)[\"']$s\$|\1$fs$fs\2|p" \
-e "s|^\($s\)-$s\(.*\)$s\$|\1$fs$fs\2|p" \
-e "s|^\($s\)\($w\)$s:$s[\"']\(.*\)[\"']$s\$|\1$fs\2$fs\3|p" \
-e "s|^\($s\)\($w\)$s:$s\(.*\)$s\$|\1$fs\2$fs\3|p" | \
awk -F$fs '{
indent = length($1)/2;
vname[indent] = $2;
for (i in vname) {if (i > indent) {delete vname[i]; idx[i]=0}}
if(length($2)== 0){ vname[indent]= ++idx[indent] };
if (length($3) > 0) {
vn=""; for (i=0; i<indent; i++) { vn=(vn)(vname[i])("_")}
printf("%s%s%s=\"%s\"\n", "'$prefix'",vn, vname[indent], $3);
}
}'
}
该版本支持字典和列表的-符号和短符号。以下输入:
global:
input:
- "main.c"
- "main.h"
flags: [ "-O3", "-fpic" ]
sample_input:
- { property1: value, property2: "value2" }
- { property1: "value3", property2: 'value 4' }
产生如下输出:
global_input_1="main.c"
global_input_2="main.h"
global_flags_1="-O3"
global_flags_2="-fpic"
global_sample_input_1_property1="value"
global_sample_input_1_property2="value2"
global_sample_input_2_property1="value3"
global_sample_input_2_property2="value 4"
as you can see the - items automatically get numbered in order to obtain different variable names for each item. In bash there are no multidimensional arrays, so this is one way to work around. Multiple levels are supported. To work around the problem with trailing white spaces mentioned by @briceburg one should enclose the values in single or double quotes. However, there are still some limitations: Expansion of the dictionaries and lists can produce wrong results when values contain commas. Also, more complex structures like values spanning multiple lines (like ssh-keys) are not (yet) supported.
A few words about the code: The first sed command expands the short form of dictionaries { key: value, ...} to regular and converts them to more simple yaml style. The second sed call does the same for the short notation of lists and converts [ entry, ... ] to an itemized list with the - notation. The third sed call is the original one that handled normal dictionaries, now with the addition to handle lists with - and indentations. The awk part introduces an index for each indentation level and increases it when the variable name is empty (i.e. when processing a list). The current value of the counters are used instead of the empty vname. When going up one level, the counters are zeroed.
编辑:我已经为此创建了一个github存储库。
其他回答
我的用例可能与这篇原始文章所要求的完全相同,也可能不完全相同,但它肯定是相似的。
我需要拉一些YAML作为bash变量。YAML的深度永远不会超过一层。
YAML看起来是这样的:
KEY: value
ANOTHER_KEY: another_value
OH_MY_SO_MANY_KEYS: yet_another_value
LAST_KEY: last_value
输出如下:
KEY="value"
ANOTHER_KEY="another_value"
OH_MY_SO_MANY_KEYS="yet_another_value"
LAST_KEY="last_value"
我用这一行实现了输出:
sed -e 's/:[^:\/\/]/="/g;s/$/"/g;s/ *=/=/g' file.yaml > file.sh
s/:[^:\/\/]/="/g查找:并将其替换为=",同时忽略://(对于url) S /$/"/g将"附加到每一行的末尾 S / *=/=/g删除=前面的所有空格
考虑到Python3和PyYAML是非常容易满足的依赖关系,下面的代码可能会有所帮助:
yaml() {
python3 -c "import yaml;print(yaml.safe_load(open('$1'))$2)"
}
VALUE=$(yaml ~/my_yaml_file.yaml "['a_key']")
可以将一个小脚本传递给一些解释器,比如Python。使用Ruby和它的YAML库的简单方法如下:
$ RUBY_SCRIPT="data = YAML::load(STDIN.read); puts data['a']; puts data['b']"
$ echo -e '---\na: 1234\nb: 4321' | ruby -ryaml -e "$RUBY_SCRIPT"
1234
4321
,其中data是来自yaml的值的散列(或数组)。
作为奖励,它可以很好地解析杰基尔的正面问题。
ruby -ryaml -e "puts YAML::load(open(ARGV.first).read)['tags']" example.md
使用Python的PyYAML或YAML::Perl等库最容易进行复杂的解析。
如果您希望将所有YAML值解析为bash值,请尝试此脚本。这也可以处理注释。参见下面的示例用法:
# pparse.py
import yaml
import sys
def parse_yaml(yml, name=''):
if isinstance(yml, list):
for data in yml:
parse_yaml(data, name)
elif isinstance(yml, dict):
if (len(yml) == 1) and not isinstance(yml[list(yml.keys())[0]], list):
print(str(name+'_'+list(yml.keys())[0]+'='+str(yml[list(yml.keys())[0]]))[1:])
else:
for key in yml:
parse_yaml(yml[key], name+'_'+key)
if __name__=="__main__":
yml = yaml.safe_load(open(sys.argv[1]))
parse_yaml(yml)
test.yml
- folders:
- temp_folder: datasets/outputs/tmp
- keep_temp_folder: false
- MFA:
- MFA: false
- speaker_count: 1
- G2P:
- G2P: true
- G2P_model: models/MFA/G2P/english_g2p.zip
- input_folder: datasets/outputs/Youtube/ljspeech/wavs
- output_dictionary: datasets/outputs/Youtube/ljspeech/dictionary.dict
- dictionary: datasets/outputs/Youtube/ljspeech/dictionary.dict
- acoustic_model: models/MFA/acoustic/english.zip
- temp_folder: datasets/outputs/tmp
- jobs: 4
- align:
- config: configs/MFA/align.yaml
- dataset: datasets/outputs/Youtube/ljspeech/wavs
- output_folder: datasets/outputs/Youtube/ljspeech-aligned
- TTS:
- output_folder: datasets/outputs/Youtube
- preprocess:
- preprocess: true
- config: configs/TTS_preprocess.yaml # Default Config
- textgrid_folder: datasets/outputs/Youtube/ljspeech-aligned
- output_duration_folder: datasets/outputs/Youtube/durations
- sampling_rate: 44000 # Make sure sampling rate is same here as in preprocess config
需要YAML值的脚本:
yaml() {
eval $(python pparse.py "$1")
}
yaml "test.yml"
# What python printed to bash:
folders_temp_folder=datasets/outputs/tmp
folders_keep_temp_folder=False
MFA_MFA=False
MFA_speaker_count=1
MFA_G2P_G2P=True
MFA_G2P_G2P_model=models/MFA/G2P/english_g2p.zip
MFA_G2P_input_folder=datasets/outputs/Youtube/ljspeech/wavs
MFA_G2P_output_dictionary=datasets/outputs/Youtube/ljspeech/dictionary.dict
MFA_dictionary=datasets/outputs/Youtube/ljspeech/dictionary.dict
MFA_acoustic_model=models/MFA/acoustic/english.zip
MFA_temp_folder=datasets/outputs/tmp
MFA_jobs=4
MFA_align_config=configs/MFA/align.yaml
MFA_align_dataset=datasets/outputs/Youtube/ljspeech/wavs
MFA_align_output_folder=datasets/outputs/Youtube/ljspeech-aligned
TTS_output_folder=datasets/outputs/Youtube
TTS_preprocess_preprocess=True
TTS_preprocess_config=configs/TTS_preprocess.yaml
TTS_preprocess_textgrid_folder=datasets/outputs/Youtube/ljspeech-aligned
TTS_preprocess_output_duration_folder=datasets/outputs/Youtube/durations
TTS_preprocess_sampling_rate=44000
使用bash访问变量:
echo "$TTS_preprocess_sampling_rate";
>>> 44000
现在做这件事的一个快速方法(以前的方法对我没用):
sudo wget https://github.com/mikefarah/yq/releases/download/v4.4.1/yq_linux_amd64 -O /usr/bin/yq &&\
sudo chmod +x /usr/bin/yq
示例asd.yaml:
a_list:
- key1: value1
key2: value2
key3: value3
解析:根
user@vm:~$ yq e '.' asd.yaml
a_list:
- key1: value1
key2: value2
key3: value3
解析key3:
user@vm:~$ yq e '.a_list[0].key3' asd.yaml
value3
推荐文章
- 我如何在Ruby中解析YAML文件?
- 如何从查找“类型d”中排除此/ current / dot文件夹
- 只使用md5sum获取哈希值(没有文件名)
- 使用sh shell比较字符串
- 在Bash中测试非零长度字符串:[-n "$var"]或["$var"]
- 如何创建Bash别名?
- 如何设置ssh超时时间?
- 将所有变量从一个shell脚本传递到另一个?
- 只列出UNIX中的目录
- 如何删除shell脚本中文件名的扩展名?
- 'find -exec'是Linux中的shell函数
- 临时更改bash中的当前工作目录以运行命令
- 重定向复制的标准输出到日志文件从bash脚本本身
- Shell脚本for循环语法
- Docker入口点运行bash脚本被“拒绝权限”