我希望提供一个结构化的配置文件,它对于非技术用户来说尽可能容易编辑(不幸的是它必须是一个文件),所以我想使用YAML。然而,我找不到任何方法从Unix shell脚本解析这个。


当前回答

下面是一个bash-only解析器,利用sed和awk来解析简单的yaml文件:

function parse_yaml {
   local prefix=$2
   local s='[[:space:]]*' w='[a-zA-Z0-9_]*' fs=$(echo @|tr @ '\034')
   sed -ne "s|^\($s\):|\1|" \
        -e "s|^\($s\)\($w\)$s:$s[\"']\(.*\)[\"']$s\$|\1$fs\2$fs\3|p" \
        -e "s|^\($s\)\($w\)$s:$s\(.*\)$s\$|\1$fs\2$fs\3|p"  $1 |
   awk -F$fs '{
      indent = length($1)/2;
      vname[indent] = $2;
      for (i in vname) {if (i > indent) {delete vname[i]}}
      if (length($3) > 0) {
         vn=""; for (i=0; i<indent; i++) {vn=(vn)(vname[i])("_")}
         printf("%s%s%s=\"%s\"\n", "'$prefix'",vn, $2, $3);
      }
   }'
}

它可以理解以下文件:

## global definitions
global:
  debug: yes
  verbose: no
  debugging:
    detailed: no
    header: "debugging started"

## output
output:
   file: "yes"

在解析时使用:

parse_yaml sample.yml

将输出:

global_debug="yes"
global_verbose="no"
global_debugging_detailed="no"
global_debugging_header="debugging started"
output_file="yes"

它也理解由ruby生成的yaml文件,其中可能包含ruby符号,例如:

---
:global:
  :debug: 'yes'
  :verbose: 'no'
  :debugging:
    :detailed: 'no'
    :header: debugging started
  :output: 'yes'

并将输出与前一个示例相同的结果。

脚本中的典型用法是:

eval $(parse_yaml sample.yml)

Parse_yaml接受一个前缀参数,这样导入的所有设置都有一个公共前缀(这将减少名称空间冲突的风险)。

parse_yaml sample.yml "CONF_"

收益率:

CONF_global_debug="yes"
CONF_global_verbose="no"
CONF_global_debugging_detailed="no"
CONF_global_debugging_header="debugging started"
CONF_output_file="yes"

注意,之前文件中的设置可以被后面的设置引用:

## global definitions
global:
  debug: yes
  verbose: no
  debugging:
    detailed: no
    header: "debugging started"

## output
output:
   debug: $global_debug

另一个很好的用法是先解析默认文件,然后解析用户设置,这是可行的,因为后一个设置会覆盖第一个设置:

eval $(parse_yaml defaults.yml)
eval $(parse_yaml project.yml)

其他回答

使用Python的PyYAML或YAML::Perl等库最容易进行复杂的解析。

如果您希望将所有YAML值解析为bash值,请尝试此脚本。这也可以处理注释。参见下面的示例用法:

# pparse.py

import yaml
import sys
            
def parse_yaml(yml, name=''):
    if isinstance(yml, list):
        for data in yml:
            parse_yaml(data, name)
    elif isinstance(yml, dict):
        if (len(yml) == 1) and not isinstance(yml[list(yml.keys())[0]], list):
            print(str(name+'_'+list(yml.keys())[0]+'='+str(yml[list(yml.keys())[0]]))[1:])
        else:
            for key in yml:
                parse_yaml(yml[key], name+'_'+key)

            
if __name__=="__main__":
    yml = yaml.safe_load(open(sys.argv[1]))
    parse_yaml(yml)

test.yml

- folders:
  - temp_folder: datasets/outputs/tmp
  - keep_temp_folder: false

- MFA:
  - MFA: false
  - speaker_count: 1
  - G2P: 
    - G2P: true
    - G2P_model: models/MFA/G2P/english_g2p.zip
    - input_folder: datasets/outputs/Youtube/ljspeech/wavs
    - output_dictionary: datasets/outputs/Youtube/ljspeech/dictionary.dict
  - dictionary: datasets/outputs/Youtube/ljspeech/dictionary.dict
  - acoustic_model: models/MFA/acoustic/english.zip
  - temp_folder: datasets/outputs/tmp
  - jobs: 4
  - align:
    - config: configs/MFA/align.yaml
    - dataset: datasets/outputs/Youtube/ljspeech/wavs
    - output_folder: datasets/outputs/Youtube/ljspeech-aligned

- TTS:
  - output_folder: datasets/outputs/Youtube
  - preprocess:
    - preprocess: true
    - config: configs/TTS_preprocess.yaml # Default Config 
    - textgrid_folder: datasets/outputs/Youtube/ljspeech-aligned
    - output_duration_folder: datasets/outputs/Youtube/durations
    - sampling_rate: 44000 # Make sure sampling rate is same here as in preprocess config

需要YAML值的脚本:

yaml() {
    eval $(python pparse.py "$1")
}

yaml "test.yml"

# What python printed to bash:

folders_temp_folder=datasets/outputs/tmp
folders_keep_temp_folder=False
MFA_MFA=False
MFA_speaker_count=1
MFA_G2P_G2P=True
MFA_G2P_G2P_model=models/MFA/G2P/english_g2p.zip
MFA_G2P_input_folder=datasets/outputs/Youtube/ljspeech/wavs
MFA_G2P_output_dictionary=datasets/outputs/Youtube/ljspeech/dictionary.dict
MFA_dictionary=datasets/outputs/Youtube/ljspeech/dictionary.dict
MFA_acoustic_model=models/MFA/acoustic/english.zip
MFA_temp_folder=datasets/outputs/tmp
MFA_jobs=4
MFA_align_config=configs/MFA/align.yaml
MFA_align_dataset=datasets/outputs/Youtube/ljspeech/wavs
MFA_align_output_folder=datasets/outputs/Youtube/ljspeech-aligned
TTS_output_folder=datasets/outputs/Youtube
TTS_preprocess_preprocess=True
TTS_preprocess_config=configs/TTS_preprocess.yaml
TTS_preprocess_textgrid_folder=datasets/outputs/Youtube/ljspeech-aligned
TTS_preprocess_output_duration_folder=datasets/outputs/Youtube/durations
TTS_preprocess_sampling_rate=44000

使用bash访问变量:

echo "$TTS_preprocess_sampling_rate";
>>> 44000

我已经用python编写了shyaml,用于从shell命令行查询YAML。

概述:

$ pip install shyaml      ## installation

示例的YAML文件(具有复杂的功能):

$ cat <<EOF > test.yaml
name: "MyName !!"
subvalue:
    how-much: 1.1
    things:
        - first
        - second
        - third
    other-things: [a, b, c]
    maintainer: "Valentin Lab"
    description: |
        Multiline description:
        Line 1
        Line 2
EOF

基本的查询:

$ cat test.yaml | shyaml get-value subvalue.maintainer
Valentin Lab

更复杂的循环查询复杂的值:

$ cat test.yaml | shyaml values-0 | \
  while read -r -d $'\0' value; do
      echo "RECEIVED: '$value'"
  done
RECEIVED: '1.1'
RECEIVED: '- first
- second
- third'
RECEIVED: '2'
RECEIVED: 'Valentin Lab'
RECEIVED: 'Multiline description:
Line 1
Line 2'

以下几个要点:

all YAML types and syntax oddities are correctly handled, as multiline, quoted strings, inline sequences... \0 padded output is available for solid multiline entry manipulation. simple dotted notation to select sub-values (ie: subvalue.maintainer is a valid key). access by index is provided to sequences (ie: subvalue.things.-1 is the last element of the subvalue.things sequence.) access to all sequence/structs elements in one go for use in bash loops. you can output whole subpart of a YAML file as ... YAML, which blend well for further manipulations with shyaml.

更多的示例和文档可以在shyaml github页面或shyaml PyPI页面上找到。

我曾经使用python将yaml转换为json,并在jq中进行处理。

python -c "import yaml; import json; from pathlib import Path; print(json.dumps(yaml.safe_load(Path('file.yml').read_text())))" | jq '.'

很难说,因为这取决于您希望解析器从YAML文档中提取什么。对于简单的情况,你可以使用grep、cut、awk等。对于更复杂的解析,您需要使用成熟的解析库,如Python的PyYAML或YAML::Perl。

可以将一个小脚本传递给一些解释器,比如Python。使用Ruby和它的YAML库的简单方法如下:

$ RUBY_SCRIPT="data = YAML::load(STDIN.read); puts data['a']; puts data['b']"
$ echo -e '---\na: 1234\nb: 4321' | ruby -ryaml -e "$RUBY_SCRIPT"
1234
4321

,其中data是来自yaml的值的散列(或数组)。

作为奖励,它可以很好地解析杰基尔的正面问题。

ruby -ryaml -e "puts YAML::load(open(ARGV.first).read)['tags']" example.md