如何从Linux shell脚本解析YAML文件?

我希望提供一个结构化的配置文件，它对于非技术用户来说尽可能容易编辑(不幸的是它必须是一个文件)，所以我想使用YAML。然而，我找不到任何方法从Unix shell脚本解析这个。

当前回答

使用Python的PyYAML或YAML::Perl等库最容易进行复杂的解析。

如果您希望将所有YAML值解析为bash值，请尝试此脚本。这也可以处理注释。参见下面的示例用法:

# pparse.py

import yaml
import sys
            
def parse_yaml(yml, name=''):
    if isinstance(yml, list):
        for data in yml:
            parse_yaml(data, name)
    elif isinstance(yml, dict):
        if (len(yml) == 1) and not isinstance(yml[list(yml.keys())[0]], list):
            print(str(name+'_'+list(yml.keys())[0]+'='+str(yml[list(yml.keys())[0]]))[1:])
        else:
            for key in yml:
                parse_yaml(yml[key], name+'_'+key)

            
if __name__=="__main__":
    yml = yaml.safe_load(open(sys.argv[1]))
    parse_yaml(yml)

test.yml

- folders:
  - temp_folder: datasets/outputs/tmp
  - keep_temp_folder: false

- MFA:
  - MFA: false
  - speaker_count: 1
  - G2P: 
    - G2P: true
    - G2P_model: models/MFA/G2P/english_g2p.zip
    - input_folder: datasets/outputs/Youtube/ljspeech/wavs
    - output_dictionary: datasets/outputs/Youtube/ljspeech/dictionary.dict
  - dictionary: datasets/outputs/Youtube/ljspeech/dictionary.dict
  - acoustic_model: models/MFA/acoustic/english.zip
  - temp_folder: datasets/outputs/tmp
  - jobs: 4
  - align:
    - config: configs/MFA/align.yaml
    - dataset: datasets/outputs/Youtube/ljspeech/wavs
    - output_folder: datasets/outputs/Youtube/ljspeech-aligned

- TTS:
  - output_folder: datasets/outputs/Youtube
  - preprocess:
    - preprocess: true
    - config: configs/TTS_preprocess.yaml # Default Config 
    - textgrid_folder: datasets/outputs/Youtube/ljspeech-aligned
    - output_duration_folder: datasets/outputs/Youtube/durations
    - sampling_rate: 44000 # Make sure sampling rate is same here as in preprocess config

需要YAML值的脚本:

yaml() {
    eval $(python pparse.py "$1")
}

yaml "test.yml"

# What python printed to bash:

folders_temp_folder=datasets/outputs/tmp
folders_keep_temp_folder=False
MFA_MFA=False
MFA_speaker_count=1
MFA_G2P_G2P=True
MFA_G2P_G2P_model=models/MFA/G2P/english_g2p.zip
MFA_G2P_input_folder=datasets/outputs/Youtube/ljspeech/wavs
MFA_G2P_output_dictionary=datasets/outputs/Youtube/ljspeech/dictionary.dict
MFA_dictionary=datasets/outputs/Youtube/ljspeech/dictionary.dict
MFA_acoustic_model=models/MFA/acoustic/english.zip
MFA_temp_folder=datasets/outputs/tmp
MFA_jobs=4
MFA_align_config=configs/MFA/align.yaml
MFA_align_dataset=datasets/outputs/Youtube/ljspeech/wavs
MFA_align_output_folder=datasets/outputs/Youtube/ljspeech-aligned
TTS_output_folder=datasets/outputs/Youtube
TTS_preprocess_preprocess=True
TTS_preprocess_config=configs/TTS_preprocess.yaml
TTS_preprocess_textgrid_folder=datasets/outputs/Youtube/ljspeech-aligned
TTS_preprocess_output_duration_folder=datasets/outputs/Youtube/durations
TTS_preprocess_sampling_rate=44000

使用bash访问变量:

echo "$TTS_preprocess_sampling_rate";
>>> 44000

2021-06-20 21:10:08

其他回答

如果你有python 2和PyYAML，你可以使用我写的这个解析器parse_yaml.py。它做的一些更整洁的事情是让您选择一个前缀(以防您有多个具有类似变量的文件)，并从yaml文件中选择一个值。

例如，如果你有这些yaml文件:

staging.yaml:

db:
    type: sqllite
    host: 127.0.0.1
    user: dev
    password: password123

prod.yaml:

db:
    type: postgres
    host: 10.0.50.100
    user: postgres
    password: password123

您可以加载两者而不会产生冲突。

$ eval $(python parse_yaml.py prod.yaml --prefix prod --cap)
$ eval $(python parse_yaml.py staging.yaml --prefix stg --cap)
$ echo $PROD_DB_HOST
10.0.50.100
$ echo $STG_DB_HOST
127.0.0.1

甚至可以选择你想要的值。

$ prod_user=$(python parse_yaml.py prod.yaml --get db_user)
$ prod_port=$(python parse_yaml.py prod.yaml --get db_port --default 5432)
$ echo prod_user
postgres
$ echo prod_port
5432

2018-06-23 00:10:55

把我的答案从如何在bash中将json响应转换为yaml，因为这似乎是关于从命令行处理yaml文本解析的权威帖子。

我想添加一些关于yq YAML实现的细节。由于这个YAML解析器有两种实现，名称都是yq，如果不查看实现的DSL，就很难区分使用的是哪一种。有两个可用的实现

kislyuk/yq——更常被提及的版本，它是jq的包装器，用Python编写，使用PyYAML库进行YAML解析 mikefarah/yq -一个Go实现，使用Go -yaml v3解析器，有自己的动态DSL。

几乎所有主要发行版都可以通过标准安装包管理器进行安装

kislyuk/yq -安装说明 mikefarah/yq -安装说明

这两个版本都有一些优点和缺点，但有一些有效的点需要强调(从他们的回购指令中采用)

kislyuk - yq

Since the DSL is the adopted completely from jq, for users familiar with the latter, the parsing and manipulation becomes quite straightforward Supports mode to preserve YAML tags and styles, but loses comments during the conversion. Since jq doesn't preserve comments, during the round-trip conversion, the comments are lost. As part of the package, XML support is built in. An executable, xq, which transcodes XML to JSON using xmltodict and pipes it to jq, on which you can apply the same DSL to perform CRUD operations on the objects and round-trip the output back to XML. Supports in-place edit mode with -i flag (similar to sed -i)

迈克法拉/YQ

Prone to frequent changes in DSL, migration from 2.x - 3.x Rich support for anchors, styles and tags. But lookout for bugs once in a while A relatively simple Path expression syntax to navigate and match yaml nodes Supports YAML->JSON, JSON->YAML formatting and pretty printing YAML (with comments) Supports in-place edit mode with -i flag (similar to sed -i) Supports coloring the output YAML with -C flag (not applicable for JSON output) and indentation of the sub elements (default at 2 spaces) Supports Shell completion for most shells - Bash, zsh (because of powerful support from spf13/cobra used to generate CLI flags)

我对以下两个版本的YAML的看法(在其他答案中也有引用)

root_key1: this is value one
root_key2: "this is value two"

drink:
  state: liquid
  coffee:
    best_served: hot
    colour: brown
  orange_juice:
    best_served: cold
    colour: orange

food:
  state: solid
  apple_pie:
    best_served: warm

root_key_3: this is value three

对这两个实现执行的各种操作(一些常用操作)

修改根节点值—修改“root_key2”的值修改数组内容，增加值-为coffee添加属性修改数组内容，删除value - Delete属性从orange_juice 打印带有路径的键/值对—用于food下的所有项目

使用kislyuk / yq

Yq -y '。Root_key2 |= "this is a new value 你，你，喝。咖啡+={时间:"always"}' yaml Yq -y 'del(.drink.orange_juice.colour)' yaml yq - r的.food |路径(标量)美元p | (($ p |加入(“。”)),(getpath ($ p) | tojson)] | @tsv的yaml

这很简单。你所需要做的就是用-y标志将jq JSON输出转码回YAML。

用mikefarah - yq

Yq w yaml root_key2 "这是一个新值" Yq w yaml喝。咖啡。时间“总是” Yq d yaml饮料。橙汁。颜色 yq r yaml——printMode pv "food.**"

截至2020年12月21日，yq v4是测试版，支持许多强大的路径表达式，并支持类似于使用jq的DSL。阅读过渡说明-从V3升级

2020-11-08 19:58:47

另一种选择是将YAML转换为JSON，然后使用jq与JSON表示进行交互，从其中提取信息或编辑信息。

我写了一个简单的bash脚本，包含这个胶水-见Y2J项目在GitHub上

2015-08-02 12:44:26

你也可以考虑使用Grunt (JavaScript任务运行器)。可以很容易地与shell集成。它支持读取YAML (grunt.file.readYAML)和JSON (grunt.file.readJSON)文件。

这可以通过在Gruntfile.js(或Gruntfile.coffee)中创建一个任务来实现，例如:

module.exports = function (grunt) {

    grunt.registerTask('foo', ['load_yml']);

    grunt.registerTask('load_yml', function () {
        var data = grunt.file.readYAML('foo.yml');
        Object.keys(data).forEach(function (g) {
          // ... switch (g) { case 'my_key':
        });
    });

};

然后在shell中简单地运行grunt foo(检查grunt—help是否有可用的任务)。

此外，你可以实现exec:foo任务(grunt-exec)与输入变量从你的任务(foo: {cmd: 'echo bar <%= foo %>'})为了打印输出在任何格式你想要的，然后管道到另一个命令。

还有一个类似于Grunt的工具，它叫做gulp，带有额外的插件gulp-yaml。

安装方法:npm Install——save-dev gulp-yaml

示例用法:

var yaml = require('gulp-yaml');

gulp.src('./src/*.yml')
  .pipe(yaml())
  .pipe(gulp.dest('./dist/'))

gulp.src('./src/*.yml')
  .pipe(yaml({ space: 2 }))
  .pipe(gulp.dest('./dist/'))

gulp.src('./src/*.yml')
  .pipe(yaml({ safe: true }))
  .pipe(gulp.dest('./dist/'))

要了解更多处理YAML格式的选项，请查看YAML网站上可用的项目、库和其他资源，这些资源可以帮助您解析该格式。

其他工具:

Jshon 解析、读取和创建JSON

2015-09-29 10:31:31