如何在Python中解析YAML文件?
不依赖C头文件的最简单和最纯粹的方法是PyYaml(文档),可以通过pip install PyYaml安装:
#!/usr/bin/env python
import yaml
with open("example.yaml", "r") as stream:
try:
print(yaml.safe_load(stream))
except yaml.YAMLError as exc:
print(exc)
就是这样。普通的yaml.load()函数也存在,但是应该始终优先使用yaml.safe_load(),以避免引入任意代码执行的可能性。因此,除非显式地需要任意对象序列化/反序列化,否则请使用safe_load。
注意PyYaml项目支持YAML 1.1规范的更高版本。如果需要YAML 1.2规范支持,请参阅ruamel。Yaml在这个答案中提到。
此外,您还可以使用一个替换pyyaml的drop,它可以使您的yaml文件保持原样,称为oyaml。在这里查看oyaml的synk
如果你的YAML符合YAML 1.2规范(2009年发布),那么你应该使用ruamel。yaml(免责声明:我是该包的作者)。 它本质上是PyYAML的超集,它支持大部分YAML 1.1(从2005年开始)。
如果您希望在往返时能够保留注释,那么当然应该使用ruame .yaml。
升级@Jon的例子很简单:
import ruamel.yaml as yaml
with open("example.yaml") as stream:
try:
print(yaml.safe_load(stream))
except yaml.YAMLError as exc:
print(exc)
使用safe_load(),除非你真的完全控制输入,需要它(很少情况下)并且知道你在做什么。
如果你正在使用pathlib路径来操作文件,你最好使用新的API ruamel。yaml提供:
from ruamel.yaml import YAML
from pathlib import Path
path = Path('example.yaml')
yaml = YAML(typ='safe')
data = yaml.load(path)
使用Python 2+3(和unicode)读写YAML文件
# -*- coding: utf-8 -*-
import yaml
import io
# Define data
data = {
'a list': [
1,
42,
3.141,
1337,
'help',
u'€'
],
'a string': 'bla',
'another dict': {
'foo': 'bar',
'key': 'value',
'the answer': 42
}
}
# Write YAML file
with io.open('data.yaml', 'w', encoding='utf8') as outfile:
yaml.dump(data, outfile, default_flow_style=False, allow_unicode=True)
# Read YAML file
with open("data.yaml", 'r') as stream:
data_loaded = yaml.safe_load(stream)
print(data == data_loaded)
创建YAML文件
a list:
- 1
- 42
- 3.141
- 1337
- help
- €
a string: bla
another dict:
foo: bar
key: value
the answer: 42
常见的文件结尾
.yml 和 .yaml
选择
CSV: Super simple format (read & write) JSON: Nice for writing human-readable data; VERY commonly used (read & write) YAML: YAML is a superset of JSON, but easier to read (read & write, comparison of JSON and YAML) pickle: A Python serialization format (read & write) ⚠️ Using pickle with files from 3rd parties poses an uncontrollable arbitrary code execution risk. MessagePack (Python package): More compact representation (read & write) HDF5 (Python package): Nice for matrices (read & write) XML: exists too *sigh* (read & write)
对于您的应用程序,以下内容可能很重要:
其他编程语言的支持 读写能力 紧凑性(文件大小)
请参见:数据序列化格式的比较
如果您正在寻找一种创建配置文件的方法,您可能想要阅读我的简短文章Python中的配置文件
#!/usr/bin/env python
import sys
import yaml
def main(argv):
with open(argv[0]) as stream:
try:
#print(yaml.load(stream))
return 0
except yaml.YAMLError as exc:
print(exc)
return 1
if __name__ == "__main__":
sys.exit(main(sys.argv[1:]))
首先使用pip3安装pyyaml。
然后导入yaml模块并将文件加载到名为'my_dict'的字典中:
import yaml
with open('filename.yaml') as f:
my_dict = yaml.safe_load(f)
这就是你所需要的。现在整个yaml文件都在'my_dict'字典中。
我用ruame .yaml。详情和辩论在这里。
from ruamel import yaml
with open(filename, 'r') as fp:
read_data = yaml.load(fp)
ruamel的用法yaml兼容(一些简单的可解决的问题)PyYAML的旧用法,正如我提供的链接中所述,使用
from ruamel import yaml
而不是
import yaml
它会解决你的大部分问题。
编辑:PyYAML并没有死,只是在另一个地方维护了它。
例子:
defaults.yaml
url: https://www.google.com
environment.py
from ruamel import yaml
data = yaml.safe_load(open('defaults.yaml'))
data['url']
像这样访问YAML文件中列表的任何元素:
global:
registry:
url: dtr-:5000/
repoPath:
dbConnectionString: jdbc:oracle:thin:@x.x.x.x:1521:abcd
您可以使用以下python脚本:
import yaml
with open("/some/path/to/yaml.file", 'r') as f:
valuesYaml = yaml.load(f, Loader=yaml.FullLoader)
print(valuesYaml['global']['dbConnectionString'])
Read_yaml_file函数返回所有数据到字典中。
def read_yaml_file(full_path=None, relative_path=None):
if relative_path is not None:
resource_file_location_local = ProjectPaths.get_project_root_path() + relative_path
else:
resource_file_location_local = full_path
with open(resource_file_location_local, 'r') as stream:
try:
file_artifacts = yaml.safe_load(stream)
except yaml.YAMLError as exc:
print(exc)
return dict(file_artifacts.items())
我自己写了剧本。请随意使用它,只要你保留属性。该脚本可以从文件(函数加载)解析yaml,从字符串(函数加载)解析yaml,并将字典转换为yaml(函数转储)。它尊重所有的变量类型。
# © didlly AGPL-3.0 License - github.com/didlly
def is_float(string: str) -> bool:
try:
float(string)
return True
except ValueError:
return False
def is_integer(string: str) -> bool:
try:
int(string)
return True
except ValueError:
return False
def load(path: str) -> dict:
with open(path, "r") as yaml:
levels = []
data = {}
indentation_str = ""
for line in yaml.readlines():
if line.replace(line.lstrip(), "") != "" and indentation_str == "":
indentation_str = line.replace(line.lstrip(), "").rstrip("\n")
if line.strip() == "":
continue
elif line.rstrip()[-1] == ":":
key = line.strip()[:-1]
quoteless = (
is_float(key)
or is_integer(key)
or key == "True"
or key == "False"
or ("[" in key and "]" in key)
)
if len(line.replace(line.strip(), "")) // 2 < len(levels):
if quoteless:
levels[len(line.replace(line.strip(), "")) // 2] = f"[{key}]"
else:
levels[len(line.replace(line.strip(), "")) // 2] = f"['{key}']"
else:
if quoteless:
levels.append(f"[{line.strip()[:-1]}]")
else:
levels.append(f"['{line.strip()[:-1]}']")
if quoteless:
exec(
f"data{''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}[{key}]"
+ " = {}"
)
else:
exec(
f"data{''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}['{key}']"
+ " = {}"
)
continue
key = line.split(":")[0].strip()
value = ":".join(line.split(":")[1:]).strip()
if (
is_float(value)
or is_integer(value)
or value == "True"
or value == "False"
or ("[" in value and "]" in value)
):
if (
is_float(key)
or is_integer(key)
or key == "True"
or key == "False"
or ("[" in key and "]" in key)
):
exec(
f"data{'' if line == line.strip() else ''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}[{key}] = {value}"
)
else:
exec(
f"data{'' if line == line.strip() else ''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}['{key}'] = {value}"
)
else:
if (
is_float(key)
or is_integer(key)
or key == "True"
or key == "False"
or ("[" in key and "]" in key)
):
exec(
f"data{'' if line == line.strip() else ''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}[{key}] = '{value}'"
)
else:
exec(
f"data{'' if line == line.strip() else ''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}['{key}'] = '{value}'"
)
return data
def loads(yaml: str) -> dict:
levels = []
data = {}
indentation_str = ""
for line in yaml.split("\n"):
if line.replace(line.lstrip(), "") != "" and indentation_str == "":
indentation_str = line.replace(line.lstrip(), "")
if line.strip() == "":
continue
elif line.rstrip()[-1] == ":":
key = line.strip()[:-1]
quoteless = (
is_float(key)
or is_integer(key)
or key == "True"
or key == "False"
or ("[" in key and "]" in key)
)
if len(line.replace(line.strip(), "")) // 2 < len(levels):
if quoteless:
levels[len(line.replace(line.strip(), "")) // 2] = f"[{key}]"
else:
levels[len(line.replace(line.strip(), "")) // 2] = f"['{key}']"
else:
if quoteless:
levels.append(f"[{line.strip()[:-1]}]")
else:
levels.append(f"['{line.strip()[:-1]}']")
if quoteless:
exec(
f"data{''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}[{key}]"
+ " = {}"
)
else:
exec(
f"data{''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}['{key}']"
+ " = {}"
)
continue
key = line.split(":")[0].strip()
value = ":".join(line.split(":")[1:]).strip()
if (
is_float(value)
or is_integer(value)
or value == "True"
or value == "False"
or ("[" in value and "]" in value)
):
if (
is_float(key)
or is_integer(key)
or key == "True"
or key == "False"
or ("[" in key and "]" in key)
):
exec(
f"data{'' if line == line.strip() else ''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}[{key}] = {value}"
)
else:
exec(
f"data{'' if line == line.strip() else ''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}['{key}'] = {value}"
)
else:
if (
is_float(key)
or is_integer(key)
or key == "True"
or key == "False"
or ("[" in key and "]" in key)
):
exec(
f"data{'' if line == line.strip() else ''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}[{key}] = '{value}'"
)
else:
exec(
f"data{'' if line == line.strip() else ''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}['{key}'] = '{value}'"
)
return data
def dumps(yaml: dict, indent="") -> str:
"""A procedure which converts the dictionary passed to the procedure into it's yaml equivalent.
Args:
yaml (dict): The dictionary to be converted.
Returns:
data (str): The dictionary in yaml form.
"""
data = ""
for key in yaml.keys():
if type(yaml[key]) == dict:
data += f"\n{indent}{key}:\n"
data += dumps(yaml[key], f"{indent} ")
else:
data += f"{indent}{key}: {yaml[key]}\n"
return data
print(load("config.yml"))
例子
config.yml
level 0 value: 0
level 1:
level 1 value: 1
level 2:
level 2 value: 2
level 1 2:
level 1 2 value: 1 2
level 2 2:
level 2 2 value: 2 2
输出
{'level 0 value': 0, 'level 1': {'level 1 value': 1, 'level 2': {'level 2 value': 2}}, 'level 1 2': {'level 1 2 value': '1 2', 'level 2 2': {'level 2 2 value': 2 2}}}
推荐文章
- 将Pandas或Numpy Nan替换为None以用于MysqlDB
- 使用pandas对同一列进行多个聚合
- 使用Python解析HTML
- django MultiValueDictKeyError错误,我如何处理它
- 如何在for循环期间修改列表条目?
- 我如何在Django中创建一个鼻涕虫?
- 没有名为'django.core.urlresolvers'的模块
- 蟒蛇导出环境文件
- Django - makemigrations -未检测到任何更改
- SQLAlchemy:引擎、连接和会话差异
- 在Python Pandas中删除多个列中的所有重复行
- 更改pandas DataFrame中的特定列名
- 将Pandas多索引转换为列
- 熊猫在每组中获得最高的n个记录
- 熊猫数据帧得到每组的第一行