我试图解析从curl请求返回的JSON,就像这样:
curl 'http://twitter.com/users/username.json' |
sed -e 's/[{}]/''/g' |
awk -v k="text" '{n=split($0,a,","); for (i=1; i<=n; i++) print a[i]}'
上面将JSON划分为多个字段,例如:
% ...
"geo_enabled":false
"friends_count":245
"profile_text_color":"000000"
"status":"in_reply_to_screen_name":null
"source":"web"
"truncated":false
"text":"My status"
"favorited":false
% ...
我如何打印一个特定的字段(由-v k=文本表示)?
使用Python使用Bash
在.bashrc文件中创建一个Bash函数:
function getJsonVal () {
python -c "import json,sys;sys.stdout.write(json.dumps(json.load(sys.stdin)$1))";
}
Then
curl 'http://twitter.com/users/username.json' | getJsonVal "['text']"
输出:
My status
下面是相同的函数,但是带有错误检查。
function getJsonVal() {
if [ \( $# -ne 1 \) -o \( -t 0 \) ]; then
cat <<EOF
Usage: getJsonVal 'key' < /tmp/
-- or --
cat /tmp/input | getJsonVal 'key'
EOF
return;
fi;
python -c "import json,sys;sys.stdout.write(json.dumps(json.load(sys.stdin)$1))";
}
其中$# -ne 1确保至少有一个输入,而-t 0确保从管道重定向。
这个实现的好处是,您可以访问嵌套的JSON值并返回JSON内容!=)
例子:
echo '{"foo": {"bar": "baz", "a": [1,2,3]}}' | getJsonVal "['foo']['a'][1]"
输出:
2
如果你想要更漂亮,你可以把数据打印出来:
function getJsonVal () {
python -c "import json,sys;sys.stdout.write(json.dumps(json.load(sys.stdin)$1, sort_keys=True, indent=4))";
}
echo '{"foo": {"bar": "baz", "a": [1,2,3]}}' | getJsonVal "['foo']"
{
"a": [
1,
2,
3
],
"bar": "baz"
}
你有多种选择。
您可以使用trdsql[1]来解析和转换JSON/CSV输入。以你为榜样;
trdsql "select attr1,attr2 from sample.json"
你也可以像SQL一样使用where子句。输出在CSV, JSON等。非常方便的工具。
根据我的经验,trdsql在处理属性嵌套值时有点问题,所以我在适当的时候使用qp[2]找到了一个解决方案。
cat sample.json | qp 'select attr1, attr2.detail.name where attr3=10'
注意这里没有FROM。
为了查看结果,您可以使用超快速命令行json查看器工具jless来查看输出[3]。
Clickhouse来了个新人。您可以从[4]中看到它的功能。
https://github.com/noborus/trdsql
https://jless.io
https://github.com/f5io/qp
https://clickhouse.com/blog/extracting-converting-querying-local-files-with-sql-clickhouse-local
更新(2020)
我使用外部工具(例如Python)时遇到的最大问题是,你必须处理包管理器和安装它们的依赖关系。
然而,现在我们有了jq作为一个独立的静态工具,很容易通过GitHub发布和Webi (webinstall.dev/jq)跨平台安装,我建议:
Mac、Linux:
curl -sS https://webi.sh/jq | bash
Windows 10:
curl.exe -A MS https://webi.ms/jq | powershell
小抄:https://webinstall.dev/jq
原(2011)
TickTick是一个用bash编写的JSON解析器(不到250行代码)。
以下是作者在他的文章《想象一个Bash支持JSON的世界》中的片段:
#!/bin/bash
. ticktick.sh
``
people = {
"Writers": [
"Rod Serling",
"Charles Beaumont",
"Richard Matheson"
],
"Cast": {
"Rod Serling": { "Episodes": 156 },
"Martin Landau": { "Episodes": 2 },
"William Shatner": { "Episodes": 2 }
}
}
``
function printDirectors() {
echo " The ``people.Directors.length()`` Directors are:"
for director in ``people.Directors.items()``; do
printf " - %s\n" ${!director}
done
}
`` people.Directors = [ "John Brahm", "Douglas Heyes" ] ``
printDirectors
newDirector="Lamont Johnson"
`` people.Directors.push($newDirector) ``
printDirectors
echo "Shifted: "``people.Directors.shift()``
printDirectors
echo "Popped: "``people.Directors.pop()``
printDirectors
对于更复杂的JSON解析,我建议使用Python jsonpath模块(Stefan Goessner) -
Install it -
sudo easy_install -U jsonpath
Use it -
Example file.json (from http://goessner.net/articles/JsonPath) -
{ "store": {
"book": [
{ "category": "reference",
"author": "Nigel Rees",
"title": "Sayings of the Century",
"price": 8.95
},
{ "category": "fiction",
"author": "Evelyn Waugh",
"title": "Sword of Honour",
"price": 12.99
},
{ "category": "fiction",
"author": "Herman Melville",
"title": "Moby Dick",
"isbn": "0-553-21311-3",
"price": 8.99
},
{ "category": "fiction",
"author": "J. R. R. Tolkien",
"title": "The Lord of the Rings",
"isbn": "0-395-19395-8",
"price": 22.99
}
],
"bicycle": {
"color": "red",
"price": 19.95
}
}
}
Parse it (extract all book titles with price < 10) -
cat file.json | python -c "import sys, json, jsonpath; print '\n'.join(jsonpath.jsonpath(json.load(sys.stdin), 'store.book[?(@.price < 10)].title'))"
Will output -
Sayings of the Century
Moby Dick
Note: The above command line does not include error checking. For a full solution with error checking, you should create a small Python script, and wrap the code with try-except.