我有一个JSON文件,我想转换为CSV文件。我如何用Python做到这一点?
我试着:
import json
import csv
f = open('data.json')
data = json.load(f)
f.close()
f = open('data.csv')
csv_file = csv.writer(f)
for item in data:
csv_file.writerow(item)
f.close()
然而,这并没有起作用。我正在使用Django和我收到的错误是:
`file' object has no attribute 'writerow'`
然后我尝试了以下方法:
import json
import csv
f = open('data.json')
data = json.load(f)
f.close()
f = open('data.csv')
csv_file = csv.writer(f)
for item in data:
f.writerow(item) # ← changed
f.close()
然后得到错误:
`sequence expected`
样本json文件:
[{
"pk": 22,
"model": "auth.permission",
"fields": {
"codename": "add_logentry",
"name": "Can add log entry",
"content_type": 8
}
}, {
"pk": 23,
"model": "auth.permission",
"fields": {
"codename": "change_logentry",
"name": "Can change log entry",
"content_type": 8
}
}, {
"pk": 24,
"model": "auth.permission",
"fields": {
"codename": "delete_logentry",
"name": "Can delete log entry",
"content_type": 8
}
}, {
"pk": 4,
"model": "auth.permission",
"fields": {
"codename": "add_group",
"name": "Can add group",
"content_type": 2
}
}, {
"pk": 10,
"model": "auth.permission",
"fields": {
"codename": "add_message",
"name": "Can add message",
"content_type": 4
}
}
]
Alec的回答很好,但在存在多层嵌套的情况下行不通。下面是一个支持多层嵌套的修改版本。如果嵌套对象已经指定了自己的键(例如Firebase Analytics / BigTable / BigQuery数据),它也会使头名称更好一些:
"""Converts JSON with nested fields into a flattened CSV file.
"""
import sys
import json
import csv
import os
import jsonlines
from orderedset import OrderedSet
# from https://stackoverflow.com/a/28246154/473201
def flattenjson( b, prefix='', delim='/', val=None ):
if val is None:
val = {}
if isinstance( b, dict ):
for j in b.keys():
flattenjson(b[j], prefix + delim + j, delim, val)
elif isinstance( b, list ):
get = b
for j in range(len(get)):
key = str(j)
# If the nested data contains its own key, use that as the header instead.
if isinstance( get[j], dict ):
if 'key' in get[j]:
key = get[j]['key']
flattenjson(get[j], prefix + delim + key, delim, val)
else:
val[prefix] = b
return val
def main(argv):
if len(argv) < 2:
raise Error('Please specify a JSON file to parse')
print "Loading and Flattening..."
filename = argv[1]
allRows = []
fieldnames = OrderedSet()
with jsonlines.open(filename) as reader:
for obj in reader:
# print 'orig:\n'
# print obj
flattened = flattenjson(obj)
#print 'keys: %s' % flattened.keys()
# print 'flattened:\n'
# print flattened
fieldnames.update(flattened.keys())
allRows.append(flattened)
print "Exporting to CSV..."
outfilename = filename + '.csv'
count = 0
with open(outfilename, 'w') as file:
csvwriter = csv.DictWriter(file, fieldnames=fieldnames)
csvwriter.writeheader()
for obj in allRows:
# print 'allRows:\n'
# print obj
csvwriter.writerow(obj)
count += 1
print "Wrote %d rows" % count
if __name__ == '__main__':
main(sys.argv)
使用pandas中的json_normalize:
在名为test.json的文件中使用来自OP的示例数据。
这里使用了Encoding ='utf-8',但在其他情况下可能不需要。
下面的代码利用了pathlib库。
.open是pathlib的一个方法。
也适用于非windows路径。
使用pandas.to_csv(…)将数据保存为csv文件。
import pandas as pd
# As of Pandas 1.01, json_normalize as pandas.io.json.json_normalize is deprecated and is now exposed in the top-level namespace.
# from pandas.io.json import json_normalize
from pathlib import Path
import json
# set path to file
p = Path(r'c:\some_path_to_file\test.json')
# read json
with p.open('r', encoding='utf-8') as f:
data = json.loads(f.read())
# create dataframe
df = pd.json_normalize(data)
# dataframe view
pk model fields.codename fields.name fields.content_type
22 auth.permission add_logentry Can add log entry 8
23 auth.permission change_logentry Can change log entry 8
24 auth.permission delete_logentry Can delete log entry 8
4 auth.permission add_group Can add group 2
10 auth.permission add_message Can add message 4
# save to csv
df.to_csv('test.csv', index=False, encoding='utf-8')
CSV输出:
pk,model,fields.codename,fields.name,fields.content_type
22,auth.permission,add_logentry,Can add log entry,8
23,auth.permission,change_logentry,Can change log entry,8
24,auth.permission,delete_logentry,Can delete log entry,8
4,auth.permission,add_group,Can add group,2
10,auth.permission,add_message,Can add message,4
嵌套更重的JSON对象的资源:
所以答案:
用python平化JSON数组
如何平嵌套的JSON递归,与平坦JSON
如何json_normalize一个列与nan
使用pandas将一列字典拆分为单独的列
有关其他相关问题,请参阅json_normalize标记。