我有一个JSON文件,我想转换为CSV文件。我如何用Python做到这一点?

我试着:

import json
import csv

f = open('data.json')
data = json.load(f)
f.close()

f = open('data.csv')
csv_file = csv.writer(f)
for item in data:
    csv_file.writerow(item)

f.close()

然而,这并没有起作用。我正在使用Django和我收到的错误是:

`file' object has no attribute 'writerow'`

然后我尝试了以下方法:

import json
import csv

f = open('data.json')
data = json.load(f)
f.close()

f = open('data.csv')
csv_file = csv.writer(f)
for item in data:
    f.writerow(item)  # ← changed

f.close()

然后得到错误:

`sequence expected`

样本json文件:

[{
        "pk": 22,
        "model": "auth.permission",
        "fields": {
            "codename": "add_logentry",
            "name": "Can add log entry",
            "content_type": 8
        }
    }, {
        "pk": 23,
        "model": "auth.permission",
        "fields": {
            "codename": "change_logentry",
            "name": "Can change log entry",
            "content_type": 8
        }
    }, {
        "pk": 24,
        "model": "auth.permission",
        "fields": {
            "codename": "delete_logentry",
            "name": "Can delete log entry",
            "content_type": 8
        }
    }, {
        "pk": 4,
        "model": "auth.permission",
        "fields": {
            "codename": "add_group",
            "name": "Can add group",
            "content_type": 2
        }
    }, {
        "pk": 10,
        "model": "auth.permission",
        "fields": {
            "codename": "add_message",
            "name": "Can add message",
            "content_type": 4
        }
    }
]

当前回答

我已经尝试了很多建议的解决方案(也熊猫没有正确地规范化我的JSON),但真正好的是正确解析JSON数据来自Max Berman。

我写了一个改进,以避免每一行都有新列 在解析期间将其放置到现有列。 如果只有一个数据存在,则将值存储为字符串,如果该列有更多值,则将值存储为列表。

它有一个输入。Json文件作为输入,并输出一个output.csv。

import json
import pandas as pd

def flatten_json(json):
    def process_value(keys, value, flattened):
        if isinstance(value, dict):
            for key in value.keys():
                process_value(keys + [key], value[key], flattened)
        elif isinstance(value, list):
            for idx, v in enumerate(value):
                process_value(keys, v, flattened)
                # process_value(keys + [str(idx)], v, flattened)
        else:
            key1 = '__'.join(keys)
            if not flattened.get(key1) is None:
                if isinstance(flattened[key1], list):
                    flattened[key1] = flattened[key1] + [value]
                else:
                    flattened[key1] = [flattened[key1]] + [value]
            else:
                flattened[key1] = value

    flattened = {}
    for key in json.keys():
        k = key
        # print("Key: " + k)
        process_value([key], json[key], flattened)
    return flattened

try:
    f = open("input.json", "r")
except:
    pass
y = json.loads(f.read())
flat = flatten_json(y)
text = json.dumps(flat)
df = pd.read_json(text)
df.to_csv('output.csv', index=False, encoding='utf-8')

其他回答

Alec的回答很好,但在存在多层嵌套的情况下行不通。下面是一个支持多层嵌套的修改版本。如果嵌套对象已经指定了自己的键(例如Firebase Analytics / BigTable / BigQuery数据),它也会使头名称更好一些:

"""Converts JSON with nested fields into a flattened CSV file.
"""

import sys
import json
import csv
import os

import jsonlines

from orderedset import OrderedSet

# from https://stackoverflow.com/a/28246154/473201
def flattenjson( b, prefix='', delim='/', val=None ):
  if val is None:
    val = {}

  if isinstance( b, dict ):
    for j in b.keys():
      flattenjson(b[j], prefix + delim + j, delim, val)
  elif isinstance( b, list ):
    get = b
    for j in range(len(get)):
      key = str(j)

      # If the nested data contains its own key, use that as the header instead.
      if isinstance( get[j], dict ):
        if 'key' in get[j]:
          key = get[j]['key']

      flattenjson(get[j], prefix + delim + key, delim, val)
  else:
    val[prefix] = b

  return val

def main(argv):
  if len(argv) < 2:
    raise Error('Please specify a JSON file to parse')

  print "Loading and Flattening..."
  filename = argv[1]
  allRows = []
  fieldnames = OrderedSet()
  with jsonlines.open(filename) as reader:
    for obj in reader:
      # print 'orig:\n'
      # print obj
      flattened = flattenjson(obj)
      #print 'keys: %s' % flattened.keys()
      # print 'flattened:\n'
      # print flattened
      fieldnames.update(flattened.keys())
      allRows.append(flattened)

  print "Exporting to CSV..."
  outfilename = filename + '.csv'
  count = 0
  with open(outfilename, 'w') as file:
    csvwriter = csv.DictWriter(file, fieldnames=fieldnames)
    csvwriter.writeheader()
    for obj in allRows:
      # print 'allRows:\n'
      # print obj
      csvwriter.writerow(obj)
      count += 1

  print "Wrote %d rows" % count



if __name__ == '__main__':
  main(sys.argv)

令人惊讶的是,我发现到目前为止贴在这里的答案都没有正确处理所有可能的场景(例如,嵌套字典,嵌套列表,无值等)。

这个解决方案应该适用于所有场景:

def flatten_json(json):
    def process_value(keys, value, flattened):
        if isinstance(value, dict):
            for key in value.keys():
                process_value(keys + [key], value[key], flattened)
        elif isinstance(value, list):
            for idx, v in enumerate(value):
                process_value(keys + [str(idx)], v, flattened)
        else:
            flattened['__'.join(keys)] = value

    flattened = {}
    for key in json.keys():
        process_value([key], json[key], flattened)
    return flattened

这工作得相对较好。 它将json压缩成csv文件。 嵌套元素被管理:)

这是python 3的

import json

o = json.loads('your json string') # Be careful, o must be a list, each of its objects will make a line of the csv.

def flatten(o, k='/'):
    global l, c_line
    if isinstance(o, dict):
        for key, value in o.items():
            flatten(value, k + '/' + key)
    elif isinstance(o, list):
        for ov in o:
            flatten(ov, '')
    elif isinstance(o, str):
        o = o.replace('\r',' ').replace('\n',' ').replace(';', ',')
        if not k in l:
            l[k]={}
        l[k][c_line]=o

def render_csv(l):
    ftime = True

    for i in range(100): #len(l[list(l.keys())[0]])
        for k in l:
            if ftime :
                print('%s;' % k, end='')
                continue
            v = l[k]
            try:
                print('%s;' % v[i], end='')
            except:
                print(';', end='')
        print()
        ftime = False
        i = 0

def json_to_csv(object_list):
    global l, c_line
    l = {}
    c_line = 0
    for ov in object_list : # Assumes json is a list of objects
        flatten(ov)
        c_line += 1
    render_csv(l)

json_to_csv(o)

享受。

由于数据看起来是字典格式,因此似乎应该实际使用csv.DictWriter()来实际输出带有适当标题信息的行。这将使转换更容易处理。然后fieldnames参数将正确地设置顺序,而第一行的输出作为标题将允许稍后由csv.DictReader()读取和处理。

例如,Mike Repass使用

output = csv.writer(sys.stdout)

output.writerow(data[0].keys())  # header row

for row in data:
  output.writerow(row.values())

不过,只需将初始设置更改为 输出= csv。DictWriter数据(文件集,字段名= [0]. keys ())

注意,由于字典中元素的顺序没有定义,您可能必须显式地创建字段名条目。一旦你这样做了,writerow就可以工作了。然后写操作就像最初显示的那样工作。

这是@MikeRepass回答的修改。此版本将CSV写入文件,适用于Python 2和Python 3。

import csv,json
input_file="data.json"
output_file="data.csv"
with open(input_file) as f:
    content=json.load(f)
try:
    context=open(output_file,'w',newline='') # Python 3
except TypeError:
    context=open(output_file,'wb') # Python 2
with context as file:
    writer=csv.writer(file)
    writer.writerow(content[0].keys()) # header row
    for row in content:
        writer.writerow(row.values())