我有一个JSON文件,我想转换为CSV文件。我如何用Python做到这一点?

我试着:

import json
import csv

f = open('data.json')
data = json.load(f)
f.close()

f = open('data.csv')
csv_file = csv.writer(f)
for item in data:
    csv_file.writerow(item)

f.close()

然而,这并没有起作用。我正在使用Django和我收到的错误是:

`file' object has no attribute 'writerow'`

然后我尝试了以下方法:

import json
import csv

f = open('data.json')
data = json.load(f)
f.close()

f = open('data.csv')
csv_file = csv.writer(f)
for item in data:
    f.writerow(item)  # ← changed

f.close()

然后得到错误:

`sequence expected`

样本json文件:

[{
        "pk": 22,
        "model": "auth.permission",
        "fields": {
            "codename": "add_logentry",
            "name": "Can add log entry",
            "content_type": 8
        }
    }, {
        "pk": 23,
        "model": "auth.permission",
        "fields": {
            "codename": "change_logentry",
            "name": "Can change log entry",
            "content_type": 8
        }
    }, {
        "pk": 24,
        "model": "auth.permission",
        "fields": {
            "codename": "delete_logentry",
            "name": "Can delete log entry",
            "content_type": 8
        }
    }, {
        "pk": 4,
        "model": "auth.permission",
        "fields": {
            "codename": "add_group",
            "name": "Can add group",
            "content_type": 2
        }
    }, {
        "pk": 10,
        "model": "auth.permission",
        "fields": {
            "codename": "add_message",
            "name": "Can add message",
            "content_type": 4
        }
    }
]

当前回答

JSON可以表示各种各样的数据结构——JS的“对象”大致类似于Python的dict(带有字符串键),JS的“数组”大致类似于Python列表,只要最后的“叶子”元素是数字或字符串,你就可以嵌套它们。

CSV本质上只能表示一个2-D表——可选的第一行是“标题”,即“列名”,这可以使表可解释为字典列表,而不是正常的解释,一个列表的列表(同样,“叶子”元素可以是数字或字符串)。

So, in the general case, you can't translate an arbitrary JSON structure to a CSV. In a few special cases you can (array of arrays with no further nesting; arrays of objects which all have exactly the same keys). Which special case, if any, applies to your problem? The details of the solution depend on which special case you do have. Given the astonishing fact that you don't even mention which one applies, I suspect you may not have considered the constraint, neither usable case in fact applies, and your problem is impossible to solve. But please do clarify!

其他回答

这段代码应该适用于您,假设您的JSON数据在一个名为data. JSON的文件中。

import json
import csv

with open("data.json") as file:
    data = json.load(file)

with open("data.csv", "w") as file:
    csv_file = csv.writer(file)
    for item in data:
        fields = list(item['fields'].values())
        csv_file.writerow([item['pk'], item['model']] + fields)

JSON可以表示各种各样的数据结构——JS的“对象”大致类似于Python的dict(带有字符串键),JS的“数组”大致类似于Python列表,只要最后的“叶子”元素是数字或字符串,你就可以嵌套它们。

CSV本质上只能表示一个2-D表——可选的第一行是“标题”,即“列名”,这可以使表可解释为字典列表,而不是正常的解释,一个列表的列表(同样,“叶子”元素可以是数字或字符串)。

So, in the general case, you can't translate an arbitrary JSON structure to a CSV. In a few special cases you can (array of arrays with no further nesting; arrays of objects which all have exactly the same keys). Which special case, if any, applies to your problem? The details of the solution depend on which special case you do have. Given the astonishing fact that you don't even mention which one applies, I suspect you may not have considered the constraint, neither usable case in fact applies, and your problem is impossible to solve. But please do clarify!

Alec的回答很好,但在存在多层嵌套的情况下行不通。下面是一个支持多层嵌套的修改版本。如果嵌套对象已经指定了自己的键(例如Firebase Analytics / BigTable / BigQuery数据),它也会使头名称更好一些:

"""Converts JSON with nested fields into a flattened CSV file.
"""

import sys
import json
import csv
import os

import jsonlines

from orderedset import OrderedSet

# from https://stackoverflow.com/a/28246154/473201
def flattenjson( b, prefix='', delim='/', val=None ):
  if val is None:
    val = {}

  if isinstance( b, dict ):
    for j in b.keys():
      flattenjson(b[j], prefix + delim + j, delim, val)
  elif isinstance( b, list ):
    get = b
    for j in range(len(get)):
      key = str(j)

      # If the nested data contains its own key, use that as the header instead.
      if isinstance( get[j], dict ):
        if 'key' in get[j]:
          key = get[j]['key']

      flattenjson(get[j], prefix + delim + key, delim, val)
  else:
    val[prefix] = b

  return val

def main(argv):
  if len(argv) < 2:
    raise Error('Please specify a JSON file to parse')

  print "Loading and Flattening..."
  filename = argv[1]
  allRows = []
  fieldnames = OrderedSet()
  with jsonlines.open(filename) as reader:
    for obj in reader:
      # print 'orig:\n'
      # print obj
      flattened = flattenjson(obj)
      #print 'keys: %s' % flattened.keys()
      # print 'flattened:\n'
      # print flattened
      fieldnames.update(flattened.keys())
      allRows.append(flattened)

  print "Exporting to CSV..."
  outfilename = filename + '.csv'
  count = 0
  with open(outfilename, 'w') as file:
    csvwriter = csv.DictWriter(file, fieldnames=fieldnames)
    csvwriter.writeheader()
    for obj in allRows:
      # print 'allRows:\n'
      # print obj
      csvwriter.writerow(obj)
      count += 1

  print "Wrote %d rows" % count



if __name__ == '__main__':
  main(sys.argv)

我已经尝试了很多建议的解决方案(也熊猫没有正确地规范化我的JSON),但真正好的是正确解析JSON数据来自Max Berman。

我写了一个改进,以避免每一行都有新列 在解析期间将其放置到现有列。 如果只有一个数据存在,则将值存储为字符串,如果该列有更多值,则将值存储为列表。

它有一个输入。Json文件作为输入,并输出一个output.csv。

import json
import pandas as pd

def flatten_json(json):
    def process_value(keys, value, flattened):
        if isinstance(value, dict):
            for key in value.keys():
                process_value(keys + [key], value[key], flattened)
        elif isinstance(value, list):
            for idx, v in enumerate(value):
                process_value(keys, v, flattened)
                # process_value(keys + [str(idx)], v, flattened)
        else:
            key1 = '__'.join(keys)
            if not flattened.get(key1) is None:
                if isinstance(flattened[key1], list):
                    flattened[key1] = flattened[key1] + [value]
                else:
                    flattened[key1] = [flattened[key1]] + [value]
            else:
                flattened[key1] = value

    flattened = {}
    for key in json.keys():
        k = key
        # print("Key: " + k)
        process_value([key], json[key], flattened)
    return flattened

try:
    f = open("input.json", "r")
except:
    pass
y = json.loads(f.read())
flat = flatten_json(y)
text = json.dumps(flat)
df = pd.read_json(text)
df.to_csv('output.csv', index=False, encoding='utf-8')

由于数据看起来是字典格式,因此似乎应该实际使用csv.DictWriter()来实际输出带有适当标题信息的行。这将使转换更容易处理。然后fieldnames参数将正确地设置顺序,而第一行的输出作为标题将允许稍后由csv.DictReader()读取和处理。

例如,Mike Repass使用

output = csv.writer(sys.stdout)

output.writerow(data[0].keys())  # header row

for row in data:
  output.writerow(row.values())

不过,只需将初始设置更改为 输出= csv。DictWriter数据(文件集,字段名= [0]. keys ())

注意,由于字典中元素的顺序没有定义,您可能必须显式地创建字段名条目。一旦你这样做了,writerow就可以工作了。然后写操作就像最初显示的那样工作。