我有一个JSON文件,我想转换为CSV文件。我如何用Python做到这一点?

我试着:

import json
import csv

f = open('data.json')
data = json.load(f)
f.close()

f = open('data.csv')
csv_file = csv.writer(f)
for item in data:
    csv_file.writerow(item)

f.close()

然而,这并没有起作用。我正在使用Django和我收到的错误是:

`file' object has no attribute 'writerow'`

然后我尝试了以下方法:

import json
import csv

f = open('data.json')
data = json.load(f)
f.close()

f = open('data.csv')
csv_file = csv.writer(f)
for item in data:
    f.writerow(item)  # ← changed

f.close()

然后得到错误:

`sequence expected`

样本json文件:

[{
        "pk": 22,
        "model": "auth.permission",
        "fields": {
            "codename": "add_logentry",
            "name": "Can add log entry",
            "content_type": 8
        }
    }, {
        "pk": 23,
        "model": "auth.permission",
        "fields": {
            "codename": "change_logentry",
            "name": "Can change log entry",
            "content_type": 8
        }
    }, {
        "pk": 24,
        "model": "auth.permission",
        "fields": {
            "codename": "delete_logentry",
            "name": "Can delete log entry",
            "content_type": 8
        }
    }, {
        "pk": 4,
        "model": "auth.permission",
        "fields": {
            "codename": "add_group",
            "name": "Can add group",
            "content_type": 2
        }
    }, {
        "pk": 10,
        "model": "auth.permission",
        "fields": {
            "codename": "add_message",
            "name": "Can add message",
            "content_type": 4
        }
    }
]

当前回答

import json,csv
t=''
t=(type('a'))
json_data = []
data = None
write_header = True
item_keys = []
try:
with open('kk.json') as json_file:
    json_data = json_file.read()

    data = json.loads(json_data)
except Exception as e:
    print( e)

with open('bar.csv', 'at') as csv_file:
    writer = csv.writer(csv_file)#, quoting=csv.QUOTE_MINIMAL)
    for item in data:
        item_values = []
        for key in item:
            if write_header:
                item_keys.append(key)
            value = item.get(key, '')
            if (type(value)==t):
                item_values.append(value.encode('utf-8'))
            else:
                item_values.append(value)
        if write_header:
            writer.writerow(item_keys)
            write_header = False
        writer.writerow(item_values)

其他回答

我假设您的JSON文件将解码为字典列表。首先,我们需要一个将JSON对象扁平化的函数:

def flattenjson(b, delim):
    val = {}
    for i in b.keys():
        if isinstance(b[i], dict):
            get = flattenjson(b[i], delim)
            for j in get.keys():
                val[i + delim + j] = get[j]
        else:
            val[i] = b[i]
            
    return val

在JSON对象上运行这段代码的结果:

flattenjson({
    "pk": 22, 
    "model": "auth.permission", 
    "fields": {
      "codename": "add_message", 
      "name": "Can add message", 
      "content_type": 8
    }
  }, "__")

is

{
    "pk": 22, 
    "model": "auth.permission", 
    "fields__codename": "add_message", 
    "fields__name": "Can add message", 
    "fields__content_type": 8
}

对JSON对象输入数组中的每个dict应用此函数后:

input = map(lambda x: flattenjson( x, "__" ), input)

并查找相关的列名:

columns = [x for row in input for x in row.keys()]
columns = list(set(columns))

在CSV模块中运行这个并不难:

with open(fname, 'wb') as out_file:
    csv_w = csv.writer(out_file)
    csv_w.writerow(columns)

    for i_r in input:
        csv_w.writerow(map(lambda x: i_r.get(x, ""), columns))

我已经尝试了很多建议的解决方案(也熊猫没有正确地规范化我的JSON),但真正好的是正确解析JSON数据来自Max Berman。

我写了一个改进,以避免每一行都有新列 在解析期间将其放置到现有列。 如果只有一个数据存在,则将值存储为字符串,如果该列有更多值,则将值存储为列表。

它有一个输入。Json文件作为输入,并输出一个output.csv。

import json
import pandas as pd

def flatten_json(json):
    def process_value(keys, value, flattened):
        if isinstance(value, dict):
            for key in value.keys():
                process_value(keys + [key], value[key], flattened)
        elif isinstance(value, list):
            for idx, v in enumerate(value):
                process_value(keys, v, flattened)
                # process_value(keys + [str(idx)], v, flattened)
        else:
            key1 = '__'.join(keys)
            if not flattened.get(key1) is None:
                if isinstance(flattened[key1], list):
                    flattened[key1] = flattened[key1] + [value]
                else:
                    flattened[key1] = [flattened[key1]] + [value]
            else:
                flattened[key1] = value

    flattened = {}
    for key in json.keys():
        k = key
        # print("Key: " + k)
        process_value([key], json[key], flattened)
    return flattened

try:
    f = open("input.json", "r")
except:
    pass
y = json.loads(f.read())
flat = flatten_json(y)
text = json.dumps(flat)
df = pd.read_json(text)
df.to_csv('output.csv', index=False, encoding='utf-8')

我可能迟到了,但我想,我已经处理过类似的问题。我有一个json文件,看起来像这样

我只想从这些json文件中提取一些键/值。因此,我编写了下面的代码来提取相同的内容。

    """json_to_csv.py
    This script reads n numbers of json files present in a folder and then extract certain data from each file and write in a csv file.
    The folder contains the python script i.e. json_to_csv.py, output.csv and another folder descriptions containing all the json files.
"""

import os
import json
import csv


def get_list_of_json_files():
    """Returns the list of filenames of all the Json files present in the folder
    Parameter
    ---------
    directory : str
        'descriptions' in this case
    Returns
    -------
    list_of_files: list
        List of the filenames of all the json files
    """

    list_of_files = os.listdir('descriptions')  # creates list of all the files in the folder

    return list_of_files


def create_list_from_json(jsonfile):
    """Returns a list of the extracted items from json file in the same order we need it.
    Parameter
    _________
    jsonfile : json
        The json file containing the data
    Returns
    -------
    one_sample_list : list
        The list of the extracted items needed for the final csv
    """

    with open(jsonfile) as f:
        data = json.load(f)

    data_list = []  # create an empty list

    # append the items to the list in the same order.
    data_list.append(data['_id'])
    data_list.append(data['_modelType'])
    data_list.append(data['creator']['_id'])
    data_list.append(data['creator']['name'])
    data_list.append(data['dataset']['_accessLevel'])
    data_list.append(data['dataset']['_id'])
    data_list.append(data['dataset']['description'])
    data_list.append(data['dataset']['name'])
    data_list.append(data['meta']['acquisition']['image_type'])
    data_list.append(data['meta']['acquisition']['pixelsX'])
    data_list.append(data['meta']['acquisition']['pixelsY'])
    data_list.append(data['meta']['clinical']['age_approx'])
    data_list.append(data['meta']['clinical']['benign_malignant'])
    data_list.append(data['meta']['clinical']['diagnosis'])
    data_list.append(data['meta']['clinical']['diagnosis_confirm_type'])
    data_list.append(data['meta']['clinical']['melanocytic'])
    data_list.append(data['meta']['clinical']['sex'])
    data_list.append(data['meta']['unstructured']['diagnosis'])
    # In few json files, the race was not there so using KeyError exception to add '' at the place
    try:
        data_list.append(data['meta']['unstructured']['race'])
    except KeyError:
        data_list.append("")  # will add an empty string in case race is not there.
    data_list.append(data['name'])

    return data_list


def write_csv():
    """Creates the desired csv file
    Parameters
    __________
    list_of_files : file
        The list created by get_list_of_json_files() method
    result.csv : csv
        The csv file containing the header only
    Returns
    _______
    result.csv : csv
        The desired csv file
    """

    list_of_files = get_list_of_json_files()
    for file in list_of_files:
        row = create_list_from_json(f'descriptions/{file}')  # create the row to be added to csv for each file (json-file)
        with open('output.csv', 'a') as c:
            writer = csv.writer(c)
            writer.writerow(row)
        c.close()


if __name__ == '__main__':
    write_csv()

我希望这能有所帮助。有关此代码如何工作的详细信息,请查看这里

JSON可以表示各种各样的数据结构——JS的“对象”大致类似于Python的dict(带有字符串键),JS的“数组”大致类似于Python列表,只要最后的“叶子”元素是数字或字符串,你就可以嵌套它们。

CSV本质上只能表示一个2-D表——可选的第一行是“标题”,即“列名”,这可以使表可解释为字典列表,而不是正常的解释,一个列表的列表(同样,“叶子”元素可以是数字或字符串)。

So, in the general case, you can't translate an arbitrary JSON structure to a CSV. In a few special cases you can (array of arrays with no further nesting; arrays of objects which all have exactly the same keys). Which special case, if any, applies to your problem? The details of the solution depend on which special case you do have. Given the astonishing fact that you don't even mention which one applies, I suspect you may not have considered the constraint, neither usable case in fact applies, and your problem is impossible to solve. But please do clarify!

使用pandas中的json_normalize:

在名为test.json的文件中使用来自OP的示例数据。 这里使用了Encoding ='utf-8',但在其他情况下可能不需要。 下面的代码利用了pathlib库。 .open是pathlib的一个方法。 也适用于非windows路径。 使用pandas.to_csv(…)将数据保存为csv文件。

import pandas as pd
# As of Pandas 1.01, json_normalize as pandas.io.json.json_normalize is deprecated and is now exposed in the top-level namespace.
# from pandas.io.json import json_normalize
from pathlib import Path
import json

# set path to file
p = Path(r'c:\some_path_to_file\test.json')

# read json
with p.open('r', encoding='utf-8') as f:
    data = json.loads(f.read())

# create dataframe
df = pd.json_normalize(data)

# dataframe view
 pk            model  fields.codename           fields.name  fields.content_type
 22  auth.permission     add_logentry     Can add log entry                    8
 23  auth.permission  change_logentry  Can change log entry                    8
 24  auth.permission  delete_logentry  Can delete log entry                    8
  4  auth.permission        add_group         Can add group                    2
 10  auth.permission      add_message       Can add message                    4

# save to csv
df.to_csv('test.csv', index=False, encoding='utf-8')

CSV输出:

pk,model,fields.codename,fields.name,fields.content_type
22,auth.permission,add_logentry,Can add log entry,8
23,auth.permission,change_logentry,Can change log entry,8
24,auth.permission,delete_logentry,Can delete log entry,8
4,auth.permission,add_group,Can add group,2
10,auth.permission,add_message,Can add message,4

嵌套更重的JSON对象的资源:

所以答案: 用python平化JSON数组 如何平嵌套的JSON递归,与平坦JSON 如何json_normalize一个列与nan 使用pandas将一列字典拆分为单独的列 有关其他相关问题,请参阅json_normalize标记。