我有一个大约有2000条记录的CSV文件。
每条记录都有一个字符串和一个类别:
This is the first line,Line1
This is the second line,Line2
This is the third line,Line3
我需要把这个文件读入一个列表,看起来像这样:
data = [('This is the first line', 'Line1'),
('This is the second line', 'Line2'),
('This is the third line', 'Line3')]
如何使用Python将此CSV导入到我需要的列表?
针对Python 3更新:
import csv
with open('file.csv', newline='') as f:
reader = csv.reader(f)
your_list = list(reader)
print(your_list)
输出:
[['This is the first line', 'Line1'], ['This is the second line', 'Line2'], ['This is the third line', 'Line3']]
Python3的更新:
import csv
from pprint import pprint
with open('text.csv', newline='') as file:
reader = csv.reader(file)
res = list(map(tuple, reader))
pprint(res)
输出:
[('This is the first line', ' Line1'),
('This is the second line', ' Line2'),
('This is the third line', ' Line3')]
如果csvfile是一个文件对象,它应该用newline= "打开。
csv模块
扩展一下您的需求,假设您不关心行顺序,并希望将它们分组到类别下,下面的解决方案可能适合您:
>>> fname = "lines.txt"
>>> from collections import defaultdict
>>> dct = defaultdict(list)
>>> with open(fname) as f:
... for line in f:
... text, cat = line.rstrip("\n").split(",", 1)
... dct[cat].append(text)
...
>>> dct
defaultdict(<type 'list'>, {' CatA': ['This is the first line', 'This is the another line'], ' CatC': ['This is the third line'], ' CatB': ['This is the second line', 'This is the last line']})
通过这种方式,您可以在类别的键下获得字典中所有可用的相关行。
不幸的是,我发现现有的答案没有一个特别令人满意。
这里是一个简单而完整的Python 3解决方案,使用csv模块。
import csv
with open('../resources/temp_in.csv', newline='') as f:
reader = csv.reader(f, skipinitialspace=True)
rows = list(reader)
print(rows)
注意skipinitialspace=True参数。这是必要的,因为不幸的是,OP的CSV在每个逗号后都包含空格。
输出:
[['This is the first line', 'Line1'], ['This is the second line', 'Line2'], ['This is the third line', 'Line3']]