用csv模块从csv文件中读取特定的列?

我试图通过csv文件进行解析，并仅从特定列中提取数据。

例csv:

ID | Name | Address | City | State | Zip | Phone | OPEID | IPEDS |
10 | C... | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |

我试图只捕获特定的列，比如ID、Name、Zip和Phone。

我看过的代码让我相信我可以通过对应的数字调用特定的列，因此ie: Name将对应于2，并且使用行[2]遍历每一行将产生列2中的所有项。但事实并非如此。

以下是我目前所做的:

import sys, argparse, csv
from settings import *

# command arguments
parser = argparse.ArgumentParser(description='csv to postgres',\
 fromfile_prefix_chars="@" )
parser.add_argument('file', help='csv file to import', action='store')
args = parser.parse_args()
csv_file = args.file

# open csv file
with open(csv_file, 'rb') as csvfile:

    # get number of columns
    for line in csvfile.readlines():
        array = line.split(',')
        first_item = array[0]

    num_columns = len(array)
    csvfile.seek(0)

    reader = csv.reader(csvfile, delimiter=' ')
        included_cols = [1, 2, 6, 7]

    for row in reader:
            content = list(row[i] for i in included_cols)
            print content

我期望它只打印出每行我想要的特定列，但它没有，我只打印出最后一列。

当前回答

如果需要单独处理列，我喜欢使用zip(*iterable)模式(有效地“unzip”)来解构列。举个例子:

ids, names, zips, phones = zip(*(
  (row[1], row[2], row[6], row[7])
  for row in reader
))

2019-01-15 18:59:25

其他回答

import pandas as pd

dataset = pd.read_csv('Train.csv')
X = dataset.iloc[:, 1:-1].values
y = dataset.iloc[:, -1].values

X是一堆列，如果你想读更多的列，就用它 Y是单列，用它来读一列 [:， 1:-1]是[row_index: to_row_index, column_index: to_column_index]

2021-11-20 11:21:32

从CSV文件读写，您可以导入CSV并使用以下代码:

with open('names.csv', newline='') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        print(row['first_name'], row['last_name'])

2022-11-13 18:24:39

import csv
from collections import defaultdict

columns = defaultdict(list) # each value in each column is appended to a list

with open('file.txt') as f:
    reader = csv.DictReader(f) # read rows into a dictionary format
    for row in reader: # read a row as {column1: value1, column2: value2,...}
        for (k,v) in row.items(): # go over each column name and value 
            columns[k].append(v) # append the value into the appropriate list
                                 # based on column name k

print(columns['name'])
print(columns['phone'])
print(columns['street'])

像这样的文件

name,phone,street
Bob,0893,32 Silly
James,000,400 McHilly
Smithers,4442,23 Looped St.

将输出

>>> 
['Bob', 'James', 'Smithers']
['0893', '000', '4442']
['32 Silly', '400 McHilly', '23 Looped St.']

或者如果你想对列进行数字索引:

with open('file.txt') as f:
    reader = csv.reader(f)
    next(reader)
    for row in reader:
        for (i,v) in enumerate(row):
            columns[i].append(v)
print(columns[0])

>>> 
['Bob', 'James', 'Smithers']

要更改分隔符，请将delimiter=" "添加到适当的实例化，即reader = csv。读者(f,分隔符= " ")

2013-05-12 02:34:02

由于你可以索引和子集pandas数据框架，一个非常简单的方法从csv文件提取单列到一个变量是:

myVar = pd.read_csv('YourPath', sep = ",")['ColumnName']

有几件事需要考虑:

上面的代码片段将生成一个pandas系列，而不是数据框架。如果速度是一个问题，ayhan和usecols的建议也会更快。在一个2122 KB大小的csv文件上使用%timeit测试这两种不同的方法，usecols方法得到22.8 ms的结果，而我建议的方法得到53 ms的结果。

别忘了进口熊猫当pd

2018-12-10 08:33:55

如果需要单独处理列，我喜欢使用zip(*iterable)模式(有效地“unzip”)来解构列。举个例子:

ids, names, zips, phones = zip(*(
  (row[1], row[2], row[6], row[7])
  for row in reader
))

2019-01-15 18:59:25

用csv模块从csv文件中读取特定的列?

推荐文章

最新文章

标签