是否有一种直接的方法将CSV文件的内容导入到记录数组中,就像R的read.table(), read.delim()和read.csv()将数据导入到R数据框架中一样?

或者我应该使用csv.reader(),然后应用numpy.core.records.fromrecords()?


当前回答

In [329]: %time my_data = genfromtxt('one.csv', delimiter=',')
CPU times: user 19.8 s, sys: 4.58 s, total: 24.4 s
Wall time: 24.4 s

In [330]: %time df = pd.read_csv("one.csv", skiprows=20)
CPU times: user 1.06 s, sys: 312 ms, total: 1.38 s
Wall time: 1.38 s

其他回答

我试了一下:

from numpy import genfromtxt
genfromtxt(fname = dest_file, dtype = (<whatever options>))

对比:

import csv
import numpy as np
with open(dest_file,'r') as dest_f:
    data_iter = csv.reader(dest_f,
                           delimiter = delimiter,
                           quotechar = '"')
    data = [data for data in data_iter]
data_array = np.asarray(data, dtype = <whatever options>)

对460万行约70列进行了分析,发现NumPy路径花费了2分16秒,csv-list理解方法花费了13秒。

我会推荐csv-list理解方法,因为它很可能依赖于预编译的库,而不是像NumPy那样依赖解释器。我怀疑pandas方法也有类似的解释器开销。

这件作品很有魅力……

import csv
with open("data.csv", 'r') as f:
    data = list(csv.reader(f, delimiter=";"))

import numpy as np
data = np.array(data, dtype=np.float)

这是一个非常简单的任务,最好的方法如下

import pandas as pd
import numpy as np


df = pd.read_csv(r'C:\Users\Ron\Desktop\Clients.csv')   #read the file (put 'r' before the path string to address any special characters in the file such as \). Don't forget to put the file name at the end of the path + ".csv"

print(df)`

y = np.array(df)

使用numpy.genfromtxt(),将分隔符kwarg设置为逗号:

from numpy import genfromtxt
my_data = genfromtxt('my_file.csv', delimiter=',')
In [329]: %time my_data = genfromtxt('one.csv', delimiter=',')
CPU times: user 19.8 s, sys: 4.58 s, total: 24.4 s
Wall time: 24.4 s

In [330]: %time df = pd.read_csv("one.csv", skiprows=20)
CPU times: user 1.06 s, sys: 312 ms, total: 1.38 s
Wall time: 1.38 s