是否有一种直接的方法将CSV文件的内容导入到记录数组中,就像R的read.table(), read.delim()和read.csv()将数据导入到R数据框架中一样?

或者我应该使用csv.reader(),然后应用numpy.core.records.fromrecords()?


当前回答

这是一个非常简单的任务,最好的方法如下

import pandas as pd
import numpy as np


df = pd.read_csv(r'C:\Users\Ron\Desktop\Clients.csv')   #read the file (put 'r' before the path string to address any special characters in the file such as \). Don't forget to put the file name at the end of the path + ".csv"

print(df)`

y = np.array(df)

其他回答

当我尝试使用NumPy和Pandas两种方式时,使用Pandas有很多优点:

快 减少CPU占用 与NumPy genfromttxt相比,RAM占用了1/3

这是我的测试代码:

$ for f in test_pandas.py test_numpy_csv.py ; do  /usr/bin/time python $f; done
2.94user 0.41system 0:03.05elapsed 109%CPU (0avgtext+0avgdata 502068maxresident)k
0inputs+24outputs (0major+107147minor)pagefaults 0swaps

23.29user 0.72system 0:23.72elapsed 101%CPU (0avgtext+0avgdata 1680888maxresident)k
0inputs+0outputs (0major+416145minor)pagefaults 0swaps

test_numpy_csv.py

from numpy import genfromtxt
train = genfromtxt('/home/hvn/me/notebook/train.csv', delimiter=',')

test_pandas.py

from pandas import read_csv
df = read_csv('/home/hvn/me/notebook/train.csv')

数据文件:

du -h ~/me/notebook/train.csv
 59M    /home/hvn/me/notebook/train.csv

在NumPy和熊猫版本:

$ pip freeze | egrep -i 'pandas|numpy'
numpy==1.13.3
pandas==0.20.2

使用numpy.genfromtxt(),将分隔符kwarg设置为逗号:

from numpy import genfromtxt
my_data = genfromtxt('my_file.csv', delimiter=',')
In [329]: %time my_data = genfromtxt('one.csv', delimiter=',')
CPU times: user 19.8 s, sys: 4.58 s, total: 24.4 s
Wall time: 24.4 s

In [330]: %time df = pd.read_csv("one.csv", skiprows=20)
CPU times: user 1.06 s, sys: 312 ms, total: 1.38 s
Wall time: 1.38 s

我建议使用表(pip3 install tables)。您可以使用pandas将.csv文件保存到.h5。

import pandas as pd
data = pd.read_csv("dataset.csv")
store = pd.HDFStore('dataset.h5')
store['mydata'] = data
store.close()

然后,即使是大量的数据,您也可以轻松地用更少的时间将数据加载到NumPy数组中。

import pandas as pd
store = pd.HDFStore('dataset.h5')
data = store['mydata']
store.close()

# Data in NumPy format
data = data.values

还可以尝试recfromcsv(),它可以猜测数据类型并返回正确格式化的记录数组。