我尝试着创造一个股票市场模拟器(也许最终会发展成一个预测AI),但是我在寻找数据方面遇到了困难。我正在寻找(希望是免费的)历史股票市场数据的来源。
理想情况下,它将是一个非常细粒度(秒或分钟间隔)的数据集,包含纳斯达克和纽约证券交易所(如果我有冒险精神,可能还包括其他)的每个符号的价格和交易量。有人知道这类信息的来源吗?
我发现这个问题表明雅虎提供CSV格式的历史数据,但我一直无法找到如何在粗略的检查网站链接得到它。
我也不喜欢在CSV文件中逐个下载数据的想法……我想雅虎会很生气,在我收到几千个请求后就把我关了。
我还发现了另一个问题,让我觉得我中了大奖,但不幸的是,OpenTick网站似乎已经关闭了它的大门……太糟糕了,因为我觉得这正是我想要的。
我还可以使用每天每个符号的开盘/收盘价格和成交量的数据,但我更喜欢所有的数据,如果我能得到的话。还有其他建议吗?
这个答案不再准确,因为雅虎feed已经不复存在
使用雅虎的CSV方法,你也可以获得历史数据!
您可以逆向工程如下示例:
http://ichart.finance.yahoo.com/table.csv?s=YHOO&d=0&e=28&f=2010&g=d&a=3&b=12&c=1996&ignore=.csv
从本质上讲:
sn = TICKER
a = fromMonth-1
b = fromDay (two digits)
c = fromYear
d = toMonth-1
e = toDay (two digits)
f = toYear
g = d for day, m for month, y for yearly
参数的完整列表:
a Ask
a2 Average Daily Volume
a5 Ask Size
b Bid
b2 Ask (Real-time)
b3 Bid (Real-time)
b4 Book Value
b6 Bid Size
c Change & Percent Change
c1 Change
c3 Commission
c6 Change (Real-time)
c8 After Hours Change (Real-time)
d Dividend/Share
d1 Last Trade Date
d2 Trade Date
e Earnings/Share
e1 Error Indication (returned for symbol changed / invalid)
e7 EPS Estimate Current Year
e8 EPS Estimate Next Year
e9 EPS Estimate Next Quarter
f6 Float Shares
g Day's Low
h Day's High
j 52-week Low
k 52-week High
g1 Holdings Gain Percent
g3 Annualized Gain
g4 Holdings Gain
g5 Holdings Gain Percent (Real-time)
g6 Holdings Gain (Real-time)
i More Info
i5 Order Book (Real-time)
j1 Market Capitalization
j3 Market Cap (Real-time)
j4 EBITDA
j5 Change From 52-week Low
j6 Percent Change From 52-week Low
k1 Last Trade (Real-time) With Time
k2 Change Percent (Real-time)
k3 Last Trade Size
k4 Change From 52-week High
k5 Percent Change From 52-week High
l Last Trade (With Time)
l1 Last Trade (Price Only)
l2 High Limit
l3 Low Limit
m Day's Range
m2 Day's Range (Real-time)
m3 50-day Moving Average
m4 200-day Moving Average
m5 Change From 200-day Moving Average
m6 Percent Change From 200-day Moving Average
m7 Change From 50-day Moving Average
m8 Percent Change From 50-day Moving Average
n Name
n4 Notes
o Open
p Previous Close
p1 Price Paid
p2 Change in Percent
p5 Price/Sales
p6 Price/Book
q Ex-Dividend Date
r P/E Ratio
r1 Dividend Pay Date
r2 P/E Ratio (Real-time)
r5 PEG Ratio
r6 Price/EPS Estimate Current Year
r7 Price/EPS Estimate Next Year
s Symbol
s1 Shares Owned
s7 Short Ratio
t1 Last Trade Time
t6 Trade Links
t7 Ticker Trend
t8 1 yr Target Price
v Volume
v1 Holdings Value
v7 Holdings Value (Real-time)
w 52-week Range
w1 Day's Value Change
w4 Day's Value Change (Real-time)
x Stock Exchange
y Dividend Yield
雅虎是获得初步免费数据的最简单选择。eckesicle回答中描述的链接可以很容易地在python代码中使用,但首先需要所有的标记。在这个例子中,我将使用纽约证券交易所,但这也可以用于不同的交易所。
我使用这个维基页面下载了以下脚本(我不是一个很有天赋的python主义者,如果这段代码不是很有效的话,很抱歉):
import string
import urllib2
from bs4 import BeautifulSoup
global f
def download_page(url):
aurl = urllib2.urlopen(url)
soup = BeautifulSoup(aurl.read())
print url
for row in soup('table')[1]('tr'):
tds = row('td')
if (len(tds) > 0):
f.write(tds[1].string + '\n')
f = open('stock_names.txt', 'w')
url_part1 = 'http://en.wikipedia.org/wiki/Companies_listed_on_the_New_York_Stock_Exchange_'
url = url_part1 + '(0-9)'
download_page(url)
for letter in string.uppercase[:26]:
url_part2 = letter
url = url_part1 + '(' + letter + ')'
download_page(url)
f.close()
为了下载每个股票,我使用了另一个非常类似的脚本:
import string
import urllib2
from bs4 import BeautifulSoup
global f
url_part1 = 'http://ichart.finance.yahoo.com/table.csv?s='
url_part2 = '&d=0&e=28&f=2010&g=d&a=3&b=12&c=1996&ignore=.csv'
print "Starting"
f = open('stock_names.txt', 'r')
file_content = f.readlines()
count = 1;
print "About %d tickers will be downloaded" % len(file_content)
for ticker in file_content:
ticker = ticker.strip()
url = url_part1 + ticker + url_part2
try:
# This will cause exception on a 404
response = urllib2.urlopen(url)
print "Downloading ticker %s (%d out of %d)" % (ticker, count, len(file_content))
count = count + 1
history_file = open('C:\\Users\\Nitay\\Desktop\\Historical Data\\' + ticker + '.csv', 'w')
history_file.write(response.read())
history_file.close()
except Exception, e:
pass
f.close()
注意,这种方法的主要缺点是不同的公司可以获得不同的数据——在请求日期(新列出的)没有数据的公司将会得到404页面。
还要记住,这种方法只适用于初步数据——如果你真的想测试你的算法,你应该花点钱,并使用CSIData或其他值得信赖的数据供应商