@Channelchan
2017-03-08T03:24:43.000000Z
字数 1735
阅读 26438
未分类
np.cov(X,Y)
np.corrcoef(X,Y)
np.cov(X,Y)[0,1]/(np.std(X)*np.std(Y))
计算000001、000005股票与深圳指数的相关性
import numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport tushare as tsasset1 = ts.get_k_data('000001', start='2016-01-01', end='2016-12-31', ktype='D',autype='qfq')asset1.index = pd.to_datetime(asset1['date'], format='%Y-%m-%d')asset1 = asset1['close']asset2 = ts.get_k_data('000005', start='2016-01-01', end='2016-12-31', ktype='D',autype='qfq')asset2.index = pd.to_datetime(asset2['date'], format='%Y-%m-%d')asset2 = asset2['close']benchmark = ts.get_hist_data('sh', start='2016-01-01', end='2016-12-31', ktype='D')[::-1]benchmark = benchmark['close']new = pd.concat([asset1, asset2, benchmark],join='inner', axis=1)new.columns = ['asset1', 'asset2', 'benchmark']
print "Correlation coefficients"print "000001 and benchmark: ", np.corrcoef(new['asset1'],new['benchmark'])[0,1]print "000005 and benchmark: ", np.corrcoef(new['asset2'],new['benchmark'])[0,1]print "000001 and 000005: ", np.corrcoef(new['asset1'],new['asset2'])[0,1]print "000001 and 000005: ", np.cov(new['asset1'],new['asset2'])[0,1]/(np.std(new['asset1'])*np.std(new['asset2']))
Correlation coefficients000001 and benchmark: 0.904350480115000005 and benchmark: 0.329516731028# 由于degree of freedom 结果会有不同000001 and 000005: 0.138377116304000001 and 000005: 0.138946569458
高相关性图表
plt.scatter(new['asset1'], new['benchmark'])plt.show()
由于相关性会随着时间的变化而变化,目前计算出来的相关性不代表未来,因此我们需要通过调整不同周期来计算动态的相关性系数,并且计算相关系数的分布情况,以便对未来做区间估计。
周期为60天的动态相关性计算
rolling_correlation = new['asset1'].rolling(window=60).corr(new['benchmark'])plt.subplot(2,1,1)plt.plot(rolling_correlation)plt.xlabel('Day')plt.ylabel('60day Rolling Correlation')plt.subplot(2,1,2)plt.hist(rolling_correlation.dropna())plt.show()
Determining related Strategies
