因子分析 - ChannelCMT/OFO GitHub Wiki
- SignalDigger是什么?
- SignalDigger vs alphalens
- 数据准备工作
- 如何用SignalDigger测试和分析选股效果?
- 选股效果可视化
-
SignalDigger是一个Python第三方库,专门用于选股因子alpha(α)的绩效分析。
-
它是alphalens的功能集成、简化版,针对A股市场交易制度(如涨跌停)专门进行了一些细节上的优化,适合初学者迅速掌握和使用
下载方式: pip install git+https://github.com/xingetouzi/JAQS.git@fxdayu
github地址: https://github.com/xingetouzi/JAQS/tree/fxdayu
官方网站:https://www.quantos.org/ 可登录该网站注册自己的数据账号
历史数据下载:
Factor数据 链接:https://pan.baidu.com/s/1QHFTn4ya1Z2ph8VFeokP7Q
提取码:zlxr
如果出现cannot set WRITEABLE flag to True of this array的报错,降级numpy的版本。
windows: 在https://www.lfd.uci.edu/~gohlke/pythonlibs/
中找到numpy的1.15.4的版本进行下载安装
mac: pip install numpy==1.15.4
下面以沪深300成分股为例,处理选股因子(signal_data)
from jaqs_fxdayu.data import DataView # 可以视为一个轻量级的数据库,数据格式基于pandas,方便数据的调用和处理
from jaqs_fxdayu.data import RemoteDataService # 数据服务,用于下载数据
import os
import warnings
warnings.filterwarnings("ignore")
dataview_folder = '../Factor'
if not (os.path.isdir(dataview_folder)):
os.makedirs(dataview_folder)
# 加载数据
dv = DataView()
dv.load_dataview(dataview_folder)
Dataview loaded successfully.
- 以pb指标为例,测试pb的大小与沪深300成分股的涨跌关系
- 步骤: 1、定义过滤条件-剔除非指数成分股 2、确认是否能买入卖出(考虑停牌、涨跌停的限制) 3、处理因子 4、因子分析
print(dv.get_ts("pb").head())
symbol 000001.SZ 000002.SZ 000008.SZ 000009.SZ 000012.SZ 000024.SZ \
trade_date
20140102 1.0563 1.2891 4.8981 3.5794 2.3725 1.3202
20140103 1.0304 1.2649 4.8709 3.4842 2.3346 1.2977
20140106 1.0079 1.2068 4.6314 3.4537 2.2036 1.2283
20140107 1.0044 1.1987 4.5661 3.4461 2.1920 1.2013
20140108 1.0157 1.1971 4.4790 3.3852 2.1862 1.1685
symbol 000027.SZ 000039.SZ 000046.SZ 000059.SZ ... 601998.SH \
trade_date ...
20140102 0.9077 2.0483 2.4159 0.8806 ... 0.8216
20140103 0.8861 2.0801 2.3726 0.8488 ... 0.8088
20140106 0.8662 2.0113 2.3348 0.8081 ... 0.7960
20140107 0.8629 2.0721 2.2970 0.7940 ... 0.7939
20140108 0.8728 2.0629 2.3294 0.7904 ... 0.7960
symbol 603000.SH 603160.SH 603288.SH 603699.SH 603799.SH 603833.SH \
trade_date
20140102 10.0487 NaN NaN NaN NaN NaN
20140103 9.8886 NaN NaN NaN NaN NaN
20140106 9.8515 NaN NaN NaN NaN NaN
20140107 10.1024 NaN NaN NaN NaN NaN
20140108 10.3713 NaN NaN NaN NaN NaN
symbol 603858.SH 603885.SH 603993.SH
trade_date
20140102 NaN NaN 2.7133
20140103 NaN NaN 2.6706
20140106 NaN NaN 2.5682
20140107 NaN NaN 2.5682
20140108 NaN NaN 2.5298
[5 rows x 488 columns]
import numpy as np
#定义信号过滤条件-非指数成分
def mask_index_member():
df_index_member = dv.get_ts('index_member')
mask_index_member = df_index_member == 0
return mask_index_member
# 定义可买卖条件——未停牌、未涨跌停
def limit_up_down():
trade_status = dv.get_ts('trade_status')
mask_sus = trade_status == 0
# 涨停
dv.add_formula('up_limit', '(close - Delay(close, 1)) / Delay(close, 1) > 0.095', is_quarterly=False, add_data=True)
# 跌停
dv.add_formula('down_limit', '(close - Delay(close, 1)) / Delay(close, 1) < -0.095', is_quarterly=False, add_data=True)
can_enter = np.logical_and(dv.get_ts('up_limit') < 1, ~mask_sus) # 未涨停未停牌
can_exit = np.logical_and(dv.get_ts('down_limit') < 1, ~mask_sus) # 未跌停未停牌
return can_enter,can_exit
mask = mask_index_member()
can_enter,can_exit = limit_up_down()
print(mask.head())
print(can_enter.head())
symbol 000001.SZ 000002.SZ 000008.SZ 000009.SZ 000012.SZ 000024.SZ \
trade_date
20140102 False False True False False False
20140103 False False True False False False
20140106 False False True False False False
20140107 False False True False False False
20140108 False False True False False False
symbol 000027.SZ 000039.SZ 000046.SZ 000059.SZ ... 601998.SH \
trade_date ...
20140102 True False False True ... False
20140103 True False False True ... False
20140106 True False False True ... False
20140107 True False False True ... False
20140108 True False False True ... False
symbol 603000.SH 603160.SH 603288.SH 603699.SH 603799.SH 603833.SH \
trade_date
20140102 False True True True True True
20140103 False True True True True True
20140106 False True True True True True
20140107 False True True True True True
20140108 False True True True True True
symbol 603858.SH 603885.SH 603993.SH
trade_date
20140102 True True False
20140103 True True False
20140106 True True False
20140107 True True False
20140108 True True False
[5 rows x 488 columns]
symbol 000001.SZ 000002.SZ 000008.SZ 000009.SZ 000012.SZ 000024.SZ \
trade_date
20140102 True True True True True True
20140103 True True True True True True
20140106 True True True True True True
20140107 True True True True True True
20140108 True True True True True True
symbol 000027.SZ 000039.SZ 000046.SZ 000059.SZ ... 601998.SH \
trade_date ...
20140102 True True True True ... True
20140103 True True True True ... True
20140106 True True True True ... True
20140107 True True True True ... True
20140108 True True True True ... True
symbol 603000.SH 603160.SH 603288.SH 603699.SH 603799.SH 603833.SH \
trade_date
20140102 True False False False False False
20140103 True False False False False False
20140106 True False False False False False
20140107 True False False False False False
20140108 True False False False False False
symbol 603858.SH 603885.SH 603993.SH
trade_date
20140102 False False True
20140103 False False True
20140106 False False True
20140107 False False True
20140108 False False True
[5 rows x 488 columns]
from jaqs_fxdayu.research import SignalDigger
obj = SignalDigger(output_folder='./output',
output_format='pdf')
# 处理因子 计算目标股票池每只股票的持有期收益,和对应因子值的quantile分类
obj.process_signal_before_analysis(signal=dv.get_ts("pb"),
price=dv.get_ts("close_adj"),
high=dv.get_ts("high_adj"), # 可为空
low=dv.get_ts("low_adj"),# 可为空
group=dv.get_ts("sw1"),# 可为空
n_quantiles=5,# quantile分类数
mask=mask,# 过滤条件
can_enter=can_enter,# 是否能进场
can_exit=can_exit,# 是否能出场
period=15,# 持有期
benchmark_price=dv.data_benchmark, # 基准价格 可不传入,持有期收益(return)计算为绝对收益
commission = 0.0008,
)
signal_data = obj.signal_data
signal_data.head()
Nan Data Count (should be zero) : 0; Percentage of effective data: 57%
signal | return | upside_ret | downside_ret | quantile | group | ||
---|---|---|---|---|---|---|---|
trade_date | symbol | ||||||
20140103 | 000001.SZ | 1.0563 | -0.003744 | 0.005068 | -0.057799 | 5 | 银行 |
000002.SZ | 1.2891 | 0.012511 | 0.010680 | -0.102841 | 2 | 房地产 | |
000009.SZ | 3.5794 | 0.029817 | 0.025430 | -0.069652 | 4 | 综合 | |
000012.SZ | 2.3725 | 0.021382 | 0.014163 | -0.116760 | 4 | 建筑材料 | |
000024.SZ | 1.3202 | -0.031632 | -0.002781 | -0.161771 | 3 | 房地产 |
from jaqs_fxdayu.research.signaldigger.analysis import analysis
result = analysis(signal_data, is_event=False, period=15)
- ic类:
return_ic/upside_ret_ic/downside_ret_ic
- 持有期收益的ic/持有期最大向上空间的ic/持有期最大向下空间的ic
- 持有收益类
long_ret/short_ret/long_short_ret/top_quantile_ret/bottom_quantile_ret/tmb_ret/all_sample_ret
- 多头组合收益/空头组合收益/多空组合收益/因子值最大组合收益/因子值最小组合收益/因子值最大组(构建多头)+因子值最小组(构建空头)收益/全样本(无论信号大小和方向)-基准组合收益
- 收益空间类
long_space/short_space/long_short_space/top_quantile_space/bottom_quantile_space/tmb_space/all_sample_space
- 多头组合空间/空头组合空间/多空组合空间/因子值最大组合空间/因子值最小组合空间/因子值最大组(构建多头)+因子值最小组(构建空头)空间/全样本(无论信号大小和方向)-基准组合空间
- ic类
"IC Mean", "IC Std.", "t-stat(IC)", "p-value(IC)", "IC Skew", "IC Kurtosis", "Ann. IR"
- IC均值,IC标准差,IC的t统计量,对IC做0均值假设检验的p-value,IC偏度,IC峰度,iC的年化信息比率-mean/std
- 持有收益类
't-stat', "p-value", "skewness", "kurtosis", "Ann. Ret", "Ann. Vol", "Ann. IR", "occurance"
- 持有期收益的t统计量,对持有期收益做0均值假设检验的p-value,偏度,峰度,持有期收益年化值,年化波动率,年化信息比率-年化收益/年化波动率,样本数量
- 收益空间类
'Up_sp Mean','Up_sp Std','Up_sp IR','Up_sp Pct5', 'Up_sp Pct25 ','Up_sp Pct50 ', 'Up_sp Pct75','Up_sp Pct95','Up_sp Occur','Down_sp Mean','Down_sp Std', 'Down_sp IR', 'Down_sp Pct5','Down_sp Pct25 ','Down_sp Pct50 ','Down_sp Pct75', 'Down_sp Pct95','Down_sp Occur'
- 上行空间均值,上行空间标准差,上行空间信息比率-均值/标准差,上行空间5%分位数,..25%分位数,..中位数,..75%分位数,..95%分位数,上行空间样本数,
- 下行..(同上)
print("——ic分析——")
print(result["ic"])
print("——选股收益分析——")
print(result["ret"])
print("——最大潜在盈利/亏损分析——")
print(result["space"])
——ic分析——
return_ic upside_ret_ic downside_ret_ic
IC Mean -6.945807e-02 5.937467e-02 -2.219792e-01
IC Std. 2.594710e-01 2.464900e-01 2.104796e-01
t-stat(IC) -8.298423e+00 7.467299e+00 -3.269370e+01
p-value(IC) 3.549725e-16 1.835765e-13 3.604265e-158
IC Skew 5.251623e-02 -4.326744e-01 5.580971e-01
IC Kurtosis -7.602440e-01 -4.722475e-01 1.068205e-01
Ann. IR -2.676911e-01 2.408806e-01 -1.054635e+00
——选股收益分析——
long_ret long_short_ret top_quantile_ret bottom_quantile_ret \
t-stat -4.354131 -6.429742 -17.141209 14.774824
p-value 0.000010 0.000000 0.000000 0.000000
skewness -0.056936 0.043832 0.907698 1.758145
kurtosis 2.476179 1.086328 5.921713 10.078364
Ann. Ret -0.074512 -0.093044 -0.123996 0.072271
Ann. Vol 0.132007 0.111627 0.375244 0.312530
Ann. IR -0.564453 -0.833528 -0.330441 0.231244
occurance 961.000000 961.000000 43414.000000 65862.000000
tmb_ret all_sample_ret
t-stat -11.707032 -4.942966
p-value 0.000000 0.000000
skewness -0.321151 1.263993
kurtosis 1.525487 7.805175
Ann. Ret -0.199588 -0.012879
Ann. Vol 0.131511 0.336874
Ann. IR -1.517656 -0.038232
occurance 961.000000 269680.000000
——最大潜在盈利/亏损分析——
long_space top_quantile_space bottom_quantile_space \
Up_sp Mean 0.086507 0.086331 0.082828
Up_sp Std 0.052492 0.094298 0.092627
Up_sp IR 1.647994 0.915512 0.894207
Up_sp Pct5 0.031881 0.001767 0.001918
Up_sp Pct25 0.053368 0.024431 0.022939
Up_sp Pct50 0.070421 0.058863 0.055114
Up_sp Pct75 0.102281 0.115623 0.111016
Up_sp Pct95 0.198886 0.266610 0.254299
Up_sp Occur 961.000000 43414.000000 65862.000000
Down_sp Mean -0.119896 -0.116442 -0.081603
Down_sp Std 0.107698 0.207111 0.160986
Down_sp IR -1.113264 -0.562221 -0.506896
Down_sp Pct5 -0.292471 -0.455604 -0.259862
Down_sp Pct25 -0.128481 -0.108653 -0.078294
Down_sp Pct50 -0.092179 -0.055193 -0.038831
Down_sp Pct75 -0.067415 -0.025264 -0.017222
Down_sp Pct95 -0.047044 -0.004709 -0.003275
Down_sp Occur 961.000000 43414.000000 65862.000000
tmb_space all_sample_space
Up_sp Mean 0.169767 0.084070
Up_sp Std 0.086265 0.092138
Up_sp IR 1.967965 0.912434
Up_sp Pct5 0.086604 0.001867
Up_sp Pct25 0.115088 0.023622
Up_sp Pct50 0.141803 0.057282
Up_sp Pct75 0.205614 0.113234
Up_sp Pct95 0.349901 0.257430
Up_sp Occur 961.000000 269680.000000
Down_sp Mean -0.201158 -0.099619
Down_sp Std 0.108447 0.190084
Down_sp IR -1.854904 -0.524077
Down_sp Pct5 -0.401382 -0.346045
Down_sp Pct25 -0.234492 -0.091120
Down_sp Pct50 -0.168090 -0.045931
Down_sp Pct75 -0.134734 -0.020540
Down_sp Pct95 -0.093423 -0.003896
Down_sp Occur 961.000000 269680.000000
- 累计收益计算方法:将资金按持有天数等分,每天取一份买入所选股票-可以用该方式复制投资组合
- 相对收益计算方法:减去benchmark对应持有期的收益
import matplotlib.pyplot as plt
obj.create_full_report()
plt.show()
Value of signals of Different Quantiles Statistics
min max mean std count count %
quantile
1 0.4286 19.3789 1.743018 0.949067 65862 24.422278
2 0.5307 14.6442 2.389186 1.263866 52990 19.649214
3 0.7358 21.6033 3.113060 1.765945 54424 20.180955
4 0.8013 36.3482 4.236022 2.608604 52990 19.649214
5 0.8848 5750.5164 7.938777 65.757157 43414 16.098339
Figure saved: E:\2019Course\QTC2019\QTC2019\newJaqs\output\returns_report.pdf
Information Analysis
ic
IC Mean -0.069
IC Std. 0.259
t-stat(IC) -8.298
p-value(IC) 0.000
IC Skew 0.053
IC Kurtosis -0.760
Ann. IR -0.268
Figure saved: E:\2019Course\QTC2019\QTC2019\newJaqs\output\information_report.pdf
<matplotlib.figure.Figure at 0x2350331d748>
# 分组分析
from jaqs_fxdayu.research.signaldigger import performance as pfm
ic = pfm.calc_signal_ic(signal_data, by_group=True)
mean_ic_by_group = pfm.mean_information_coefficient(ic, by_group=True)
from jaqs_fxdayu.research.signaldigger import plotting
plt.rcParams['font.sans-serif']=['SimHei']
plt.rcParams['axes.unicode_minus']=False
plotting.plot_ic_by_group(mean_ic_by_group)
plt.show()
excel_data = signal_data[signal_data['quantile']==1]["quantile"].unstack().replace(np.nan, 0)
print (excel_data.head())
excel_data.to_excel('./pb_quantile_1.xlsx')
symbol 000001.SZ 000002.SZ 000009.SZ 000024.SZ 000027.SZ 000039.SZ \
trade_date
20140103 0.0 0.0 0.0 0.0 0.0 0.0
20140106 0.0 0.0 0.0 0.0 0.0 0.0
20140107 0.0 0.0 0.0 0.0 0.0 0.0
20140108 0.0 0.0 0.0 0.0 0.0 0.0
20140109 0.0 0.0 0.0 0.0 0.0 0.0
symbol 000063.SZ 000069.SZ 000100.SZ 000156.SZ ... 601929.SH \
trade_date ...
20140103 0.0 0.0 1.0 0.0 ... 0.0
20140106 0.0 0.0 1.0 0.0 ... 0.0
20140107 0.0 0.0 0.0 0.0 ... 0.0
20140108 0.0 0.0 0.0 0.0 ... 0.0
20140109 0.0 0.0 1.0 0.0 ... 0.0
symbol 601933.SH 601939.SH 601958.SH 601988.SH 601989.SH 601991.SH \
trade_date
20140103 0.0 0.0 0.0 1.0 1.0 0.0
20140106 0.0 0.0 0.0 1.0 1.0 0.0
20140107 0.0 0.0 0.0 1.0 1.0 0.0
20140108 0.0 0.0 0.0 1.0 1.0 0.0
20140109 0.0 0.0 0.0 1.0 1.0 0.0
symbol 601992.SH 601998.SH 603858.SH
trade_date
20140103 0.0 1.0 0.0
20140106 0.0 1.0 0.0
20140107 0.0 1.0 0.0
20140108 0.0 1.0 0.0
20140109 0.0 1.0 0.0
[5 rows x 253 columns]