原创文章第447篇,专注“AI量化投资、个人成长与财富自由"。
本周星球代码计划——因子分析,因子挖掘:
1、(因子表达式优化)Alpha158以及world quant101部分因子实现。
2、基于lightgbm的因子筛选。?
3、优秀因子的单因子分析。?
4、deepalphagen和gplearn部分代码优化。
或者可以这么说,当前主流私募基金的做法就是因子挖掘。——手工挖,遗传算法,机器学习等。然后合成因子。
因为因子到策略,还有很多人工可以干预的环节。
完全所谓端到端,哪怕就是现在最强的GPT-5也做不到。当下大模型可以真正落地的场景,还是Agent生态,通过多环节的串联+工具来实现预期的目标。
投资也如此。
最多到端对端挖掘因子,然后把因子拿出来做策略。
因子表达式重构
两个包装器,一个按symbol来groupby,另一个按date来groupby
def calc_by_date(func): @wraps(func) def wrapper(*args, **kwargs): other_args = [] se_args = [] se_names = [] for arg in args: if type(arg) is not pd.Series: other_args.append(arg) else: se_args.append(arg) se_names.append(arg.name) if len(se_args) == 1: ret = se_args[0].groupby(level=0, group_keys=False).apply(lambda x: func(x, *other_args, **kwargs)) elif len(se_args) > 1: count = len(se_args) df = pd.concat(se_args, axis=1) df.index = se_args[0].index ret = df.groupby(level=0, group_keys=False).apply( lambda sub_df: func(*[sub_df[name] for name in se_names], *other_args)) ret.index = df.index return ret return wrapper def calc_by_symbol(func): @wraps(func) def wrapper(*args, **kwargs): other_args = [] se_args = [] se_names = [] for arg in args: if type(arg) is not pd.Series: other_args.append(arg) else: se_args.append(arg) se_names.append(arg.name) if len(se_args) == 1: ret = se_args[0].groupby(level=1, group_keys=False).apply(lambda x: func(x, *other_args, **kwargs)) elif len(se_args) > 1: count = len(se_args) df = pd.concat(se_args, axis=1) df.index = se_args[0].index ret = df.groupby(level=1, group_keys=False).apply( lambda sub_df: func(*[sub_df[name] for name in se_names], *other_args)) ret.index = df.index return ret return wrapper
目前,需要使用截面排序rank,需要截面排序。
这样做的好处,就是所有的计算子函数都不需要变更了,加上这个包装器即可:
@calc_by_symbol def ta_atr(high, low, close, period=14): se = talib.ATR(high, low, close, period) se = pd.Series(se) se.index = high.index return se
@calc_by_date def rank(se: pd.Series): ret = se.rank(pct=True) return ret
这样csv_dataloader加载数据,因子计算速度也快了很多,最重要的是,我们解决了rank,ts_rank嵌套的问题。
Alpha158因子库
代码完全移植过来了,qlib的158因子,大家感兴趣可以把alpha360也迁移过来。
代码在如下位置:
from datafeed import AlphaBase class Alpha158(AlphaBase): def get_fields_names(self): # ['CORD30', 'STD30', 'CORR5', 'RESI10', 'CORD60', 'STD5', 'LOW0', # 'WVMA30', 'RESI5', 'ROC5', 'KSFT', 'STD20', 'RSV5', 'STD60', 'KLEN'] fields = [] names = [] # kbar fields += [ "(close-open)/open", "(high-low)/open", "(close-open)/(high-low+1e-12)", "(high-greater(open, close))/open", "(high-greater(open, close))/(high-low+1e-12)", "(less(open, close)-low)/open", "(less(open, close)-low)/(high-low+1e-12)", "(2*close-high-low)/open", "(2*close-high-low)/(high-low+1e-12)", ] names += [ "KMID", "KLEN", "KMID2", "KUP", "KUP2", "KLOW", "KLOW2", "KSFT", "KSFT2", ] # =========== price ========== feature = ["OPEN", "HIGH", "LOW", "CLOSE"] windows = range(5) for field in feature: field = field.lower() fields += ["shift(%s, %d)/close" % (field, d) if d != 0 else "%s/close" % field for d in windows] names += [field.upper() + str(d) for d in windows] # ================ volume =========== fields += ["shift(volume, %d)/(volume+1e-12)" % d if d != 0 else "volume/(volume+1e-12)" for d in windows] names += ["VOLUME" + str(d) for d in windows] # ================= rolling ==================== windows = [5, 10, 20, 30, 60] fields += ["shift(close, %d)/close" % d for d in windows] names += ["ROC%d" % d for d in windows] fields += ["mean(close, %d)/close" % d for d in windows] names += ["MA%d" % d for d in windows] fields += ["std(close, %d)/close" % d for d in windows] names += ["STD%d" % d for d in windows] #fields += ["slope(close, %d)/close" % d for d in windows] #names += ["BETA%d" % d for d in windows] fields += ["max(high, %d)/close" % d for d in windows] names += ["MAX%d" % d for d in windows] fields += ["min(low, %d)/close" % d for d in windows] names += ["MIN%d" % d for d in windows] fields += ["quantile(close, %d, 0.8)/close" % d for d in windows] names += ["QTLU%d" % d for d in windows] fields += ["quantile(close, %d, 0.2)/close" % d for d in windows] names += ["QTLD%d" % d for d in windows] #fields += ["ts_rank(close, %d)" % d for d in windows] #names += ["RANK%d" % d for d in windows] fields += ["(close-min(low, %d))/(max(high, %d)-min(low, %d)+1e-12)" % (d, d, d) for d in windows] names += ["RSV%d" % d for d in windows] fields += ["idxmax(high, %d)/%d" % (d, d) for d in windows] names += ["IMAX%d" % d for d in windows] fields += ["idxmin(low, %d)/%d" % (d, d) for d in windows] names += ["IMIN%d" % d for d in windows] fields += ["(idxmax(high, %d)-idxmin(low, %d))/%d" % (d, d, d) for d in windows] names += ["IMXD%d" % d for d in windows] fields += ["corr(close, log(volume+1), %d)" % d for d in windows] names += ["CORR%d" % d for d in windows] fields += ["corr(close/shift(close,1), log(volume/shift(volume, 1)+1), %d)" % d for d in windows] names += ["CORD%d" % d for d in windows] fields += ["mean(close>shift(close, 1), %d)" % d for d in windows] names += ["CNTP%d" % d for d in windows] fields += ["mean(close<shift(close, 1), %d)" % d for d in windows] names += ["CNTN%d" % d for d in windows] fields += ["mean(close>shift(close, 1), %d)-mean(close<shift(close, 1), %d)" % (d, d) for d in windows] names += ["CNTD%d" % d for d in windows] fields += [ "sum(greater(close-shift(close, 1), 0), %d)/(sum(Abs(close-shift(close, 1)), %d)+1e-12)" % (d, d) for d in windows ] names += ["SUMP%d" % d for d in windows] fields += [ "sum(greater(shift(close, 1)-close, 0), %d)/(sum(Abs(close-shift(close, 1)), %d)+1e-12)" % (d, d) for d in windows ] names += ["SUMN%d" % d for d in windows] fields += [ "(sum(greater(close-shift(close, 1), 0), %d)-sum(greater(shift(close, 1)-close, 0), %d))" "/(sum(Abs(close-shift(close, 1)), %d)+1e-12)" % (d, d, d) for d in windows ] names += ["SUMD%d" % d for d in windows] fields += ["mean(volume, %d)/(volume+1e-12)" % d for d in windows] names += ["VMA%d" % d for d in windows] fields += ["std(volume, %d)/(volume+1e-12)" % d for d in windows] names += ["VSTD%d" % d for d in windows] fields += [ "std(Abs(close/shift(close, 1)-1)*volume, %d)/(mean(Abs(close/shift(close, 1)-1)*volume, %d)+1e-12)" % (d, d) for d in windows ] names += ["WVMA%d" % d for d in windows] fields += [ "sum(greater(volume-shift(volume, 1), 0), %d)/(sum(Abs(volume-shift(volume, 1)), %d)+1e-12)" % (d, d) for d in windows ] names += ["VSUMP%d" % d for d in windows] fields += [ "sum(greater(shift(volume, 1)-volume, 0), %d)/(sum(Abs(volume-shift(volume, 1)), %d)+1e-12)" % (d, d) for d in windows ] names += ["VSUMN%d" % d for d in windows] fields += [ "(sum(greater(volume-shift(volume, 1), 0), %d)-sum(greater(shift(volume, 1)-volume, 0), %d))" "/(sum(Abs(volume-shift(volume, 1)), %d)+1e-12)" % (d, d, d) for d in windows ] names += ["VSUMD%d" % d for d in windows] return fields, names
吾日三省吾身
“若无闲事在心头,便是人间好时节”。
这里的闲事,就是短期内,没有确定deadline要做的事情。
当然,本身很多事情,随着时间的推移,变得可有可无,或者就是随手的事情。
但在强迫症的眼里,既然要做,那就尽快做完,然后痛快地玩。
强迫症很难与杂事共处,尤其是带着不确定的事情。
要做就尽快做,不做就不做。
确实很多时候可以带来自律,给人专业,靠谱的感觉。
但过犹不及,对于家人,身边的朋友有时候也会造成困扰。
历史文章:
Quantlab3.3代码发布:全新引擎 | 静待花开:年化13.9%,回撤小于15% | lightGBM实现排序学习
创业板指布林带突破策略:年化12.8%,回撤20%+| Alphalens+streamlit单因子分析框架(代码+数据)
去掉底层回测引擎,完全自研,增加超参数优化,因子自动挖掘,机器模型交易。
飞狐量化——AI驱动的量化。(持续给大家写代码的,交付最前沿AI量化技术和策略的星球)AI量化实验室——2024量化投资的星辰大海
关于我:CFA,北大光华金融硕士,十年量化投资实战。?/?CTO,全栈技术,AI大模型?。——应该是金融圈最懂技术的男人。