1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177
| 基础策略01: 1,先使用svr过滤方法,把异常行处理掉 INFO:__main__:train_data len:2888 index len:61 outlier model:SVR name model mean std 0 38['V0', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6', '... DecisionTreeRegressor -0.254583 0.015420 1 38['V0', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6', '... RandomForestRegressor -0.126443 0.014312 2 38['V0', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6', '... XGBRegressor -0.107743 0.014618 3 38['V0', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6', '... SVR -0.128184 0.022065 4 38['V0', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6', '... LinearRegression -0.093924 0.014470
2,pca过滤 name model mean std 0 37['V0', 'V1', 'V2', 'V3', 'V4', 'V5', 'V8', '... DecisionTreeRegressor -0.246533 0.019663 1 37['V0', 'V1', 'V2', 'V3', 'V4', 'V5', 'V8', '... RandomForestRegressor -0.129520 0.014540 2 37['V0', 'V1', 'V2', 'V3', 'V4', 'V5', 'V8', '... XGBRegressor -0.106539 0.016954 3 37['V0', 'V1', 'V2', 'V3', 'V4', 'V5', 'V8', '... SVR -0.131678 0.021323 4 37['V0', 'V1', 'V2', 'V3', 'V4', 'V5', 'V8', '... LinearRegression -0.093725 0.015912
3,丢弃4特征 drop 4 feature: name model mean std 0 33['V0', 'V1', 'V2', 'V3', 'V4', 'V5', 'V8', '... DecisionTreeRegressor -0.240781 0.026782 1 33['V0', 'V1', 'V2', 'V3', 'V4', 'V5', 'V8', '... RandomForestRegressor -0.128367 0.011586 2 33['V0', 'V1', 'V2', 'V3', 'V4', 'V5', 'V8', '... XGBRegressor -0.105745 0.018051 3 33['V0', 'V1', 'V2', 'V3', 'V4', 'V5', 'V8', '... SVR -0.131549 0.020833 4 33['V0', 'V1', 'V2', 'V3', 'V4', 'V5', 'V8', '... LinearRegression -0.092577 0.015777
4,丢弃分布不同6特征(有个之前已经被丢弃,故实际减少5个) name model mean std 0 28['V0', 'V1', 'V2', 'V3', 'V4', 'V8', 'V12', ... DecisionTreeRegressor -0.232744 0.023978 1 28['V0', 'V1', 'V2', 'V3', 'V4', 'V8', 'V12', ... RandomForestRegressor -0.124071 0.015044 2 28['V0', 'V1', 'V2', 'V3', 'V4', 'V8', 'V12', ... XGBRegressor -0.106402 0.017246 3 28['V0', 'V1', 'V2', 'V3', 'V4', 'V8', 'V12', ... SVR -0.133891 0.027492 4 28['V0', 'V1', 'V2', 'V3', 'V4', 'V8', 'V12', ... LinearRegression -0.098457 0.016716
基础策略01_改造01 将基准策略中特征丢弃,放到一开始就执行,丢弃特征,在丢弃行 也就是按照3,4,1,2重新组装 tmp_02 name model mean std 0 33['V0', 'V1', 'V2', 'V3', 'V4', 'V5', 'V8', '... DecisionTreeRegressor -0.238354 0.012864 1 33['V0', 'V1', 'V2', 'V3', 'V4', 'V5', 'V8', '... RandomForestRegressor -0.130878 0.017220 2 33['V0', 'V1', 'V2', 'V3', 'V4', 'V5', 'V8', '... XGBRegressor -0.106126 0.016203 3 33['V0', 'V1', 'V2', 'V3', 'V4', 'V5', 'V8', '... SVR -0.133249 0.021006 4 33['V0', 'V1', 'V2', 'V3', 'V4', 'V5', 'V8', '... LinearRegression -0.093221 0.016074
基础策略01_改造01_改造01 流程初始阶段添加maxmin(0,1) 最终效果: name model mean std 0 28['V0', 'V1', 'V2', 'V3', 'V4', 'V8', 'V12', ... DecisionTreeRegressor -0.240605 0.019270 1 28['V0', 'V1', 'V2', 'V3', 'V4', 'V8', 'V12', ... RandomForestRegressor -0.126842 0.017003 2 28['V0', 'V1', 'V2', 'V3', 'V4', 'V8', 'V12', ... XGBRegressor -0.109181 0.017691 3 28['V0', 'V1', 'V2', 'V3', 'V4', 'V8', 'V12', ... SVR -0.106070 0.020124 4 28['V0', 'V1', 'V2', 'V3', 'V4', 'V8', 'V12', ... LinearRegression -0.100764 0.017924
基础策略02 各个步骤重新走下 初始 name model mean std 0 38['V0', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6', '... DecisionTreeRegressor -0.301752 0.046047 1 38['V0', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6', '... RandomForestRegressor -0.161876 0.024799 2 38['V0', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6', '... XGBRegressor -0.136554 0.025657 3 38['V0', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6', '... SVR -0.161049 0.033630 4 38['V0', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6', '... LinearRegression -0.119611 0.023264
L1 INFO:__main__:drop_columns(l1):len(3)['V31', 'V15', 'V21'] remain feature_columns:len(35)['V36', 'V7', 'V8', 'V5', 'V37', 'V24', 'V29', 'V25', 'V35', 'V28', 'V32', 'V34', 'V13', 'V20', 'V18', 'V23', 'V22', 'V30', 'V19', 'V33', 'V26', 'V16', 'V11', 'V4', 'V9', 'V14', 'V17', 'V12', 'V6', 'V3', 'V2', 'V1', 'V10', 'V0', 'V27'] name model mean std 0 35['V36', 'V7', 'V8', 'V5', 'V37', 'V24', 'V29... DecisionTreeRegressor -0.314407 0.055367 1 35['V36', 'V7', 'V8', 'V5', 'V37', 'V24', 'V29... RandomForestRegressor -0.156533 0.022986 2 35['V36', 'V7', 'V8', 'V5', 'V37', 'V24', 'V29... XGBRegressor -0.140072 0.024400 3 35['V36', 'V7', 'V8', 'V5', 'V37', 'V24', 'V29... SVR -0.159603 0.034162 4 35['V36', 'V7', 'V8', 'V5', 'V37', 'V24', 'V29... LinearRegression -0.118851 0.023679
L2 INFO:__main__:drop_columns(l2):len(3)['V32', 'V13', 'V34'] remain feature_columns:len(32)['V36', 'V8', 'V7', 'V5', 'V37', 'V24', 'V29', 'V25', 'V35', 'V28', 'V20', 'V18', 'V22', 'V30', 'V23', 'V33', 'V19', 'V26', 'V9', 'V4', 'V16', 'V11', 'V14', 'V17', 'V12', 'V3', 'V6', 'V2', 'V1', 'V10', 'V0', 'V27'] name model mean std 0 32['V36', 'V8', 'V7', 'V5', 'V37', 'V24', 'V29... DecisionTreeRegressor -0.304581 0.048247 1 32['V36', 'V8', 'V7', 'V5', 'V37', 'V24', 'V29... RandomForestRegressor -0.163034 0.026763 2 32['V36', 'V8', 'V7', 'V5', 'V37', 'V24', 'V29... XGBRegressor -0.137111 0.022508 3 32['V36', 'V8', 'V7', 'V5', 'V37', 'V24', 'V29... SVR -0.157262 0.031500 4 32['V36', 'V8', 'V7', 'V5', 'V37', 'V24', 'V29... LinearRegression -0.118579 0.023579
PCA name model mean std 0 31['V8', 'V5', 'V37', 'V24', 'V29', 'V25', 'V3... DecisionTreeRegressor -0.306925 0.035987 1 31['V8', 'V5', 'V37', 'V24', 'V29', 'V25', 'V3... RandomForestRegressor -0.155268 0.027546 2 31['V8', 'V5', 'V37', 'V24', 'V29', 'V25', 'V3... XGBRegressor -0.134823 0.024214 3 31['V8', 'V5', 'V37', 'V24', 'V29', 'V25', 'V3... SVR -0.161478 0.030688 4 31['V8', 'V5', 'V37', 'V24', 'V29', 'V25', 'V3... LinearRegression -0.117744 0.025338
SVR name model mean std 0 31['V8', 'V5', 'V37', 'V24', 'V29', 'V25', 'V3... DecisionTreeRegressor -0.246123 0.012659 1 31['V8', 'V5', 'V37', 'V24', 'V29', 'V25', 'V3... RandomForestRegressor -0.126124 0.016573 2 31['V8', 'V5', 'V37', 'V24', 'V29', 'V25', 'V3... XGBRegressor -0.105483 0.016860 3 31['V8', 'V5', 'V37', 'V24', 'V29', 'V25', 'V3... SVR -0.130822 0.020550 4 31['V8', 'V5', 'V37', 'V24', 'V29', 'V25', 'V3... LinearRegression -0.093589 0.016626
丢弃6不同分布的特征 name model mean std 0 25['V8', 'V37', 'V24', 'V29', 'V25', 'V35', 'V... DecisionTreeRegressor -0.237193 0.008986 1 25['V8', 'V37', 'V24', 'V29', 'V25', 'V35', 'V... RandomForestRegressor -0.123084 0.015617 2 25['V8', 'V37', 'V24', 'V29', 'V25', 'V35', 'V... XGBRegressor -0.105656 0.016326 3 25['V8', 'V37', 'V24', 'V29', 'V25', 'V35', 'V... SVR -0.137090 0.029198 4 25['V8', 'V37', 'V24', 'V29', 'V25', 'V35', 'V... LinearRegression -0.099489 0.017874
基础策略02_改造01
初始之前添加对所有feature的maxmin归一化(0,1) 归一化后效果: name model mean std 0 38['V0', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6', '... DecisionTreeRegressor -0.293681 0.055959 1 38['V0', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6', '... RandomForestRegressor -0.153561 0.025228 2 38['V0', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6', '... XGBRegressor -0.136521 0.025700 3 38['V0', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6', '... SVR -0.129900 0.021798 4 38['V0', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6', '... LinearRegression -0.119611 0.023264
svr效果明显提高 L1 drop_columns(l1):len(8)['V20', 'V16', 'V13', 'V15', 'V31', 'V34', 'V32', 'V25']
name model mean std 0 30['V36', 'V7', 'V8', 'V37', 'V29', 'V24', 'V5... DecisionTreeRegressor -0.311610 0.059784 1 30['V36', 'V7', 'V8', 'V37', 'V29', 'V24', 'V5... RandomForestRegressor -0.160163 0.023681 2 30['V36', 'V7', 'V8', 'V37', 'V29', 'V24', 'V5... XGBRegressor -0.135760 0.025817 3 30['V36', 'V7', 'V8', 'V37', 'V29', 'V24', 'V5... SVR -0.125948 0.021718 4 30['V36', 'V7', 'V8', 'V37', 'V29', 'V24', 'V5... LinearRegression -0.118137 0.022044
L2无变化 rop_columns(l2):len(0)[]
IF01_分支01:由于V16不存在,pca关闭
IF01_分之02:在L1特殊处理保留V16,此处继续pca name model mean std 0 29['V8', 'V37', 'V5', 'V29', 'V24', 'V35', 'V2... DecisionTreeRegressor -0.306813 0.038430 1 29['V8', 'V37', 'V5', 'V29', 'V24', 'V35', 'V2... RandomForestRegressor -0.158987 0.026039 2 29['V8', 'V37', 'V5', 'V29', 'V24', 'V35', 'V2... XGBRegressor -0.133585 0.025091 3 29['V8', 'V37', 'V5', 'V29', 'V24', 'V35', 'V2... SVR -0.126072 0.022261 4 29['V8', 'V37', 'V5', 'V29', 'V24', 'V35', 'V2... LinearRegression -0.116638 0.024050
IF01_分之01:svr outlier model:SVR name model mean std 0 30['V36', 'V8', 'V7', 'V37', 'V5', 'V29', 'V24... DecisionTreeRegressor -0.250153 0.021936 1 30['V36', 'V8', 'V7', 'V37', 'V5', 'V29', 'V24... RandomForestRegressor -0.131978 0.016049 2 30['V36', 'V8', 'V7', 'V37', 'V5', 'V29', 'V24... XGBRegressor -0.112816 0.017442 3 30['V36', 'V8', 'V7', 'V37', 'V5', 'V29', 'V24... SVR -0.104641 0.016567 4 30['V36', 'V8', 'V7', 'V37', 'V5', 'V29', 'V24... LinearRegression -0.097441 0.013544
IF01_分之02: outlier model:SVR name model mean std 0 29['V8', 'V37', 'V5', 'V29', 'V24', 'V35', 'V2... DecisionTreeRegressor -0.241877 0.016470 1 29['V8', 'V37', 'V5', 'V29', 'V24', 'V35', 'V2... RandomForestRegressor -0.131061 0.015154 2 29['V8', 'V37', 'V5', 'V29', 'V24', 'V35', 'V2... XGBRegressor -0.111531 0.016127 3 29['V8', 'V37', 'V5', 'V29', 'V24', 'V35', 'V2... SVR -0.104441 0.017566 4 29['V8', 'V37', 'V5', 'V29', 'V24', 'V35', 'V2... LinearRegression -0.095736 0.016731
IF01_分之02: 丢弃分布不一致的特征列 name model mean std 0 23['V8', 'V37', 'V29', 'V24', 'V35', 'V18', 'V... DecisionTreeRegressor -0.243059 0.015813 1 23['V8', 'V37', 'V29', 'V24', 'V35', 'V18', 'V... RandomForestRegressor -0.128891 0.017159 2 23['V8', 'V37', 'V29', 'V24', 'V35', 'V18', 'V... XGBRegressor -0.108672 0.016404 3 23['V8', 'V37', 'V29', 'V24', 'V35', 'V18', 'V... SVR -0.104399 0.019012 4 23['V8', 'V37', 'V29', 'V24', 'V35', 'V18', 'V... LinearRegression -0.099833 0.018218
|