算法_深度LSTM笔记

本文适合有一定基础同学的复习使用,不适合小白入门,入门参考本文参考文献第一篇

结构_静态综合图

港股打新_机器学习预测港股打新收益02

日期名称预测01(原算法)预测02(新算法)实际方向对错
20200101麗年國際控股32.07189514
20200101CTR Holdings70.13069214
20200101尚晉(國際)控股4.30706134.09
20200101文業集團控股51.17572814
20200101曠世控股14.98690243.75
20200101三和精化集團12.56793112
20200101華和控股集團2.9527830.8
20200101新石文化投資16.344639121.21
20200101北控城市資源集團-16.629482-8.87
20200101九毛九國際控股-0.08765557.73
20200101雋思集團控股-7.2341830.67
20200101Infinity Logistics and Transport Ventures-10.5844250.67
20200101上海建橋教育集團1.264491.03
20200101佳辰控股集團-9.19769115.09
20200101艾德韋宣集團控股9.16353426.67
20200101滙景控股0.0302395.29
20200101驢跡科技控股-6.050739-2

港股打新_特征分析可视化

绿鞋(无差别)

港股打新_机器学习预测港股打新收益

 前阵子接触到了港股打新,据说是个赚零花钱的好方法。

1
2
扫盲介绍可参考雪球文章:https://xueqiu.com/3831498220/132942898  
各位大神总结的攻略:http://www.360doc.com/content/19/0625/15/64736033_844765859.shtml。https://zhuanlan.zhihu.com/p/88357819

项目_机器学习竞赛阅读

人社大赛算法赛题解题思路分享+季军+三马一曹团队

https://tianchi.aliyun.com/forum/postDetail?spm=5176.12282027.0.0.485a311fu9lqGt&postId=2981
就诊频次
就诊不同医院个数
各费用的汇总统计量,包括最大值,最小值,均值
选出出现次数最多的24种药品,计算每个社保用户每个药品的取药金额的总和。描述社保人员购买药品情况。
医院欺诈率对医院ID进行排序处理,然后对所有的医院进行分箱,设计医院欺诈等级特征,用来描述社保人员看病医院偏好。
我们决定根据模型输出的特征重要性得分进行特征选择,通过选择不同个数的特征数进行对比实验,最终选取了top150特征,对样本进行重新训练。

读书_机器学习实践

读书《机器学习实战》

1,knn,

读书_机器学习

教材《机器学习》

第二章:findsSG

算法_ARIMA时间序列分析总结

简介

ARIMA算法流程步骤(算法数学推导自行查阅相关论文),本文只讲工程技术和方法。

算法_PValue个人理解

最近见到p-value的频率有点高,之前也看到很多次了,基本当时懂了过几天就忘记了,整理下。

P值定义[from:百度百科]

算法_机器学习十大算法优缺点

机器学习十大算法
数据挖掘十大算法总结–核心思想,算法优缺点,应用领域,数据挖掘优缺点
分类算法:C4.5,CART,Adaboost,NaiveBayes,KNN,SVM
聚类算法:KMeans
统计学习:EM
关联分析:Apriori
链接挖掘:PageRank

 其中,EM算法虽可以用来聚类,但是由于EM算法进行迭代速度很慢,比kMeans性能差很多,并且KMeans算法 聚类效果没有比EM差多少,所以一般用kMeans进行聚类,而不是EM。EM算法的主要作用是用来进行参数估计,故将其分入统计学习类。SVM算法在回归分析,统计方面也有不小的贡献,并且在分类算法中也占有一定地位,思考了下还是将SVM分入分类算法中。

预处理_pandas数据预处理

数据预处理常用python方法

01 从数据中读取数据

项目_kaggleTitanic

练手作Titanic

数据概览

项目_天池工业蒸汽量预测特征工程

01_columns_info

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
print (columns_info)
type count count_rate unique_count unique_set mean std min 25% 50% 75% max
target float 2888 1.000 1916 [(0.669, 7), (0.8170000000000001, 7), (0.451, ... 0.126353 0.983966 -3.044 -0.35025 0.3130 0.79325 2.538
V0 float 2888 1.000 1801 [(0.875, 7), (0.425, 7), (0.65, 6), (0.4679999... 0.123048 0.928031 -4.335 -0.29700 0.3590 0.72600 2.121
V1 float 2888 1.000 1759 [(0.38, 8), (0.402, 7), (0.449, 7), (0.47, 7),... 0.056068 0.941515 -5.122 -0.22625 0.2725 0.59900 1.918
V2 float 2888 1.000 1948 [(0.066, 6), (0.777, 6), (0.887, 5), (0.858, 5... 0.289720 0.911236 -3.420 -0.31300 0.3860 0.91825 2.828
V3 float 2888 1.000 1820 [(-0.321, 9), (-0.196, 8), (-0.218, 7), (-0.07... -0.067790 0.970298 -3.956 -0.65225 -0.0445 0.62400 2.457
V4 float 2888 1.000 1824 [(-0.049, 7), (0.125, 6), (0.01300000000000000... 0.012921 0.888377 -4.742 -0.38500 0.1100 0.55025 2.689
V5 float 2888 1.000 1452 [(-0.452, 8), (-0.475, 7), (-0.196, 7), (-0.22... -0.558565 0.517957 -2.182 -0.85300 -0.4660 -0.15400 0.489
V6 float 2888 1.000 1834 [(0.517, 6), (0.861, 6), (0.506, 6), (-0.02799... 0.182892 0.918054 -4.576 -0.31000 0.3880 0.83125 1.895
V7 float 2888 1.000 1353 [(0.474, 9), (0.75, 8), (0.64, 8), (1.077, 8),... 0.116155 0.955116 -5.048 -0.29500 0.3440 0.78225 1.918
V8 float 2888 1.000 1807 [(0.857, 9), (0.75, 7), (0.595, 7), (0.348, 7)... 0.177856 0.895444 -4.692 -0.15900 0.3620 0.72600 2.245
V9 float 2888 1.000 72 [(0.042, 1219), (-0.39, 355), (0.904, 308), (0... -0.169452 0.953813 -12.891 -0.39000 0.0420 0.04200 1.335
V10 float 2888 1.000 1888 [(0.26899999999999996, 7), (-2.583, 7), (-2.58... 0.034319 0.968272 -2.584 -0.42050 0.1570 0.61925 4.830
V11 float 2888 1.000 1766 [(0.282, 23), (0.28800000000000003, 7), (0.408... -0.364465 0.858504 -3.160 -0.80325 -0.1120 0.24700 1.455
V12 float 2888 1.000 1841 [(-0.151, 7), (0.374, 6), (-0.125, 5), (0.718,... 0.023177 0.894092 -5.165 -0.41900 0.1230 0.61600 2.657
V13 float 2888 1.000 1935 [(0.20199999999999999, 6), (-0.535, 6), (0.839... 0.195738 0.922757 -3.675 -0.39800 0.2895 0.86425 2.475
V14 float 2888 1.000 1935 [(0.204, 6), (-0.484, 6), (-0.264, 6), (-0.261... 0.016081 1.015585 -2.455 -0.66800 -0.1610 0.82975 2.558
V15 float 2888 1.000 1781 [(-0.847, 7), (0.055, 6), (-0.662, 6), (-0.223... 0.096146 1.033048 -2.903 -0.66225 -0.0005 0.73000 4.314
V16 float 2888 1.000 1773 [(0.35100000000000003, 8), (0.466, 6), (0.081,... 0.113505 0.983128 -5.981 -0.30000 0.3060 0.77425 2.861
V17 float 2888 1.000 99 [(0.165, 721), (-0.366, 473), (0.43, 456), (-0... -0.043458 0.655857 -2.224 -0.36600 0.1650 0.43000 2.023
V18 float 2888 1.000 940 [(0.069, 30), (0.07400000000000001, 29), (0.07... 0.055034 0.953466 -3.582 -0.36750 0.0820 0.51325 4.441
V19 float 2888 1.000 2066 [(-1.361, 21), (0.20199999999999999, 5), (0.67... -0.114884 1.108859 -3.704 -0.98750 -0.0005 0.73725 3.431
V20 float 2888 1.000 1676 [(0.414, 8), (-0.319, 8), (0.147, 7), (0.71900... -0.186226 0.788511 -3.402 -0.67550 -0.1565 0.30400 3.525
V21 float 2888 1.000 1815 [(-0.135, 6), (0.285, 6), (0.237, 6), (0.066, ... -0.056556 0.781471 -2.643 -0.51700 -0.0565 0.43150 2.259
V22 float 2888 1.000 73 [(0.133, 456), (0.314, 336), (-0.063, 333), (0... 0.302893 0.639186 -1.375 -0.06300 0.2165 0.87200 2.018
V23 float 2888 1.000 760 [(0.342, 52), (0.34299999999999997, 50), (0.34... 0.155978 0.978757 -5.542 0.09725 0.3380 0.36825 1.906
V24 float 2888 1.000 682 [(-1.1909999999999998, 120), (-1.3219999999999... -0.021813 1.033403 -1.344 -1.19100 0.0950 0.93125 2.423
V25 float 2888 1.000 1781 [(-0.006, 9), (-0.14, 7), (-0.153, 7), (0.221,... -0.051679 0.915957 -3.808 -0.55725 -0.0760 0.35600 7.284
V26 float 2888 1.000 1898 [(0.474, 6), (0.05, 6), (0.127, 6), (-0.265, 5... 0.072092 0.889771 -5.131 -0.45200 0.0750 0.64425 2.980
V27 float 2888 1.000 983 [(0.312, 14), (0.337, 14), (0.439, 13), (0.275... 0.272407 0.270374 -1.164 0.15775 0.3250 0.44200 0.925
V28 float 2888 1.000 576 [(-0.45799999999999996, 126), (-0.456, 124), (... 0.137712 0.929899 -2.435 -0.45500 -0.4470 0.73000 4.671
V29 float 2888 1.000 1851 [(-0.20600000000000002, 7), (-0.654, 6), (0.26... 0.097648 1.061200 -2.912 -0.66400 -0.0230 0.74525 4.580
V30 float 2888 1.000 1636 [(-4.497, 9), (-0.022000000000000002, 7), (-0.... 0.055477 0.901934 -4.507 -0.28300 0.0535 0.48800 2.689
V31 float 2888 1.000 1703 [(0.498, 7), (0.478, 7), (0.114, 7), (0.513, 6... 0.127791 0.873028 -5.859 -0.17025 0.2995 0.63500 2.013
V32 float 2888 1.000 1748 [(-4.0489999999999995, 9), (-0.065, 8), (0.734... 0.020806 0.902584 -4.053 -0.40725 0.0390 0.55700 2.395
V33 float 2888 1.000 429 [(-0.04, 351), (0.419, 278), (-0.843, 205), (0... 0.007801 1.006995 -4.627 -0.49900 -0.0400 0.46200 5.465
V34 float 2888 1.000 419 [(0.16, 450), (0.273, 373), (-0.29, 324), (0.0... 0.006715 1.003291 -4.789 -0.29000 0.1600 0.27300 5.110
V35 float 2888 1.000 224 [(0.364, 1157), (0.8390000000000001, 316), (-0... 0.197764 0.985675 -5.695 -0.20250 0.3640 0.60200 2.324
V36 float 2888 1.000 1847 [(-2.608, 17), (-2.5639999999999996, 13), (0.4... 0.030658 0.970812 -2.608 -0.41300 0.1370 0.64425 5.238
V37 float 2888 1.000 2009 [(-0.677, 6), (0.18100000000000002, 6), (-0.81... -0.130330 1.017196 -3.630 -0.79825 -0.1855 0.49525 3.000

项目_天池工业蒸汽量预测

特征工程

1,特征和目标的相关性观察

项目_天池电力预测

题目:[智造扬中]大航杯电力AI大赛
地址:https://tianchi.aliyun.com/competition/entrance/231602/information
这个项目做了一半就放弃了,主要是数据偏大,本机使用单sql+聚集函数都需10分钟以上.
使用阿里云环境,也有问题,一方面由于sql是阿里定制sql(类似spark的sql),使用挺大的不方便(不支持update,只能select),另一方面,不支持pandas,特征工程时比较头大.
所以后面就放弃了.
这个问题本质上属于时间序列

数据准备

项目_天池移动推荐

问题

问题描述

项目_kaggle房价预测01

ipynb 转化(对应notebook文件(图片路径需要重新生成):python_myproject/kaggle_housePrice/house_price01.ipynb)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
%run MyTools.py
import re as re
from sklearn.grid_search import GridSearchCV
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
import seaborn as sns
from sklearn.decomposition import PCA
from sklearn.ensemble import ExtraTreesClassifier
import re as re
from sklearn.grid_search import GridSearchCV
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
import numpy as np
import seaborn as sns
from scipy import stats
from scipy.stats import skew
from scipy.stats import norm
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.manifold import TSNE
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
%matplotlib inline

项目_天池津南数字制造

标题:津南数字制造算法挑战赛【赛场一】
地址:https://tianchi.aliyun.com/competition/entrance/231695/information

特征01_特征观察

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×