项目_天池电力预测

题目:[智造扬中]大航杯电力AI大赛
地址:https://tianchi.aliyun.com/competition/entrance/231602/information
这个项目做了一半就放弃了,主要是数据偏大,本机使用单sql+聚集函数都需10分钟以上.
使用阿里云环境,也有问题,一方面由于sql是阿里定制sql(类似spark的sql),使用挺大的不方便(不支持update,只能select),另一方面,不支持pandas,特征工程时比较头大.
所以后面就放弃了.
这个问题本质上属于时间序列

数据准备

数据导入mysql表power
报名后可将数据下载本机,导入到mysql数据库中,相关sql参考附件的sql脚本(参考附录sql01)

数据预处理和可视化

数据观察

用电量数据点分布情况

按照日期查看数据分布情况,可见整体趋势向上,而且分层角度看延续趋势较明显。

总记录:885468
<5 数据 155441
<10 数据165834
<100数据 295132
频率图

可见数据0-1区间有异常(数据中没有0,所以0-1区间就是1)

异常数据的分布频率

取值为1的分布,稳定下降,看作企业新开张状态可能更合适。

取值为2的分布

按日期汇总总电量


整体平稳,但201502月中旬和201602中旬有下降,应是春节放假导致。同理可猜测周末和国家法定假日一样会受到影响。

日汇总合法企业数量(合法企业用电量大于等于3)

合法企业日均用电量

月角度的每日平均

月初相对较低,月末较高,可能由于地区经济整体向上(TODO),所以月末相对月初耗电量更高。

注意到前面的锯齿消失了?为何?todo

周角度每日平均

可见周一和周末较其他时间确实略有下降

月角度日总和


去掉第31日

周角度日总和


随机抽取3组公司数据,发现曲线变动过大,各种形态都有。考虑到最终预测的是和,所以暂且将每日的随机扰动看作白噪声,这样多个企业加和后白噪声会有很多抵消。

企业规模角度10份,每份方差方差角度5份,然后每份随机抽取5图,观察有无特征

算法选择和集成

模型评估

参考SQL

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
01.	创表,建索引,导数据,去首行非法数据

02.
1,create table power(record_date date,user_id varchar(8),power_consumption int(16));
2,alter table power add index power_i_rdui(record_date,user_id);
3,load data infile 'd:/TDDOWNLOAD/Tianchi_power/Tianchi_power.csv'
into table power character set utf8
fields terminated by ',' optionally enclosed by '"' escaped by '"'
lines terminated by '\r\n';
4,delete from power where record_date='0000-00-00';,

03. 新增power_sum_rd汇总表,按月,周导出到文件
1,Create table power_sum_rd AS SELECT t.record_date,
sum(t.power_consumption) AS power_consumption,
dayofmonth(t.record_date)as day_of_month,
dayofweek(t.record_date)as day_of_week
FROM power t
GROUP BY t.record_date;

统计合法企业数量和时间关系
select t.record_date,count(*) from power t where t.power_consumption >2 group by t.record_date into outfile'd:/power_count_com2.csv' fields terminated by "," lines terminated by "\r\n";

统计合法企业日均用电量
select t.record_date,avg(t.power_consumption) from power t where t.power_consumption >2 group by t.record_date into outfile'd:/power_avg_com2.csv' fields terminated by "," lines terminated by "\r\n";

月/周视角每日平均值
select t.day_of_week,avg(t.power_consumption) as power_consumption from power_sum_rd t group by t.day_of_week into outfile'd:/power_avg_rd_week.csv' fields terminated by "," lines terminated by "\r\n";
select t.day_of_month,avg(t.power_consumption) as power_consumption from power_sum_rd t group by t.day_of_month into outfile'd:/power_avg_rd_month.csv' fields terminated by "," lines terminated by "\r\n";
月/周视角每日合法企业平均值
select t.day_of_week,avg(t.power_consumption) as power_consumption from power_sum_rd t where t.power_consumption >2 group by t.day_of_week into outfile'd:/power_avg_rd_week_com2.csv' fields terminated by "," lines terminated by "\r\n";
select t.day_of_month,avg(t.power_consumption) as power_consumption from power_sum_rd t where t.power_consumption >2 group by t.day_of_month into outfile'd:/power_avg_rd_month_com2.csv' fields terminated by "," lines terminated by "\r\n";

总电角度
4, select t.day_of_month,sum(t.power_consumption) as power_consumption from power_sum_rd t group by t.day_of_month into outfile'd:/power_sum_month.csv'fields terminated by "," lines terminated by "\r\n";
5, select t.day_of_week,sum(t.power_consumption) as power_consumption from power_sum_rd t group by t.day_of_week into outfile'd:/power_sum_week.csv'fields terminated by "," lines terminated by "\r\n";

总企业数角度
6,select t.day_of_month,count(*) as count from power_sum_rd t where t.power_consumption>2 group by t.day_of_month into outfile'd:/power_count_month.csv'fields terminated by "," lines terminated by "\r\n";
7.select t.day_of_week,count(*) as count from power_sum_rd t where t.power_consumption>2 group by t.day_of_week into outfile'd:/power_count_week.csv'fields terminated by "," lines terminated by "\r\n";

04. 统计异常数据随时间分布情况
1. select t.record_date,count(*) as count from power t where t.power_consumption<=1 group by t.record_date into outfile'd:/power_1.csv' fields terminated by "," lines terminated by "\r\n";
2. select t.record_date,count(*) as count from power t where t.power_consumption=2 group by t.record_date into outfile'd:/power_2.csv' fields terminated by "," lines terminated by "\r\n";

"create table {0}_train AS
SELECT tf.user_id as user_id,\
tf.item_id as item_id,\
tu.bt0101 as tubt0101,\
tu.bt0102 as tubt0102,\
tu.bt0103 as tubt0103,\
tu.bt0104 as tubt0104,\
tu.bt010104 as tubt010104,\
tu.bt010204 as tubt010204,\
tu.bt010304 as tubt010304,\
tu.bt0201 as tubt0201,\
tu.bt0202 as tubt0202,\
tu.bt0203 as tubt0203,\
tu.bt0204 as tubt0204,\
tu.bt020104 as tubt020104,\
tu.bt020204 as tubt020204,\
tu.bt020304 as tubt020304,\
tu.bt0301 as tubt0301,\
tu.bt0302 as tubt0302 ,\
tu.bt0303 as tubt0303,\
tu.bt0304 as tubt0304,\
tu.bt030104 as tubt030104,\
tu.bt030204 as tubt030204,\
tu.bt030304 as tubt030304,\
tu.bt0401 as tubt0401,\
tu.bt0402 as tubt0402,\
tu.bt0403 as tubt0403,\
tu.bt0404 as tubt0404,\
tu.bt040104 as tubt040104,\
tu.bt040204 as tubt040204,\
tu.bt040304 as tubt040304,\
tu.bt0501 as tubt0501,\
tu.bt0502 as tubt0502,\
tu.bt0503 as tubt0503,\
tu.bt0504 as tubt0504,\
tu.bt050104 as tubt050104,\
tu.bt050204 as tubt050204,\
tu.bt050304 as tubt050304,\
tu.bt0601 as tubt0601,\
tu.bt0602 as tubt0602,\
tu.bt0603 as tubt0602,\
tu.bt0604 as tubt0604,\
tu.bt060104 as tubt060104,\
tu.bt060204 as tubt060204,\
tu.bt060304 as tubt060304,\
tu.bt0701 as tubt0701 ,\
tu.bt0702 as tubt0702,\
tu.bt0703 as tubt0703,\
tu.bt0704 as tubt0704,\
tu.bt070104 as tubt070104,\
tu.bt070204 as tubt070204,\
tu.bt070304 as tubt070304\
ti.user_id as user_id,\
ti.bt0101 as tibt0101,\
ti.bt0102 as tibt0102,\
ti.bt0103 as tibt0103,\
ti.bt0104 as tibt0104,\
ti.bt010104 as tibt010104,\
ti.bt010204 as tibt010204,\
ti.bt010304 as tibt010304,\
ti.bt0201 as tibt0201,\
ti.bt0202 as tibt0202,\
ti.bt0203 as tibt0203,\
ti.bt0204 as tibt0204,\
ti.bt020104 as tibt020104,\
ti.bt020204 as tibt020204,\
ti.bt020304 as tibt020304,\
ti.bt0301 as tibt0301,\
ti.bt0302 as tibt0302 ,\
ti.bt0303 as tibt0303,\
ti.bt0304 as tibt0304,\
ti.bt030104 as tibt030104,\
ti.bt030204 as tibt030204,\
ti.bt030304 as tibt030304,\
ti.bt0401 as tibt0401,\
ti.bt0402 as tibt0402,\
ti.bt0403 as tibt0403,\
ti.bt0404 as tibt0404,\
ti.bt040104 as tibt040104,\
ti.bt040204 as tibt040204,\
ti.bt040304 as tibt040304,\
ti.bt0501 as tibt0501,\
ti.bt0502 as tibt0502,\
ti.bt0503 as tibt0503,\
ti.bt0504 as tibt0504,\
ti.bt050104 as tibt050104,\
ti.bt050204 as tibt050204,\
ti.bt050304 as tibt050304,\
ti.bt0601 as tibt0601,\
ti.bt0602 as tibt0602,\
ti.bt0603 as tibt0602,\
ti.bt0604 as tibt0604,\
ti.bt060104 as tibt060104,\
ti.bt060204 as tibt060204,\
ti.bt060304 as tibt060304,\
ti.bt0701 as tibt0701 ,\
ti.bt0702 as tibt0702,\
ti.bt0703 as tibt0703,\
ti.bt0704 as tibt0704,\
ti.bt070104 as tibt070104,\
ti.bt070204 as tibt070204,\
ti.bt070304 as tibt070304\
tc.user_id as user_id,\
tc.bt0101 as tcbt0101,\
tc.bt0102 as tcbt0102,\
tc.bt0103 as tcbt0103,\
tc.bt0104 as tcbt0104,\
tc.bt010104 as tcbt010104,\
tc.bt010204 as tcbt010204,\
tc.bt010304 as tcbt010304,\
tc.bt0201 as tcbt0201,\
tc.bt0202 as tcbt0202,\
tc.bt0203 as tcbt0203,\
tc.bt0204 as tcbt0204,\
tc.bt020104 as tcbt020104,\
tc.bt020204 as tcbt020204,\
tc.bt020304 as tcbt020304,\
tc.bt0301 as tcbt0301,\
tc.bt0302 as tcbt0302 ,\
tc.bt0303 as tcbt0303,\
tc.bt0304 as tcbt0304,\
tc.bt030104 as tcbt030104,\
tc.bt030204 as tcbt030204,\
tc.bt030304 as tcbt030304,\
tc.bt0401 as tcbt0401,\
tc.bt0402 as tcbt0402,\
tc.bt0403 as tcbt0403,\
tc.bt0404 as tcbt0404,\
tc.bt040104 as tcbt040104,\
tc.bt040204 as tcbt040204,\
tc.bt040304 as tcbt040304,\
tc.bt0501 as tcbt0501,\
tc.bt0502 as tcbt0502,\
tc.bt0503 as tcbt0503,\
tc.bt0504 as tcbt0504,\
tc.bt050104 as tcbt050104,\
tc.bt050204 as tcbt050204,\
tc.bt050304 as tcbt050304,\
tc.bt0601 as tcbt0601,\
tc.bt0602 as tcbt0602,\
tc.bt0603 as tcbt0602,\
tc.bt0604 as tcbt0604,\
tc.bt060104 as tcbt060104,\
tc.bt060204 as tcbt060204,\
tc.bt060304 as tcbt060304,\
tc.bt0701 as tcbt0701 ,\
tc.bt0702 as tcbt0702,\
tc.bt0703 as tcbt0703,\
tc.bt0704 as tcbt0704,\
tc.bt070104 as tcbt070104,\
tc.bt070204 as tcbt070204,\
tc.bt070304 as tcbt070304\
tui.user_id as user_id,\
tui.bt0101 as tuibt0101,\
tui.bt0102 as tuibt0102,\
tui.bt0103 as tuibt0103,\
tui.bt0104 as tuibt0104,\
tui.bt010104 as tuibt010104,\
tui.bt010204 as tuibt010204,\
tui.bt010304 as tuibt010304,\
tui.bt0201 as tuibt0201,\
tui.bt0202 as tuibt0202,\
tui.bt0203 as tuibt0203,\
tui.bt0204 as tuibt0204,\
tui.bt020104 as tuibt020104,\
tui.bt020204 as tuibt020204,\
tui.bt020304 as tuibt020304,\
tui.bt0301 as tuibt0301,\
tui.bt0302 as tuibt0302 ,\
tui.bt0303 as tuibt0303,\
tui.bt0304 as tuibt0304,\
tui.bt030104 as tuibt030104,\
tui.bt030204 as tuibt030204,\
tui.bt030304 as tuibt030304,\
tui.bt0401 as tuibt0401,\
tui.bt0402 as tuibt0402,\
tui.bt0403 as tuibt0403,\
tui.bt0404 as tuibt0404,\
tui.bt040104 as tuibt040104,\
tui.bt040204 as tuibt040204,\
tui.bt040304 as tuibt040304,\
tui.bt0501 as tuibt0501,\
tui.bt0502 as tuibt0502,\
tui.bt0503 as tuibt0503,\
tui.bt0504 as tuibt0504,\
tui.bt050104 as tuibt050104,\
tui.bt050204 as tuibt050204,\
tui.bt050304 as tuibt050304,\
tui.bt0601 as tuibt0601,\
tui.bt0602 as tuibt0602,\
tui.bt0603 as tuibt0602,\
tui.bt0604 as tuibt0604,\
tui.bt060104 as tuibt060104,\
tui.bt060204 as tuibt060204,\
tui.bt060304 as tuibt060304,\
tui.bt0701 as tuibt0701 ,\
tui.bt0702 as tuibt0702,\
tui.bt0703 as tuibt0703,\
tui.bt0704 as tuibt0704,\
tui.bt070104 as tuibt070104,\
tui.bt070204 as tuibt070204,\
tui.bt070304 as tuibt070304\
tuc.user_id as user_id,\
tuc.bt0101 as tucbt0101,\
tuc.bt0102 as tucbt0102,\
tuc.bt0103 as tucbt0103,\
tuc.bt0104 as tucbt0104,\
tuc.bt010104 as tucbt010104,\
tuc.bt010204 as tucbt010204,\
tuc.bt010304 as tucbt010304,\
tuc.bt0201 as tucbt0201,\
tuc.bt0202 as tucbt0202,\
tuc.bt0203 as tucbt0203,\
tuc.bt0204 as tucbt0204,\
tuc.bt020104 as tucbt020104,\
tuc.bt020204 as tucbt020204,\
tuc.bt020304 as tucbt020304,\
tuc.bt0301 as tucbt0301,\
tuc.bt0302 as tucbt0302 ,\
tuc.bt0303 as tucbt0303,\
tuc.bt0304 as tucbt0304,\
tuc.bt030104 as tucbt030104,\
tuc.bt030204 as tucbt030204,\
tuc.bt030304 as tucbt030304,\
tuc.bt0401 as tucbt0401,\
tuc.bt0402 as tucbt0402,\
tuc.bt0403 as tucbt0403,\
tuc.bt0404 as tucbt0404,\
tuc.bt040104 as tucbt040104,\
tuc.bt040204 as tucbt040204,\
tuc.bt040304 as tucbt040304,\
tuc.bt0501 as tucbt0501,\
tuc.bt0502 as tucbt0502,\
tuc.bt0503 as tucbt0503,\
tuc.bt0504 as tucbt0504,\
tuc.bt050104 as tucbt050104,\
tuc.bt050204 as tucbt050204,\
tuc.bt050304 as tucbt050304,\
tuc.bt0601 as tucbt0601,\
tuc.bt0602 as tucbt0602,\
tuc.bt0603 as tucbt0602,\
tuc.bt0604 as tucbt0604,\
tuc.bt060104 as tucbt060104,\
tuc.bt060204 as tucbt060204,\
tuc.bt060304 as tucbt060304,\
tuc.bt0701 as tucbt0701 ,\
tuc.bt0702 as tucbt0702,\
tuc.bt0703 as tucbt0703,\
tuc.bt0704 as tucbt0704,\
tuc.bt070104 as tucbt070104,\
tuc.bt070204 as tucbt070204,\
tuc.bt070304 as tucbt070304\
tf.label\
FROM {0}_feature tf left outer\
JOIN {0}_ut tu\
ON tf.user_id = tu.user_id\
AND tf.time_box=tu.time_box left outer\
JOIN {0}_it ti\
ON tf.item_id = ti.item_id\
AND tf.time_box=ti.time_box left outer\
JOIN {0}_ct tc\
ON tf.item_catagory = tc.item_catagory\
AND tf.time_box= tc.time_box left outer\
JOIN {0}_uit tui\
ON tf.user_id = tui.user_id\
AND tf.item_id = tui.item_id\
AND tf.time_box=tui.time_box left outer\
JOIN {0}_uct tuc\
ON tf.user_id = tuc.user_id\
AND tf.item_catagory = tuc.item_catagory\
AND tf.time_box=tuc.time_box "
Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×