愉快的学习就从翻译开始吧_0-Time Series Forecasting with the Long Short-Term Memory Network in Python

The Long Short-Term Memory recurrent neural network has the promise of learning long sequences of observations.

LSTM递归神经网络有望在长序列的观测数据上学习(字面意思学习长序列的观测,什么鬼!)

It seems a perfect match for time series forecasting, and in fact, it may be.

它似乎可以完美的贴合时间序列预测,事实上,它也许可以(作为一种技术手段,做定性分析应该还行,想做股票之类的精确预测,等本天才成为大神来研究吧)

In this tutorial, you will discover how to develop an LSTM forecast model for a one-step univariate time series forecasting problem.

本教程中,你将了解如何开发单变量时间序列预测问题的LSTM模型

After completing this tutorial, you will know:

完成本教程后,你将知道:(还好是知道,要是你说是让我学会,就真的是天真了)

How to develop a baseline of performance for a forecast problem.

如何为预测问题制定一个性能基准(终于知道我为什么学不好英语了,鬼佬们总是把限定词放后面,如果是一个数据阵列,鬼佬们先想到的是最后一维上的数据,而我们先想到的是第0维,这就是思维上的差别)

How to design a robust test harness for one-step time series forecasting

如何为单步时间序列预测设计一个强大的测试工具

How to prepare data, develop, and evaluate an LSTM recurrent neural network for time series forecasting.

如何准备数据,开发和评估一个时间序列预测的LSTM递归神经网络。

Let’s get started.

让我们开始吧!

Tutorial Overview

教程概述

  1. Shampoo Sales Dataset/洗发水销售数据集
  2. Test Setup/建立测试
  3. Persistence Model Forecast/持久性模型预测(什么鬼)
  4. LSTM Data Preparation/LSTM 数据准备
  5. LSTM Model Development/LSTM模型开发
  6. LSTM Forecast/LSTM预测
  7. Complete LSTM Example/完整的LSTM样例
  8. Develop a Robust Result/开发一个健壮的结果(什么鬼,什么鬼)
  9. Tutorial Extensions/教程扩展

Python Environment/Python环境

This tutorial assumes you have a Python SciPy environment installed. You can use either Python 2 or 3 with this tutorial.

本教程假定你已经安装了Python SciPy(为什么不是you have installed Python SciPy,逗我这英语白痴吗?),本教程中你可以使用Python 2或3。

You must have Keras (2.0 or higher) installed with either the TensorFlow or Theano backend.

你必须已经安装了Keras(2.0或更高版本),无论是用tensorFlow或是Theano后端

The tutorial also assumes you have scikit-learn, Pandas, NumPy and Matplotlib installed.

本教程也假定你已经安装了scikit-lern,Pandas,NumPy和Matplotlib

If you need help with your environment, see this post:

如果你需要关于环境的帮助,看下面的文章。

Shampoo Sales Dataset/洗发水销售数据集

This dataset describes the monthly number of sales of shampoo over a 3-year period.

数据集描述了洗发水三年期间的每月销售数量

The units are a sales count and there are 36 observations. The original dataset is credited to Makridakis, Wheelwright, and Hyndman (1998).

单位是销售量(?推测是千甁或是kg,或是小超市卖的,要是洗发水公司每月卖200瓶左右估计半年就倒闭了,偷笑,不管了,数据而已骂人),有36个观测值,原始数据集被归入Makridakis,Wheelwright和Hyndman(1998)(大概是几个调研公司什么的吧)

You can download and learn more about the dataset here.(这个地方的数据有坑,要把最后的文字描述删除,不然运行出错)

Update: here is a direct link for the dataset, ready to use: shampoo.csv(这个地方的数据没问题)

Download the dataset to your current working directory with the name “shampoo-sales.csv“. Note that you may need to delete the footer information added by DataMarket.

下载数据集到你的当前工作目录,把名字改为‘shampoo-sales.csv’,注意你可能需要删除DataMarket添加的页脚信息(针对第一个下载地址,不删必出错)

The example below loads and creates a plot of the loaded dataset.

下面的示例是关于数据集的加载和创建图例(我特么就这么理解了)

# load and plot dataset
from pandas import read_csv
from pandas import datetime
from matplotlib import pyplot
# load dataset
def parser(x):
	return datetime.strptime('190'+x, '%Y-%m')
series = read_csv('shampoo-sales.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser)
# summarize first few rows
print(series.head())
# line plot
series.plot()
pyplot.show()

csv文件是行列排列的数据文件,可以用文本打开查看,修改

header = 0 表示第一行是列标题,数据将从下一行开始

header : int or list of ints, default ‘infer’

Row number(s) to use as the column names, and the start of the data. Default behavior is to infer the column names: if no names are passed the behavior is identical to header=0 and column names are inferred from the first line of the file, if column names are passed explicitly then the behavior is identical to header=None. Explicitly pass header=0 to be able to replace existing names. The header can be a list of integers that specify row locations for a multi-index on the columns e.g. [0,1,3]. Intervening rows that are not specified will be skipped (e.g. 2 in this example is skipped). Note that this parameter ignores commented lines and empty lines ifskip_blank_lines=True, so header=0 denotes the first line of data rather than the first line of the file.


parse_dates=[0]表示日期是第0列

parse_dates : boolean or list of ints or names or list of lists or dict, default False

  • boolean. If True -> try parsing the index.
  • list of ints or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column.
  • list of lists. e.g. If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column.
  • dict, e.g. {‘foo’ : [1, 3]} -> parse columns 1, 3 as date and call result ‘foo’

If a column or index contains an unparseable date, the entire column or index will be returned unaltered as an object data type. For non-standard datetime parsing, use pd.to_datetime after pd.read_csv

Note: A fast-path exists for iso8601-formatted dates.

index_col = 0 索引列是第0列,也就是用日期做行的标签了

index_col : int or sequence or False, default None

Column to use as the row labels of the DataFrame. If a sequence is given, a MultiIndex is used. If you have a malformed file with delimiters at the end of each line, you might consider index_col=False to force pandas to _not_ use the first column as the index (row names)

squeeze = True 就是打印的时候挤压两个列之间的空间了,试一下改为False就知道了

squeeze : boolean, default False

If the parsed data only contains one column then return a Series

date_parser = parser 调用parser来替代默认的日期解释器

date_parser : function, default None

Function to use for converting a sequence of string columns to an array of datetime instances. The default uses dateutil.parser.parser to do the conversion. Pandas will try to call date_parser in three different ways, advancing to the next if an exception occurs: 1) Pass one or more arrays (as defined by parse_dates) as arguments; 2) concatenate (row-wise) the string values from the columns defined by parse_datesinto a single array  and pass that; and 3) call date_parser once for each row using one or more strings (corresponding to the columns defined by parse_dates) as arguments.

Running the example loads the dataset as a Pandas Series and prints the first 5 rows.

运行示例,把数据集作为Pandas序列加载,并打印前五行。

Month
1901-01-01 266.0
1901-02-01 145.9
1901-03-01 183.1
1901-04-01 119.3
1901-05-01 180.3
Name: Sales, dtype: float64

A line plot of the series is then created showing a clear increasing trend.

一个线图被创建,并且显示了明显的增长趋势(哼哼,是吗?以我对股票的理解这货很快就会有一个大的回调)

Line Plot of Monthly Shampoo Sales Dataset

                                                                Line Plot of Monthly Shampoo Sales Dataset

先整这么一段吧,王者荣耀一把,再接着学习
  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值