数据分析股票-茅台

最新推荐文章于 2024-03-01 17:03:24 发布

lxh5431

最新推荐文章于 2024-03-01 17:03:24 发布

阅读量892

点赞数

分类专栏：爬虫知识梳理 python python与自然语言

原文链接：https://www.kesci.com/mw/project/5ff56ecf840381003b053735

版权

python 同时被 3 个专栏收录

3 篇文章

订阅专栏

python与自然语言

3 篇文章

订阅专栏

爬虫知识梳理

2 篇文章

订阅专栏

查看当前挂载的数据集目录

!ls /home/kesci/input/
maotai4154

查看个人持久化工作区文件

!ls /home/kesci/work/
input lost+found test.xlsx visualize

查看当前kernel下的package

!pip list --format=columns
Package Version

absl-py 0.7.1
alabaster 0.7.12
allennlp 0.8.3
altair 1.2.0
ansiwrap 0.8.4
appdirs 1.4.4
arrow 0.12.1
asn1crypto 0.24.0
astor 0.8.0
async-generator 1.10
atomicwrites 1.3.0
attrs 19.3.0
audioread 2.1.8
auto-sklearn 0.5.2
aws-sam-translator 1.12.0
aws-xray-sdk 2.4.2
awscli 1.18.106
Babel 2.7.0
backcall 0.1.0
beautifulsoup4 4.7.1
bert-tensorflow 1.0.1
black 19.10b0
blaze 0.10.1
bleach 3.0.2
blinker 1.4
blis 0.2.4
bokeh 1.0.2
boto 2.49.0
boto3 1.14.29
botocore 1.17.29
bs4 0.0.1
bunch 1.0.1
catboost 0.14.2
certifi 2018.11.29
cffi 1.11.4
cfn-lint 0.22.1
chardet 3.0.4
Click 7.0
cloudpickle 1.2.1
colorama 0.4.3
colorlover 0.3.0
conda 4.4.10
configparser 3.7.4
ConfigSpace 0.4.10
conllu 0.11
cryptography 2.7
cufflinks 0.12.1
cycler 0.10.0
cymem 2.0.2
Cython 0.29.7
cytoolz 0.8.2
dask 2.1.0
datashape 0.5.2
DateTime 4.3
deap 1.3.0
decorator 4.4.2
defusedxml 0.6.0
Delorean 0.6.0
dill 0.3.0
docker 4.0.2
docopt 0.6.2
docutils 0.15.2
ecdsa 0.13.2
editdistance 0.5.3
entrypoints 0.2.3
enum34 1.1.6
fasttext 0.8.22
filelock 3.0.12
findspark 1.3.0
fitter 1.0.8
flaky 3.6.0
Flask 1.1.1
Flask-Cors 3.0.8
flatbuffers 1.11
ftfy 5.5.1
funcsigs 1.0.2
funcy 1.12
future 0.17.1
gast 0.2.2
gensim 3.7.3
Geohash 1.0
gevent 1.4.0
gpxpy 1.1.2
graphviz 0.8.4
greenlet 0.4.15
grpcio 1.22.0
h5py 2.8.0rc1
haversine 0.4.5
heamy 0.0.7
hmmlearn 0.2.1
humanize 0.5.1
idna 2.6
imagesize 1.1.0
importlib-metadata 1.7.0
ipykernel 5.1.0
ipython 7.2.0
ipython-genutils 0.2.0
ipython-sql 0.3.9
ipywidgets 7.5.0
itsdangerous 1.1.0
jedi 0.13.2
jieba 0.39
Jinja2 2.10
jmespath 0.10.0
joblib 0.13.2
jsondiff 1.1.2
jsonnet 0.13.0
jsonpatch 1.23
jsonpickle 1.2
jsonpointer 2.0
jsonschema 3.2.0
jupyter 1.0.0
jupyter-client 6.1.6
jupyter-console 6.0.0
jupyter-core 4.6.3
jupyter-kernel-gateway 1.2.0
Keras 2.2.4
Keras-Applications 1.0.8
Keras-Preprocessing 1.1.0
kiwisolver 1.1.0
klab-autotime 0.0.2
langid 1.1.6
liac-arff 2.4.0
librosa 0.6.1
lightgbm 2.2.3
line-profiler 2.1.2
llvmlite 0.29.0
lockfile 0.12.2
lxml 4.2.1
Markdown 3.1.1
MarkupSafe 1.1.0
matplotlib 3.1.1
matplotlib-venn 0.11.5
MDP 3.5
missingno 0.4.0
mistune 0.8.4
ml-metrics 0.1.4
mlxtend 0.16.0
mock 3.0.5
modin 0.5.0
more-itertools 7.1.0
moto 1.3.9
mpld3 0.3
mplleaflet 0.0.5
mpmath 1.1.0
msgpack 0.5.6
multipledispatch 0.6.0
murmurhash 1.0.2
mxnet 1.4.1
nbclient 0.4.1
nbconvert 5.6.1
nbformat 5.0.7
nest-asyncio 1.4.0
netaddr 0.7.19
networkx 2.3
nibabel 2.4.1
NiftyNet 0.3.0
nltk 3.4.1
nose 1.3.7
notebook 4.4.1
numba 0.44.1
numexpr 2.6.9
numpy 1.16.3
numpydoc 0.9.1
odo 0.5.0
onnx 1.3.0
opencv-python 4.1.0.25
orderedmultidict 1.0
overrides 1.9
packaging 19.0
paddlepaddle 1.5.0
pandas 0.24.2
pandas-profiling 1.4.2
pandocfilters 1.4.2
papermill 2.1.1
parsimonious 0.8.1
parso 0.3.1
pathspec 0.8.0
patsy 0.5.1
pexpect 4.6.0
pickleshare 0.7.5
Pillow 5.3.0
pip 9.0.1
plac 0.9.6
plotly 3.9.0
pluggy 0.12.0
preshed 2.0.1
prettytable 0.7.2
prometheus-client 0.5.0
prompt-toolkit 2.0.7
protobuf 3.8.0
psutil 5.6.3
psycopg2-binary 2.8.2
ptyprocess 0.6.0
pudb 2017.1
py 1.8.0
py-cpuinfo 5.0.0
py-lz4framed 0.13.0
py4j 0.10.7
pyasn1 0.4.8
pybind11 2.3.0
pycosat 0.6.3
pycparser 2.18
pydot 1.2.3
pyecharts 1.1.0
pygal 2.4.0
Pygments 2.3.1
pyLDAvis 2.1.1
pyltr 0.2.4
PyMySQL 0.10.1
pynisher 0.5.0
pyOpenSSL 17.5.0
pyparsing 2.1.10
pyrfr 0.7.4
pyrsistent 0.16.0
PySocks 1.6.7
pyspark 2.4.2
pystan 2.18.0.0
pytest 5.0.1
python-dateutil 2.8.1
python-jose 3.0.1
python-Levenshtein 0.12.0
python-speech-features 0.6
pytorch-pretrained-bert 0.6.2
pytz 2019.1
PyWavelets 1.0.3
PyYAML 5.3.1
pyzmq 19.0.1
qtconsole 4.5.1
randomgen 1.16.6
rarfile 3.0
ray 0.6.6
recordio 0.1.7
redis 3.2.1
regex 2019.6.8
requests 2.18.4
resampy 0.2.1
responses 0.10.6
retrying 1.3.3
rsa 4.5
ruamel-yaml 0.15.35
s2sphere 0.2.4
s3transfer 0.3.3
sacred 0.6.10
scikit-image 0.14.2
scikit-learn 0.21.1
scipy 1.2.0
seaborn 0.9.0
Send2Trash 1.5.0
setuptools 49.2.0
SexMachine 0.1.1
six 1.15.0
smac 0.8.0
smart-open 1.8.4
smhasher 0.150.1
snowballstemmer 1.9.0
soupsieve 1.9.2
spacy 2.1.4
Sphinx 2.1.2
sphinx-rtd-theme 0.4.3
sphinxcontrib-applehelp 1.0.1
sphinxcontrib-devhelp 1.0.1
sphinxcontrib-htmlhelp 1.0.2
sphinxcontrib-jsmath 1.0.1
sphinxcontrib-qthelp 1.0.2
sphinxcontrib-serializinghtml 1.1.3
SQLAlchemy 1.3.5
sqlparse 0.3.0
srsly 0.0.7
sshpubkeys 3.1.0
statsmodels 0.9.0
sympy 1.2
tables 3.4.2
tenacity 6.2.0
tensorboard 1.13.1
tensorboardX 1.8
tensorflow 1.13.1
tensorflow-estimator 1.13.0
termcolor 1.1.0
terminado 0.8.1
testpath 0.4.2
textblob 0.15.1
textwrap3 0.9.2
Theano 1.0.4
thinc 7.0.4
toml 0.10.1
toolz 0.8.2
torch 1.1.0
torchvision 0.3.0
tornado 4.5.3
TPOT 0.6.8
tqdm 4.49.0
traitlets 4.3.3
trueskill 0.4.4
typed-ast 1.4.1
typing 3.7.4
typing-extensions 3.7.4
tzlocal 1.5.1
Unidecode 1.1.1
update-checker 0.16
urllib3 1.22
urwid 2.0.1
vega 2.4.0
vida 0.3
wasabi 0.2.2
wcwidth 0.1.7
webencodings 0.5.1
websocket-client 0.56.0
Werkzeug 0.15.4
wheel 0.30.0
widgetsnbextension 3.5.0
word2number 1.1
Wordbatch 1.3.8
wordcloud 1.5.0
wrapt 1.11.2
xgboost 0.82
xlearn 0.40a1
xlrd 1.2.0
xmltodict 0.12.0
zipp 3.1.0
zope.interface 4.6.0
You are using pip version 9.0.1, however version 20.3.3 is available.
You should consider upgrading via the ‘pip install --upgrade pip’ command.

显示cell运行时长

%load_ext klab-autotime
import pandas as pd
from pandas import DataFrame,Series
import numpy as np
#加载安装包

df = pd.read_csv(’/home/kesci/input/maotai4154/maotai.csv’)
df.head()
#读取茅台历史股票数据
Unnamed: 0 date open close high low volume code
0 0 2001-08-27 5.392 5.554 5.902 5.132 406318.00 600519
1 1 2001-08-28 5.467 5.759 5.781 5.407 129647.79 600519
2 2 2001-08-29 5.777 5.684 5.781 5.640 53252.75 600519
3 3 2001-08-30 5.668 5.796 5.860 5.624 48013.06 600519
4 4 2001-08-31 5.804 5.782 5.877 5.749 23231.48 600519
df.drop(labels=‘Unnamed: 0’,inplace=True,axis=1) #删除第一列，并马上作用于当前表格
df.head() #后面我们需要用时间序列排序
date open close high low volume code
0 2001-08-27 5.392 5.554 5.902 5.132 406318.00 600519
1 2001-08-28 5.467 5.759 5.781 5.407 129647.79 600519
2 2001-08-29 5.777 5.684 5.781 5.640 53252.75 600519
3 2001-08-30 5.668 5.796 5.860 5.624 48013.06 600519
4 2001-08-31 5.804 5.782 5.877 5.749 23231.48 600519
df.info() # 发现 data好像不是日期格式
nums = [1,2,3,4,5]

print(np.median(nums))
<class ‘pandas.core.frame.DataFrame’>
DatetimeIndex: 4621 entries, 2001-08-27 to 2020-12-31
Data columns (total 6 columns):
open 4621 non-null float64
close 4621 non-null float64
high 4621 non-null float64
low 4621 non-null float64
volume 4621 non-null float64
code 4621 non-null int64
dtypes: float64(5), int64(1)
memory usage: 252.7 KB
3.0
df[‘date’] = pd.to_datetime(df[‘date’]) #将date转置成时间格式
df.info() #已经转化成datatime日期格式
<class ‘pandas.core.frame.DataFrame’>
RangeIndex: 4621 entries, 0 to 4620
Data columns (total 7 columns):
date 4621 non-null datetime64[ns]
open 4621 non-null float64
close 4621 non-null float64
high 4621 non-null float64
low 4621 non-null float64
volume 4621 non-null float64
code 4621 non-null int64
dtypes: datetime64ns, float64(5), int64(1)
memory usage: 252.8 KB
df.set_index(‘date’,inplace=True) #data当index索引
df.head() #转化成功
open close high low volume code
date
2001-08-27 5.392 5.554 5.902 5.132 406318.00 600519
2001-08-28 5.467 5.759 5.781 5.407 129647.79 600519
2001-08-29 5.777 5.684 5.781 5.640 53252.75 600519
2001-08-30 5.668 5.796 5.860 5.624 48013.06 600519
2001-08-31 5.804 5.782 5.877 5.749 23231.48 600519
m5 = df[‘close’].rolling(5).mean()
m30 = df[‘close’].rolling(30).mean() #计算 5天结束均值和30天的结束均值
print(m30)
date
2001-08-27 NaN
2001-08-28 NaN
2001-08-29 NaN
2001-08-30 NaN
2001-08-31 NaN
2001-09-03 NaN
2001-09-04 NaN
2001-09-05 NaN
2001-09-06 NaN
2001-09-07 NaN
2001-09-10 NaN
2001-09-11 NaN
2001-09-12 NaN
2001-09-13 NaN
2001-09-14 NaN
2001-09-17 NaN
2001-09-18 NaN
2001-09-19 NaN
2001-09-20 NaN
2001-09-21 NaN
2001-09-24 NaN
2001-09-25 NaN
2001-09-26 NaN
2001-09-27 NaN
2001-09-28 NaN
2001-10-08 NaN
2001-10-09 NaN
2001-10-10 NaN
2001-10-11 NaN
2001-10-12 5.696633
…
2020-11-20 1709.602333
2020-11-23 1710.922333
2020-11-24 1711.739000
2020-11-25 1711.668333
2020-11-26 1711.981333
2020-11-27 1712.844667
2020-11-30 1713.341667
2020-12-01 1713.441667
2020-12-02 1713.597667
2020-12-03 1713.814333
2020-12-04 1716.407000
2020-12-07 1722.053667
2020-12-08 1729.553667
2020-12-09 1735.393333
2020-12-10 1740.623333
2020-12-11 1745.622667
2020-12-14 1750.789000
2020-12-15 1754.822667
2020-12-16 1759.503333
2020-12-17 1764.290333
2020-12-18 1769.103000
2020-12-21 1772.434000
2020-12-22 1776.967333
2020-12-23 1780.644000
2020-12-24 1783.829000
2020-12-25 1787.995667
2020-12-28 1792.760667
2020-12-29 1797.800667
2020-12-30 1805.779000
2020-12-31 1815.039333
Name: close, Length: 4621, dtype: float64
import matplotlib.pyplot as plt #导入可视化模块
%matplotlib inline
plt.plot(m5[4000:4500])
plt.plot(m30[4000:4500])
#观察5天和30天部分数据的折现分布情况
[<matplotlib.lines.Line2D at 0x7f0ec9c36e48>]

分析金叉日期和死叉日期
短时间内的指标线与另一根较长时间的指标线
如果短期指标线方向上穿长期均线，这种状态叫“金叉”
如果短期指标线方向下穿长期均线，这种状态叫“死叉”
金叉趋向买入，死叉趋向卖出

ma5 = ma5[30:]
ma30 = ma30[30:] #全都去除前30天的数据，去掉空值，而且能互相对应

NameError Traceback (most recent call last)
in
----> 1 ma5 = ma5[30:]
2 ma30 = ma30[30:] #全都去除前30天的数据，去掉空值，而且能互相对应

NameError: name ‘ma5’ is not defined
s1 = m5m30
#短期和长期的判断条件，转成布尔值
deat_ex = s1&s2.shift(1) #s1的True 与 s2向下偏移一位的 True 相交的True 得到死叉，&都成立才成立
deat_data = df.loc[deat_ex].index #取死叉在源数据的日期的索引值日期，并取出
deat_data
DatetimeIndex([‘2002-01-17’, ‘2002-01-30’, ‘2002-03-29’, ‘2002-07-29’,
‘2002-12-27’, ‘2003-03-17’, ‘2003-04-22’, ‘2003-06-20’,
‘2003-06-30’, ‘2003-08-04’, ‘2004-02-27’, ‘2004-05-11’,
‘2004-06-07’, ‘2004-08-20’, ‘2004-11-23’, ‘2005-04-20’,
‘2005-05-16’, ‘2005-06-15’, ‘2005-09-27’, ‘2006-07-10’,
‘2006-07-31’, ‘2006-08-24’, ‘2006-09-13’, ‘2007-02-08’,
‘2007-04-23’, ‘2007-05-09’, ‘2007-07-12’, ‘2007-09-12’,
‘2007-11-12’, ‘2007-11-22’, ‘2008-01-31’, ‘2008-03-18’,
‘2008-05-23’, ‘2008-08-12’, ‘2008-12-31’, ‘2009-03-12’,
‘2009-04-30’, ‘2009-08-20’, ‘2009-09-02’, ‘2009-10-20’,
‘2009-12-18’, ‘2010-01-22’, ‘2010-02-26’, ‘2010-06-23’,
‘2010-10-15’, ‘2010-11-02’, ‘2010-12-24’, ‘2011-03-02’,
‘2011-03-30’, ‘2011-09-08’, ‘2011-12-08’, ‘2012-07-24’,
‘2012-08-02’, ‘2012-08-15’, ‘2012-09-21’, ‘2012-11-07’,
‘2012-12-25’, ‘2013-01-18’, ‘2013-03-18’, ‘2013-06-21’,
‘2013-07-12’, ‘2013-10-25’, ‘2013-11-26’, ‘2013-12-04’,
‘2014-04-01’, ‘2014-04-30’, ‘2014-08-22’, ‘2014-09-16’,
‘2014-10-13’, ‘2014-11-21’, ‘2015-01-19’, ‘2015-06-17’,
‘2015-07-17’, ‘2015-09-28’, ‘2015-11-26’, ‘2015-12-10’,
‘2016-01-05’, ‘2016-08-05’, ‘2016-08-18’, ‘2016-11-21’,
‘2017-07-06’, ‘2017-09-08’, ‘2017-11-29’, ‘2018-02-05’,
‘2018-03-27’, ‘2018-06-28’, ‘2018-07-23’, ‘2018-07-31’,
‘2018-10-15’, ‘2018-12-25’, ‘2019-05-10’, ‘2019-07-19’,
‘2019-11-28’, ‘2020-01-03’, ‘2020-02-28’, ‘2020-03-18’,
‘2020-08-10’, ‘2020-09-21’, ‘2020-10-27’],
dtype=‘datetime64[ns]’, name=‘date’, freq=None)
gold_ex = ~(s1|s2.shift(1)) # s1或s2向下偏移都为Flsh 就是金叉
gold_data = df.loc[gold_ex].index #取金叉在原表格的时间索引并提取
gold_data
DatetimeIndex([‘2001-08-27’, ‘2001-08-28’, ‘2001-08-29’, ‘2001-08-30’,
‘2001-08-31’, ‘2001-09-03’, ‘2001-09-04’, ‘2001-09-05’,
‘2001-09-06’, ‘2001-09-07’,
…
‘2019-01-03’, ‘2019-06-14’, ‘2019-08-13’, ‘2020-01-02’,
‘2020-02-19’, ‘2020-03-03’, ‘2020-04-02’, ‘2020-08-19’,
‘2020-10-14’, ‘2020-11-05’],
dtype=‘datetime64[ns]’, name=‘date’, length=129, freq=None)
a1 = pd.Series(data = 1 , index=gold_data) #取金叉日期为索引，内容为 1 ，死叉内容为 0
a2 = pd.Series(data = 0 , index=deat_data)
a = a1.append(a2)
a = a.sort_index()
s = a[‘2012’:‘2020’] # 金叉死叉日期合成，按时间序列排序，取2012-2020之间的数据
s
date
2012-02-10 1
2012-07-24 0
2012-07-25 1
2012-08-02 0
2012-08-09 1
2012-08-15 0
2012-09-12 1
2012-09-21 0
2012-09-27 1
2012-11-07 0
2012-12-21 1
2012-12-25 0
2013-01-10 1
2013-01-18 0
2013-03-12 1
2013-03-18 0
2013-04-17 1
2013-06-21 0
2013-07-03 1
2013-07-12 0
2013-10-22 1
2013-10-25 0
2013-11-11 1
2013-11-26 0
2013-11-28 1
2013-12-04 0
2014-01-23 1
2014-04-01 0
2014-04-03 1
2014-04-30 0
…
2018-03-27 0
2018-05-09 1
2018-06-28 0
2018-07-18 1
2018-07-23 0
2018-07-25 1
2018-07-31 0
2018-09-20 1
2018-10-15 0
2018-12-04 1
2018-12-25 0
2019-01-03 1
2019-05-10 0
2019-06-14 1
2019-07-19 0
2019-08-13 1
2019-11-28 0
2020-01-02 1
2020-01-03 0
2020-02-19 1
2020-02-28 0
2020-03-03 1
2020-03-18 0
2020-04-02 1
2020-08-10 0
2020-08-19 1
2020-09-21 0
2020-10-14 1
2020-10-27 0
2020-11-05 1
Length: 97, dtype: int64
first_money = 100000 # 本金10万
money = first_money #初始资金
hold = 0 #账户茅台股票目前笔数
for i in range(0,len(s)): #卖出买入的日期有多少个，金叉死叉的时间节点循环
if s[i] == 1: #日期为金叉日期
time = s.index[i] #提取当天日期
p = df.loc[time][‘open’] #原数据中提取开盘价格
hand = p100 #一手（100支）的价格
hand_count = money//hand #目前资金最多能买几手
hold = hand_count100 #需要买多少支
money -= hold*p #尽可能多买完后，还剩余多少资金

else:
    deat_time = s.index[i]        #提取死叉时间序列的值
    p_death = df.loc[deat_time]['open'] #提取原数据 死叉日期当天的开盘价
    money += p_death*hold                #账户所有股票根据开盘价卖出后得到的钱
    hold = 0                              #账户股票清空

last_money = hold*df[‘close’][-1] #最后一天的日期是金叉还是死叉，账户剩余的股票按照最终价格换算价格
l = int(money+last_money-first_money) #利润= 口袋余额+账户余额-起始本金
l
1212457