因为本地运行速度太慢,这周开始用google colab开始跑机器学习模型。
google colab需要先连接到google drive上:
from google.colab import dirve
drive.mount('/content/drive')
然后通过cd, ls等终端命令找到文件路径。注意在colab里终端命令前要加“!”
improt os
!pwd #get current working directory
!ls #list all files in current directory
os.chdir('/content/drive/My drive/your data folder')
#set working directory
!ls #check files
用pandas读取数据
import pandas as pd
protein = pd.read_csv("Protreins.csv")
另外注意sklearn.cross_validation目前sklearn版本已经不支持,比如
from sklearn.cross_validation import KFold
需换成
from sklearn.model_selection import KFold
删除na
df.dropna()
计算每行na数量
df.isna().sum()
Create an array based on numerical ranges
import numpy as np
np.arange(start = 1, stop = 10, step = 2)
automatically set x label
plt.xticks(arange(0,501,10),