Python Learning Note
python基本概念
python是一种解释型,面向对象,动态数据类型的高级程序设计语言.
(search for python extension packages to download extra packages)
python包
Pandas
In particular, pandas offers data structures and operations for manipulating numerical tables and time series:
- pd.Series
// create a series
In [3]: s = pd.Series([1, 1, 12, 6, np.nan])
In [4]: s
Out[4]:
0 1.0
1 1.0
2 12.0
3 6.0
5 NaN
dtype: float64
Slice operation in series:
1\ To print elements from beginning to a range use [:Index] (does not include Index)
2\ To print elements till end-use [:-Index] (does not include Index)
3\ To print elements from specific Index till the end use [Index:] (does not include Index)
4\ To print elements within a range, use [Start Index:End Index]
5\ To print whole Series with the use of slicing operation, use [:]
6\ To print the whole Series in reverse order, use [::-1]
- pd.DataFrame:
// create a series
In [1]: df = pd.Series.to_frame(s)
In [2]: df.column = ['Rate']
//selecting columns:
1\ use .loc to do conditional selection.
data['Cost'].loc[data['DMA']=='Other'].sum()/data['Cost'].sum()
- dictionary
//create a dictionary
dict = {'Pheobe':[95,93,11,65],'luis':[99,97,66,70] }
- how to join two tables
df_off = df_off.set_index('trans_dt').join(total_gmb.set_index('trans_dt'),
how='left', lsuffix='',
rsuffix='_total', sort=False).reset_index()
Encountered errors and solutions:
1. datetime package:
AttributeError: 'datetime' module has no attribute 'strptime'
Solutions:
from datetime import datetime
or
#module class method
datetime.datetime.strptime(date, "%Y-%m-%d")
2. space in the string
In[5]: validation_data1['VERTICAL'].unique()
Out[5]: array(['Home\xa0&\xa0Garden', 'Other', 'Electronics'], dtype=object)
\xa0 is a non-breaking space in Latin1 (ISO 8859-1), also chr(160). Here is how to replace it with normal space:
string = string.replace(u'\xa0', u' ')
3. ‘pandas’ has no attribute ‘ewma’
# 对size个数进行加权移动平均
rol_weighted_mean = pd.ewma(timeSeries, span=size)
改为
rol_weighted_mean = pd.DataFrame.ewm(timeSeries, span=size).mean()
4. 解决plt图像交叠问题:
fig = plt.figure()
fig.tight_layout()
or
plt.tight_layout()
5. 怎么从string中选取特定字符
original array:
array(['Evergreen, Core, GEO1',
'PL_TS_Control, Evergreen, Promoted Listings',
'Evergreen, GEO4, Core', 'Evergreen, GEO2, Core',
'Evergreen, Core, GEO 5', 'Evergreen, GEO3, Core',
'Evergreen, PL_TS_Treatment, Promoted Listings',
'Trading Cards, Strategic', 'CR, Strategic',
'Evergreen, FBK, Core', 'Evergreen, Core, CTRL',
'Watches, Strategic', 'Watches_SSC_Treatment_GEOB, Strategic',
'Sneakers, Strategic', 'Watches_SSC_Control_GEOA, Strategic',
'Evergreen, Core', 'Evergreen, Core, C2C',
'Strategic, Sneaker_Showcase'], dtype=object)
data_test=data.loc[data['Labels on Campaign'].apply(lambda x: (x.split(',',1)[1] == ' Strategic') or (x.split(',',1)[0] == 'Strategic'))]
array(['Trading Cards, Strategic', 'CR, Strategic', 'Watches, Strategic',
'Watches_SSC_Treatment_GEOB, Strategic', 'Sneakers, Strategic',
'Watches_SSC_Control_GEOA, Strategic',
'Strategic, Sneaker_Showcase'], dtype=object)
- 浅拷贝和深拷贝
dict2 = dict1 # 浅拷贝: 引用对象
dict3 = dict1.copy() # 浅拷贝:深拷贝父对象(一级目录),子对象(二级目录)不拷贝,还是引用
- is 和 == 的区别
‘b is a’ returns ‘True’ when a and b point to the same object
‘b == a’ returns ‘True’ when a and b have the same variables
a = [1,2,3]
b = a #copy a
c = a[:] #copy a using slice operator
if b == a:
print('True1')
if b is a:
print('True2')
if c == a:
print('True3')
if c is a:
print('True4')
[out]: True1
True2
True3