Keras: list、ndarray、Series、DataFrame

list:


https://foofish.net/python-list-top10.html
列表是最常用的数据类型之一,本文整理了 StackOverflow 上关于列表操作被访问最多的10个问答,如果你在开发过程中遇到这些问题,不妨先思考一下如何解决。

In [55]: s1 = []

In [56]: x1 = [1,2]

In [57]: y1 = [3,4]

In [58]: z1 = [5,6]

//三个list合成一个大的list

In [59]: [x1,y1,z1]
Out[59]: [[1, 2], [3, 4], [5, 6]]

In [63]: np.array(s1).shape
Out[63]: (1, 3, 2)
//再加上一个大的list

In [64]: s1.append([x1, y1, z1])

In [65]: s1
Out[65]: [[[1, 2], [3, 4], [5, 6]], [[1, 2], [3, 4], [5, 6]]]

In [67]: np.array(s1).shape
Out[67]: (2, 3, 2)
 

numpy的ndarray: 多维数组对象


In [97]: data = randn(2, 3, 4)

In [98]: data
Out[98]: 
array([[[-1.77893616,  1.32541875, -0.98059457, -0.72900211],
        [-0.81315267,  0.63041693,  0.37501996, -0.31711792],
        [-1.13708265,  1.20609483, -1.84456137, -0.60516487]],

       [[ 2.46355383, -1.16851347, -0.52487837,  0.53655977],
        [-0.28845058,  0.31156087, -2.91496284,  1.16156673],
        [ 0.29344483, -0.91912533,  0.4226712 ,  0.60789032]]])

ndarray的属性


In [99]: data.ndim
Out[99]: 3

In [100]: data.shape
Out[100]: (2, 3, 4) [axis 0/1/2]

In [101]: data.dtype
Out[101]: dtype('float64')

ndarray的创建


1. 创建ndarray最简单的方法就是array函数,它接收一切序列型对象如list\tuple等
In [103]: np.array([2,3,4])
Out[103]: array([2, 3, 4])

In [102]: np.array((1,2))
Out[102]: array([1, 2])

2. 简单的函数
np.zeros/ones(10), np.zeros/ones((a,b))
np.arange(10)

randn(3,2)//正态分布的随机数
array([[ 0.3171505 , -0.5381273 ],
       [ 1.20714189,  0.74829474],
       [ 0.11972618, -1.89287799]])

基本的索引和切片


Numpy数组的索引是个丰富的主题,因为选取子集或单个元素的方法很多。
1. 一维数组
和list类似,但对切片的赋值具有broadcast作用(list没有改功能),且改变了原来的内容
In [111]: arr = np.arange(5)

In [112]: arr
Out[112]: array([0, 1, 2, 3, 4])

In [113]: arr[3:]
Out[113]: array([3, 4])

In [114]: arr[:2] = 9

In [115]: arr
Out[115]: array([9, 9, 2, 3, 4])

2. 高维数组
In [119]: arr2d = randn(3,4)

In [120]: arr2d
Out[120]: 
array([[ 0.79971818,  1.50851931,  0.48822392,  0.70054322],
       [-0.02592012, -0.80977049,  1.10848665,  0.98746061],
       [-0.32560101,  0.55448242, -0.16523043,  1.1026722 ]])

In [121]: arr2d[0] //axis:0 上的第0
Out[121]: array([0.79971818, 1.50851931, 0.48822392, 0.70054322])

In [122]: arr2d[0:2]//axis:0 切片[0:2]
Out[122]: 
array([[ 0.79971818,  1.50851931,  0.48822392,  0.70054322],
       [-0.02592012, -0.80977049,  1.10848665,  0.98746061]])

In [123]: arr2d[0:2, 1:]//axis:0/1上切片(切过axis:0再切 axis:1)
Out[123]: 
array([[ 1.50851931,  0.48822392,  0.70054322],
       [-0.80977049,  1.10848665,  0.98746061]])

In [124]: arr2d[2,1]   //选取某个元素的方法,这两种方法都可以
Out[124]: 0.5544824201923262

In [125]: arr2d[2][1]
Out[125]: 0.5544824201923262

3. numpy中数组的布尔型索引


In [126]:  languages = np.array(['c','perl','python','c','python','perl','java'])
In [135]: data = np.random.randn(7,3)
In [137]: data
Out[137]: 
array([[-0.33489167, -0.24631403,  0.90335029],
       [-0.39345562, -0.72420883,  0.24052579],
       [-2.08528837,  2.19331772, -1.08433669],
       [ 0.64976211, -1.32106489,  0.26910594],
       [ 1.40548013, -1.22632999, -0.44270916],
       [ 1.74141036, -0.06534211,  0.51472613],
       [-0.26326706,  1.70945702,  0.24703332]])

In [138]: data[languages == 'java']
Out[138]: array([[-0.26326706,  1.70945702,  0.24703332]])                                                                                                                                                                                                                                                                                                              In [139]: data[languages == 'python']
Out[139]: 
array([[-2.08528837,  2.19331772, -1.08433669],
       [ 1.40548013, -1.22632999, -0.44270916]])
 

Series


类似于一维数组的对象(但由index和values两部分组成)
In [140]: from pandas import Series,DataFrame
In [141]: import pandas as pd

In [142]: obj = Series([4, -5, 7])

In [143]: obj
Out[143]: //左边index, 右边value
0    4
1   -5
2    7
dtype: int64

In [145]: obj.values
Out[145]: array([ 4, -5,  7])

In [146]: obj.index
Out[146]: RangeIndex(start=0, stop=3, step=1)

//可以创建有意义字符串的index
In [147]: obj = Series([4, -5, 7], index=['a', 'b', 'c'])

In [148]: obj
Out[148]: 
a    4
b   -5
c    7
dtype: int64

In [149]: obj['a']
Out[149]: 4

In [150]: obj[['a', 'b']]
Out[150]: 
a    4
b   -5
dtype: int64

In [151]: obj['a', 'b']//KeyError: ('a', 'b')

DataFrame表格型的数据结构

read_csv

In [4]: columns = ['user','activity','timestamp', 'x-axis', 'y-axis', 'z-axis']
   ...: df = pd.read_csv('data/WISDM_ar_v1.1_raw.txt', header = None, names = columns)
   ...: df = df.dropna()
   ...: 

DataFrame属性

In [5]: type(df)
Out[5]: pandas.core.frame.DataFrame

In [6]: df.head()
Out[6]: 
   user activity       timestamp    x-axis     y-axis    z-axis
0    33  Jogging  49105962326000 -0.694638  12.680544  0.503953
1    33  Jogging  49106062271000  5.012288  11.264028  0.953424
2    33  Jogging  49106112167000  4.903325  10.882658 -0.081722
3    33  Jogging  49106222305000 -0.612916  18.496431  3.023717
4    33  Jogging  49106332290000 -1.184970  12.108489  7.205164

In [7]: df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1098203 entries, 0 to 1098203
Data columns (total 6 columns):
user         1098203 non-null int64
activity     1098203 non-null object
timestamp    1098203 non-null int64
x-axis       1098203 non-null float64
y-axis       1098203 non-null float64
z-axis       1098203 non-null float64
dtypes: float64(3), int64(2), object(1)
memory usage: 58.7+ MB

In [8]: df.shape
Out[8]: (1098203, 6)

In [21]: type(df['activity'])
Out[21]: pandas.core.series.Series

value_counts()

df['activity'].value_counts()
Out[19]: 
Walking       424397
Jogging       342176
Upstairs      122869
Downstairs    100427
Sitting        59939
Standing       48395
Name: activity, dtype: int64
value_counts(self, normalize=False, sort=True, ascending=False, bins=None, dropna=True) unbound pandas.core.series.Series method
    Returns object containing counts of unique values.

    
    The resulting object will be in descending order so that the
    first element is the most frequently-occurring element.
    Excludes NA values by default.

//Series 布尔型索引


In [22]: df['activity'] == 'Sitting'
Out[22]: 
0          False
1          False
2          False
3          False
4          False
5          False

df[df['activity'] == activity]//经过布尔型索引后的DataFrame

df[df['activity'] == activity][['x-axis', 'y-axis', 'z-axis']]//从DataFrame中切片:得到某些列
df[df['activity'] == 'Sitting'][['x-axis', 'y-axis', 'z-axis']][:200]//选择axis:0 的前200个元素


In [26]: !head -10 data/WISDM_ar_v1.1_raw.txt
33,Jogging,49105962326000,-0.6946377,12.680544,0.50395286
33,Jogging,49106062271000,5.012288,11.264028,0.95342433
33,Jogging,49106112167000,4.903325,10.882658,-0.08172209
33,Jogging,49106222305000,-0.61291564,18.496431,3.0237172
33,Jogging,49106332290000,-1.1849703,12.108489,7.205164
33,Jogging,49106442306000,1.3756552,-2.4925237,-6.510526
33,Jogging,49106542312000,-0.61291564,10.56939,5.706926
33,Jogging,49106652389000,-0.50395286,13.947236,7.0553403
33,Jogging,49106762313000,-8.430995,11.413852,5.134871
33,Jogging,49106872299000,0.95342433,1.3756552,1.6480621
In [21]: for i in range(0, len(df) - N_TIME_STEPS, step):
    ...:     xs = df['x-axis'].values[i: i + N_TIME_STEPS]
    ...:     ys = df['y-axis'].values[i: i + N_TIME_STEPS]
    ...:     zs = df['z-axis'].values[i: i + N_TIME_STEPS]
    ...:     label = stats.mode(df['activity'][i: i + N_TIME_STEPS])[0][0]
    ...:     segments.append([xs, ys, zs])
    ...:     labels.append(label)
    ...:     
In [24]: np.array(segments).shape
Out[24]: (54901, 3, 200) // 3行,每行200个元素,第一行是x-axis(200个)

In [25]: np.array(segments)[0]
Out[25]: 
array([[ -0.6946377 ,   5.012288  ,   4.903325  ,  -0.61291564,
         -1.1849703 ,   1.3756552 ,  -0.61291564,  -0.50395286,
         -8.430995  ,   0.95342433,  -8.19945   ,   1.4165162 ,
         -1.879608  ,  -6.1291566 ,   5.829509  ,   6.2789803 ,

reshaped_segments = np.asarray(segments, dtype= np.float32).reshape(-1, N_TIME_STEPS, N_FEATURES)
//asarray?
In [29]: reshaped_segments.shape
Out[29]: (54901, 200, 3)//每行3个元素,其中都是axis-x数据,并不是x/y/z三轴的数据

In [30]: reshaped_segments[0]
Out[30]: 
array([[ -0.6946377 ,   5.012288  ,   4.903325  ],
       [ -0.61291564,  -1.1849703 ,   1.3756552 ],

labels = np.asarray(pd.get_dummies(labels), dtype = np.float32)
labels.shape: (54901, 6)

//每个训练数据都是包含x/y/z数据的
X_train, X_test, y_train, y_test = train_test_split(
        reshaped_segments, labels, test_size=0.2, random_state=RANDOM_SEED)


N_CLASSES = 6
N_HIDDEN_UNITS = 64
def create_LSTM_model(inputs):
    //named variable-tensor:Weight: hidden:(3, 64) output:(64,6)
    // bias:hidden(64,0), output(6,)

    W = {
        'hidden': tf.Variable(tf.random_normal([N_FEATURES, N_HIDDEN_UNITS])),
        'output': tf.Variable(tf.random_normal([N_HIDDEN_UNITS, N_CLASSES]))
    }
    biases = {
        'hidden': tf.Variable(tf.random_normal([N_HIDDEN_UNITS], mean=1.0)),
        'output': tf.Variable(tf.random_normal([N_CLASSES]))
    }
    
    //把输入数据变换
    X = tf.transpose(inputs, [1, 0, 2]) //第0轴和第1轴变换,shape发生改变
    X = tf.reshape(X, [-1, N_FEATURES])
    hidden = tf.nn.relu(tf.matmul(X, W['hidden']) + biases['hidden'])
    hidden = tf.split(hidden, N_TIME_STEPS, 0)

    # Stack 2 LSTM layers
    lstm_layers = [tf.contrib.rnn.BasicLSTMCell(N_HIDDEN_UNITS, forget_bias=1.0) for _ in range(2)]
    lstm_layers = tf.contrib.rnn.MultiRNNCell(lstm_layers)

    outputs, _ = tf.contrib.rnn.static_rnn(lstm_layers, hidden, dtype=tf.float32)

    # Get output for the last time step
    lstm_last_output = outputs[-1]

    return tf.matmul(lstm_last_output, W['output']) + biases['output']

X: 输入
X = tf.placeholder(tf.float32, [None, N_TIME_STEPS, N_FEATURES], name="input")
Y = tf.placeholder(tf.float32, [None, N_CLASSES])

pred_Y = create_LSTM_model(X)
pred_softmax = tf.nn.softmax(pred_Y, name="y_")//对结果进行softmax

//损失函数
L2_LOSS = 0.0015
l2 = L2_LOSS * sum(tf.nn.l2_loss(tf_var) for tf_var in tf.trainable_variables())
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = pred_Y, labels = Y)) + l2

//learning:怎么优化
LEARNING_RATE = 0.0025
optimizer = tf.train.AdamOptimizer(learning_rate=LEARNING_RATE).minimize(loss)
correct_pred = tf.equal(tf.argmax(pred_softmax, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, dtype=tf.float32))

//training
N_EPOCHS = 50
BATCH_SIZE = 1024

for i in range(1, N_EPOCHS + 1):
    for start, end in zip(range(0, train_count, BATCH_SIZE), range(BATCH_SIZE, train_count + 1,BATCH_SIZE)):
        sess.run(optimizer, feed_dict={X: X_train[start:end],Y: y_train[start:end]})

In [51]: range(0(start), 30(train_count), 10(batch_size))
Out[51]: [0, 10, 20]

In [52]: range(10(start), 31(train_count), 10(batch_size))
Out[52]: [10, 20,30]

In [54]: zip(range(0, 30, 10), range(10, 31, 10))
Out[54]: [(0, 10), (10, 20), (20, 30)]


In [49]: pd.get_dummies(hot2name).head()


Out[49]: 
   Downstairs  Jogging  Sitting  Standing  Upstairs  Walking
0           0        1        0         0         0        0
1           0        1        0         0         0        0
2           0        1        0         0         0        0
3           0        1        0         0         0        0
4           0        1        0         0         0        0

 */

  • 2
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
使用模型进行预测... WARNING:tensorflow:Model was constructed with shape (None, 3989, 10) for input KerasTensor(type_spec=TensorSpec(shape=(None, 3989, 10), dtype=tf.float32, name='dense_input'), name='dense_input', description="created by layer 'dense_input'"), but it was called on an input with incompatible shape (None, 10). 1/1 [==============================] - 0s 36ms/step --------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[20], line 14 11 predicted = model.predict(unknown, verbose=1) 13 # 将预测结果保存到新的 CSV 文件中 ---> 14 result = pd.DataFrame(predicted, columns=['prediction']) 15 result.to_csv('predicted_result.csv', index=False) 16 print("输入的数据为: ") File ~\AppData\Roaming\Python\Python39\site-packages\pandas\core\frame.py:757, in DataFrame.__init__(self, data, index, columns, dtype, copy) 746 mgr = dict_to_mgr( 747 # error: Item "ndarray" of "Union[ndarray, Series, Index]" has no 748 # attribute "name" (...) 754 copy=_copy, 755 ) 756 else: --> 757 mgr = ndarray_to_mgr( 758 data, 759 index, 760 columns, 761 dtype=dtype, 762 copy=copy, 763 typ=manager, 764 ) 766 # For data is list-like, or Iterable (will consume into list) 767 elif is_list_like(data): File ~\AppData\Roaming\Python\Python39\site-packages\pandas\core\internals\construction.py:337, in ndarray_to_mgr(values, index, columns, dtype, copy, typ) 332 # _prep_ndarraylike ensures that values.ndim == 2 at this point 333 index, columns = _get_axes( 334 values.shape[0], values.shape[1], index=index, columns=columns 335 ) --> 337 _check_values_indices_shape_match(values, index, columns) 339 if typ == "array": 340 if issubclass(values.dtype.type, str): File ~\AppData\Roaming\Python\Python39\site-packages\pandas\core\internals\construction.py:408, in _check_values_indices_shape_match(values, index, columns) 406 passed = values.shape 407 implied = (len(index), len(columns)) --> 408 raise ValueError(f"Shape of passed values is {passed}, indices imply {implied}") ValueError: Shape of passed values is (1, 3), indices imply (1, 1)该怎么修改代码
05-27
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值