演示数据df_nba:
+-------------+-----------+---------+--------+--------+
| name | team | poision | height | weight |
+-------------+-----------+---------+--------+--------+
| jordan | Bulls | SG | 198 | 98.0 |
| kobe | Lakers | SG | 198 | 96.0 |
| Oneil | Lakers | C | 216 | 147.0 |
| McGradyg | Rockets | SF | 203 | 101.0 |
| jordan | Bulls | SG | 198 | 98.0 |
| kobe | Lakers | SG | 198 | 96.0 |
| Larry_Bird | Boston | PF | 206 | 100.0 |
| Iverson | 76s | PG | 183 | 75.0 |
| kobe | Lakers | SG | 198 | 96.0 |
| jordan | Bulls | SG | 198 | 98.0 |
| Olajuwon | Rockets | C | 208 | 116.0 |
+-------------+-----------+---------+--------+--------+
df_res:
+-------------+-----------+---------+--------+--------+
| name | team | poision | height | weight |
+-------------+-----------+---------+--------+--------+
| James | Cavaliers | PF | 206 | 113.0 |
| Jabbar | Cavaliers | PF | 228 | 120.0 |
| Chamberlain | 76s | PF | 216 | 125.0 |
| Russell | Boston | PF | 206 | 98.0 |
+-------------+-----------+---------+--------+--------+
调整某列数据类型
df['col_name'] = df['col_name'].astype('int') | 将df的col_name列的数据类型改为int!
两个DataFrame对象取差集
res = df[~(df['a'].isin(a_to_drop))]
数据去重
- df_nba.drop_duplicates(subset=['name','team'], keep='last', inplace=False)
+------------+---------+---------+--------+--------+
| name | team | poision | height | weight |
+------------+---------+---------+--------+--------+
| Oneil | Lakers | C | 216 | 147.0 |
| McGradyg | Rockets | SF | 203 | 101.0 |
| Larry_Bird | Boston | PF | 206 | 100.0 |
| Iverson | 76s | PG | 183 | 75.0 |
| kobe | Lakers | SG | 198 | 96.0 |
| jordan | Bulls | SG | 198 | 98.0 |
| Olajuwon | Rockets | C | 208 | 116.0 |
+------------+---------+---------+--------+--------+
数据拼接
- data = pd.concat([df_nba, df_res], axis = 0)
+-------------+-----------+---------+--------+--------+
| name | team | poision | height | weight |
+-------------+-----------+---------+--------+--------+
| jordan | Bulls | SG | 198 | 98.0 |
| kobe | Lakers | SG | 198 | 96.0 |
| Oneil | Lakers | C | 216 | 147.0 |
| McGradyg | Rockets | SF | 203 | 101.0 |
| jordan | Bulls | SG | 198 | 98.0 |
| kobe | Lakers | SG | 198 | 96.0 |
| Larry_Bird | Boston | PF | 206 | 100.0 |
| Iverson | 76s | PG | 183 | 75.0 |
| kobe | Lakers | SG | 198 | 96.0 |
| jordan | Bulls | SG | 198 | 98.0 |
| Olajuwon | Rockets | C | 208 | 116.0 |
| James | Cavaliers | PF | 206 | 113.0 |
| Jabbar | Cavaliers | PF | 228 | 120.0 |
| Chamberlain | 76s | PF | 216 | 125.0 |
| Russell | Boston | PF | 206 | 98.0 |
+-------------+-----------+---------+--------+--------+
打乱数据
- data = df_nba.sample(frac=1)
+------------+---------+---------+--------+--------+
| name | team | poision | height | weight |
+------------+---------+---------+--------+--------+
| jordan | Bulls | SG | 198 | 98.0 |
| jordan | Bulls | SG | 198 | 98.0 |
| kobe | Lakers | SG | 198 | 96.0 |
| McGradyg | Rockets | SF | 203 | 101.0 |
| jordan | Bulls | SG | 198 | 98.0 |
| kobe | Lakers | SG | 198 | 96.0 |
| Larry_Bird | Boston | PF | 206 | 100.0 |
| Iverson | 76s | PG | 183 | 75.0 |
| Olajuwon | Rockets | C | 208 | 116.0 |
| kobe | Lakers | SG | 198 | 96.0 |
| Oneil | Lakers | C | 216 | 147.0 |
+------------+---------+---------+--------+--------+