pandas的一些用法

1. pandas读取json文件

import pandas as pd
pathfile = 'xxx.json'
data = pd.read_json(pathfile)

data的类型为<class 'pandas.core.frame.DataFrame'>
Python机器学习(八十三)Pandas 读取 JSON 数据

2.pandas日期转换

3.pandas写入csv格式文件出现中文乱码问题解决方法

4. pd.DataFrame

代码:
import pandas as pd
a = [['a','b','c','d'], ['e','f','g','h']]
a_df = pd.DataFrame(a)
print(a_df)
print(type(a_df))

结果:
   0  1  2  3
0  a  b  c  d
1  e  f  g  h
<class 'pandas.core.frame.DataFrame'>
代码:
#对于简单列表而言,转为DataFrame后,需要转置,才是一行
b = ['a','b','c','d']
b_df = pd.DataFrame(b)
print(b_df)
print(type(b_df))

b_df_T = b_df.T
print(b_df_T)
print(type(b_df_T))


结果:
   0
0  a
1  b
2  c
3  d
<class 'pandas.core.frame.DataFrame'>
   0  1  2  3
0  a  b  c  d
<class 'pandas.core.frame.DataFrame'>

  • 实际遇到的例子:
    某些情况我们需要将一行列表存入csv文件中,当all_content = pd.DataFrame(all_content)后,数据被转成一列,这时all_content.to_csv()会出错。所以加一个flag标志的判断,当是一列是进行转置。
    if len(all_content) == 0:
        all_content = row
        flag = True
    else:
        all_content = np.row_stack((all_content, row))
        flag = False

all_content = pd.DataFrame(all_content)
if flag:
    all_content = all_content.T
all_content.to_csv(out_file, index=False,header=header,encoding='utf-8-sig')

5.pd.read_json()

近期处理一些数据(数据已经脱敏),格式如下:

#原始数据
[
 {
  "reposts_count": 0, 
  "favorited": 0, 
  "update_time": "Sun Jan 06 23:07:51 +0800 2000", 
  "original_pic": "", 
  "text": " 哈哈@123123123", 
  "created_at": "Mon Oct 29 11:30:05 +0800 2000", 
  "mid": 123123123123123123, 
  "annotations": "", 
  "source": "<a href=\"http:/>", 
  "user": {
    "id": 123123123,
    "idstr": "123123123",
    "screen_name": "xxxxxx",
    "name": "xxxxxxxx",
    "location": "China",
    "gender": "m",
    "statuses_count": 133,
    "favourites_count": 0
  },
  "in_reply_to_screen_name": "", 
  "in_reply_to_user_id": 0, 
  "comments_count": 2
 }, 
 {
  "reposts_count": 0, 
  "favorited": 0, 
  "update_time": "Sun Jan 06 23:07:51 +0800 2010", 
  "original_pic": "", 
  "text": " 哈哈哈!你好!!", 
  "created_at": "Mon Oct 29 11:30:05 +0800 2010", 
  "mid": 456456456465456456, 
  "annotations": "", 
  "source": "<a href=\"http:/>", 
  "user": {
    "id": 456456456,
    "idstr": "456456456",
    "screen_name": "yyyyyyyy",
    "name": "yyyyyyyy",
    "location": "China",
    "gender": "f",
    "statuses_count": 133,
    "favourites_count": 0
  },
  "in_reply_to_screen_name": "", 
  "in_reply_to_user_id": 0, 
  "comments_count": 2
 }, 
]

需要提取上面文件123456.json(或者123456.txt)中的相关内容(例如需要提取“text”的内容,"user"中的“id”内容),提取方法如下:

import pandas as pd

datafile = pd.read_json("123456.json",encoding='utf-8')   #有时候不加encoding='utf-8'会报错
# datafile = pd.read_json("123456.txt",encoding='utf-8')  #内容相同时,123456.txt文件也可以被pd.read_json处理
print("type(datafile): ",type(datafile))      #<class 'pandas.core.frame.DataFrame'>
print("datafile:\n",datafile)

num_shape = datafile.shape[0]
print("\n该文件中有 ",num_shape, " 条数据!")


data_text = datafile['text']
print("\ntype(data_text): ",type(data_text))    #<class 'pandas.core.series.Series'>
print("data_text:\n",data_text)

data_user = datafile['user']
print("\ntype(data_user): ",type(data_user))   #<class 'pandas.core.series.Series'>
print("data_user:\n",data_user)

for i in range(num_shape):
    print("\n第",i,"条数据中:")
    text = data_text[i]
    print("type(text):",type(text))
    print("text:", text)
    uid = str(data_user[i]['id'])
    print("type(uid):", type(uid))
    print("uid:", uid)


结果:
type(datafile):  <class 'pandas.core.frame.DataFrame'>
datafile:
    reposts_count  favorited  ... in_reply_to_user_id comments_count
0              0          0  ...                   0              2
1              0          0  ...                   0              2
[2 rows x 13 columns]

该文件中有  2  条数据!

type(data_text):  <class 'pandas.core.series.Series'>
data_text:
0    嘻嘻@123123123
1    哈哈哈!你好!!
Name: text, dtype: object

type(data_user):  <class 'pandas.core.series.Series'>
data_user:
0    {'id': 123123123, 'idstr': '123123123', 'scree...}
1    {'id': 456456456, 'idstr': '456456456', 'scree...}
Name: user, dtype: object0 条数据中:
type(text): <class 'str'>
text: 嘻嘻@123123123
type(uid): <class 'str'>
uid: 1231231231 条数据中:
type(text): <class 'str'>
text: 哈哈哈!你好!!
type(uid): <class 'str'>
uid: 456456456
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值