1. JSON 格式处理
import json
json_content = response.json() (response是从网页中获取的数据,type是response,我们需要把它转为json格式)
json.dumps(json_content,indent = 2)) 把json转为str,print出来可以清楚看见json的内容,类似dictionary
2. Unix Time 转 Readable Time
e.g: creation_data = 1420070503
python:
import time
def print_creation_dates_is_answered_json(response):
format = '%m-%d-%Y %H:%M:%S'
for i in response["items"]:
print((time.strftime(format,time.gmtime(i["creation_date"])))) #('01-01-2015 00:00:58') (gmtime把一个时间戳转为UTC时区,strftime把时间转为format的格式)
3. xml数据集的处理
<data>
<row Id="001" PostTypeId="2" ParentId="277" CreationDate="2015-01-01" OwenUserId="123" Score="3" Tags="test" />
<row Id="111" PostTypeId="2" ParentId="266" CreationDate="2015-01-01" OwenUserId="321" Score="1" Tags="No.2" />
<row Id="100" PostTypeId="1" ViewCount="123456" AnswerCount="654321" CreationDate="2015-01-01" OwenUserId="222" Score="6" Tags="test123" />
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>
tree = ET.parse("F:\MyDownloads\Download\stackoverflow-posts-2015.xml")
root = tree.getroot()
print(root)# 根节点是外面最大的<> #也就是这里的data
for child in root:
print(child.attrib)
print(root.findall("row"))#在根节点下找到所有的与row匹配的节点
# Element.text 获取当前元素的text值 <rank>56</rank> 会打印出56
#Element.get(key, default=None):获取元素指定key对应的属性值,如果没有该属性,则返回default值
for child in root.findall("row"):
print(child)#每个row节点
#print("**",child.text)
print(child.get("Id"),child.get("Tags"))
Output:
<Element 'data' at 0x000001CEECDEADB8>
{'Id': '001', 'PostTypeId': '2', 'ParentId': '277', 'CreationDate': '2015-01-01', 'OwenUserId': '123', 'Score': '3', 'Tags': 'test'}
{'Id': '111', 'PostTypeId': '2', 'ParentId': '266', 'CreationDate': '2015-01-01', 'OwenUserId': '321', 'Score': '1', 'Tags': 'No.2'}
{'Id': '100', 'PostTypeId': '1', 'ViewCount': '123456', 'AnswerCount': '654321', 'CreationDate': '2015-01-01', 'OwenUserId': '222', 'Score': '6', 'Tags': 'test123'}
{'name': 'Panama'}
[<Element 'row' at 0x000001CEECDF6868>, <Element 'row' at 0x000001CEECDF68B8>, <Element 'row' at 0x000001CEECDF6908>]
<Element 'row' at 0x000001CEECDF6868>
001 test
<Element 'row' at 0x000001CEECDF68B8>
111 No.2
<Element 'row' at 0x000001CEECDF6908>
100 test123
3. Pandas Data Frame 处理
import pandas as pd
df = pd.DataFrame(data = (array/dic/dataframe) , columns = ['A','B','C'])
df.to_csv("filename.csv") 把df的内容转为csv文件
df.read_csv("filename.csv")
还有一些groupby、concat等函数