CS506 HW1.1

最新推荐文章于 2023-05-14 16:31:31 发布

jweixuan

最新推荐文章于 2023-05-14 16:31:31 发布

阅读量181

点赞数

分类专栏： Homework

本文链接：https://blog.csdn.net/jweixuan/article/details/79215670

版权

Homework 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

1. JSON 格式处理

import json

json_content = response.json() （response是从网页中获取的数据,type是response，我们需要把它转为json格式）

json.dumps(json_content,indent = 2)) 把json转为str，print出来可以清楚看见json的内容，类似dictionary

2. Unix Time 转 Readable Time

e.g: creation_data = 1420070503
python:
import time
def print_creation_dates_is_answered_json(response):
format = '%m-%d-%Y %H:%M:%S'
for i in response["items"]:
print((time.strftime(format,time.gmtime(i["creation_date"])))) #('01-01-2015 00:00:58') （gmtime把一个时间戳转为UTC时区，strftime把时间转为format的格式）

3. xml数据集的处理

xml文件内容:

<?xml version="1.0"?>
<data>
<row Id="001" PostTypeId="2" ParentId="277" CreationDate="2015-01-01" OwenUserId="123" Score="3" Tags="test" />
<row Id="111" PostTypeId="2" ParentId="266" CreationDate="2015-01-01" OwenUserId="321" Score="1" Tags="No.2" />
<row Id="100" PostTypeId="1" ViewCount="123456" AnswerCount="654321" CreationDate="2015-01-01" OwenUserId="222" Score="6" Tags="test123" />
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>

--------------------------------------------------------------------------------------------

import xml.etree.ElementTree as ET
tree = ET.parse("F:\MyDownloads\Download\stackoverflow-posts-2015.xml")
root = tree.getroot()
print(root)# 根节点是外面最大的<> #也就是这里的data
for child in root:
print(child.attrib)

print(root.findall("row"))#在根节点下找到所有的与row匹配的节点

# Element.text 获取当前元素的text值 <rank>56</rank> 会打印出56
#Element.get(key, default=None)：获取元素指定key对应的属性值，如果没有该属性，则返回default值

for child in root.findall("row"):
print(child)#每个row节点
#print("**",child.text)
print(child.get("Id"),child.get("Tags"))

Output:

<Element 'data' at 0x000001CEECDEADB8>
{'Id': '001', 'PostTypeId': '2', 'ParentId': '277', 'CreationDate': '2015-01-01', 'OwenUserId': '123', 'Score': '3', 'Tags': 'test'}
{'Id': '111', 'PostTypeId': '2', 'ParentId': '266', 'CreationDate': '2015-01-01', 'OwenUserId': '321', 'Score': '1', 'Tags': 'No.2'}
{'Id': '100', 'PostTypeId': '1', 'ViewCount': '123456', 'AnswerCount': '654321', 'CreationDate': '2015-01-01', 'OwenUserId': '222', 'Score': '6', 'Tags': 'test123'}
{'name': 'Panama'}
[<Element 'row' at 0x000001CEECDF6868>, <Element 'row' at 0x000001CEECDF68B8>, <Element 'row' at 0x000001CEECDF6908>]
<Element 'row' at 0x000001CEECDF6868>
001 test
<Element 'row' at 0x000001CEECDF68B8>
111 No.2
<Element 'row' at 0x000001CEECDF6908>
100 test123