HTTP全名为xx transfer protocol
[N-COUNT] protocol is a set of rules for exchanging information between computers.
即一种传输协议,是我们在交换信息时要遵守的规则
一般的网址可以这样理解
1.常用TCP接口
2.python代码
第一行引入socket,第二行创建socket,第三行连接,紫色为目标,绿色为接口
********第三行很有可能崩溃,因为连不上etc
3.课后作业:PY4E - Python for Everybody
作业1
import socket
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('data.pr4e.org', 80))
cmd = 'GET http://data.pr4e.org/intro-short.txt HTTP/1.0\r\n\r\n'.encode()
mysock.send(cmd)
while True:
data = mysock.recv(512)
if len(data) < 1:
break
print(data.decode(),end='')
mysock.close()
作业2:
Scraping Numbers from HTML using BeautifulSoup In this assignment you will write a Python program similar to http://www.py4e.com/code3/urllink2.py. The program will use urllib to read the HTML from the data files below, and parse the data, extracting numbers and compute the sum of the numbers in the file.
# To run this, download the BeautifulSoup zip file
# http://www.py4e.com/code3/bs4.zip
# and unzip it in the same directory as this file
from urllib.request import urlopen
from bs4 import BeautifulSoup
import ssl
import re
# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
#url = input('Enter - ')
url = "http://py4e-data.dr-chuck.net/comments_1452627.html"
html = urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, "html.parser")
# Retrieve all of the anchor tags
tags = soup()
add = 0
for tag in tags:
num = re.findall('^<span class="comments">([0-9]+)',str(tag.contents[0]))
for i in num:
if int(i)!=0:
add=add+int(i)
print(add)