Python-Networks and Sockets(课堂笔记整理)

Sober97

已于 2023-09-10 16:54:13 修改

阅读量313

点赞数

文章标签： python socket

于 2020-05-08 12:43:55 首次发布

本文链接：https://blog.csdn.net/Sober97/article/details/105993073

版权

利用socket读取网页信息

import socket

mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('data.pr4e.org', 80))
cmd = 'GET http://data.pr4e.org/intro-short.txt HTTP/1.0\r\n\r\n'.encode() #转换为Byte(UTF-8)
mysock.send(cmd)

while True:
    data = mysock.recv(512)
    if(len(data) < 1):
        break
    print(data.decode(), end = '') #转换为String(Unicode)

mysock.close()

得到的信息：

HTTP/1.1 200 OK
Date: Fri, 08 May 2020 04:37:08 GMT
Server: Apache/2.4.18 (Ubuntu)
Last-Modified: Sat, 13 May 2017 11:22:22 GMT
ETag: "1d3-54f6609240717"
Accept-Ranges: bytes
Content-Length: 467
Cache-Control: max-age=0, no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Connection: close
Content-Type: text/plain

Why should you learn to write programs?

Writing programs (or programming) is a very creative
and rewarding activity.  You can write programs for
many reasons, ranging from making your living to solving
a difficult data analysis problem to having fun to helping
someone else solve a problem.  This book assumes that
everyone needs to know how to program, and that once
you know how to program you will figure out what you want
to do with your newfound skills.

使用的原理：
encode以后再send，recv以后再decode
encode and decode
我们可以使用一个更简单的方式，使用urllib，不仅可以读文件，也可以读网页

import urllib.request, urllib.parse, urllib.error
fhand = urllib.request.urlopen('http://data.pr4e.org/romeo.txt') #类似于open a file
for line in fhand:
    print(line.decode().strip())
# But soft what light through yonder window breaks
# It is the east and Juliet is the sun
# Arise fair sun and kill the envious moon
# Who is already sick and pale with grief

import urllib.request, urllib.parse, urllib.error
fhand = urllib.request.urlopen('http://data.pr4e.org/romeo.txt') #类似于open a file
counts = dict()
for line in fhand:
    words = line.decode().split()
    for word in words:
        counts[word] = counts.get(word, 0) + 1 #如果这个word已经存在，返回它的数量，否则返回0
print(counts)
# {'But': 1, 'soft': 1, 'what': 1, 'light': 1, 'through': 1, 'yonder': 1, 'window': 1, 'breaks': 1, 'It': 1, 'is': 3, 'the': 3, 'east': 1, 'and': 3, 'Juliet': 1, 'sun': 2, 'Arise': 1, 'fair': 1, 'kill': 1, 'envious': 1, 'moon': 1, 'Who': 1, 'already': 1, 'sick': 1, 'pale': 1, 'with': 1, 'grief': 1}

import urllib.request, urllib.parse, urllib.error
fhand = urllib.request.urlopen('http://www.dr-chuck.com/page1.htm')
for line in fhand:
    print(line.decode().strip())
# <h1>The First Page</h1>
# <p>
# If you like, you can switch to the
# <a href="http://www.dr-chuck.com/page2.htm">
# Second Page</a>.
# </p>

简单使用beautifulsoup （它会把网页信息抽取出来形成树的结构）

import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup

url = input('Enter - ')
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser')

# Retrieve all of the anchor tags
tags = soup('a')
for tag in tags:
    print(tag.get('href', None))

# 可用网址：
# http://www.dr-chuck.com/page1.htm
# 输出结果：
# http://www.dr-chuck.com/page2.htm

总结：
Summary

Sober97

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
Python-Networks and Sockets(课堂笔记整理)

利用socket读取网页信息import socketmysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)mysock.connect(('data.pr4e.org', 80))cmd = 'GET http://data.pr4e.org/intro-short.txt HTTP/1.0\r\n\r\n'.encode(...
复制链接

扫一扫