Python_urllib.urlopen

最新推荐文章于 2024-03-11 22:34:46 发布

emmmmmmT

最新推荐文章于 2024-03-11 22:34:46 发布

阅读量428

点赞数

本文链接：https://blog.csdn.net/qwer1203355251/article/details/50802134

版权

file.read()的使用

概述

read() 方法用于从文件读取指定的字节数，如果未给定或为负则读取所有。

语法

read() 方法语法如下：

fileObject.read();

参数

size -- 从文件中读取的字节数。

返回值

返回从字符串中读取的字节。

实例

文件 runoob.txt 的内容如下：

1:www.runoob.com
2:www.runoob.com
3:www.runoob.com
4:www.runoob.com
5:www.runoob.com

循环读取文件的内容：

#!/usr/bin/python
# -*- coding: UTF-8 -*-

# 打开文件
fo = open("runoob.txt", "rw+")
print "文件名为: ", fo.name

line = fo.read(10)
print "读取的字符串: %s" % (line)

# 关闭文件
fo.close()

以上实例输出结果为：

文件名为:  runoob.txt
读取的字符串: 1:www.runo

简单的读取一个网页

import urllib.request

response = urllib.request.urlopen('http://placekitten.com')
html = response.read()
print (html)

得到

b'<!DOCTYPE html>\n\n\n\n\n <html lang="en" class="no-js"> \n<head xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:og="http://ogp.me/ns#"><meta charset="utf-8">\n\n\t\n\n\t<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">\n\t<meta http-equiv="imagetoolbar" content="no">\n\n\t\n\t<title>{placekitten}</title>\n\t<meta name="description" content="Kitten-themed placeholder images for developers :3">\n\t<meta name="author" content="Mark James (http://mark.james.name)">\n\t<meta name="keywords" content="kittens, placeholders, images, cats, felines, kittycats, hot pussy photos">\n\t<meta name="copyright" content="Mark James. Images © people on the internet.">\n\t<meta name="robots" content="all">\n\t<meta

加上这一句

html = html.decode('utf-8')

得到

<!DOCTYPE html>




 <html lang="en" class="no-js"> 
<head xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:og="http://ogp.me/ns#"><meta charset="utf-8">

   

   <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
   <meta http-equiv="imagetoolbar" content="no">

   
   <title>{placekitten}</title>
   <meta name="description" content="Kitten-themed placeholder images for developers :3">
   <meta name="author" content="Mark James (http://mark.james.name)">
   <meta name="keywords" content="kittens, placeholders, images, cats, felines, kittycats, hot pussy photos">
   <meta name="copyright" content="Mark James. Images © people on the internet.">
   <meta name="robots" content="all">
   <meta name="viewport" content="width=device-width; initial-scale=1.0;">

   
   <meta name="DC.identifier" content="http://example.com" />
   <meta name="DC.creator" lang="en" content="Mark James">
   <meta name="DC.date.created" lang="en" content="2011-02-26">
   <meta name="DC.date.modifed" lang="en" content="2005-02-26">
   <meta name="DC.format" lang="en" content="text/html">
   <meta name="DC.language" content="en">
   <meta name="DC.coverage" lang="en" content="UK">
   <meta name="DC.publisher" lang="en" content="Mark James" />

   
   <meta property="og:site_name" content="placekitten">
   <meta property="og:title" content="placekitten">
   <meta property="og:description" content="Kitten-themed placeholder images for developers :3">
   <meta property="og:type" content="website">
   <meta property="og:image" content="http://placekitten.com/100/100">
   <meta property="og:url" content="http://placekitten.com">

   
   <link rel="home" href="/">

   
   <link rel="shortcut icon" href="/favicon.ico">
   <link rel="apple-touch-icon" href="/apple-touch-icon.png">

   
   <link rel="stylesheet" href="assets/css/html5kit/style.css">
   <link rel="stylesheet" href="assets/css/placekitten.css">

   
   <script>
       var _gaq = _gaq || [];
       _gaq.push(['_setAccount', 'UA-21694863-1']);
       _gaq.push(['_trackPageview']);
       (function() {
           var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
           ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
           var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
       })();
   </script>

</head>
<body class="page-homepage template-homepage section-homepage environment-live">

   <div id="container">

       <div id="primary-content" role="main">

           <div id="unit-kitten-content" class="unit">

               <header id="header">
                   <h1>placekitten</h1>
               </header>

               <p title="It fits a gap in the market. "><strong>A quick and simple service for getting pictures of kittens for use as placeholders in your designs or code.</strong> Just put your image size (width <span class="amp">&</span> height) after our URL and you'll get a placeholder.</p>

               <p class="links">
                   Like this: <a href="/200/300">http://placekitten.com/200/300</a><br>
                   or: <a href="/g/200/300">http://placekitten.com/g/200/300</a>
               </p>

               <img id="image-0" src="http://placekitten.com.s3.amazonaws.com/homepage-samples/408/287.jpg" alt="">

           </div>

           <div id="unit-many-many-pictures-of-small-cats" class="unit">
               <img id="image-1" src="http://placekitten.com.s3.amazonaws.com/homepage-samples/200/287.jpg" alt="">
               <img id="image-2" src="http://placekitten.com.s3.amazonaws.com/homepage-samples/200/140.jpg" alt="">
               <img id="image-3" src="http://placekitten.com.s3.amazonaws.com/homepage-samples/200/139.jpg" alt="">
               <img id="image-4" src="http://placekitten.com.s3.amazonaws.com/homepage-samples/200/286.jpg" alt="">
               <img id="image-5" src="http://placekitten.com.s3.amazonaws.com/homepage-samples/96/140.jpg" alt="">
               <img id="image-6" src="http://placekitten.com.s3.amazonaws.com/homepage-samples/96/139.jpg" alt="">
               <img id="image-7" src="http://placekitten.com.s3.amazonaws.com/homepage-samples/200/138.jpg" alt="">
           </div>

       </div>

       <footer id="footer" role="contentinfo">
           <p>Delivered by <a href="http://mark.james.name">Mark James</a> <span class="amp">&</span> Photographers (<a href="attribution.html">Image attribution</a>) — <a href="http://placehold.it/">Inspired by placehold.it</a></p>
       </footer>

   </div>

</body>
</html>

就比较整齐了。

urllib.request,urllib.error,urllib.parse,urlli.robotparser四个子模块,这里主要介绍urllib.request的一些简单用法.

首先是urlopen函数,用于打开一个URL:

urlopen返回一个类文件对象,可以像文件一样操作,同时支持一下三个方法:

info()：返回一个对象，表示远程服务器返回的头信息。
getcode()：返回Http状态码，如果是http请求，200表示请求成功完成;404表示网址未找到。
geturl()：返回请求的url地址。

emmmmmmT

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Python_urllib.urlopen

file.read()的使用概述read() 方法用于从文件读取指定的字节数，如果未给定或为负则读取所有。语法read() 方法语法如下：fileObject.read(); 参数size -- 从文件中读取的字节数。返回值返回从字符串中读取的字节。实例文件 runoob.txt 的内容如下：1:www.runoob.com2:
复制链接

扫一扫