file.read()的使用
概述
read() 方法用于从文件读取指定的字节数,如果未给定或为负则读取所有。
语法
read() 方法语法如下:
fileObject.read();
参数
-
size -- 从文件中读取的字节数。
返回值
返回从字符串中读取的字节。
实例
文件 runoob.txt 的内容如下:
1:www.runoob.com 2:www.runoob.com 3:www.runoob.com 4:www.runoob.com 5:www.runoob.com
循环读取文件的内容:
#!/usr/bin/python # -*- coding: UTF-8 -*- # 打开文件 fo = open("runoob.txt", "rw+") print "文件名为: ", fo.name line = fo.read(10) print "读取的字符串: %s" % (line) # 关闭文件 fo.close()
以上实例输出结果为:
文件名为: runoob.txt 读取的字符串: 1:www.runo
简单的读取一个网页
import urllib.request
response = urllib.request.urlopen('http://placekitten.com')
html = response.read()
print (html)
得到
b'<!DOCTYPE html>\n<!--[if lt IE 7 ]> <html lang="en" class="no-js ie6"> <![endif]-->\n<!--[if IE 7 ]> <html lang="en" class="no-js ie7"> <![endif]-->\n<!--[if IE 8 ]> <html lang="en" class="no-js ie8"> <![endif]-->\n<!--[if IE 9 ]> <html lang="en" class="no-js ie9"> <![endif]-->\n<!--[if (gt IE 9)|!(IE)]><!--> <html lang="en" class="no-js"> <!--<![endif]-->\n<head xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:og="http://ogp.me/ns#"><meta charset="utf-8">\n\n\t<!--\n\n\tBREAKING NEWS: KITTENS FKN LOVE YARN\n\n __ __,\n \\,`~"~` /\n .-=-. / . .\\\n / .-. \\ { = Y}=\n (_/ \\ \\ \\ / \n \\ \\ _/`\'`\'`b\n \\ `.__.-\'` \\-._\n | \'.__ `\'-;_\n | _.\' `\'-.__)\n \\ ;_..-`\'/ // \\\n | / / | // |\n \\ \\ \\__) \\ // /\n \\__) \'.// .\'\n `\'-\'`\n\n\t-->\n\n\t<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">\n\t<meta http-equiv="imagetoolbar" content="no">\n\n\t<!-- Standard Metadata -->\n\t<title>{placekitten}</title>\n\t<meta name="description" content="Kitten-themed placeholder images for developers :3">\n\t<meta name="author" content="Mark James (http://mark.james.name)">\n\t<meta name="keywords" content="kittens, placeholders, images, cats, felines, kittycats, hot pussy photos">\n\t<meta name="copyright" content="Mark James. Images © people on the internet.">\n\t<meta name="robots" content="all">\n\t<meta
html = html.decode('utf-8')
得到
<!DOCTYPE html>
<!--[if lt IE 7 ]> <html lang="en" class="no-js ie6"> <![endif]-->
<!--[if IE 7 ]> <html lang="en" class="no-js ie7"> <![endif]-->
<!--[if IE 8 ]> <html lang="en" class="no-js ie8"> <![endif]-->
<!--[if IE 9 ]> <html lang="en" class="no-js ie9"> <![endif]-->
<!--[if (gt IE 9)|!(IE)]><!--> <html lang="en" class="no-js"> <!--<![endif]-->
<head xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:og="http://ogp.me/ns#"><meta charset="utf-8">
<!--
BREAKING NEWS: KITTENS FKN LOVE YARN
__ __,
\,`~"~` /
.-=-. / . .\
/ .-. \ { = Y}=
(_/ \ \ \ /
\ \ _/`'`'`b
\ `.__.-'` \-._
| '.__ `'-;_
| _.' `'-.__)
\ ;_..-`'/ // \
| / / | // |
\ \ \__) \ // /
\__) '.// .'
`'-'`
-->
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<meta http-equiv="imagetoolbar" content="no">
<!-- Standard Metadata -->
<title>{placekitten}</title>
<meta name="description" content="Kitten-themed placeholder images for developers :3">
<meta name="author" content="Mark James (http://mark.james.name)">
<meta name="keywords" content="kittens, placeholders, images, cats, felines, kittycats, hot pussy photos">
<meta name="copyright" content="Mark James. Images © people on the internet.">
<meta name="robots" content="all">
<meta name="viewport" content="width=device-width; initial-scale=1.0;">
<!-- Dublin Core Metadata -->
<meta name="DC.identifier" content="http://example.com" />
<meta name="DC.creator" lang="en" content="Mark James">
<meta name="DC.date.created" lang="en" content="2011-02-26">
<meta name="DC.date.modifed" lang="en" content="2005-02-26">
<meta name="DC.format" lang="en" content="text/html">
<meta name="DC.language" content="en">
<meta name="DC.coverage" lang="en" content="UK">
<meta name="DC.publisher" lang="en" content="Mark James" />
<!-- Open Graph Metadata -->
<meta property="og:site_name" content="placekitten">
<meta property="og:title" content="placekitten">
<meta property="og:description" content="Kitten-themed placeholder images for developers :3">
<meta property="og:type" content="website">
<meta property="og:image" content="http://placekitten.com/100/100">
<meta property="og:url" content="http://placekitten.com">
<!-- Navigation -->
<link rel="home" href="/">
<!-- Icons -->
<link rel="shortcut icon" href="/favicon.ico">
<link rel="apple-touch-icon" href="/apple-touch-icon.png">
<!-- CSS -->
<link rel="stylesheet" href="assets/css/html5kit/style.css">
<link rel="stylesheet" href="assets/css/placekitten.css">
<!-- Analytics -->
<script>
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-21694863-1']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
</head>
<body class="page-homepage template-homepage section-homepage environment-live">
<div id="container">
<div id="primary-content" role="main">
<div id="unit-kitten-content" class="unit">
<header id="header">
<h1>placekitten</h1>
</header><!-- /#header -->
<p title="It fits a gap in the market. "><strong>A quick and simple service for getting pictures of kittens for use as placeholders in your designs or code.</strong> Just put your image size (width <span class="amp">&</span> height) after our URL and you'll get a placeholder.</p>
<p class="links">
Like this: <a href="/200/300">http://placekitten.com/200/300</a><br>
or: <a href="/g/200/300">http://placekitten.com/g/200/300</a>
</p>
<img id="image-0" src="http://placekitten.com.s3.amazonaws.com/homepage-samples/408/287.jpg" alt="">
</div>
<div id="unit-many-many-pictures-of-small-cats" class="unit">
<img id="image-1" src="http://placekitten.com.s3.amazonaws.com/homepage-samples/200/287.jpg" alt="">
<img id="image-2" src="http://placekitten.com.s3.amazonaws.com/homepage-samples/200/140.jpg" alt="">
<img id="image-3" src="http://placekitten.com.s3.amazonaws.com/homepage-samples/200/139.jpg" alt="">
<img id="image-4" src="http://placekitten.com.s3.amazonaws.com/homepage-samples/200/286.jpg" alt="">
<img id="image-5" src="http://placekitten.com.s3.amazonaws.com/homepage-samples/96/140.jpg" alt="">
<img id="image-6" src="http://placekitten.com.s3.amazonaws.com/homepage-samples/96/139.jpg" alt="">
<img id="image-7" src="http://placekitten.com.s3.amazonaws.com/homepage-samples/200/138.jpg" alt="">
</div>
</div><!-- /#primary-content -->
<footer id="footer" role="contentinfo">
<p>Delivered by <a href="http://mark.james.name">Mark James</a> <span class="amp">&</span> Photographers (<a href="attribution.html">Image attribution</a>) — <a href="http://placehold.it/">Inspired by placehold.it</a></p>
</footer><!-- /#footer -->
</div><!-- /#container -->
</body>
</html>
就比较整齐了。
urllib.request,urllib.error,urllib.parse,urlli.robotparser四个子模块,这里主要介绍urllib.request的一些简单用法.
首先是urlopen函数,用于打开一个URL:
urlopen返回一个类文件对象,可以像文件一样操作,同时支持一下三个方法:
- info():返回一个对象,表示远程服务器返回的头信息。
- getcode():返回Http状态码,如果是http请求,200表示请求成功完成;404表示网址未找到。
- geturl():返回请求的url地址。