python基础教程学习笔记十五

最新推荐文章于 2023-12-30 15:57:14 发布

retacn

最新推荐文章于 2023-12-30 15:57:14 发布

阅读量145

点赞数

文章标签： python web.xml java

Python 和万维网

1 屏幕抓取

使用urllib和re提取信息

from utllib import urlopen

import re

p=re.compile('<h3><a .*?><a .*? href="(.*?)">(.*?)</a>')

text=urlopen('http://python.org/community/jobs').read()

for url,name in p.findall(text):

print('%s (%s)' name,url)

Tidy和 XHTML解析

Tidy是用来修复不规范且随意的html的工具

#使用tidy修复html

form subprocess import Popen,PIPE

text=open('messy.html').read()

tidy=Popen('tidy',stdin=PIPE,stdout=PIPE,stderr=PIPE)

tidy=stdin.write(text)

tidy=stdin.close()

print(tidy.stdout.read())

使用HTMLParser来解析html文件

#使用htmlparser模块的屏幕抓取程序

from urllib import urlopen

from HTMLParser import HTMLParser

class Scraper(HTMLParser):

in_h3=False

in_link=False

def handle_starttag(self,tag,attrs):

attrs=dict(attrs)

if tag='h3':

self.in_h3=True

if tag='a' and 'href' in attrs:

self.in_link=True

self.chunks=[]

self.url=attrs['href']

def handle_data(self,data):

if self.in_link:

self.chunks.append(data)

def handle_endtag(self,tag):

if tag='h3':

self.in_h3=False

if tag='a':

if self.in_h3 and self.in_link:

print('%s (%s)' % (''.join(self.chunks),self.url))

self.in_link=False

text=urlopen('http://python.org/community/jobs').read()

parser=Sraper()

parser.feed(text)

parser.close()

Buautiful soup 用来解析和检查不规范的html

2 使用CGI创建动态网页

Common gateway interface 通用网关接口

A 准备网络服务器

B 加入pound bang行

Linux:

#!/usr/bin/env python或

#!/usr/bin/python

Windows:

#!c:\python32\python.exe

C 设置文件许可

在linux下需要进行设置,示例代码如下:

Chmod 755 someScript.cgi

简单的CGI 角本

#!D:\Python32\python.exe

print ('Content-type: text/html')

print() #打印空行

print('hello word!')

该程序在tomcat下测试,需要开启CGI,需要作如下修改

配置方法:

修改conf/web.xml,打开以下两个注释

<servlet>

<servlet-name>cgi</servlet-name>

<servlet-class>org.apache.catalina.servlets.CGIServlet</servlet-class>

<init-param>

<param-name>debug</param-name>

<param-value>0</param-value>

</init-param>

<init-param>

<param-name>cgiPathPrefix</param-name>

<param-value>WEB-INF/cgi</param-value>

</init-param>

<load-on-startup>5</load-on-startup>

</servlet>

<servlet-mapping>

<servlet-name>cgi</servlet-name>

<url-pattern>/cgi-bin/*</url-pattern>

</servlet-mapping>

修改conf/context.xml,添加privileged属性

<Context privileged="true">...</context>

将cgi程序放到WEB-INF/cgi目录中

如果是linux下,要使cgi程序有可执行权限

重启tomcate服务器

通过http://localhost:8089/cgi-bin/somescript.cgi来访问程序

使用cgitb调试

#!D:\Python32\python.exe

#使用cgitb进行调试,在程序开发完成后要关闭

import cgitb

cgitb.enable()

print ('Content-type: text/html')

print() #打印空行

print(1/0)

print('hello word!')

页面的显示结果为:

使用cgi模块

通过html表单提供给cgi键值对,cgi模块的fileStorage类从cgi角本中获取这些字段

Form=cgi.FieldStorage()

Name=form[‘name’].value

示例代码如下:

#!D:\Python32\python.exe

#使用cgitb进行调试,在程序开发完成后要关闭

import cgi

import cgitb

cgitb.enable()

#取得表单的值

form=cgi.FieldStorage()

name=form.getvalue('name','word')

print ('Content-type: text/html')

print() #打印空行

#print(1/0)

print('hello ,%s!' %name)

可以直接使用get方法进行测试

http://localhost:8089/cgi-bin/somescript.cgi?name=retacn

简单的表单

示例代码如下:

#!D:\Python32\python.exe

#表单

import cgi

form=cgi.FieldStorage()

name=form.getvalue('name','word')

print("""Content-type: text/html

<html>

<head>

<title>Greeting Page</title>

</head>

<body>

<h1>Hello,%s!</h1>

<form action='formTest.cgi'>

Change name<input type='text' name='name'>

<input type='submit'>

</form>

</body>

""" % name)

运行结果如下:

Mod_python

它是apache网络服务器的扩展,可以让python解释器成为apache的一部分

使用mod_python可以深入apache内核

自带的web处理程序:

CGI处理程序

Psp处理程序

Publisher handler发布处理程序

安装mod_python

Cgi处理程序

Psp

发布

网络应用程序框架

Web服务正确分析

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python基础教程学习笔记十五

Python 和万维网1 屏幕抓取使用urllib和re提取信息from utllib import urlopenimport rep=re.compile('&lt;h3&gt;&lt;a .*?&gt;&lt;a .*? href="(.*?)"&gt;(.*?)&lt;/a&gt;')text=urlopen('http://python.org/community...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。