php+snoopy 数据采集

PHP采集库
Snoopy是一个php类,用来模拟浏览器的功能,可以获取网页内容,发送表单。


Snoopy的特点:
1、抓取网页的内容 fetch
2、抓取网页的文本内容 (去除HTML标签) fetchtext
3、抓取网页的链接,表单 fetchlinks fetchform
4、支持代理主机
5、支持基本的用户名/密码验证
6、支持设置 user_agent, referer(来路), cookies 和 header content(头文件)
7、支持浏览器重定向,并能控制重定向深度
8、能把网页中的链接扩展成高质量的url(默认)
9、提交数据并且获取返回值
10、支持跟踪HTML框架
11、支持重定向的时候传递cookies


获取指定网页的内容
<?php
$url = "http://www.codesky.net"; 
include("lib/Snoopy.class.php"); 
$snoopy = new Snoopy; 
$snoopy->fetch($url); //获取所有内容 
$content=$snoopy->results; //显示结果 
// print_r($content);


//获取到所有的文本
$snoopy->fetchtext('http://tech.163.com/internet');//获取链接 
$text=$snoopy->results;
print_r($text);


//获取所有的链接
$snoopy->fetchlinks; //获取链接 
$link=$snoopy->results;
print_r($link); 
?>


接下来我们来伪装ip,伪装浏览器 伪装session 
注意: 这里伪装ip ,其实是伪装http头, 所以一般的通过 REMOTE_ADDR 获取的ip是伪装不了, 
反而那些通过http头来获取ip的(可以防止代理的那种) 就可以自己来制造ip。 






<?php
include("lib/Snoopy.class.php"); 
$snoopy = new Snoopy; 
$formvars["username"] = "admin"; 
$formvars["pwd"] = "admin"; 
$action = "http://www.codesky.net"; 


$snoopy->cookies["PHPSESSID"] = 'fc106b1918bd522cc863f36890e6fff7'; //伪装sessionid 
$snoopy->agent = "(compatible; MSIE 4.01; MSN 2.5; AOL 4.0; Windows 98)"; //伪装浏览器 
$snoopy->referer = "http://s.jb51.net"; //伪装来源页地址 http_referer 
$snoopy->rawheaders["Pragma"] = "no-cache"; //cache 的http头信息 
$snoopy->rawheaders["X_FORWARDED_FOR"] = "127.0.0.101"; //伪装ip 
$snoopy->submit($action,$formvars); 
echo $snoopy->results; 
?> 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
import java.util.List; import java.util.ArrayList; public class paixubiancheng3 { public static void main(String[] args) { StudentClass sClass = new StudentClass(); sClass.createClass(); System.out.println("Original Order:"); System.out.println(sClass.output()); sClass.sort(); System.out.println("Sorted Order:"); System.out.println(sClass.output()); } } class Student{ private String name; private double grade; private int age; public Student(String name,int age,double grade){ this.name = name; this.age = age; this.grade = grade; } public void setname(){ this.name = name; } public void setgrade(){ this.grade = grade; } public void setage(){ this.age = age; } public String getname(){ return name; } public double getgrade(){ return grade; } public int getage(){ return age; } } class StudentClass{ private int size; private List<Student> stuList; public StudentClass(){ size = 0; stuList = null; } public void createClass(){ String names[] = {"Tom","Jerry","Snoopy","Mary","Rose"}; double grades[] = {67,78.5,98,76.5,90}; int ages[] = {17,18,18,19,17}; size = names.length; stuList = new ArrayList<Student>(); Student temp; for (int i = 0; i<size ;i++ ) { temp = new Student(names[i],ages[i],grades[i]); stuList.add(temp); } } public void sort(){ Student temp; for(int i=0;i<size;i++){ for(int j=1;j<size;j++){ if(stuList.get(j-1).getage()>stuList.get(j).getage()){ temp=stuList.get(j-1); stuList.set(j-1,stuList.get(j)); stuList.set(j,temp); } else if (stuList.get(j-1).getage()=stuList.get(j).getage()){ if(stuList.get(j-1).getname().compareTo(stuList.get(j).getname())>0){ temp=stuList.get(j-1); stuList.set(j-1,stuList.get(j)); stuList.set(j,temp); } } else{} } } } public String output(){ StringBuilder studentInfo = new StringBuilder(); for(Student stu : stuList){ studentInfo.append("Age: "+stu.getage()+"\tName: "+stu.getname()+"\r\n"); } studentInfo.append("total: "+size+" students\n"); return studentInfo.toString(); } public void add(Student s){ stuList.add(s); size = stuList.size(); } }这段代码有错吗?如果有请修改
最新发布
05-23
有几处错误,以下是修改后的代码: ``` import java.util.List; import java.util.ArrayList; public class PaixuBiancheng3 { public static void main(String[] args) { StudentClass sClass = new StudentClass(); sClass.createClass(); System.out.println("Original Order:"); System.out.println(sClass.output()); sClass.sort(); System.out.println("Sorted Order:"); System.out.println(sClass.output()); } } class Student { private String name; private double grade; private int age; public Student(String name, int age, double grade){ this.name = name; this.age = age; this.grade = grade; } public void setName(String name){ this.name = name; } public void setGrade(double grade){ this.grade = grade; } public void setAge(int age){ this.age = age; } public String getName(){ return name; } public double getGrade(){ return grade; } public int getAge(){ return age; } } class StudentClass { private int size; private List<Student> stuList; public StudentClass(){ size = 0; stuList = new ArrayList<Student>(); } public void createClass(){ String names[] = {"Tom","Jerry","Snoopy","Mary","Rose"}; double grades[] = {67,78.5,98,76.5,90}; int ages[] = {17,18,18,19,17}; size = names.length; Student temp; for (int i = 0; i<size ;i++ ) { temp = new Student(names[i],ages[i],grades[i]); stuList.add(temp); } } public void sort(){ Student temp; for(int i=0;i<size;i++){ for(int j=1;j<size-i;j++){ if(stuList.get(j-1).getAge()>stuList.get(j).getAge()){ temp=stuList.get(j-1); stuList.set(j-1,stuList.get(j)); stuList.set(j,temp); } else if (stuList.get(j-1).getAge()==stuList.get(j).getAge() && stuList.get(j-1).getName().compareTo(stuList.get(j).getName())>0){ temp=stuList.get(j-1); stuList.set(j-1,stuList.get(j)); stuList.set(j,temp); } } } } public String output(){ StringBuilder studentInfo = new StringBuilder(); for(Student stu : stuList){ studentInfo.append("Age: "+stu.getAge()+"\tName: "+stu.getName()+"\r\n"); } studentInfo.append("total: "+size+" students\n"); return studentInfo.toString(); } public void add(Student s){ stuList.add(s); size = stuList.size(); } } ``` 主要的修改有: 1. 类名 PaixuBiancheng3 首字母应该大写。 2. Student 类中的 setter 方法没有传入参数。 3. StudentClass 类中创建 List 对象时没有初始化,应该改为 `stuList = new ArrayList<Student>();`。 4. sort 方法中的比较条件有误,应该使用 `>` 或 `<` 进行比较。 5. sort 方法中比较年龄相同的情况下,应该再比较姓名的字典序。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值