Java抓取网页动态发送到邮箱案例(springboot)

 作为互联网公司,公司销售需要接项目,需要经常注意金采网动态,但是又常常错过。然后想着用python或者java抓取当天的招标项目,然后以邮件的形式发送到邮箱。

刚开始听到这个需求是一脸懵逼的,python,妈的刚装上软件,鬼知道怎么写,然后就在网上搜现成的前辈经验。可惜对爬虫

术一窍不通短时间内内是搞不出来的。最后还是拿起老本行Java敲代码吧。

项目需求:抓取http://www.cfcpn.com/plist/caigou和http://www.cfcpn.com/plist/zhengji每天发布的公告,然后以邮件

的形式发送到指定的多个用户邮箱


邮箱效果图:蓝色字体含有超链接可直接访问网站中相关文章。


想要爬网页动态发送到邮箱有两个过程:

1、从网页获取想要的信息

2、把想要的信息发送到邮箱

下面这两篇文章给了完成功能的曙光。

如何java写/实现网络爬虫抓取网页
https://jingyan.baidu.com/album/2c8c281db5f6970009252a60.html?picindex=7

java发送邮件(qq邮箱)
https://jingyan.baidu.com/album/c910274bb41859cd361d2d03.html?picindex=4

下面开始上代码了,整个项目以springboot为框架进行编写的。


详细代码:

1.pom.xml 文件

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>
	<groupId>com.feeling.listener</groupId>
	<artifactId>catchdemo</artifactId>
	<version>0.0.1-SNAPSHOT</version>
	<packaging>jar</packaging>
	<properties>
		<spring.version>4.3.9.RELEASE</spring.version>
		<slf4j.version>1.7.12</slf4j.version>
		<log4j.version>1.2.14</log4j.version>
		<commons.logging.version>1.1.1</commons.logging.version>
		<commons.pool.version>1.6</commons.pool.version>
		<logback.logstash.version>4.9</logback.logstash.version>
	</properties>

	<!-- springboot必须的jar包 -->
	<parent>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-parent</artifactId>
		<version>1.4.5.RELEASE</version>
		<relativePath /> <!-- lookup parent from repository -->
	</parent>


	<dependencies>
		
		<!-- https://mvnrepository.com/artifact/commons-httpclient/commons-httpclient -->
		<dependency>
			<groupId>commons-httpclient</groupId>
			<artifactId>commons-httpclient</artifactId>
			<version>3.1</version>
		</dependency>
		<!--Javamail qq发送邮件的以阿里 -->
		<dependency>
			<groupId>javax.activation</groupId>
			<artifactId>activation</artifactId>
			<version>1.1</version>
		</dependency>
		<dependency>
			<groupId>javax.mail</groupId>
			<artifactId>mail</artifactId>
			<version>1.4</version>
		</dependency>
		<!-- JSOUP从⽂文件中加载网页 -->
		<dependency>
			<groupId>org.jsoup</groupId>
			<artifactId>jsoup</artifactId>
			<version>1.7.3</version>
		</dependency>

		<!-- springboot必须的jar包 -->
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-test</artifactId>
			<scope>test</scope>
		</dependency>
		
		<!-- springboot web 加载的jar包 -->
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-web</artifactId>
		</dependency>
		
		
		<!--在基础IOC功能上提供扩展服务,此外还提供许多企业级服务的支持,有邮件服务、任务调度、JNDI定位,EJB集成、远程访问、缓存以及多种视图层框架的支持。 -->
		<dependency>
			<groupId>org.springframework</groupId>
			<artifactId>spring-context</artifactId>
		</dependency>
		<!--Spring的核心工具包 -->
		<dependency>
			<groupId>org.springframework</groupId>
			<artifactId>spring-core</artifactId>
		</dependency>
		<!--Spring IOC的基础实现,包含访问配置文件、创建和管理bean等。 -->
		<dependency>
			<groupId>org.springframework</groupId>
			<artifactId>spring-beans</artifactId>
		</dependency>
		
		<!-- Spring context的扩展支持,用于MVC方面 -->
		<dependency>
			<groupId>org.springframework</groupId>
			<artifactId>spring-context-support</artifactId>
		</dependency>
		
		
		
		
		<dependency>
			<groupId>org.slf4j</groupId>
			<artifactId>slf4j-api</artifactId>
		</dependency>
		
		
		<dependency>
			<groupId>commons-fileupload</groupId>
			<artifactId>commons-fileupload</artifactId>
			<version>1.3.1</version>
		</dependency>
		<dependency>
			<groupId>org.apache.httpcomponents</groupId>
			<artifactId>httpclient</artifactId>
		</dependency>
		
		
		
		
		
		
	</dependencies>
	<build>
		<plugins>
			<!-- 自带jdk配置 -->
			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-compiler-plugin</artifactId>
				<configuration>
					<version>2.5.1</version>
					<source>1.7</source>
					<target>1.7</target>
				</configuration>
			</plugin>
			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-jar-plugin</artifactId>
				<configuration>
					<archive>
						<addMavenDescriptor>false</addMavenDescriptor>
						<manifest>
							<addClasspath>true</addClasspath>
							<classpathPrefix>lib/</classpathPrefix>
							<mainClass>com.feeling.mail.CatchDemoApplication</mainClass>
						</manifest>
					</archive>
				</configuration>
			</plugin>
			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-dependency-plugin</artifactId>
				<executions>
					<execution>
						<id>copy</id>
						<phase>package</phase>
						<goals>
							<goal>copy-dependencies</goal>
						</goals>
						<configuration>
							<outputDirectory>${project.build.directory}/lib</outputDirectory>
						</configuration>
					</execution>
				</executions>
			</plugin>
		</plugins>
		<!-- 如果不使用resource插件的话,默认情况下,打包jar包不会把webapp下的东西打包进来 ,参考http://blog.csdn.net/u012849872/article/details/51035938 -->
		<resources>
			<!-- 打包时将jsp文件拷贝到META-INF目录下 -->
			<resource>
				<!-- 指定resources插件处理哪个目录下的资源文件 -->
				<directory>src/main/webapp</directory>
				<!--将项目中的src/main/webapp目录下的内容打包到了META-INF/resources路径下 -->
				<targetPath>META-INF/resources</targetPath>
				<includes>
					<include>**/**</include>
				</includes>
			</resource>
			<resource>
				<directory>src/main/resources</directory>
				<includes>
					<include>**/**</include>
				</includes>
				<filtering>false</filtering>
			</resource>
		</resources>


	</build>

</project>
2.params.properties本文件需要参数需要自己填写

email.host=smtp.qq.com//固定
email.sendMail= //发送邮箱
email.sendPassword=    //第二篇文章开启qq邮箱服务的生成的字符串,本项目中不不要qq密码和邮箱真正的密码
email.receiveMail=a.@qq.com,b.@qq.com//多个收件人以逗号间隔3.
3.CatchDemoApplication.java springboot启动类

本项目为配置数据库,所以必须注解,不然启动时会报错

@SpringBootApplication(exclude = {DataSourceAutoConfiguration.class,DataSourceTransactionManagerAutoConfiguration.class,HibernateJpaAutoConfiguration.class})

package com.feeling.mail;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.boot.autoconfigure.jdbc.DataSourceAutoConfiguration;
import org.springframework.boot.autoconfigure.jdbc.DataSourceTransactionManagerAutoConfiguration;
import org.springframework.boot.autoconfigure.orm.jpa.HibernateJpaAutoConfiguration;
import org.springframework.context.annotation.ComponentScan;
import org.springframework.scheduling.annotation.EnableScheduling;


@EnableScheduling//定时器
@ComponentScan(basePackages={"com.feeling.mail.*"})
@SpringBootApplication(exclude = {DataSourceAutoConfiguration.class,DataSourceTransactionManagerAutoConfiguration.class,HibernateJpaAutoConfiguration.class})
public class CatchDemoApplication {
	
	public static void main(String[] args) {
		SpringApplication.run(CatchDemoApplication.class, args);
	}
	
	

}

4.CatchSchdule.java定时器

package com.feeling.mail.batch;

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.stereotype.Component;
import com.feeling.mail.controller.CatchNews;

@Component
public class CatchSchdule {
	
	
	@Autowired
	private CatchNews catchnews;
	
	
//	@Scheduled(cron = "0 23 15 * * ?") 
	@Scheduled(cron = "0 0/3 * * * ?") 
	public void CatchNewToSendMail() throws Exception {
		//1、采购公告
		catchnews.catchNews("http://www.cfcpn.com/plist/caigou","采购公告","D:\\caigou.html");
		
			
		//2、寻源/征集公告
		catchnews.catchNews("http://www.cfcpn.com/plist/zhengji","寻源/征集公告","D:\\zhengji.html");
		
	}
	
	

}


5.CatchNews.java 抓取网页当天发布的项目信息,这步很重要,如果你自己要爬你的网页动态这部分需要改代码了。

代码中有时间的判断,当天发布的的才拼装。

package com.feeling.mail.controller;

import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.io.PrintWriter;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Scanner;

import org.apache.http.HttpEntity;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;


/**
 * 
 * @author liangwenbo catchNews 方法被定时器掉用抓捕新闻然后进行发送到email
 */
@Component
public class CatchNews {

	
	@Autowired
	private SendMail sendmail;

	
	
	public void catchNews(String url, String title, String fileName) throws Exception {
		// 1、采取金采网采购公告主页面内容读出为html文件
		// 两个参数,第一个参数是要获取的网页地址,第二个参数是文件的名称及路径
		getHtml(url, fileName);
		// 2、获取每条动态<a></a>标签的href和主题
		File input = new File(fileName);
		// 成功解析,后面的网址可以不填写
		Document doc = Jsoup.parse(input, "UTF-8", url);
		Date now = new Date();
		String nowDate = dateToString(now);
		String context = "";
		
		// 邮件内容,标题,时间,网址,内容简要说明,
		Elements contents = doc.select(".cfcpn_list_content");

		for (Element content : contents) {
			
			
			// 遍历所有的动态
			// 获取时间
			String dateDate = content.select(".cfcpn_list_date").first().text();
			String date = dateDate.substring(5, 15);
			//当为当天的数据时
			if (nowDate.equals(date)) {
				context = context +  content.toString()+"<br /><br />";
			
			}

		}
		System.out.println(context);

		
		// 4、把标题内容当做邮件的标题内容发送到指定邮箱
		
		sendmail.sendMail(title+nowDate, context.replace("/front/newcontext", "http://www.cfcpn.com/front/newcontext"));
	
		// 找到地址去数据库查询,如果查不到相同的url,就进行发送邮件并存储在数据库,如果查到相同的,则不在数据库做任何操作
		// 查找更精确的

	}

	private static void getHtml(String HttpUrl, String filename) throws ClientProtocolException, IOException {

		CloseableHttpClient httpClient = HttpClients.createDefault();
		HttpGet httpGet = new HttpGet(HttpUrl);
		CloseableHttpResponse response = httpClient.execute(httpGet);
		HttpEntity entity = response.getEntity();
		// 2、通过Entity获取到InputStream对象,然后对返回内容进⾏行处理
		InputStream is = entity.getContent();
		Scanner sc = new Scanner(is);
		PrintWriter os = new PrintWriter(filename);
		while (sc.hasNext()) {
			os.write(sc.nextLine());
		}
		os.close();
		sc.close();

	}
	
	public  String dateToString(Date date){
		SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd");
		return sdf.format(date);
	}

}
6.SendMail.java 发送邮件

package com.feeling.mail.controller;

import java.util.Date;
import java.util.Properties;

import javax.mail.Message;
import javax.mail.Message.RecipientType;
import javax.mail.Session;
import javax.mail.Transport;
import javax.mail.internet.InternetAddress;
import javax.mail.internet.MimeMessage;

import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.PropertySource;
import org.springframework.stereotype.Component;


/**
 * java发送邮件发送
 */
@Component
@PropertySource(value = "classpath:params.properties")
public class SendMail {

	@Value("${email.host}")
	private String host;
	@Value("${email.sendMail}")
	private String sendMail;
	@Value("${email.sendPassword}")
	private String sendPassword;
	@Value("${email.receiveMail}")
	private String receiveMail;
	
	
	

	public void sendMail(String title, String context) throws Exception {
		// 邮件的参数设置
		Properties props = new Properties();
		props.setProperty("mail.transport.protocol", "smtp");
		props.setProperty("mail.smtp.host", host);
		props.setProperty("mail.smtp.auth", "true");
		props.setProperty("mail.smtp.socketFactory.class", "javax.net.ssl.SSLSocketFactory");
		props.setProperty("mail.smtp.port", "465");
		props.setProperty("mail.smtp.socketFactory.port", "465");

		// 根据配置创建会话对象,用于邮件和服务器交互
		Session session = Session.getDefaultInstance(props);
		session.setDebug(true);// 设置为debug模式,可以查看详细的发送日志

		
		
		// 创建一封邮件
		Message message = createMineMessage(session,title,context);
		// 根据Session获取邮件的传输对象
		Transport transport = session.getTransport();
		// 使用邮箱账号 和密码连接服务器,这里认证的邮箱必须和Message中发件人邮箱一致,否则会报错
		transport.connect(sendMail, sendPassword);
		// 发送邮件
		transport.sendMessage(message, message.getAllRecipients());
		// 关闭连接
		transport.close();

		

	}

	private Message createMineMessage(Session session,  String title, String context) throws Exception {

		Message message = new MimeMessage(session);
		
		// 设置昵称
		String nick=javax.mail.internet.MimeUtility.encodeText("金采网");
				
		
		// 发送地址
		message.setFrom(new InternetAddress(nick+"<"+sendMail+">"));
		// 接收地址//可以写多个发送人
		 // 多个收件人  
		message.setRecipients(RecipientType.TO, InternetAddress.parse(receiveMail));
		// 设置邮件标题
		message.setSubject(title);
		// 设置邮件内容,以html的方式发送
		message.setContent(context,"text/html;charset=utf-8");


		message.setSentDate(new Date());
		// 保存设置
		message.saveChanges();
		return message;
	}

}

下面是怎么打包发布部署项目到linux服务器:

pom.xml需要指定启动类



然后参考:https://www.cnblogs.com/larryzeal/p/6253356.html

http://www.linuxidc.com/Linux/2013-06/86588.htm

打包项目

进入到该项目的工作空间的项目所在路径

我的是E:\eclipse_workspace\apollo\catchdemo\target


建立个demo的文件夹把


蓝色的放进去。

放入linux数据库,cd 进入该目录。输入

java -jar catchdemo-0.0.1-SNAPSHOT.jar   

就自己启动了。


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值