RSS--一个古老、小众的阅读方法

最新推荐文章于 2023-02-25 15:08:09 发布

CrazyDragon_King

最新推荐文章于 2023-02-25 15:08:09 发布

阅读量1.5k

点赞数 2

分类专栏： Java

本文链接：https://blog.csdn.net/qq_40734247/article/details/105416907

版权

Java 专栏收录该内容

86 篇文章 3 订阅

订阅专栏

RSS

RSS（Really Simple Syndication 简易信息聚合) ，它是一种基于XML标准，在互联网上被广泛采用的内容包装和投递协议。 现在很多的网络数据都使用JSON了，也可以看出来这是一个很古老的东西了。RSS/Atom源是基于XML的语义网内容,能够被客户端解析程序用做数据源。总之，可以简单理解为一种包含数据的xml文件。 这个东西现在已经没落了，我也是到了大学才偶然接触到它，后来试了一下（感觉很新奇、很有趣），但是也就是浅尝辄止。（因为需要自己去寻找数据源，相对来说还是比较麻烦的。）这个东西在以前大多数用来订阅博客的，但是现在随着移动互联网和web 2.0的发展，博客也不是像以前那样辉煌了。毕竟，相对于长文字或者配点图片，现在都流行很简短的那种博客了或者其他形式的服务。所以，RSS一直就是很小众的存在，平常人也接触不到了。

RSS 订阅

一般只要网站上含有下图所示的类似 WIFI 标志的，就是可以进行RSS订阅服务的。似乎没有RSS的，也可以通过其它方法制作RSS订阅，我没有使用过，应该也是很有帮助的。
在这里插入图片描述

RSS 的优点

比较集中： 把自己喜欢的网站文章或影音，集中在一个阅读器里阅读。不用一个一个网站的访问去阅读更新了。（大多数人都使用不止一个阅读源）
高效率： 更高阅读效率，它不是显示文章的全部，而是只显示文章的开头一部分，这样可以通过简短的查看，来决定是否阅读，而不用打开文章。如果感觉有兴趣，则可以打开文章进行阅读。
手动挑选： 自由选择文章来源，不用受机器推荐的影响。

RSS 格式

层次关系：rss -> channel -> item
包含一个根节点 rss，rss 包含一个 channel，channel 包含若干个 item。（前面是一些与相应数据源相关的信息，不关注也行。）


<?xml version="1.0" encoding="utf-8" ?>
<?xml-stylesheet type="text/xsl" title="XSL Formatting" href="/static_files/rss/rss.xsl" media="all" ?>
<rss version="2.0">
    <channel>
        <title>qq_40734247的博客</title>
        <image>
            <link>https://blog.csdn.net/</link>
            <url>https://csdnimg.cn/release/phoenix/static_blog/images/logo.gif</url>
        </image>
        <description></description>
        <link>https://blog.csdn.net/qq_40734247</link>
        <language>zh-cn</language>
        <generator>https://blog.csdn.net/</generator>
        <ttl>5</ttl>
        <copyright>
            <![CDATA[Copyright &copy; qq_40734247]]>
        </copyright>
        <pubDate>2020/04/09 18:28:27</pubDate>
        <item>
            <title>
                <![CDATA[[原]HttpClient的Fluent API]]>
            </title>
            <link>https://blog.csdn.net/qq_40734247/article/details/105249539</link>
            <guid>https://blog.csdn.net/qq_40734247/article/details/105249539</guid>
            <author>qq_40734247</author>
            <pubDate>2020/04/01 18:10:39</pubDate>
            <description>
                <![CDATA[
                    Fluent API
                        As of version of 4.2 HttpClient comes with an easy to use facade API based on the concept of a fluent interface. Fluent facade API exposes only the most fundamental functions of HttpClient...                    <div>
                        作者：qq_40734247 发表于 2020/04/01 18:10:39 <a href="https://blog.csdn.net/qq_40734247/article/details/105249539">原文链接</a> https://blog.csdn.net/qq_40734247/article/details/105249539                    </div><div>
                        阅读：31 评论：1 <a href="https://blog.csdn.net/qq_40734247/article/details/105249539#comments" target="_blank">查看评论</a></div>
                    ]]>
            </description>
            <category></category>
        </item>
    </channel>
</rss>

解析 RSS 文件

虽然这个东西现在没什么人用，但是对于喜欢阅读的朋友还是一个很不错的选择。现在好用的客户端和数据源都需要自己去寻找了，不过如果真的想尝试，应该也不是很难。因为它的格式是固定的，所谓的阅读器本质上也就是RSS解析器，只不过在其他方面也做得很好。这里我们就根据它得格式来解析一下，这个东西接触起来还是感觉蛮新奇的。

思路和代码

这里只是简单的解析xml文件，所以代码也很简单。这里我是首先请求 rss 源，获取响应的xml文本，然后获取里面的item节点，获取信息封装成对象，再解析就行了。

Maven 依赖

<dependencies>
	 <!-- https://mvnrepository.com/artifact/org.apache.httpcomponents/httpclient -->
	<dependency>
	    <groupId>org.apache.httpcomponents</groupId>
	    <artifactId>httpclient</artifactId>
	    <version>4.5.6</version>
	</dependency>
	
	 <!-- fluent API https://mvnrepository.com/artifact/org.apache.httpcomponents/fluent-hc -->
	<dependency>
	    <groupId>org.apache.httpcomponents</groupId>
	    <artifactId>fluent-hc</artifactId>
	    <version>4.5.6</version>
	</dependency>
	
	<!-- https://mvnrepository.com/artifact/dom4j/dom4j -->
	<dependency>
	    <groupId>dom4j</groupId>
	    <artifactId>dom4j</artifactId>
	    <version>1.6.1</version>
	</dependency>

</dependencies>

Article 类

这里这获取 title、link、pubDate和description四项信息。

package com.dragon.rss;

public class Article {
	
	private String title;   // 标题
	private String link;    //链接
	private String pubDate;  //发布日期
	private String description;  //描述信息
	
	public String getTitle() {
		return title;
	}
	public void setTitle(String title) {
		this.title = title;
	}
	public String getLink() {
		return link;
	}
	public void setLink(String link) {
		this.link = link;
	}
	
	public String getPubDate() {
		return pubDate;
	}
	public void setPubDate(String pubDate) {
		this.pubDate = pubDate;
	}
	public String getDescription() {
		return description;
	}
	public void setDescription(String description) {
		this.description = description;
	}
	
	@Override
	public String toString() {
		return "<h1>" + title + "</h1>\n"  +
	           "<a href=\""+ link + "\">" + title + "</a>" +
				"<p>" + pubDate + "</p>\n" +
				description + "\n";  //它本身似乎就是 html 片段
	}
}

RSS 类

RSS 类用于请求获取 rss 信息，解析信息，封装对象，拼接成html文本。

package com.dragon.rss;

import java.io.BufferedOutputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.charset.Charset;
import java.util.List;
import java.util.stream.Collectors;

import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.config.CookieSpecs;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.client.fluent.Executor;
import org.apache.http.client.fluent.Request;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.dom4j.Document;
import org.dom4j.DocumentException;
import org.dom4j.DocumentHelper;
import org.dom4j.Element;

/**
 * 定义一个 RSS 类，用于获取解析数据。
 * 
 * 主要步骤为：
 * 1、获取RSS的xml数据
 * 2、解析数据取出每一条数据的信息（封装成对象）
 * 3、显示在屏幕上面
 * */
public class RSS {
	private final int TIME_OUT = 15*1000;
	
	public void start(String url) {
		try {
			this.toFile(this.createHtml(this.resolveRSSXML(this.getRSSXML(url))));
		} catch (ClientProtocolException e) {
			e.printStackTrace();
		} catch (IOException e) {
			e.printStackTrace();
		}
	}
	
	
	/**
	 * @param url rss的订阅源链接
	 * @throws IOException 
	 * @throws ClientProtocolException 
	 * */
	public String getRSSXML(String url) throws ClientProtocolException, IOException {
	    // 消除 Invalid cookie header 异常。
		RequestConfig config = RequestConfig.custom()
				.setCookieSpec(CookieSpecs.STANDARD)
				.build();
		
		// 创建 HttpClient 客户端，但是这里不直接使用它，而是使用 Fluent APT 处理。
		CloseableHttpClient httpclient = HttpClients.custom()
				.setDefaultRequestConfig(config)
				.build();
		
		// 创建执行器
		Executor executor = Executor.newInstance(httpclient);
		//创建请求
		Request request = Request.Get(url)
				.socketTimeout(TIME_OUT)
				.connectTimeout(TIME_OUT);
				
		//返回数据转为字符串
		return executor.execute(request).returnContent().asString(Charset.forName("UTF-8"));
	}
	
	
	/**
	 * 解析出属性并封装成相应的对象
	 * item
	 * 层次关系：rss -> channel -> item 
	 * 通用的属性有：title、link、description
	 * */
	public List<Article> resolveRSSXML(String xml) {
		System.out.println(xml);
		try {
			Document doc = DocumentHelper.parseText(xml);
			Element root = doc.getRootElement(); //获取根节点
			Element channel = root.element("channel");   //获取 channel  我一开始没有注意 channel，导致什么都没有获取到。
			
			@SuppressWarnings("unchecked")
			List<Element> itemList = channel.elements("item");
			System.out.println("共有文章信息：" + itemList.size() + "条");
			return itemList.stream()
					.map(RSS::createArticle)
					.collect(Collectors.toList());
		} catch (DocumentException e) {
			e.printStackTrace();
		}
		return null;
	}
	
	
	//这个必须是静态的方法，实例方法的方法引用，有点麻烦，必须是第一个参数来调用
	private static Article createArticle(Element item) {
		Article article = new Article();
		Element title = item.element("title");
		Element link = item.element("link");
		Element description = item.element("description");
		Element pubDate = item.element("pubDate");
		
		article.setTitle(title.getTextTrim());
		article.setLink(link.getTextTrim());
		article.setPubDate(pubDate.getTextTrim());
		article.setDescription(description.getTextTrim());
		
		return article;
	}
	
	public String createHtml(List<Article> articleList) {
		String head = "<!DOCTYPE html>\r\n" + 
				"<html>\r\n" + 
				"	<head>\r\n" + 
				"		<meta charset=\"utf-8\"/>\r\n" + 
				"		<title>RSS订阅</title>\r\n" + 
				"	<head>\r\n" + 
				"	<body>";
		
		String tail = "<body>\r\n" + 
				"</html>";
		
		// 使用 java 8 收集器的字符串处理
		return articleList.stream()
			.map(article->article.toString())
			.collect(Collectors.joining("", head, tail));
	}
	
	public void toFile(String html) {
		String filename = System.currentTimeMillis() + ".html";
		try (BufferedOutputStream out = new BufferedOutputStream(new FileOutputStream("rss/" + filename))) {
			out.write(html.getBytes("UTF-8"));  //写入文件
		} catch (FileNotFoundException e) {
			e.printStackTrace();
		} catch (IOException e) {
			e.printStackTrace();
		}
	}
}

说明：
1.这里我使用 Fluent API，因为出现了 Invalid cookie header，这个异常。我找了解决办法，就是在创建 HttpClient 对象的时候设置 Cookie策略，所在只能使用 Fluent API 创建执行器（这样才能使用自己创建的 HttpClient 对象），所以显得比较繁琐了。还不如直接使用 HttpClient 方便了。但是，反正也是尝试，多试试总是好的。

2.我这里直接在控制台打印信息，不太方便显示。所以，我就想把信息转成 html 的形式，这里提供一个如下所示的模板 html。每一个item的信息（4项），放入三个标签中，第四项信息，似乎就是一个html片段，不需要单独的标签。使用html是因为description里面会有图片的链接，这样使用浏览器打开就可以直接显示了。

<!DOCTYPE html>
<html>
	<head>
		<meta charset="utf-8"/>
		<title>RSS订阅</title>
	<head>
	<body>
	<!-- 以下三个标签存放item的信息。 -->
		<h1><h1>
		<a/>
		<p></p>
	<body>
</html>

3.所以，注意 Article 类的 toString 方法，就是拼接生成相应的 html 片段。

createHtml 这个方法使用Java8的收集器进行处理，显得特别的方便。它省去了显示使用字符串拼接或者 StringBuilder 的繁琐代码。但是，你可能需要了解一些关于收集器的知识了。这里只要记住它的作用就行了：joining 方法的三个参数，作用分别是：字符串分隔符、头部、尾部。

因此该作用就是：每个Article 生成一个 html 片段，然后 head 作为字符串的头部，依次添加每一个 html 片段，每个片段直接不使用分隔符，最后凭借上尾部，就组成了一个完整的 html 字符串。

public String createHtml(List<Article> articleList) {
	String head = "<!DOCTYPE html>\r\n" + 
			"<html>\r\n" + 
			"	<head>\r\n" + 
			"		<meta charset=\"utf-8\"/>\r\n" + 
			"		<title>RSS订阅</title>\r\n" + 
			"	<head>\r\n" + 
			"	<body>";
	
	String tail = "<body>\r\n" + 
			"</html>";
	
	// 使用 java 8 收集器的字符串处理
	return articleList.stream()
		.map(article->article.toString())
		.collect(Collectors.joining("", head, tail));
}

RSSBootStrape 主类

package com.dragon.rss;

import java.io.IOException;

import org.apache.http.client.ClientProtocolException;

public class RSSBootStrape {
	public static void main(String[] args) throws ClientProtocolException, IOException {
		// http://zhihu.com/rss 知乎日报
		// https://blog.csdn.net/qq_40734247/rss/list csdn 某个博主的博客
		// http://blog.sina.com.cn/rss/1286528122.xml
		// https://www.williamlong.info/rss.xml 月光博客
		
		String rssUrl = "http://zhihu.com/rss";  //http://zhihu.com/rss
		RSS rss = new RSS();
		rss.start(rssUrl);
	}
}

运行结果

在项目目录创建一个 rss 文件夹，或者直接指定一个文件生成的绝对路径即可。运行之后，刷新文件夹，即可发现一个html文件。使用浏览器打开，就能看到我们想看的信息了。

在这里插入图片描述

知乎日报的rss订阅信息
在这里插入图片描述

CSDN 博客订阅信息
在这里插入图片描述

说明

这里我就是出于对RSS的好奇，所以去尝试去解析了一下，所以界面很丑陋。这就是去理解一下它的原理，还是很有趣的。现在也是有很多的RSS源可以订阅的，感兴趣的可以去找一个RSS阅读器尝试。（一个好的RSS阅读器很重要。）因为现在很多都是使用算法进行推荐，如果不喜欢让机器影响你的阅读的话，可以尝试一下RSS的方式。

CrazyDragon_King

关注

2
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
RSS--一个古老、小众的阅读方法

RSSRSS（Really Simple Syndication 简易信息聚合) ，它是一种基于XML标准，在互联网上被广泛采用的内容包装和投递协议。现在很多的网络数据都使用JSON了，也可以看出来这是一个很古老的东西了。RSS/Atom源是基于XML的语义网内容,能够被客户端解析程序用做数据源。总之，可以简单理解为一种包含数据的xml文件。这个东西现在已经没落了，我也是到了大学才偶然接触...
复制链接

扫一扫