最近想写一下开源中国的客户端,但是不想自己造数据,才发现有jsoup这么好用的东西。使用jsoup,你在网站上能看到的任何东西都可以解析出来。jsoup是一个解析网页源码的开源库,他能按照给定的规则提取出一个网页中的任意元素,和其他网页解析库不同的是,他提取网页内容的方式和css、jquery的选择器非常相似。
我们看一下网页中的资讯和最终的实现效果(网页截图比手机截图晚了两个小时凑合看吧 0.0):
jsoup规则其实并不难:
打开开源中国新闻资讯的网页,右键查看源代码,找到资讯相关的<div>。
<div class="panel" id='RecentNewsList'>
<h3 class='tabs'>
<ul>
<li><a href="/news/list" class='active'>全部资讯</a></li>
<li><a href="/news/list?show=industry">综合资讯</a></li>
<li><a href="/news/list?show=project">软件更新资讯</a></li>
</ul>
</h3>
<ul class='List'>
<li>
<h2><a href="/news/72054/mybatis-spring-boot-1-0-2" target="_blank">Mybatis Spring Boot 1.0.2 发布</a></h2>
<p class='date'><a href="http://my.oschina.net/u/2305107">淡漠悠然</a> 发布于 1小时前 - 3评</p>
<p class='detail'>Mybatis Spring Boot 1.0.2 发布了,Mybatis Spring Boot 是 MyBatis 和 Spring Boot 的集成。 暂无相关改进说明,查看项目提交记录,可点击这里。 下载: Source code (zip) Source code (tar.gz)...</p>
<p class='more'><a href="/news/72054/mybatis-spring-boot-1-0-2" target="_blank" class='more'>显示全文</a></p>
</li>
<li>
<a href="/news/72053/windows-10" target="_blank" class='img'><img src="/img/logo/windows.gif?t=1451964198000" border='0'/></a>
<h2><a href="/news/72053/windows-10" target="_blank">Win10 年度最重大更新:代码、理想与爱</a></h2>
<p class='date'><a href="http://my.oschina.net/osadmin">oschina</a> 发布于 5小时前 - 29评</p>
<p class='detail'>微软 Build 2016 大会,让围绕 Windows 10 的软件开发变得格外瞩目。 在这次,微软依旧在开发者大会上带来了 Windows 10、平台软件开发、HoloLens、人工智能方面的技术进展,当然最让普通消费者关心的仍然是 Wind...</p>
<p class='newsImg'><a title="Win10 年度最重大更新:代码、理想与爱" href="/news/72053/windows-10" target="_blank"><img alt="...." src="http://static.oschina.net/uploads/space/2016/0331/115526_bt3I_1774694.png"></a></p>
<p class='more'><a href="/news/72053/windows-10" target="_blank" class='more'>显示全文</a></p>
</li><span style="font-family: 'lucida grande', 'lucida sans unicode', lucida, helvetica, 'Hiragino Sans GB', 'Microsoft YaHei', 'WenQuanYi Micro Hei', sans-serif; line-height: 23.8px;"> </span>
我们会发现所有的资讯都在id为RecentNewsList的div中,jsoup提供了getElementById()方法:
Document doc = Jsoup.connect(path).timeout(5000).get();
Element masthead = doc.getElementById("RecentNewsList");
这时masthead中就包含了d为RecentNewsList的div中所有的信息。我们再仔细观察发现,资讯都是在class为List的ul中,并且以<li></li>为一组,jsoup提供了select()方法,条件递进,用空格分开:
Elements articleElements = masthead.select("ul.List li");
ul.List表示class为List的ul,articleElements中包含了所有以<li></li>为一组资讯信息,每一组信息如果获取标题,详情和发表信息可用以下代码:
Elements titleElement = articleElement.select("h2 a");
Elements summaryElement = articleElement.select("p.detail");
Elements timeElement = articleElement.select("p.date");
此时titleElement,summaryElement,timeElement分别装着标题,详情和发表信息。
以articleElements中的数据作为数据源的话,我们要以articleElements的长度做个循环,逐个取出以上三个信息赋值给一个Article对象,并将每一个Article对象放到定义的ArrayList对象中中。此时将ArrayList对象作为数据源设置ListView适配器。
下面我们看怎么一步步实现的:
1、首先新建android工程,下载jsoup的jar包导入工程,由于ADT升级到版本20以后无法加载这个包,用builtpath的方式可能会报错 java.lang.noclassdeffounderror:org/jsoup/Jsoup,如果报错,先把jar包remove掉,然后直接将jsoup.jar拷到libs文件夹下,clean一下工程就好了。
2、新建Article实体类,作为ListView适配器的适配类型
package com.example.osnews;
public class Article {
private String title="";
private String summary="";
private String postTime="";
public Article(String title,String summary,String postTime){
this.title=title;
this.summary=summary;
this.postTime=postTime;
}
public String getTitle() {
return title;
}
public void setTitle(String title) {
this.title = title;
}
public String getSummary() {
return summary;
}
public void setSummary(String summary) {
this.summary = summary;
}
public String getPostTime() {
return postTime;
}
public void setPostTime(String postTime) {
this.postTime = postTime;
}
}
3、新建ListView的适配器ListAdapter
package com.example.osnews;
import java.util.ArrayList;
import android.content.Context;
import android.view.LayoutInflater;
import android.view.View;
import android.view.ViewGroup;
import android.widget.BaseAdapter;
import android.widget.TextView;
public class ListAdapter extends BaseAdapter {
private ArrayList<Article> mArticleList;
private int resourceId;
private Context ctx;
public ListAdapter(Context context, int textViewResourceId, ArrayList<Article> objects) {
resourceId = textViewResourceId;
this.mArticleList = objects;
this.ctx = context;
}
@Override
public int getCount() {
return mArticleList.size();
}
@Override
public Article getItem(int position) {
return mArticleList.get(position);
}
@Override
public long getItemId(int position) {
return position;
}
@Override
public View getView(int position, View convertView, ViewGroup parent) {
Article article = getItem(position);
View view;
ViewHolder viewHolder;
if (convertView == null) {
view= LayoutInflater.from(ctx).inflate(resourceId, null);
viewHolder = new ViewHolder();
viewHolder.title = (TextView) view.findViewById(R.id.title);
viewHolder.summary = (TextView) view.findViewById(R.id.summary);
viewHolder.postTime = (TextView) view.findViewById(R.id.postTime);
view.setTag(viewHolder);
} else {
view=convertView;
viewHolder = (ViewHolder) view.getTag();
}
viewHolder.title.setText(article.getTitle());
viewHolder.summary.setText(article.getSummary());
viewHolder.postTime.setText(article.getPostTime());
return view;
}
static class ViewHolder {
public TextView title;
public TextView summary;
public TextView postTime;
}
}
4、布局文件,就上面一个标题和下面一个listView,自己在下载源码查看。
5、MainActivity.java
package com.example.osnews;
import java.io.IOException;
import java.util.ArrayList;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import android.annotation.SuppressLint;
import android.app.Activity;
import android.os.AsyncTask;
import android.os.Bundle;
import android.os.Handler;
import android.os.Message;
import android.util.Log;
import android.view.Menu;
import android.view.MenuItem;
import android.view.Window;
import android.widget.ListView;
public class MainActivity extends Activity {
private ListView listview;
private String path = "http://www.oschina.net/news";
private ListAdapter adapter;
@SuppressLint("HandlerLeak")
private Handler handler = new Handler(){
public void handleMessage(Message msg){
switch(msg.what){
case 1 :
listview.setAdapter(adapter);
break;
default:
break;
}
}
};
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
requestWindowFeature(Window.FEATURE_NO_TITLE);
setContentView(R.layout.activity_main);
listview = (ListView) this.findViewById(R.id.listview);
new GetListData().execute(path);
}
class GetListData extends AsyncTask<String, Void, ArrayList<Article>> {
@Override
protected ArrayList<Article> doInBackground(String... arg0) {
ArrayList<Article> articleList =new ArrayList<Article>();
try {
Document doc = Jsoup.connect(path).timeout(5000).get();
Element masthead = doc.getElementById("RecentNewsList");
Elements articleElements = masthead.select("ul.List li");
if (doc != null) {
for(int i = 0; i < articleElements.size(); i++) {
Element articleElement = articleElements.get(i);
Elements titleElement = articleElement.select("h2 a");
Elements summaryElement = articleElement.select("p.detail");
Elements timeElement = articleElement.select("p.date");
String title = titleElement.text();
String summary = summaryElement.text();
//if(summary.length() > 70)
// summary = summary.substring(0, 70);
String postTime = timeElement.text();
Log.i("title", title);
Log.i("summary", summary);
Log.i("postTime", postTime);
Article article = new Article(title,summary,postTime);
articleList.add(article);
}
}
} catch (IOException e) {
e.printStackTrace();
}return articleList;
}
@Override
protected void onPostExecute(ArrayList<Article> articleList) {
super.onPostExecute(articleList);
adapter = new ListAdapter(MainActivity.this,R.layout.item_article_list,articleList);
Log.i("adapter", "----------"+adapter.isEmpty());
Message msg = Message.obtain();
msg.what=1;
handler.sendMessage(msg);
}
}
@Override
public boolean onCreateOptionsMenu(Menu menu) {
getMenuInflater().inflate(R.menu.main, menu);
return true;
}
@Override
public boolean onOptionsItemSelected(MenuItem item) {
int id = item.getItemId();
if (id == R.id.action_settings) {
return true;
}
return super.onOptionsItemSelected(item);
}
}
6.AndroidManifest.xml中添加的上网权限
<uses-permission android:name="android.permission.INTERNET"></uses-permission>
源码下载: http://download.csdn.net/detail/chunxiao123ouc/9478282