Android中网页数据的抓取和修改

最新推荐文章于 2024-06-03 18:40:26 发布

一杯清泉

最新推荐文章于 2024-06-03 18:40:26 发布

阅读量7.6k

点赞数

文章标签： WebView 网页 H5 数据抓取

本文链接：https://blog.csdn.net/yoonerloop/article/details/59158191

版权

在Android中经常会使用WebView加载网页，进行网页数据的展示，但是有时候需要从网页中动态的抓取数据，进行处理，甚至对网页的数据进行修改，使其动态的展示效果，候WebView就显得无能为力了，最近项目中就有这样的需求，加载本地H5数据，动态的修改里面的内容，然后再预览，接下里说说他的实现步骤。

一、WebView介绍
WebView是一个基于webkit引擎、展现web页面的控件。在低版本和高版本采用了不同的webkit版本内核，4.4后直接使用Chrome。 WebView控件功能强大，除了具有一般View的属性和设置外，还可以对url请求、页面加载、渲染、页面交互进行强大的处理。
1、常用设置
//支持javascript
wvWebView.getSettings().setJavaScriptEnabled(true);
// 设置可以支持缩放
wvWebView.getSettings().setSupportZoom(true);
//隐藏缩放按钮
wvWebView.getSettings().setDisplayZoomControls(false);
// 设置出现缩放工具
wvWebView.getSettings().setBuiltInZoomControls(true);
//扩大比例的缩放
wvWebView.getSettings().setUseWideViewPort(true);
//自适应屏幕
wvWebView.getSettings().setLayoutAlgorithm(WebSettings.LayoutAlgorithm.SINGLE_COLUMN);
wvWebView.getSettings().setLoadWithOverviewMode(true);
2、加载网页的方式
在WebView有三种常用的加载方式：分别是loadUrl，LoadData，LoadDataWithBase
(1)loadUrl直接加载一个URL就可以实现网页的加载。
(2)wvWebView.loadData(String data,String minmeTye,String encoding);
参数一：要加载的网页字符串数据，参数二：加载minmeTye数据，一般为图片，参数三：编码格式。
此方法会自动把特殊字符转换，需要设置过滤，因此在加载css等含有特殊字符的文件应该谨慎。
(3) wvWebView.loadDataWithBaseURL(String baseUrl, String data, String mimeType, String encoding, String historyUrl);
参数一：要加载的网页数据的路径，即包含各类资源的总路径，参数二：需要加载的网页内容的字符串数据，参数三：加载minmeTye数据，一般为图片，参数四：编码格式，参数五：返回的URL,一般为null。
一般会使用方式一来加载图片，但是有时候加载的URL会关联多个文件，例如：一个HTML中含有的多个js,css,图片等资源，若是使用的一种加载方式会显示不全，无法显示图片等等，这时候需要用到第三种方式加载，他比第二种方式更加强大。

二、jsoup解析器

jsoup是一个强大的HTML解析器，封装了很多解析HTML，js，css的解析方法，具有非常强大的解析能力。它能够根据网页中的关键字，类选择器，id选择器，属性，值等等内容获取网页的的相关信息，并且能够设置相关属性，插入数据，以及独立的网页，对其进行编辑。

1、jsoup的初始化

导入jsoup的jar包，jsoup的静态方法Jsoup.parse能够把网页数据的字符串格式、输入流形式、文件形式、URL形式等转化为document文档对象，接着对文档对象进行操作，例如：

Document document = Jsoup.parse(html);

2、数据的获取，这里介绍以下常用的方法获取数据

（1）获取元素

getElementById(String id) 用id获得元素
getElementsByTag(String tag) 用标签获得元素
getElementsByClass(String className) 用class获得元素
getElementsByAttribute(String key) 用属性获得元

（2）获取特定的元素的文本

依据选择器来获取：Elements elementsBuyerName = document.select(".buyerName");
依据关键词来获取：Elements elementsBuyerName = document.contain(":货物");

获取的结果是一个list集合，遍历集合获取所要的结果。

（3）设置值

elementsBuyerName.get(0).text("这是一个新的值"); //设置值
document.select(".code").remove(); //移除相关标签

通过以上方法就可以简单地获取一个网页的数据。

三、具体使用场景实现

1、在Android studio的main文件夹下简历assets资源文件夹，并且把网页内容文件夹包括关联的图片、js资源、css资源以及其他资源拷贝到资源文件下。

2、在适当的位置把assets文件夹下的网页资源文件复制到手机本地目录里面。

    public static void copyAssetsToDst(Context context, String srcPath, String dstPath) {

        try {
            String fileNames[] = context.getAssets().list(srcPath);
            if (fileNames.length > 0) {
                File file = new File(context.getFilesDir(), dstPath);
                if (!file.exists()) {
                    file.mkdirs();
                } else {
                    return;
                }

                for (String fileName : fileNames) {
                    if (!srcPath.equals("")) { // assets 文件夹下的目录
                        copyAssetsToDst(context, srcPath + File.separator + fileName, dstPath + File.separator + fileName);
                    } else { // assets 文件夹
                        copyAssetsToDst(context, fileName, dstPath + File.separator + fileName);
                    }
                }
            } else {
                File outFile = new File(context.getFilesDir(), dstPath);
                InputStream is = context.getAssets().open(srcPath);
                FileOutputStream fos = new FileOutputStream(outFile);
                byte[] buffer = new byte[1024];
                int byteCount;
                while ((byteCount = is.read(buffer)) != -1) {
                    fos.write(buffer, 0, byteCount);
                }
                fos.flush();
                is.close();
                fos.close();
            }

        } catch (Exception e) {
            e.printStackTrace();

        }
    }

3、读取本地网页文件夹转为字符串格式数据，当访问网络获时取到数据并且对网页的相关字段进行查找替换。

    public static String readFile(String path) throws IOException {
        File file = new File(path);
        BufferedReader bufferedReader = new BufferedReader(new FileReader(file));
        StringBuilder stringBuilder = new StringBuilder();
        String content;
        while ((content = bufferedReader.readLine()) != null) {
            stringBuilder.append(content);
        }
        bufferedReader.close();
        return stringBuilder.toString();
    }

读取到内存中，使用Document document = Jsoup.parse(String html);对网页进行解析，获得到Document 文档对象。

Elements elementsr = document.select(".class选择器");
elementsr .get(0).text("要替换的内容");

使用String html = document.outerHtml();生成编辑后的字符串内容。

4、替换结束后把字符串数据重新写到相应的本地目录文件夹。

    public static void writeFile(String str,String path){
        FileWriter fw = null;
        File f = new File(path);
        try {
            fw = new FileWriter(f);
            BufferedWriter out = new BufferedWriter(fw);
            out.write(str, 0, str.length()-1);
            out.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

5、在预览网页时候从本地目录进行网页加载。

    public static String readFile(String path) throws IOException {
        File file = new File(path);
        BufferedReader bufferedReader = new BufferedReader(new FileReader(file));
        StringBuilder stringBuilder = new StringBuilder();
        String content;
        while ((content = bufferedReader.readLine()) != null) {
            stringBuilder.append(content);
        }
        bufferedReader.close();
        return stringBuilder.toString();
    }

wvWebView.loadUrl("file:///data/data/包名/文件夹名称/文件名称/file.html");

注意：这里一定要是：file:///文件路径，直接使用文件路径不能正常加载，会无法加载总文件夹下其他的js，图片等资源，出现各种错乱问题。

一杯清泉

关注

0
点赞
踩
6

收藏

觉得还不错? 一键收藏
2
评论
Android中网页数据的抓取和修改

在Android中经常会使用WebView加载网页，进行网页数据的展示，但是有时候需要从网页中动态的抓取数据，进行处理，甚至对网页的数据进行修改，使其动态的展示效果，候WebView就显得无能为力了，最新项目中就有这样的需求，加载本地H5数据，、动态的修改里面的内容，然后再预览，接下里说说他的实现步骤。一、WebView介绍WebView是一个基于webkit引擎、展现web页面的控件。在
复制链接

扫一扫