【Android】利用爬虫将电影网站打包成一个APP

以前总是通过HDMI线将电脑连接电视看剧看电影,但是每次想快进啊进行一些操作过于麻烦,手机上的APP的资源又太少,于是想到将电影网站打包成一个APP!
APP已经差不多做好了,最近也比较忙,没什么时间去弄,界面很丑,这里我讲讲我的主要思路,自己动手才有乐趣!
在做之前整理下思路,怎样才能把网站搬到APP上呢?
有两个方法:
1.通过WebView简单粗暴将整个网站搬过来,几十行代码搞定!但是用户体验不是很好,广告多,字体小,操作也不是很方便
2.利用爬虫,需要什么爬什么!然后将数据显示在自己做的界面上!

Let'go
首先我们搭一个界面大概的界面出来
 

[Java] 纯文本查看 复制代码

?

01

02

03

04

05

06

07

08

09

10

ButterKnife.bind(this);

      mTabBar.init(getSupportFragmentManager())

              .setImgSize(50, 50)

              .setFontSize(10)

              .setTabPadding(4, 6, 10)

              .setChangeColor(Color.RED, Color.DKGRAY)

              .addTabItem("首页", R.mipmap.ic_launcher_round, HomeFragment.class)

              .addTabItem("电影", R.mipmap.ic_launcher_round, MoveFragment.class)

              .addTabItem("电视剧", R.mipmap.ic_launcher_round, TVplayFragment.class)

              .addTabItem("动漫", R.mipmap.ic_launcher_round, CartoonFragment.class);



一个4个Fragment绑定一个activity,每个fragment
上面是个recyclerview
底部是BottomTabBar

 

界面搭好了就差数据了
导入图片加载和网络请求依赖
compile 'com.squareup.picasso:picasso:2.5.2'
compile 'com.mcxiaoke.volley:library:1.0.19'打开网站,F12查看当前网页源码,找到需要的数据
 
首先通过网络请求获取整个页面数据,然后正则匹配找到我们需要的数据,图片,链接,标题和评分,新建一个Bean类将数据储存到里面

[Java] 纯文本查看 复制代码

?

01

02

03

04

05

06

07

08

09

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

public class DataBean {

    private String DataName;

    private String DataScore;

    private String DataImg;

    private String DataNetWork;

 

    public String getDataName() {

        return DataName;

    }

 

    public void setDataName(String dataName) {

        DataName = dataName;

    }

 

    public String getDataScore() {

        return DataScore;

    }

 

    public void setDataScore(String dataScore) {

        DataScore = dataScore;

    }

 

    public String getDataImg() {

        return DataImg;

    }

 

    public void setDataImg(String dataImg) {

        DataImg = dataImg;

    }

 

    public String getDataNetWork() {

        return DataNetWork;

    }

 

    public void setDataNetWork(String dataNetWork) {

        DataNetWork = dataNetWork;

    }

}


通过recyclerView将数据显示到界面上,通过接口回调设置点击事件点击图片跳转到详情页

[Java] 纯文本查看 复制代码

?

01

02

03

04

05

06

07

08

09

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

RequestQueue queue = Volley.newRequestQueue(getActivity());

        MyStringRequest stringRequest = new MyStringRequest(getHosturl(), new Response.Listener<String>() {

            @Override

            public void onResponse(String response) {

                String regEx = "<li><div class=li-box><div class=img-box></div><a href=\"(.+?)\"><img src=\"(.+?)\" onerror=\".+?\"><span class=back></span><span>(.+?)</span></div><P><a href=\".+?\" target=\"_blank\">(.+?)</a></P></li>";

                 

                Pattern pattern = Pattern.compile(regEx);

                mMatcher = pattern.matcher(response);

                if (mData != null) {

                    mData.clear();

                }

                new Thread(new Runnable() {

                    @Override

                    public void run() {

                        while (mMatcher.find()) {

                            //Log.e(TAG, matcher.group());

                            /*Log.e(TAG, mMatcher.group(1));

                            Log.e(TAG, mMatcher.group(2));

                            Log.e(TAG, mMatcher.group(3));

                            Log.e(TAG, mMatcher.group(4));*/

                            DataBean dataBean = new DataBean();

                            dataBean.setDataNetWork(mMatcher.group(1));

                            dataBean.setDataImg(mMatcher.group(2));

                            dataBean.setDataName(mMatcher.group(4));

                            dataBean.setDataScore(mMatcher.group(3));

                            mData.add(dataBean);

                        }

                    }

                }).start();

                if (mHomeAdapter == null) {

                    mHomeAdapter = new homeAdapter(getContext(), mData, mGridLayoutManager);

                }

                mRecyclerView.setAdapter(mHomeAdapter);

                mHomeAdapter.notifyDataSetChanged();

                mTvThisPage.setText("第"+mThisPage+"页");

                Toast.makeText(getContext(), "第" + mThisPage + "页", Toast.LENGTH_SHORT).show();

                mHomeAdapter.setItemClickListener(new homeAdapter.OnItemClickListener() {

                    @Override

                    public void onItemClick(int position) {

                        String url = mData.get(position).getDataNetWork();

                        String title = mData.get(position).getDataName();

                        String img_url = mData.get(position).getDataImg();

                        String requestUrl = getHosturl() + url;

                        //Toast.makeText(getContext(), requestUrl, Toast.LENGTH_SHORT).show();

                        Intent intent = new Intent(getContext(), DetailsActivity.class);

                        intent.putExtra("title", title);

                        intent.putExtra("img_url", img_url);

                        intent.putExtra("requestUrl", requestUrl);

                        startActivity(intent);

                    }

                });

 

                //Log.e(TAG, response);

 

            }

        }, new Response.ErrorListener() {

            @Override

            public void onErrorResponse(VolleyError error) {

                Log.e(TAG, error.getMessage(), error);

                mThisPage--;

                Toast.makeText(getContext(), "网络不稳定!加载失败!请稍后重试!", Toast.LENGTH_SHORT).show();

 

            }

        });

        queue.add(stringRequest);

    }



 

为了便于维护将首页封装成基类,然后将电影,电视剧和动漫页面继承,稍加改动,这样4个页面就出来了
 

[Java] 纯文本查看 复制代码

?

01

02

03

04

05

06

07

08

09

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

public class MoveFragment extends HomeFragment implements View.OnClickListener {

 

    @Override

    protected void intitView() {

        hostUrl = "http://m.yiybb.com/dianying/";

        super.intitView();

        mThisPage = 1;

        listTpye = "List_15_";

        mTvThisPage.setVisibility(View.VISIBLE);

 

 

    }

 

    @Override

    protected void initData() {

        super.initData();

    }

 

    @Override

    protected String setTitle() {

        return "电影";

    }

 

    @Override

    protected void setViibilly() {

        page.setVisibility(View.VISIBLE);

    }



第二大步:
然后我们来做详情页
通过Intent获取到上个页面传递过来的URL,图片链接和标题,通过URL获取整个页面数据,然后正则匹配,其他需要的数据

 
 

[Java] 纯文本查看 复制代码

?

01

02

03

04

05

06

07

08

09

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

RequestQueue queue = Volley.newRequestQueue(this);

       MyStringRequest stringRequest = new MyStringRequest(mRequestUrl, new Response.Listener<String>() {

           @Override

           public void onResponse(String response) {

               //Log.e(TAG, response );

               String regEx = "<li><a href=\"/(.+?)\" >(.+?)</a></li>";

               /*

               <div class="movie"><ul><div class="img"><div class="img-box-2"></div><img src="http://88.meenke.com/img_buyhi/201805/2018052876045761.jpg" alt="帝王攻略" border="0"></div><h1>帝王攻略</h1><li>更新至:[17]</li><li>年 代:2018</li><li>类 型:<a href="/dhp_lianzai/Index.html" target="_blank">动画连载</a></li><li class="cksc"><a id="shoucang" href="#sc">收藏</a></li></ul></div>

                */

 

               Pattern pattern = Pattern.compile(regEx);

               mMatcher = pattern.matcher(response);

               new Thread(new Runnable() {

                   @Override

                   public void run() {

                       while (mMatcher.find()) {

                           //Toast.makeText(DetailsActivity.this, mMatcher.group() + "", Toast.LENGTH_SHORT).show();

                           //Log.e(TAG, mMatcher.group());

                           //Log.e(TAG, mMatcher.group(1));

                           //Log.e(TAG, mMatcher.group(2));

                           String url = mMatcher.group(1);

                           String title = mMatcher.group(2);

                           //Log.e(TAG, mMatcher.group(3));

                           //Log.e(TAG, mMatcher.group(4));

                           DetailsBean dataBean = new DetailsBean();

                           dataBean.setTitle(title);

                           dataBean.setUrl(url);

                           mData.add(dataBean);

                       }

                   }

               }).start();

               DetailAdapter adapter = new DetailAdapter(DetailsActivity.this,mData,mGridLayoutManager);

               mDaRecyView.setAdapter(adapter);

              adapter.setItemClickListener(new DetailAdapter.OnItemClickListener() {

                  @Override

                  public void onItemClick(int position) {

                      String url = "http://m.yiybb.com/"+mData.get(position).getUrl();

                      Intent intent = new Intent(DetailsActivity.this,PlayActivity.class);

                      intent.putExtra("url",url);

                      startActivity(intent);

                  }

              });

           }

       }, new Response.ErrorListener() {

           @Override

           public void onErrorResponse(VolleyError error) {

               Log.e(TAG, error.getMessage(), error);

               Toast.makeText(DetailsActivity.this, "网络不稳定!", Toast.LENGTH_SHORT).show();

 

           }

       });

       queue.add(stringRequest);

   }



第三大步:
最重要就是播放页面了,这里费了点时间
刚开始我也是按照前2步骤去操作,但是发现怎么都爬不到获取不到链接打开原网站才发现播放链接是动态加载的而且还3层加密了!MMP
打开播放页面,F12选择sources,查看源码

 

<div class="playing">就是播放页面,但是里面空的,证明这是动态加载了这个标签
继续看源码找script标签看下里面做了什么
<script language="javascript">var StrHtml;var url=set_code(unescape("JN0HT%250G%256B%256BJeN.Ty6l.167%256Bx%256BlDvoVW%256B1dTw8xE.mkyw",0,0));var nexturl="no";var nextpath="no";var Player={Url:url,Height:240,Width:600,Show:null};function $ShowPlayer(w,h){document.write($Showhtml());}</script>
<script language="javascript" src="Play/23.js"></script><script language="javascript">$ShowPlayer(600,240);</script>
JN0HT%250G%256B%256BJeN.Ty6l.167%256Bx%256BlDvoVW%256B1dTw8xE.mkyw是第一加密后的链接,至于怎么来的,我们不管他,这个数据是可以爬出来的
unescape()方法是JS自带解密方法
set_code()再次加密


找到了主要的逻辑,顺着思路往下走,
在netWork刷新页面看下加载了那些东西,主要看js,发现加载了这么几个东东


 

一个个查看吧,里面搜索set_code这个方法


 

找到了set_code方法:
大概看了下,里面又调用了____e() 和 ____d()
以为这样就找到了链接,我刚开始也这么觉得,但是在网页上打开还是错误的
继续回到sources,查看源码

<script language="javascript" src="Play/23.js"></script><script language="javascript">$ShowPlayer(600,240);</script>
这个才是主要动态加载标签的主要逻辑,在源码了找到Play/23.js
function $ShowPlayer(width,height){
        StrHtml = '<iframe id="ffplayer" src="/ck/ck.html?'+url+'|" width="96%" height="94%"  allowfullscreen="true" frameborder="no" border="0" marginwidth="0" marginheight="0" scrolling="no"></iframe>';
        document.write(StrHtml);
}

将之前加密的URL再一次拼接!
找到了加密方法,那我们来做一个工具类将其解密

 

[HTML] 纯文本查看 复制代码

?

01

02

03

04

05

06

07

08

09

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

<!DOCTYPE html>

<html>

 

        <head>

                <meta charset="utf-8" />

                <title></title>

        </head>

 

        <body>

                <script language="JavaScript">

                        var StrHtml;

                        function ____e(){

                                return"0123456789,ABCDEFG,HIJKLMN,OPQRST,UVWXYZ,abcdefg,hijklmn,opqrst,uvwxyz"

                                }

                        function ____d(){

                                return"4560123987,GFEDCBA,MHIJLNK,PQRSTO,ZUVWXY,gfedcba,mhijlnk,pqrsto,zuvwxy"

                                }

                        function set_code(s,en,isN){

                                var e_s = en?____e():____d(), d_s = en?____d():____e(),str="";

                                e_s=isN?e_s.split(",")[0]:e_s,d_s=isN?d_s.split(",")[0]:d_s;

                                for(var i=0;i<s.length;i++){

                                        n=-1;n=e_s.indexOf(s.charAt(i));

                                        if(n!=-1){

                                                str+=d_s.charAt(n)

                                        }else{

                                                str+=s.charAt(i)

                                        }

                                }

                                return str

                        };

 

                        function getUrl(first_url){

                                var url = set_code(unescape(first_url, 0, 0));

                                StrHtml="http://m.yiybb.com/ck/ck.html?" +url+ "|";

                                window.open(StrHtml);

                        }

 

                </script>

                 

                         

                 </body>

 

</html>



接下来将HTML放到asset/index.html
 

[Java] 纯文本查看 复制代码

?

1

2

3

4

5

6

7

mWebView.loadUrl("file:///android_asset/index.html");

        WebSettings settings = mWebView.getSettings();

        settings.setJavaScriptEnabled(true);

        mWebView.addJavascriptInterface(this, "android");

        mWebView.setWebChromeClient(webChromeClient);

        mWebView.setWebViewClient(mWebViewClient);

        settings.setDomStorageEnabled(true);


这样就可以通过webView与本地的HTML文件交互,
接下来获取到播放页面第一次加密的链接,

[Java] 纯文本查看 复制代码

?

01

02

03

04

05

06

07

08

09

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

RequestQueue queue = Volley.newRequestQueue(this);

       MyStringRequest stringRequest = new MyStringRequest(mUrl, new Response.Listener<String>() {

           @RequiresApi(api = Build.VERSION_CODES.KITKAT)

           @Override

           public void onResponse(String response) {

               Log.d(TAG, response);

               String regEx = "unescape\\(\"(.+?)\",0,0\\)";

               Pattern pattern = Pattern.compile(regEx);

               mMatcher = pattern.matcher(response);

 

               new Thread(new Runnable() {

                   @Override

                   public void run() {

                       while (mMatcher.find()) {

                           Log.e(TAG, mMatcher.group(1));

                           mFirst_url = mMatcher.group(1);

                           final String requestUrl = "javascript:getUrl('"+mFirst_url+"')";

                           Log.e(TAG, requestUrl );

                           mWebView.post(new Runnable() {

                               @Override

                               public void run() {

                                   mWebView.loadUrl(requestUrl);

 

                               }

                           });

 

                       }

                   }

               }).start();

 

           }

       }, new Response.ErrorListener() {

           @Override

           public void onErrorResponse(VolleyError error) {

               Log.e(TAG, error.getMessage(), error);

 

               Toast.makeText(PlayActivity.this, "网络不稳定!加载失败!请稍后重试!", Toast.LENGTH_SHORT).show();

 

           }

       });

       queue.add(stringRequest);

 

   }




调用html自己写的工具类getUrl方法


将获取到的参数传进去,然后再加载网页


大功告成!
可能步骤比较多,说的比较乱,没看懂的可以留言咨询!
我拿的4.3系统的破手机测试没问题,其他没测试,可能会有很多BUG,UI没什么时间弄,源码放出来,自己修改,仅供娱乐!
8.26更新 
修复了详情页西瓜视频不能播放的无效链接
修复了播放页面参数错误

详情请查看

【Android】利用爬虫将电影网站打包成一个APP(8.26更新)
https://www.52pojie.cn/thread-785672-1-1.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值