爬虫数据展示网站

一、项目要求

  • 基于第一个项目爬虫爬取的数据(3-5个数据源),完成数据展示网站。
  • 基本要求:
  • 1、用户可注册登录网站,非注册用户不可登录查看数据
  • 2、用户注册、登录、查询等操作记入数据库中的日志
  • 3、爬虫数据查询结果列表支持分页和排序
  • 4、用Echarts或者D3实现3个以上的数据分析图表展示在网站中
  • 5、实现一个管理端界面,可以查看(查看用户的操作记录)和管理(停用启用)注册用户。

二、网站展示

1、用户可注册登录网站,非注册用户不可登录查看数据

建立表格

user用于保存用户信息(id,用户名,密码,登陆时间)

 user_action用于记录用户的登录和查询信息

如图,在crawl里面表格已经建立好了,包含上次爬虫数据存储表以及新建的表

建立MySQL定义确认

module.exports = {
    mysql: {
        host: 'localhost',
        user: 'root',
        password: 'root',
        database:'crawl',
        // 最大连接数,默认为10
        connectionLimit: 10
    }
};

建立调用数据的路由

登录页面路径

对用户有适当的提示:

1.用户不存在!请检查后输入

2.用户名或密码错误!请检查后输入

注册页面路径

用户输入后有提示:

1.用户已存在!

2.成功注册!请登录

 用户退出登录路径(退出登录返回登录页面)

建立登录以及注册网站(注册成功跳转登录页面)

<!DOCTYPE html>
<html ng-app="login">
<head>
    <meta charset="utf-8" />
    <title>Login</title>
    <link rel="stylesheet" href="http://cdn.bootcss.com/bootstrap/3.3.0/css/bootstrap.min.css">
    <script src="https://cdn.staticfile.org/jquery/3.2.1/jquery.min.js"></script>
    <script src="https://cdn.staticfile.org/popper.js/1.12.5/umd/popper.min.js"></script>
    <script src="https://cdn.staticfile.org/twitter-bootstrap/4.1.0/js/bootstrap.min.js"></script>
<!--    <script src="../node_modules/angular/angular.min.js"></script>-->
    <script src="/angular/angular.min.js"></script>

<!--    引入自己的样式与js-->
    <link rel="stylesheet" type="text/css" href="stylesheets/index.css">
    <script type="text/javascript" src="javascripts/index.js"></script>
    <script>
        var app = angular.module('login', []);
        app.controller('loginCtrl', function ($scope, $http, $timeout) {

            // 登录时,检查用户输入的账户密码是否与数据库中的一致
            $scope.check_pwd = function () {
                var data = JSON.stringify({
                    username: $scope.username,
                    password: $scope.password
                });
                $http.post("/users/login", data)
                    .then(
                    function (res) {
                        if(res.data.msg=='ok') {
                            window.location.href='/news.html';
                        }else{
                            $scope.msg=res.data.msg;
                        }
                    },
                        function (err) {
                        $scope.msg = err.data;
                    });

            };
            //增加注册用户
            $scope.doAdd = function () {
                // 检查用户注册时,输入的两次密码是否一致
                if($scope.add_password!==$scope.confirm_password){
                    // $timeout(function () {
                    //     $scope.msg = '两次密码不一致!';
                    // },100);
                    $scope.msg = '两次密码不一致!';
                }
                else {
                    var data = JSON.stringify({
                        username: $scope.add_username,
                        password: $scope.add_password
                    });
                    $http.post("/users/register", data)
                        .then(function (res) {
                            if(res.data.msg=='成功注册!请登录') {
                                $scope.msg=res.data.msg;
                                $timeout(function () {
                                    window.location.href='index.html';
                                },2000);

                            } else {
                                $scope.msg = res.data.msg;
                            }
                        }, function (err) {
                            $scope.msg = err.data;
                        });
                }
            };
        });
    </script>
</head>
<body>
<div class="container" ng-controller="loginCtrl">
    <div class="row">
        <div class="col-md-6 col-md-offset-3">
            <div class="panel panel-login">
                <div class="panel-heading">
                    <div class="row">
                        <div class="col-xs-6">
                            <a href="#" class="active" id="login-form-link">Login</a>
                        </div>
                        <div class="col-xs-6">
                            <a href="#" id="register-form-link">Register</a>
                        </div>
                    </div>
                    <hr>
                </div>
                <div class="panel-body">
                    <div class="row">
                        <div class="col-lg-12">
                            <form id="login-form" method="post" role="form" style="display: block;">
<!--                                登陆部分-->
                                <div class="form-group">
                                    <input ng-model="username" tabindex="1" class="form-control" placeholder="Username" value=""/>
                                </div>
                                <div class="form-group">
                                    <input type="password" ng-model="password" tabindex="2" class="form-control" placeholder="Password">
                                </div>
<!--                                <div class="form-group text-center">-->
<!--                                    <input type="checkbox" tabindex="3" class="" name="remember" id="remember">-->
<!--                                    <label for="remember"> Remember Me</label>-->
<!--                                </div>-->

                                <div class="form-group">
                                    <div class="row">
                                        <div class="col-sm-6 col-sm-offset-3">
                                            <button id="login-submit" tabindex="4" class="form-control btn btn-login" ng-click="check_pwd()">LOG IN</button>
                                        </div>
                                    </div>
                                </div>
                            </form>
                            <form id="register-form" method="post" role="form" style="display: none;">
                                <div class="form-group">
                                    <input ng-model="add_username" tabindex="1" class="form-control" placeholder="Username" value=""/>
                                </div>

                                <div class="form-group">
                                    <input type="password" ng-model="add_password" tabindex="2" class="form-control" placeholder="Password">
                                </div>

                                <div class="form-group">
                                    <input type="password" ng-model="confirm_password" tabindex="2" class="form-control" placeholder="Confirm Password">
                                </div>
                                <div class="form-group">
                                    <div class="row">
                                        <div class="col-sm-6 col-sm-offset-3">
                                            <button tabindex="4" class="form-control btn btn-register" ng-click="doAdd()">Register Now</button>
                                        </div>
                                    </div>
                                </div>

                            </form>

                        </div>
                    </div>
                </div>
<!--                <div class="alert alert-warning alert-dismissible fade show">-->
<!--                    <button type="button" class="close" data-dismiss="alert">&times;</button>-->
<!--                    <strong>警告!</strong>{{msg}}-->
<!--                </div>-->
            </div>
            <div class="alert alert-warning" ng-if="msg && msg!='ok'">
                <a href="#" class="close" data-dismiss="alert">&times;</a>
                <strong>Waring!</strong>
            </div>
        </div>
    </div>
</div>
</body>

登录网站建立成功

2、用户注册、登录、查询等操作记入数据库中的日志

在app.js中下载安装包npm install cookie-parser、npm install express、npm install express-session、npm install morgan、npm install nodejieba

 完整的app.js代码如下

var createError = require('http-errors');
var express = require('express');
var path = require('path');
var cookieParser = require('cookie-parser');
var session = require('express-session');
var logger = require('morgan');
var logDAO = require('./dao/logDAO.js');
var usersRouter = require('./routes/users');
var newsRouter = require('./routes/news');


var app = express();

//设置session
app.use(session({
  secret: 'sessiontest',//与cookieParser中的一致
  resave: true,
  saveUninitialized: false, // 是否保存未初始化的会话
  cookie : {
    maxAge : 1000 * 60 * 60, // 设置 session 的有效时间,单位毫秒
  },
}));

let method = '';
app.use(logger(function (tokens, req, res) {
  console.log('打印的日志信息:');
  var request_time = new Date();
  var request_method = tokens.method(req, res);
  var request_url = tokens.url(req, res);
  var status = tokens.status(req, res);
  var remote_addr = tokens['remote-addr'](req, res);
  if(req.session){
    var username = req.session['username']||'notlogin';
  }else {
    var username = 'notlogin';
  }

  // 直接将用户操作记入mysql中
  if(username!='notlogin'){
    logDAO.userlog([username,request_time,request_method,request_url,status,remote_addr], function (success) {
      console.log('成功保存!');
    })
  }
  console.log('请求时间  = ', request_time);
  console.log('请求方式  = ', request_method);
  console.log('请求链接  = ', request_url);
  console.log('请求状态  = ', status);
  console.log('请求长度  = ', tokens.res(req, res, 'content-length'),);
  console.log('响应时间  = ', tokens['response-time'](req, res) + 'ms');
  console.log('远程地址  = ', remote_addr);
  console.log('远程用户  = ', tokens['remote-user'](req, res));
  console.log('http版本  = ', tokens['http-version'](req, res));
  console.log('浏览器信息 = ', tokens['user-agent'](req, res));
  console.log('用户 = ', username);
  console.log(' ===============',method);
}, ));


app.use(express.json());
app.use(express.urlencoded({ extended: false }));
app.use(cookieParser());
app.use(express.static(path.join(__dirname, 'public')));
app.use('/angular', express.static(path.join(__dirname , '/node_modules/angular')));

// app.use('/', indexRouter);
app.use('/users', usersRouter);
app.use('/news', newsRouter);

// catch 404 and forward to error handler
app.use(function(req, res, next) {
  next(createError(404));
});

// error handler
app.use(function(err, req, res, next) {
  // set locals, only providing error in development
  res.locals.message = err.message;
  res.locals.error = req.app.get('env') === 'development' ? err : {};

  // render the error page
  res.status(err.status || 500);
  // res.render('error');
});

module.exports = app;

将用户信息保存在mysql的表格中

保存的操作日志如图

3、爬虫数据查询词支持布尔表达式,查询结果列表支持分页排序

登录进网站,在该段代码中引入查询页面search.html

<html ng-app="news">
<head>
    <meta charset="utf-8">
    <title>News</title>

    <link rel="stylesheet" href="https://cdn.staticfile.org/twitter-bootstrap/3.3.7/css/bootstrap.min.css">
    <script src="https://cdn.staticfile.org/jquery/2.1.1/jquery.min.js"></script>
    <script src="http://cdn.bootcss.com/bootstrap/3.3.4/js/bootstrap.min.js"></script>
    <script type="text/javascript" src="https://cdn.jsdelivr.net/npm/echarts@4.7.0/dist/echarts.min.js"></script>
    <script src='javascripts/dist/echarts-wordcloud.min.js'></script>
    <script src="/angular/angular.min.js"></script>

    <script src="javascripts/news.js" type="text/javascript"></script>

</head>
<body ng-controller="news_Ctrl" ng-init="isShow=false">
<nav class="navbar navbar-inverse navbar-fixed-top">
    <div class="container">
        <div class="navbar-header">
            <a class="navbar-brand" href="#">News</a>
        </div>
        <div id="navbar" class="navbar-collapse collapse">
            <ul class="nav navbar-nav">
                <li ><a ng-click="showSearch()">检索</a></li>
                <li class="dropdown">
                    <a href="#" class="dropdown-toggle" data-toggle="dropdown">图片<span class="caret"></span></a>
                    <ul class="dropdown-menu">
                        <li><a ng-click="histogram()">柱状图</a></li>
                        <li><a ng-click="pie()">饼状图</a></li>
                        <li><a ng-click="line()">折线图</a></li>
                        <li><a ng-click="wordcloud()">词云</a></li>
                    </ul>
                </li>
                <li>
                    <a href="#" class="dropdown-toggle" data-toggle="dropdown">账号管理<span class="caret"></span></a>
                    <ul class="dropdown-menu">
                        <li class="dropdown-header">账号</li>
                        <li><a ng-click="logout()">退出登录</a></li>
                    </ul>

                </li>
            </ul>

        </div>

    </div>

</nav>
<!--    所有的图片都绘制在main1位置-->
<span ng-hide="isShow" id="main1" style="width: 1000px;height:600px;position:fixed; top:70px;left:80px"></span>

<div ng-show="isShow" style="width: 1300px;position:relative; top:70px;left: 80px">
    <!--    查询页面-->
    <div ng-include="'search.html'"></div>

</div>
</body>

</html>

查询页路由(routes/news.js)

查询页实现布尔表达式

查询页结果的列表支持分页排序

首先在查询页面添加页面按钮

再看分页的js代码(可以通过改变第70行代码来改变每一页显示条数)

该网页此时显示页面的js代码(设置的是框里最多展示5个页码)

实现效果

 还可以加入首页和末页按钮,更改js代码及html代码

4、用Echarts或者D3实现3个以上的数据分析图表展示在网站中

四个图的js代码

    $scope.histogram = function () {
        $scope.isShow = false;
        $http.get("/news/histogram")
            .then(
                function (res) {

                    if(res.data.message=='url'){
                        window.location.href=res.data.result;
                    }else {

                        // var newdata = washdata(data);
                        let xdata = [], ydata = [], newdata;

                        var pattern = /\d{4}-(\d{2}-\d{2})/;
                        res.data.result.forEach(function (element) {
                            // "x":"2020-04-28T16:00:00.000Z" ,对x进行处理,只取 月日
                            xdata.push(pattern.exec(element["x"])[1]);
                            ydata.push(element["y"]);
                        });
                        newdata = {"xdata": xdata, "ydata": ydata};

                        var myChart = echarts.init(document.getElementById('main1'));

                        // 指定图表的配置项和数据
                        var option = {
                            title: {
                                text: '新闻发布数 随时间变化'
                            },
                            tooltip: {},
                            legend: {
                                data: ['新闻发布数']
                            },
                            xAxis: {
                                data: newdata["xdata"]
                            },

                            yAxis: {},
                            series: [{
                                name: '新闻数目',
                                type: 'bar',
                                data: newdata["ydata"]
                            }]
                        };
                        // 使用刚指定的配置项和数据显示图表。
                        myChart.setOption(option);
                    }
                },
                function (err) {
                    $scope.msg = err.data;
                });

    };
    $scope.pie = function () {
        $scope.isShow = false;
        $http.get("/news/pie").then(
            function (res) {
                if(res.data.message=='url'){
                    window.location.href=res.data.result;
                }else {
                    let newdata = [];

                    var pattern = /责任编辑:(.+)/;//匹配名字
                    res.data.result.forEach(function (element) {
                        // "x":  责任编辑:李夏君 ,对x进行处理,只取 名字
                        newdata.push({name: pattern.exec(element["x"])[1], value: element["y"]});

                    });

                    var myChart = echarts.init(document.getElementById('main1'));
                    var app = {};
                    option = null;
                    // 指定图表的配置项和数据
                    var option = {
                        title: {
                            text: '作者发布新闻数量',
                            x: 'center'
                        },
                        tooltip: {
                            trigger: 'item',
                            formatter: "{a} <br/>{b} : {c} ({d}%)"
                        },
                        legend: {
                            orient: 'vertical',
                            left: 'left',
                            // data: ['直接访问', '邮件营销', '联盟广告', '视频广告', '搜索引擎']
                        },
                        series: [
                            {
                                name: '访问来源',
                                type: 'pie',
                                radius: '55%',
                                center: ['50%', '60%'],
                                data: newdata,
                                itemStyle: {
                                    emphasis: {
                                        shadowBlur: 10,
                                        shadowOffsetX: 0,
                                        shadowColor: 'rgba(0, 0, 0, 0.5)'
                                    }
                                }
                            }
                        ]
                    };
                    // myChart.setOption(option);
                    app.currentIndex = -1;

                    setInterval(function () {
                        var dataLen = option.series[0].data.length;
                        // 取消之前高亮的图形
                        myChart.dispatchAction({
                            type: 'downplay',
                            seriesIndex: 0,
                            dataIndex: app.currentIndex
                        });
                        app.currentIndex = (app.currentIndex + 1) % dataLen;
                        // 高亮当前图形
                        myChart.dispatchAction({
                            type: 'highlight',
                            seriesIndex: 0,
                            dataIndex: app.currentIndex
                        });
                        // 显示 tooltip
                        myChart.dispatchAction({
                            type: 'showTip',
                            seriesIndex: 0,
                            dataIndex: app.currentIndex
                        });
                    }, 1000);
                    if (option && typeof option === "object") {
                        myChart.setOption(option, true);
                    }
                    ;
                }
            });
    };
    $scope.line = function () {
        $scope.isShow = false;
        $http.get("/news/line").then(
            function (res) {
                if(res.data.message=='url'){
                    window.location.href=res.data.result;
                }else {
                    var myChart = echarts.init(document.getElementById("main1"));
                    option = {
                        title: {
                            text: '"疫情"该词在新闻中的出现次数随时间变化图'
                        },
                        xAxis: {
                            type: 'category',
                            data: Object.keys(res.data.result)
                        },
                        yAxis: {
                            type: 'value'
                        },
                        series: [{
                            data: Object.values(res.data.result),
                            type: 'line',
                            itemStyle: {normal: {label: {show: true}}}
                        }],

                    };

                    if (option && typeof option === "object") {
                        myChart.setOption(option, true);
                    }
                }

            });
    };
    $scope.wordcloud = function () {
        $scope.isShow = false;
        $http.get("/news/wordcloud").then(
            function (res) {
                if(res.data.message=='url'){
                    window.location.href=res.data.result;
                }else {
                    var mainContainer = document.getElementById('main1');

                    var chart = echarts.init(mainContainer);

                    var data = [];
                    for (var name in res.data.result) {
                        data.push({
                            name: name,
                            value: Math.sqrt(res.data.result[name])
                        })
                    }

                    var maskImage = new Image();
                    maskImage.src = './images/logo.png';

                    var option = {
                        title: {
                            text: '所有新闻内容 jieba分词 的词云展示'
                        },
                        series: [{
                            type: 'wordCloud',
                            sizeRange: [12, 60],
                            rotationRange: [-90, 90],
                            rotationStep: 45,
                            gridSize: 2,
                            shape: 'circle',
                            maskImage: maskImage,
                            drawOutOfBound: false,
                            textStyle: {
                                normal: {
                                    fontFamily: 'sans-serif',
                                    fontWeight: 'bold',
                                    // Color can be a callback function or a color string
                                    color: function () {
                                        // Random color
                                        return 'rgb(' + [
                                            Math.round(Math.random() * 160),
                                            Math.round(Math.random() * 160),
                                            Math.round(Math.random() * 160)
                                        ].join(',') + ')';
                                    }
                                },
                                emphasis: {
                                    shadowBlur: 10,
                                    shadowColor: '#333'
                                }
                            },
                            data: data
                        }]
                    };

                    maskImage.onload = function () {
                        // option.series[0].data = data;
                        chart.clear();
                        chart.setOption(option);
                    };

                    window.onresize = function () {
                        chart.resize();
                    };
                }

            });
    }

});

四个图的路由代码

router.get('/histogram', function(request, response) {
    //sql字符串和参数
    console.log(request.session['username']);

    //sql字符串和参数
    if (request.session['username']===undefined) {
        // response.redirect('/index.html')
        response.json({message:'url',result:'/index.html'});
    }else {
        var fetchSql = "select publish_date as x,count(publish_date) as y from fetches group by publish_date order by publish_date;";
        newsDAO.query_noparam(fetchSql, function (err, result, fields) {
            response.writeHead(200, {
                "Content-Type": "application/json",
                "Cache-Control": "no-cache, no-store, must-revalidate",
                "Pragma": "no-cache",
                "Expires": 0
            });
            response.write(JSON.stringify({message:'data',result:result}));
            response.end();
        });
    }
});


router.get('/pie', function(request, response) {
    //sql字符串和参数
    console.log(request.session['username']);

    //sql字符串和参数
    if (request.session['username']===undefined) {
        // response.redirect('/index.html')
        response.json({message:'url',result:'/index.html'});
    }else {
        var fetchSql = "select author as x,count(author) as y from fetches group by author;";
        newsDAO.query_noparam(fetchSql, function (err, result, fields) {
            response.writeHead(200, {
                "Content-Type": "application/json",
                "Cache-Control": "no-cache, no-store, must-revalidate",
                "Pragma": "no-cache",
                "Expires": 0
            });
            response.write(JSON.stringify({message:'data',result:result}));
            response.end();
        });
    }
});

router.get('/line', function(request, response) {
    //sql字符串和参数
    console.log(request.session['username']);

    //sql字符串和参数
    if (request.session['username']===undefined) {
        // response.redirect('/index.html')
        response.json({message:'url',result:'/index.html'});
    }else {
        var keyword = '疫情'; //也可以改进,接受前端提交传入的搜索词
        var fetchSql = "select content,publish_date from fetches where content like'%" + keyword + "%' order by publish_date;";
        newsDAO.query_noparam(fetchSql, function (err, result, fields) {
            response.writeHead(200, {
                "Content-Type": "application/json",
                "Cache-Control": "no-cache, no-store, must-revalidate",
                "Pragma": "no-cache",
                "Expires": 0
            });
            response.write(JSON.stringify({message:'data',result:myfreqchangeModule.freqchange(result, keyword)}));
            response.end();
        });
    }
});

router.get('/wordcloud', function(request, response) {
    //sql字符串和参数
    console.log(request.session['username']);

    //sql字符串和参数
    if (request.session['username']===undefined) {
        // response.redirect('/index.html')
        response.json({message:'url',result:'/index.html'});
    }else {
        var fetchSql = "select content from fetches;";
        newsDAO.query_noparam(fetchSql, function (err, result, fields) {
            response.writeHead(200, {
                "Content-Type": "application/json",
                "Cache-Control": "no-cache, no-store, must-revalidate",
                "Pragma": "no-cache",
                "Expires": 0
            });
            response.write(JSON.stringify({message:'data',result:mywordcutModule.wordcut(result)}));//返回处理过的数据
            response.end();
        });
    }
});


module.exports = router;

数据分析图结果展示

词云

通过更改代码还可以得到不同形状的词云

柱状图

折线图

总结:经过一学期的web编程课,从最开始爬虫和做网页时感到困难重重到最终圆满的完成了实验项目而感到欣喜自豪,我从这门课上不仅学习到了知识,还学习到了如何向周围的老师同学寻求帮助,解决问题。再次感谢大家的帮助,感谢老师和助教不厌其烦的耐心讲解!

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值