服务器后台运行scrapy爬虫的everything

服务器后台运行爬虫

一. ssh远程连接

ssh root@100.100.1`在这里插入代码片`00.100

之后按要求输入password即可

二. 数据库(mysql)安装

sudo apt-get update
sudo apt-get install mysql-server

mysql配置(用户名,密码)设置

sudo mysql_secure_installation

配置过程会显示如下:

Securing the MySQL server deployment.

Connecting to MySQL using a blank password.

#询问是否要安装密码有效性测试插件,类似于密码里面要有大写小写数字的那种,我嫌麻烦没安
VALIDATE PASSWORD PLUGIN can be used to test passwords
and improve security. It checks the strength of password
and allows the users to set only those passwords which are
secure enough. Would you like to setup VALIDATE PASSWORD plugin?

Press y|Y for Yes, any other key for No: No

#设置密码
Please set the password for root here.

New password: 

Re-enter new password: 

#有一个默认的不需要密码就可以登陆的user是否要删除(直接输入‘mysql’就可以登陆),我没删
By default, a MySQL installation has an anonymous user,
allowing anyone to log into MySQL without having to have
a user account created for them. This is intended only for
testing, and to make the installation go a bit smoother.
You should remove them before moving into a production
environment.

Remove anonymous users? (Press y|Y for Yes, any other key for No) : No

 ... skipping.

#是否运行远程连接mysql数据库,我选择不可以
Normally, root should only be allowed to connect from
'localhost'. This ensures that someone cannot guess at
the root password from the network.

Disallow root login remotely? (Press y|Y for Yes, any other key for No) : Yes
Success.

#又一个test默认数据库作为测试的是否要删除,没删
By default, MySQL comes with a database named 'test' that
anyone can access. This is also intended only for testing,
and should be removed before moving into a production
environment.


Remove test database and access to it? (Press y|Y for Yes, any other key for No) : No

 ... skipping.
 
 #是否现在就重新加载改变,是
Reloading the privilege tables will ensure that all changes
made so far will take effect immediately.

Reload privilege tables now? (Press y|Y for Yes, any other key for No) : Yes
Success.

三. sftp传输文件

通过sftp传输数据库的结构sql文件,项目代码

sftp连接

sftp user@100.100.100.100

上传

put local_path remote_path

下载

get remote_path local_path

四. mysql数据库结构本地导入

#新建数据库
create database myDatabaseName;
#选定数据库
use myDatabaseName;
#导入数据库sql文件
source myDatabaseName.sql

五. 配置python环境

对于爬虫,我使用的是scrapy,因此需要单独配置环境

pip3 install scrapy
pip3 install scrapy_splash
pip3 install PyMysql

六. 服务器后台运行项目

nohup python3 -u run.py > log.log 2>&1 &

输出日志文件log.log

查看进程运行状态

ps aux

七. 问题记录

1. python无法连接mysql

Traceback (most recent call last):

File “/usr/local/lib/python3.6/dist-packages/scrapy/crawler.py”, line 89, in crawl

yield self.engine.open_spider(self.spider, start_requests)

pymysql.err.InternalError: (1698, “Access denied for user ‘root’@‘localhost’”)

解决:

https://stackoverflow.com/questions/39281594/error-1698-28000-access-denied-for-user-rootlocalhost

Some systems like Ubuntu, mysql is using by default the UNIX auth_socket plugin.

Basically means that: db_users using it, will be “auth” by the system user credentias. You can see if your root user is set up like this by doing the following:

$ sudo mysql -u root # I had to use "sudo" since is new installation

mysql> USE mysql;
mysql> SELECT User, Host, plugin FROM mysql.user;

+------------------+-----------------------+
| User             | plugin                |
+------------------+-----------------------+
| root             | auth_socket           |
| mysql.sys        | mysql_native_password |
| debian-sys-maint | mysql_native_password |
+------------------+-----------------------+

As you can see in the query, the root user is using the auth_socket plugin

There are 2 ways to solve this:

  1. You can set the root user to use the mysql_native_password plugin
  2. You can create a new db_user with you system_user (recommended)

Option 1:

$ sudo mysql -u root # I had to use "sudo" since is new installation

mysql> USE mysql;
mysql> UPDATE user SET plugin='mysql_native_password' WHERE User='root';
mysql> FLUSH PRIVILEGES;
mysql> exit;

$ service mysql restart

Option 2: (replace YOUR_SYSTEM_USER with the username you have)

$ sudo mysql -u root # I had to use "sudo" since is new installation

mysql> USE mysql;
mysql> CREATE USER 'YOUR_SYSTEM_USER'@'localhost' IDENTIFIED BY '';
mysql> GRANT ALL PRIVILEGES ON *.* TO 'YOUR_SYSTEM_USER'@'localhost';
mysql> UPDATE user SET plugin='auth_socket' WHERE User='YOUR_SYSTEM_USER';
mysql> FLUSH PRIVILEGES;
mysql> exit;

$ service mysql restart

Remember that if you use option #2 you’ll have to connect to mysql as your system username (mysql -u YOUR_SYSTEM_USER)

Note: On some systems (e.g., Debian stretch) ‘auth_socket’ plugin is called ‘unix_socket’, so the corresponding SQL command should be: UPDATE user SET plugin='unix_socket' WHERE User='YOUR_SYSTEM_USER';

Update: from @andy’s comment seems that mysql 8.x.x updated/replaced the auth_socket for caching_sha2_password I don’t have a system setup with mysql 8.x.x to test this, however the steps above should help you to understand the issue. Here’s the reply:

One change as of MySQL 8.0.4 is that the new default authentication plugin is ‘caching_sha2_password’. The new ‘YOUR_SYSTEM_USER’ will have this auth plugin and you can login from the bash shell now with “mysql -u YOUR_SYSTEM_USER -p” and provide the password for this user on the prompt. No need for the “UPDATE user SET plugin” step. For the 8.0.4 default auth plugin update see, https://mysqlserverteam.com/mysql-8-0-4-new-default-authentication-plugin-caching_sha2_password/

2. 将远程数据库导入到本地

将远程数据库导入到本地有两种方法:
第一种是使用mysqldump到远程指令直接连接,但是由于我在远程mysql中限制了禁止远程访问,因此这里选择第二种。
第二种是在远程服务器处生成sql文件,下载到本地再导入数据库。
问题便出现在这里:
由于我此次的爬虫包含图片的二进制数据,因此导出的sql文件为2.7G
当我导入本地数据库时候显示:

Mysql server has gone away

查询时候根据以下网址:
mysql-server-has-gone-away-solutions
找到我的解决办法。

MySQL max_allowed_packet
max_allowed_packet is the maximum size of one packet. The default size of 4MB helps MySQL server catch large (possibly incorrect) packets. As of MySQL 8, the default has been increased to 16MB. If mysqld receives a packet that is too large, it assumes that something is wrong and closes the connection. To fix, you should increase the max_allowed_packet in my.cnf, then restart MySQL. The max for this setting is 1GB.

查询我原先的max_allowed_packet

show global variables like 'max_allowed_packet';
+--------------------+----------+
| Variable_name      | Value    |
+--------------------+----------+
| max_allowed_packet | 67108864 |
+--------------------+----------+
1 row in set (0.00 sec)

对其进行更改

set global max_allowed_packet = 268435456;

再次导入sql文件成功!

3. scrapy输出信息限制

scrapy会默认输出debug信息级别的信息。
由于我本次爬取的内容包含图片,因此item输出时候的信息会过于冗长,不便于查看。
于是要调用scrapy自带的logging模块,筛选输出。
首先熟悉一下日志输出的级别:

CRITICAL – 严重错误(critical)
ERROR – 一般错误(regular errors)
WARNING – 警告信息(warning messages)
INFO – 一般信息(informational messages)
DEBUG – 调试信息(debugging messages)

log设置

LOG_ENABLED
LOG_ENCODING
LOG_FILE
LOG_LEVEL
LOG_STDOUT

我们需要在setting中设置

LOG_LEVEL= 'WARNING'

即可简化输出

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值