将 APACHE 日志解析到 SQL 数据库中

最新推荐文章于 2023-05-14 08:36:14 发布

allway2

最新推荐文章于 2023-05-14 08:36:14 发布

阅读量362

点赞数

文章标签：数据库 apache sql

本文链接：https://blog.csdn.net/allway2/article/details/126070718

版权

背景

将 Apache HTTPD 访问日志解析到数据库结构中，您可以轻松地针对它运行查询和报告，以更好地了解 Web 流量并检测问题。通过使用 pip 可安装的 apache_log_parser [ 1 ]包和 sqlite3 python 模块[ 2 ]作为标准 python 库的一部分，我们可以快速解析这些访问日志并将条目插入到 sqlite3 数据库中。

方法说明

关于代码中采用的方法需要注意以下几点：

选择 sqlite3 是因为它的原生支持和易用性，但其他 DBMS 也可以类似使用
模式中不包含自动递增的 id 列，因为 sqlite3 不推荐这样的列，除非必要[ 4 ]。您可能希望为其他数据库系统添加一个
apache_log_parser 生成 apache 日志条目部分的字典。鉴于此，可用的字典键取决于提供给 make_parser() 函数的 access.log 的日志模式
代码片段中的列只是解析器返回的全部键/值的一个子集。您可以通过调用 pprint() 轻松查看从给定行生成的所有键和值
在代码中添加了一个额外的“日期”键和值（以 sqlite 日期函数友好的方式），以允许基于日期的查询和标准

代码

#!/usr/bin/env python3

import sys

import sqlite3

import apache_log_parser

if len(sys.argv) != 2:

print("Usage:", sys.argv[0], "/path/to/access.log")

exit(1)

conn = sqlite3.connect('/tmp/logs.db')

cur = conn.cursor()

cur.execute("""

CREATE TABLE IF NOT EXISTS logs (

status INTEGER,

request_method TEXT,

request_url TEXT,

date TEXT

)

""")

# Pattern below is from the LogFormat setting in apache2.conf/httpd.conf file

# You will likely need to change this value to the pattern your system uses

parser = apache_log_parser.make_parser(

"%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\""

)

log_file = sys.argv[1]

with open(log_file) as f:

for line in f:

d = parser(line)

# Line below adds minimalistic date stamp column

# in format that sqlite3 date functions can work with

d['date'] = d['time_received_datetimeobj'].date().isoformat()

cur.execute("""

INSERT INTO logs ( status, request_method, request_url, date)

VALUES (:status, :request_method, :request_url, :date)

""", d)

cur.close()

conn.commit();

conn.close();

示例运行

示例 access.log 条目：

# cat /var/log/apache2/access.log

::1 - - [25/Nov/2018:20:13:28 -0500] "GET /index.html HTTP/1.1" 200 3380 "-" "User agent string"

::1 - - [25/Nov/2018:12:15:26 -0500] "GET /notthere HTTP/1.1" 404 498 "-" "User agent string"

运行上面的代码：

$ ./httpd_log_parser.py /var/log/apache2/access.log

检查结果：

$ sqlite3 -header -column /tmp/logs.db 'SELECT * FROM logs'

status request_method request_url date

---------- -------------- ----------- ----------

200 GET /index.html 2018-11-25

404 GET /notthere 2018-11-25

示例查询

加载日志条目后，您可以开始对它们运行有用的查询。

按响应状态显示请求数：

SELECT count(1) as nReqs,

status

FROM logs

GROUP BY (status)

ORDER BY nReqs DESC;

nReqs status

---------- ----------

4 200

2 404

什么是最常见的 404 网址：

SELECT count(1) as nReqs,

request_url

FROM logs

WHERE status = 404

GROUP BY (request_url)

ORDER BY nReqs DESC;

nReqs request_url

---------- -----------

2 /notthere

最后的想法

上述方法在日志文件被 apache 进程写出后对其进行处理。这可能特别有用，因为可以将日志文件卸载到不同的系统并使用您选择的数据库进行分析，而不会影响网络服务器的性能。这种方法的替代方法是使用类似mod_log_sql（又名libapache2-mod-log-sql）的东西，这是一个专门构建的 apache 模块，可在请求时将请求直接记录到 MySQL DB。任何一种选择都需要权衡取舍，但希望所提供的信息能让您在旅途中抢占先机。保重，祝你好运！

参考

GitHub - rory/apache-log-parser - GitHub - amandasaurus/apache-log-parser: Parses log lines from an apache log
[Python] sqlite3 — SQLite 数据库的 DB-API 2.0 接口 - sqlite3 — DB-API 2.0 interface for SQLite databases — Python 3.10.5 documentation
[SQLite] 日期和时间函数 - Date And Time Functions
SQLite 自动增量 - SQLite Autoincrement

allway2

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
将 APACHE 日志解析到 SQL 数据库中

这可能特别有用，因为可以将日志文件卸载到不同的系统并使用您选择的数据库进行分析，而不会影响网络服务器的性能。这种方法的替代方法是使用类似。）的东西，这是一个专门构建的apache模块，可在请求时将请求直接记录到MySQLDB。鉴于此，可用的字典键取决于提供给make_parser()函数的access.log的日志模式。代码片段中的列只是解析器返回的全部键/值的一个子集。在代码中添加了一个额外的“日期”键和值（以sqlite日期函数友好的方式），以允许基于日期的查询和标准。...
复制链接

扫一扫