php统计nginx访问日志的ip访问量

最新推荐文章于 2022-11-16 14:54:49 发布

weixin_34195364

最新推荐文章于 2022-11-16 14:54:49 发布

阅读量254

点赞数

文章标签：运维 php python

原文链接：https://my.oschina.net/kaykay012/blog/706877

版权

2019独角兽企业重金招聘Python工程师标准>>>

逐字节读取分析，性能杠杠的。话不多说，直接上代码

编写文件test01.php


//文件大小 4.8G
$filepath = '/usr/local/nginx/logs/acc.log';

$f = fopen($filepath, 'r');

$i = 0; //行数
$ipcount = []; //ip
while (!feof($f)) {    
    $str = fread($f, 1024*1024*2); //每次读2m
    
    $lines = explode("\n", $str);    
    foreach($lines as $line) {
        if(!$line){
            continue;
        }
        $ip = substr($line, 0, strpos($line, '- -')-1);
        if(ip2long($ip) === false) {
            continue;
        }
        $ipcount[$ip] = isset($ipcount[$ip]) ? $ipcount[$ip] : 0;
        $ipcount[$ip]++;
    }

    $num = substr_count($str, "\n");
    $i += $num;
}
fclose($f);

//输出结果写入文件
$str1 = "Lines:$i \n";
file_put_contents('/tmp/a.log', $str1, FILE_APPEND);

foreach($ipcount as $ip=>$count){
    file_put_contents('/tmp/a.log', $ip . ':' . $count . "\n", FILE_APPEND);
}

经过服务器测试

耗时：

输入图片说明

输出结果：

输入图片说明

只做行数统计耗时

输入图片说明

和shell命令 wc 比较

输入图片说明

但是我发现有一个问题，数据貌似不对，有个别地址被截断了。如图

输入图片说明

猜测应该是恰好读取到ip地址中间，所以ip地址就被截断了，所以我又优化了一下

//文件大小 4.8G
$filepath = '/usr/local/nginx/logs/acc.log';

$f = fopen($filepath, 'r');

$i = 0; //行数
$ipcount = []; //ip
$len = 1024*1024*2; //每次读2m

while (!feof($f)) {    
    $str = fread($f, $len);

    if (!feof($f)) {
        $pos1 = ftell($f);//当前文件指针位置

        $pos2 = (int) strrpos($str, PHP_EOL); // `\n` 在 $str中 最后一次出现的位置

        $pos3 = $len - $pos2 - 1; // 从`\n`在字符串($str)最后一次出现的位置 到 字符串($str)末尾de 字节长度

        fseek($f, $pos1 - $pos3); // 重新指定文件指针位置
        
        $str = substr($str, 0, $len-$pos3);                
    }
    
    $lines = explode(PHP_EOL, $str);    
    foreach($lines as $line) {
        if(!$line){
            continue;
        }
        $ip = substr($line, 0, strpos($line, '- -')-1);
        if(ip2long($ip) === false) {
            continue;
        }
        $ipcount[$ip] = isset($ipcount[$ip]) ? $ipcount[$ip] : 0;
        $ipcount[$ip]++;
    }

    $num = substr_count($str, PHP_EOL);
    $i += $num;
}
fclose($f);

//输出结果写入文件
$str1 = "Lines:$i \n";
file_put_contents('/tmp/a.log', $str1);

foreach($ipcount as $ip=>$count){
    file_put_contents('/tmp/a.log', $ip . ':' . $count . "\n", FILE_APPEND);
}

那么现在，完美

转载于:https://my.oschina.net/kaykay012/blog/706877