PHP 日志收集系统

最近业务中涉及到远程服务器的日志收集需求, 出于限制技术栈扩大的想法,使用PHP进行了实现.

实现过程中有些小小需要注意的点,记录如下:

1. 主动获取. 由于服务器较多, 如果使用Flume之类的架构, 需要在每台服务器上安装软件, 这就产生了运维成本 . 所以我们使用 收集端主动获取的方式. 不需要在生产者(服务端)安装软件.

2.SSH连接. 每台服务器都配置了SSH连接权限,使用PHP的 ssh2扩展即可远程连接并访问服务器内容.

3.服务器日志结构统一.  每台服务器上的日志文件都按同一目录 规则放置,以简化程序逻辑.

4.CLI运行. 收集是持续运行的程序,使用CLI模式,要注意,此时所使用的INI文件问题.

5.SSH连接异常.  有时,由于网络问题,导致SSH连接或验证失败, 延时重试即可.

6.日志截断与压缩. 通常,我们的运维会在每天的固定时间对日志进行截断和压缩, 这就有了两种类型的文件需要读取:压缩与未压缩的日志, 需要分别处理.

7.日志中的时间戳. 以秒为单位 的时间戳不足以区分请求, 我们增加$msec以毫秒计量, 同一毫秒内,同一IP来源,同一UA的可以认为是一个请求.

8.读取目录.  使用readdir即可读取SSH格式的远程目录, readdir("ssh2.sft://......"); 过滤掉不需要的文件后, 按文件创建时间排序,逐个处理.

9.读取压缩文件. 如果用file_get_contents会导致界面长期无响应, 我使用了fopen, fread 分步读取. 一次读取8K(再大也没有用了).  每读取一定次数后,输出一个进度显示.

10.压缩文件缓存. 读取成功后, 保存到缓存目录 , 以便备份以及下次使用. 如果程序出错或重新运行时, 先检查缓存目录, 如果有缓存文件,就不用从网络上读取了.

11.解压缩. 使用gzdecode即可. 这会导致PHP内存需要暴增, 调整PHP.INI吧, 把内存限制扩大.

12.压缩日志处理完成记录. 处理完成一个压缩文件后, 在数据库中记录下来, 以后PHP程序运行后,就不用重复处理了.

13.未压缩日志处理. 未压缩的日志表明,此日志仍在增长中. 不需要缓存. 使用数据库记录,当前文件指针(使用ftell,fseek). 记录文件创建日期.

14.未压缩日志判断. 当文件日期与记录的日期不同时, 或文件小于记录中的文件大小, 说明 此文件被更新了, 需要重置文件指针.

    否则可以直接定位(fseek),以继续从上次处理的位置进行.

15.日志行分解. 使用正则即可,根据空格及定界符进行区分. 也可使用logParser第三方类库来处理. 为节省内存开销.可使用Iterator 协程模式, 逐行返回.

16.日志判重. 事先读取每个服务器的最后 日志时间戳(毫秒)以及IP,UA. 

17.日志保存. 我是使用了MYSQL来保存日志. 每一行日志执行一次MYSQL会极大浪费运行时间, 可以累积4000行再一次性插入.

18.错误处理. 除了SSH连接失败外, 还会读取半行日志,导致分解失败, 此时也抛出异常. 由主程序捕获,并重新运行即可.

源程序如下:

<?php
/**
 * Created by IcePHP Framework.
 * User: 蓝冰大侠
 * Date: 2018/4/11
 * Time: 15:09
 */

class MLogImport
{
    /**
     * 当前正在处理的站点名称
     * @var string
     */
    private $feed;

    /**
     * 当前正在处理的站点主机地址
     * @var string
     */
    private $host;

    /**
     * 当前站点的登录名/站点名称
     * @var string
     */
    private $user;

    /**
     * 当前站点的登录密码
     * @var string
     */
    private $pass;

    /**
     * 当前站点的日志所在目录
     * @var string
     */
    private $logPath;

    //用于本地保存服务器日志的目录,仅备份压缩后的日志
    const CACHE_PATH = DIR_ROOT . 'run/serverLog/';

    /**
     * 当前站点的SSH连接
     * @var resource
     */
    private $sftp;

    /**
     * 当前正在处理的文件名
     * @var string
     */
    private $file;

    /**
     * 本站最后一条日志
     * @var array
     */
    private $lastRow;

    private $agentPatterns;
    private $agentReplaces;

    public function __construct()
    {
        $maps = [
            '/AppleWebKit\/[\d\.]*/i' => 'AppleWebKit/...',
            '/Mobile\/[\d\w]*/i' => 'Mobile/...',
            '/Safari\/[\d\.]*/i' => 'Safari/...',
            '/CriOS\/[\d\.]*/i' => 'CriOS/...',
            '/GSA\/[\d\.]*/i' => 'GSA/...',
            '/Version\/[\d\.]*/i' => 'Version/...',
            '/Chrome\/[\d\.]*/i' => 'Chrome/...',
            '/Edge\/[\d\.]*/i' => 'Edge/...',
            '/Firefox\/[\d\.]*/i' => 'Firefox/...',
            '/SamsungBrowser\/[\d\.]*/i' => 'SamsungBrowser/...',
            '/build\/[\d\w\-\.]*/i' => 'build/...',
            '/Silk\/[\d\.]*/i' => 'Silk/...',
            '/Crosswalk\/[\d\.]*/i' => 'Crosswalk/...',
            '/Gecko\/[\d\.]*/i' => 'Gecko/...',
            '/NTENTBrowser\/[\.\d]*/i' => 'NTENTBrowser/...',
            '/Snapchat\/[\d\w\-\.]*/i' => 'Snapchat/...',
            '/Java\/[\d\.]*/i' => 'Java/...',
            '/UCBrowser\/[\d\.]*/i' => 'UCBrowser',

            '/\(Linux[^\)]*SAMSUNG[^\)]*\)/i' => 'SAMSUNG...',
            '/\([^\)]*IPAD[^\)]*\)/i' => 'IPAD...',
            '/\([^\)]*SM-[^\)]*\)/i' => 'SM...',

            '/LG\-[\w\d]*/i' => 'LG...',
            '/LGL\d[\w\d]*/i' => 'LGL...',
            '/itel it\d*/i' => 'itel...',
            '/XT\d*/i' => 'XT...',
            '/TECNO\-[\w\d]*/i' => 'TECNO...',
            '/RCT[\d\w]*/i' => 'RCT...',
            '/Micromax\s[\w\d]*/i' => 'Micromax...',
            '/LGMS[\d]*/i' => 'LGMS...',
            '/GT\-[\w\d]*/i' => 'GT...',
            '/HUAWEI\s[\-\w\d]*/i' => 'HUAWEI...',
            '/Lenovo\s[\-\w\d]*/i' => 'Lenovo...',
            '/SCH\-[\w\d]*/i' => 'SCH...',
            '/rv\:[\d\.]*/i' => 'rv:...',
            '/Lumia\s\d+/i' => 'Lumia...',
            '/Instagram\s[\d\.]*/i' => 'Instagram...',

            '/iPhone OS 5[_\d]*/i' => 'iOS 5...',
            '/iPhone OS 6[_\d]*/i' => 'iOS 6...',
            '/iPhone OS 7[_\d]*/i' => 'iOS 7...',
            '/iPhone OS 8[_\d]*/i' => 'iOS 8...',
            '/iPhone OS 9[_\d]*/i' => 'iOS 9...',
            '/iPhone OS 10[_\d]*/i' => 'iOS 10...',
            '/iPhone OS 11[_\d]*/i' => 'iOS 11...',
            '/iOS 5[_\d]*/i' => 'iOS 5...',
            '/iOS 6[_\d]*/i' => 'iOS 6...',
            '/iOS 7[_\d]*/i' => 'iOS 7...',
            '/iOS 8[_\d]*/i' => 'iOS 8...',
            '/iOS 9[_\d]*/i' => 'iOS 9...',
            '/iOS 10[_\d]*/i' => 'iOS 10...',
            '/iOS 11[_\d]*/i' => 'iOS 11...',
            '/Android 2[\.\d]*/i' => 'Android 2...',
            '/Android 3[\.\d]*/i' => 'Android 3...',
            '/Android 4[\.\d]*/i' => 'Android 4...',
            '/Android 5[\.\d]*/i' => 'Android 5...',
            '/Android 6[\.\d]*/i' => 'Android 6...',
            '/Android 7[\.\d]*/i' => 'Android 7...',
            '/Android 8[\.\d]*/i' => 'Android 8...',

            '/QuantcastSDK[^\s]*(\s\(\d+\))?/i' => 'QuantcastSDK...',
        ];
        $this->agentPatterns = array_keys($maps);
        $this->agentReplaces = array_values($maps);
    }

    /**
     * 记录一个站点的账号,密码,日志路径
     * @param string $host 主机/账号
     * @param string $user 站点名称/登录名
     * @param string $pass 登录密码
     * @param string $logPath 日志文件路径
     */
    public function site(string $host, string $user, string $pass, string $logPath): void
    {
        $this->feed = $user;
        $this->host = $host;
        $this->user = $user;
        $this->pass = $pass;
        $this->logPath = $logPath;

        //重新连接的间隔时间
        $interval = 1;
        $connect = null;
        while (true) {
            //连接主机
            $connect = ssh2_connect($this->host, '22');

            //账号密码验证成功
            if (false !== ssh2_auth_password($connect, $this->user, $this->pass)) {
                break;
            }

            //间隔时间2秒,4,8,...
            $interval *= 2;
            echo "auth wrong at $this->host, retry after $interval seconds\r\n";

            //间隔指定 时间后,重新连接
            sleep($interval);
        }

        //登录成功
        echo "\r\nlogin $this->feed\r\n";

        //读取文件列表
        $this->sftp = ssh2_sftp($connect);
        if(!$this->sftp){
            throw new Exception('ssh2_sftp fail.');
        }
        $handle = opendir("ssh2.sftp://{$this->sftp}{$this->logPath}");   //ssh2.sftp://Resource #33/home/.....
        if (!$handle) {
            throw new Exception('open dir ssh2.sftp fail.');
        }

        $zippedFiles = [];
        $unzippedFile = '';
        while (false !== ($file = readdir($handle))) {
            $filePath = "ssh2.sftp://{$this->sftp}{$this->logPath}/$file";

            //必须是文件,目录的不要
            if (!is_file($filePath)) continue;

            //必须是访问日志
            if (left($file, 10) !== 'access.log') continue;

            //如果是压缩文件
            if (substr($file, -3) === '.gz') {
                //4.5之前的不处理(这天改格式了)
                if (substr($file, 11, 8) < '20180405') continue;
                $zippedFiles[] = $file;
            } else {
                $unzippedFile = $file;
            }
        }
        closedir($handle);

        //本站最后请求时间
        $this->lastRow = table('log')->row('*', ['feedName' => $this->feed], 'id desc')->toArray();

        //按创建时间正序排序
        asort($zippedFiles);

        //逐个文件处理压缩文件
        foreach ($zippedFiles as $file) {
            $this->file = $file;
            $this->zipped();
        }

        //如果有非压缩日志,处理
        if ($unzippedFile) {
            $this->file = $unzippedFile;
            $this->unzipped();
        }
    }


    /**
     * 读取远程 文件内容
     * @param $indicator string 远程 文件指示器
     * @param $size int 文件大小
     * @return Iterator 遍历器
     */
    private function readUnzipped(string $indicator, int $size): Iterator
    {
        echo "Begin read File:$this->file:" . STool::kmgt($size) . "\r\n";

        //打开文件,指向上次读取的位置
        $f = fopen($indicator, 'r');
        if (!$f) {
            return;
        }
        if ($this->offset) {
            fseek($f, $this->offset);
            echo "Seek to $this->offset\r\n";
        }

        //总行数
        $lines = 0;

        //逐行读取
        while (!feof($f)) {
            $lines++;
            $line = fgets($f);

            //更新偏移量
            $this->offset = ftell($f);

            //返回行数
            yield $line;

            //每200行输出一个显示
            if ($lines % 500 == 0) {
                echo "read $this->feed $this->file Lines:$lines\r\n";
            }
        }
        fclose($f);
        echo "read $this->feed $this->file Lines:$lines\r\n";
        echo "End.\r\n";
    }


    /**
     * 读取远程 文件内容
     * @return string 缓存文件路径
     */
    private function readZipped(): string
    {
        //构造远程文件地址
        $indicator = "ssh2.sftp://$this->sftp$this->logPath/$this->file";

        //文件大小
        $fileSize = filesize($indicator);
        $size = STool::kmgt($fileSize);

        //如果有缓存文件且缓存文件大小一致,则使用缓存文件
        $cacheFile = self::CACHE_PATH . $this->feed . '/' . $this->file;
        if (is_file($cacheFile) and filesize($cacheFile) == $fileSize) {
            echo "Read Zipped File From Cache:" . $this->file . ' ' . $size . "\r\n";
            return $cacheFile;
        }

        //从服务器读文件
        echo "Begin read File:{$this->file}:" . $size . "\r\n";
        $fileHandle = fopen($indicator, 'rb');
        if (!$fileHandle) {
            dump($indicator, 'OPEN FAIL');
            exit;
        }

        //读取远程文件内容
        $content = '';
        $i = 0;
        while (!feof($fileHandle)) {
            //每次能读回8K字节
            $content .= fread($fileHandle, 65536);

            //每128K显示一次读取进度
            $i++;
            if ($i % 16 == 0) {
                echo "$this->feed $this->file Reading :" . STool::kmgt(strlen($content)) . "/$size\r\n";
            }
        }
        fclose($fileHandle);

        //保存到缓存文件中
        echo "Save to cache:" . $cacheFile . " \r\n";
        makeDir(dirname($cacheFile));
        file_put_contents($cacheFile, $content);

        //返回压缩文件内容
        return $cacheFile;
    }

    /**
     * 字符串分行
     * @param string $content
     * @return Iterator
     */
    public function explode(string $content): Iterator
    {
        $size = strlen($content);
        $pointer = 0;
        while ($pointer < $size) {
            $next = strpos($content, "\n", $pointer);
            if ($next === false) {
                $line = substr($content, $pointer);
                $next = $size;
            } else {
                $line = substr($content, $pointer, $next - $pointer);
            }
            yield $line;
            $pointer = $next + 1;
        }
    }

    private function valid(string $url): bool
    {
        return false !== strpos($url, '/?s=') or false !== strpos($url, '/?ss=') or preg_match('/^\/.*\/.*\/$/i', $url);
    }

    /**
     * 处理一个压缩日志文件
     */
    private function zipped(): void
    {
        //检查文件已经处理过
        $fileTable = table('zipped');
        if ($fileTable->exist(['feedName' => $this->feed, 'fileName' => $this->file])) return;

        //读取文件内容
        $gz=gzopen($this->readZipped(),'r');

        echo "\r\nBegin Process File\r\n";
        //$memTable = $this->createTemporaryTable(uniqid('tmp_'));

        //要插入的日志表
        $logTable = table('log');

        //要插入的行缓冲区
        $rows = [];
        $insertRowsCount = 0;

        $content = null;
        $key=0;
        while(!gzeof($gz)) {
            $line=gzgets($gz);
            if ((++$key) % 30000 == 0) {
                echo "Analysis LINES:$key\r\n";
            }

            //空行不处理
            $line = trim($line);
            if (!$line) continue;

            //行分解
            $parts = $this->explodeLine($line);
            if (!$parts) continue;

            //判断 是否是 搜索 行
            if (!$this->valid($parts['url'])) continue;

            //检查是否已经处理过
            if ($this->lastRow) {
                if ($parts['timestamp'] < $this->lastRow['timestamp']) continue;
                if ($parts['timestamp'] == $this->lastRow['timestamp'] and $parts['url'] == $this->lastRow['url'] and $parts['ip'] == $this->lastRow['ip']) {
                    continue;
                }
            }

            //加入缓冲 区
            $parts['feedName'] = $this->feed;
            $rows[] = $parts;

            //每4000行执行一次插入,再多就会出现placeholder太多
            if (count($rows) >= 4000) {
                $logTable->inserts($rows);
                $insertRowsCount += count($rows);
                SDebug::clearMsgs();
                $rows = [];
            }
        }

        //处理最后剩余的行
        if (count($rows)) {
            $logTable->inserts($rows);
            $insertRowsCount += count($rows);
            SDebug::clearMsgs();
        }

        echo "insert LINES:$insertRowsCount\r\n";

        //标记此文件已经处理过
        //$fileTable->begin();
        //$this->move($memTable);
        $fileTable->insert(['feedName' => $this->feed, 'fileName' => $this->file]);
        //$fileTable->commit();
    }

    /**
     * 将临时表中的日志转移到正式表中
     * @param STable $memTable 临时表对象
     */
    private function move(STable $memTable)
    {
        $fields = ['feedName', 'accessTime', 'timestamp', 'ip', 'requestTime', 'responseTime', 'method', 'url', 'code', 'length', 'referrer', 'agentId', 'created', 'updated', 'forward'];;
        $fieldsStr = implode(',', $fields);
        $memTable->execute("Insert" . " Into log($fieldsStr) select $fieldsStr from " . $memTable->name());
        $memTable->deleteAll();
    }

    /**
     * 当前文件的偏移
     * @var int
     */
    private $offset;

    /**
     * 处理一个未压缩的日志文件
     */
    private function unzipped(): void
    {
        //检查上次处理情况
        $fileTable = table('unzipped');

        //如果没有记录,则生成一条初始记录
        if ($fileTable->notExist(['feedName' => $this->feed])) {
            $fileTable->insert(['feedName' => $this->feed, 'offset' => 0, 'size' => 0, 'timestamp' => 0]);
        }

        //取出处理信息,其中包含 offset(上次文件指针位置),size(上次文件大小), lasttime(上次最后时间)
        $info = $fileTable->row('*', ['feedName' => $this->feed]);

        //构造远程文件地址
        $indicator = "ssh2.sftp://$this->sftp$this->logPath/$this->file";

        //文件大小
        $fileSize = filesize($indicator);

        //文件变小了, 说明是新文件
        if ($fileSize < $info['size']) {
            $this->offset = 0;
        } else {
            // 取首行
            $f = fopen($indicator, 'r');
            $firstLine = fgets($f);
            fclose($f);
            $first = $this->explodeLine($firstLine);
            $timestamp = $first['timestamp'];
            if ($timestamp > $info['timestamp']) {
                $this->offset = 0;
            } else {
                $this->offset = $info['offset'];
            }
        }

        echo "\r\nBegin Process File\r\n";

        //要插入的日志表
        $logTable = table('log');

        //要插入的行缓冲区
        $rows = [];
        $insertedRowsCount = 0;

        $iterator = $this->readUnzipped($indicator, $fileSize);
        $lastTime = 0;
        foreach ($iterator as $key => $line) {
            //空行不处理
            $line = trim($line);
            if (!$line) continue;

            //分解 日志行
            $parts = $this->explodeLine($line);
            if (!$parts) continue;

            //判断 是否是 搜索 行
            if (!$this->valid($parts['url'])) continue;

            //判断是否已经导入
            if ($this->lastRow and (floatval($parts['timestamp']) < floatval($this->lastRow['timestamp']))) continue;
            $rows[] = array_merge($parts, [
                'feedName' => $this->feed
            ]);

            //最大的时间戳
            $lastTime = $parts['timestamp'];

            //批量插入
            if (count($rows) >= 100) {
                $insertedRowsCount += count($rows);
                $logTable->inserts($rows);
                $fileTable->update(['size' => $fileSize, 'offset' => $this->offset, 'timestamp' => $lastTime], ['feedName' => $this->feed]);
                echo "Insert LINES:$insertedRowsCount\r\n";
                SDebug::clearMsgs();
                $rows = [];
            }
        }

        //处理最后剩余的行
        if (count($rows)) {
            $insertedRowsCount += count($rows);
            $logTable->inserts($rows);
            $fileTable->update(['size' => $fileSize, 'offset' => $this->offset, 'timestamp' => $lastTime], ['feedName' => $this->feed]);
            echo "Insert LINES:$insertedRowsCount\r\n";
            SDebug::clearMsgs();
        }
    }

    /**
     * 分解一行日志
     * @param $line string
     * @return array
     * @throws Exception 匹配失败
     */
    private function explodeLine(string $line): array
    {
        //[08/Apr/2018:03:30:17 +0800] 1523129417.075 72.178.128.43 - 0.114 - "GET /index.php/blog/search/?s=lowering%20ldl%20cholesterol&subid=tgr_zhen_BFX0J1ILON6N__rmlwuf_73004751 HTTP/1.1" 499 0 "http://168634854.keywordblocks.com/Cholesterol_Hdl_Ldl_Ratio.cfm?&vsid=1661264105777118&vi=1523124812717930856&dytm=1523124813100&kbbq=%26sde%3D1%26adepth%3D1%26ddepth%3D3&tdAdd[]=%7C%40%7Csde%3D1%7C%40%7Cadepth%3D1%7C%40%7Cddepth%3D3&sbdrId=135&vgd_matchstr=CommercialUrlOn%7Chlid%3D2002&matchstring=CommercialUrlOn%7Chlid%3D2002&vgd_bdata=ss%3D320x568%7C%7CMM%3D1.0%7C%7Cbb%3D145%7C%7CMP%3D.*%2Fcholesterol-management%2F.*%7C%7Cfbb%3D0%7C%7CRB%3D34.18110604079318%7C%7Cbtd%3D2341877441767294977%7C%7Ccbid%3D34.18110604079318%7C%7CMB%3D15.0%7C%7CMC%3DAUTO%7C%7Curl_l%3D50%7C%7Chour_group_l%3D20%7C%7CRImp%3D9.0%7C%7Cbid%3D15.1%7C%7Cdevice_l%3D20%7C%7CisRef%3D0&verid=111299&acid=427573913652889251523124810846&hvsid=00001523124812870012196577713996&upk=1523124813.1380&sttm=1523124812870&=&kp=1&kbc=143697&bdrid=4&subBdr=135&kt=266&ki=5912010&ktd=274911461948&kbc2=rpc%3D0.14&fdkt=266&lkpgd=UUID%3Duuid_s8_3_1523124813_778621763%7C%7CSI%3D863%7C%7CMPTD%3D232%7C%7CPTD%3D6922032661652308480%7C%7CSID%3D14%7C%7CCI%3D863%7C%7CMN%3D8%7C%7Cerpm%3D-1.0%7C%7CMI%3D863%7C%7CKTGD%3D3866%7C%7CKSE%3D1523124813242%7C%7CAN%3D5%7C%7CHID%3D3%7C%7CPTD2%3D16896&&lktgd=3866&&fp=biwFab2EOSptF9Dp9P5pLIuIHpVTe2ha94T5u6HCtebISTPUlc1la6_ujtvHa-nb8nGHPkJ_EnIwZF7mo3KnR2p3XYd1wmF70O9szYDQ9ufyP0OyS-gxVg%3D%3D&c=O5LJq2Lix-2w0IdspaXDCw&cme=rs5xevxSmJb0u22ZZHKqUTjYupvdJAHcw4kmb0sBhK6UBgyb-EKIO8Yg8DI2Uv0ZcpIG4AQvPb75jBLoeAG5VMn2cBgcO0Er9uHnU2G2b5527aplb-EHrVG_De8s_c_9-9bkhpH6jUmk3eK5uGthWBagtuatdg2SBe72cEUSh9aPY9sVJrkoOPaGQsQOH5rqAz1TMLK3_fisF-ozH6JyNg%3D%3D%7C%7CNDHRnZ9Gz3KXlI-i9OnZqQ%3D%3D%7C5gDUJdTGiJzedmq9hanWYg%3D%3D%7CtrJ5NInYpv_AyRdJRHyQbAoA6iGqXTxu%7CRrUTbnOe6Nf9cTuAtIJVy9no3H-wuOVy%7CN7fu2vKt8_s%3D%7Cl44MelaykDW0jQJG6bjukdQlinX0DB9oV4Sm9gijr_bD43Zl1UaHw39JatxHgP46euFaB3PMdSqZJqb8JKnexHrlF_K3RJ5R%7CJf0d-WoAdPuDA6UD6Gc_F1zJX7Ucny7osFvXic8Z4MU%3D%7Cue9AR4Lxeuwq7AuXzY3UTfqQIZ7T1ETAepQ5ZjhMUrn8F4iL72pDJxv9w1vxSK2jeiEactQl6VTIdrnkiwcfmH0laLhDYgMhmFyUaT5z0ZmFu4kbMwh587f73k-Z2prl2NRyNqvZoZL_mL9UwcCaoUiGM916VV0SyiuEizF5kMH-PgMGZNtaVAulY1i6cP1h%7C&ib=0&cid=8CU12LGKP&crid=285618735&size=300x250&lpid=&tsid=1&ksu=233&chid=&https=0&kwdsMaxTm=400&ugd=3&maxProviderPixel1=%2F%2Fc.ad-srv.co%2Fpixel&maxProviderPixel2=%2F%2Fc.adyield.co%2Fpixel&rms=1523124813&&sc=TX&asn=11427&kals=base&kalog=SI%3D863%7C%7CTPTD%3D516%7C%7CCI%3D863%7C%7CUUID%3Duuid_s12_nc1b_4_1523124812_210054693%7C%7CSID%3D11%7C%7CHID%3D4%7C%7CMI%3D863%7C%7CMPTD%3D176&kasts=tstype%3DBASE_BAG%7C%7C&kata=8ce5&clsKb=2&ecref=w77E%3ASS7mE8NQ.BJGYO.NmYS7mE8NSuSTOj%2BTJeJjQ%2BImLY1j%2BD1zyJS%3Fx7YMN1YE18yzvJY4RuuT%26x7YM7JLYvBw17n8QnzmLY1jnjOjnNwmjJQ7JLmj%26x7YMQmxLNJv%26x7YMYJO8xYvG%26x7YMNmz7Jz7vfHFWiXW9FiuH%26yM7yv%26yM78vUBOofiiAXAf9uWuH%26yMOJvY%26yMOYv%26yM1Evu7f%26yMzBvy%26yMN8vu9HHHfuWWX%26yM18vX9hiuHFXXFA%26yMjEvi9fhX9A%26yMj8v%26UvBw17n8QnzmLY1jnjOjnNwmjJQ7JLmj%26yNj8Ov%3Dd9C%3DgdBfCqpRD%3DfKDVQK6rMLAkw9YDzi%20YAVJOGawyN077a0Ezk7z8903%20AuogkUFBTjI%20%3DK8jODdV1K8Qg4KTBMBNR&kct=20512&abpl=2" "Mozilla/5.0 (iPhone; CPU iPhone OS 11_2_6 like Mac OS X) AppleWebKit/604.5.6 (KHTML, like Gecko) Version/11.0 Mobile/15D100 Safari/604.1" "-"

        $ns = '([^\s]*)';
        $str = '"([^"]*)"';
        $datetime = '\[([^\]]*)\]';

        //正则匹配
        $matched = preg_match("/$datetime $ns $ns \- $ns $ns $str $ns $ns $str $str $str/i", $line, $matches);
        if (!$matched) {
            throw new Exception('NOT MATCH');
        }

        //空格区别 MODE URL HTTP协议
        list($mode, $url, $protocol) = explode(' ', $matches[6]);

        return [
            'accessTime' => datetime(strtotime($matches[1])), //访问时间(秒)
            'timestamp' => floatval($matches[2]),//访问时间戳(带毫秒)
            'ip' => $matches[3], //请求者IP
            'requestTime' => floatval($matches[4]), //Nginx处理请求的时间
            'responseTime' => floatval($matches[5]), //Nginx完成整个响应的时间
            'method' => $mode, //GET/POST/...
            'url' => $url, //请求地址
            'code' => $matches[7], //响应代码
            'length' => intval($matches[8]), //响应正文长度
            'referrer' => left($matches[9], 250), //引用
            'agentId' => $this->getAgentId($matches[10]), //用户代理
            'forward' => $matches[11] //真实IP
        ];
    }

    //获取Agent与ID的对应关系
    private function getAgentMap()
    {
        $rows = table('agent')->select('id,agent', null, 'agent')->toArray();
        return array_column($rows, 'id', 'agent');
    }

    //根据一个Agent,获取对应ID,如果没有则创建一个对应关系
    private function getAgentId($agent)
    {
        //如果UA为空
        if (!$agent) {
            return 0;
        }

        //静态内存缓存
        static $maps;
        if (!$maps) {
            $maps = $this->getAgentMap();
        }


        //缩减[FBAN/FBIOS;...]
        $sub = mid($agent, '[', ']');
        if ($sub) {
            $agent = str_replace('[' . $sub . ']', '[...]', $agent);
        }

        $agent = str_replace(' (KHTML, like Gecko)', '', $agent);

        //变种归并
        $agent = preg_replace($this->agentPatterns, $this->agentReplaces, $agent);

        //Agent缩减到250个字符
        $agent = left($agent, 191);
        if (!isset($maps[$agent])) {
            $id = table('agent')->insertIgnore(['agent' => $agent]);
            $maps[$agent] = $id;
        }

        return $maps[$agent];
    }

}

————————————————
版权声明:本文为CSDN博主「蓝冰大侠」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/bluehire/article/details/79985203

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值