最近业务中涉及到远程服务器的日志收集需求, 出于限制技术栈扩大的想法,使用PHP进行了实现.
实现过程中有些小小需要注意的点,记录如下:
1. 主动获取. 由于服务器较多, 如果使用Flume之类的架构, 需要在每台服务器上安装软件, 这就产生了运维成本 . 所以我们使用 收集端主动获取的方式. 不需要在生产者(服务端)安装软件.
2.SSH连接. 每台服务器都配置了SSH连接权限,使用PHP的 ssh2扩展即可远程连接并访问服务器内容.
3.服务器日志结构统一. 每台服务器上的日志文件都按同一目录 规则放置,以简化程序逻辑.
4.CLI运行. 收集是持续运行的程序,使用CLI模式,要注意,此时所使用的INI文件问题.
5.SSH连接异常. 有时,由于网络问题,导致SSH连接或验证失败, 延时重试即可.
6.日志截断与压缩. 通常,我们的运维会在每天的固定时间对日志进行截断和压缩, 这就有了两种类型的文件需要读取:压缩与未压缩的日志, 需要分别处理.
7.日志中的时间戳. 以秒为单位 的时间戳不足以区分请求, 我们增加$msec以毫秒计量, 同一毫秒内,同一IP来源,同一UA的可以认为是一个请求.
8.读取目录. 使用readdir即可读取SSH格式的远程目录, readdir("ssh2.sft://......"); 过滤掉不需要的文件后, 按文件创建时间排序,逐个处理.
9.读取压缩文件. 如果用file_get_contents会导致界面长期无响应, 我使用了fopen, fread 分步读取. 一次读取8K(再大也没有用了). 每读取一定次数后,输出一个进度显示.
10.压缩文件缓存. 读取成功后, 保存到缓存目录 , 以便备份以及下次使用. 如果程序出错或重新运行时, 先检查缓存目录, 如果有缓存文件,就不用从网络上读取了.
11.解压缩. 使用gzdecode即可. 这会导致PHP内存需要暴增, 调整PHP.INI吧, 把内存限制扩大.
12.压缩日志处理完成记录. 处理完成一个压缩文件后, 在数据库中记录下来, 以后PHP程序运行后,就不用重复处理了.
13.未压缩日志处理. 未压缩的日志表明,此日志仍在增长中. 不需要缓存. 使用数据库记录,当前文件指针(使用ftell,fseek). 记录文件创建日期.
14.未压缩日志判断. 当文件日期与记录的日期不同时, 或文件小于记录中的文件大小, 说明 此文件被更新了, 需要重置文件指针.
否则可以直接定位(fseek),以继续从上次处理的位置进行.
15.日志行分解. 使用正则即可,根据空格及定界符进行区分. 也可使用logParser第三方类库来处理. 为节省内存开销.可使用Iterator 协程模式, 逐行返回.
16.日志判重. 事先读取每个服务器的最后 日志时间戳(毫秒)以及IP,UA.
17.日志保存. 我是使用了MYSQL来保存日志. 每一行日志执行一次MYSQL会极大浪费运行时间, 可以累积4000行再一次性插入.
18.错误处理. 除了SSH连接失败外, 还会读取半行日志,导致分解失败, 此时也抛出异常. 由主程序捕获,并重新运行即可.
源程序如下:
<?php /** * Created by IcePHP Framework. * User: 蓝冰大侠 * Date: 2018/4/11 * Time: 15:09 */ class MLogImport { /** * 当前正在处理的站点名称 * @var string */ private $feed; /** * 当前正在处理的站点主机地址 * @var string */ private $host; /** * 当前站点的登录名/站点名称 * @var string */ private $user; /** * 当前站点的登录密码 * @var string */ private $pass; /** * 当前站点的日志所在目录 * @var string */ private $logPath; //用于本地保存服务器日志的目录,仅备份压缩后的日志 const CACHE_PATH = DIR_ROOT . 'run/serverLog/'; /** * 当前站点的SSH连接 * @var resource */ private $sftp; /** * 当前正在处理的文件名 * @var string */ private $file; /** * 本站最后一条日志 * @var array */ private $lastRow; private $agentPatterns; private $agentReplaces; public function __construct() { $maps = [ '/AppleWebKit\/[\d\.]*/i' => 'AppleWebKit/...', '/Mobile\/[\d\w]*/i' => 'Mobile/...', '/Safari\/[\d\.]*/i' => 'Safari/...', '/CriOS\/[\d\.]*/i' => 'CriOS/...', '/GSA\/[\d\.]*/i' => 'GSA/...', '/Version\/[\d\.]*/i' => 'Version/...', '/Chrome\/[\d\.]*/i' => 'Chrome/...', '/Edge\/[\d\.]*/i' => 'Edge/...', '/Firefox\/[\d\.]*/i' => 'Firefox/...', '/SamsungBrowser\/[\d\.]*/i' => 'SamsungBrowser/...', '/build\/[\d\w\-\.]*/i' => 'build/...', '/Silk\/[\d\.]*/i' => 'Silk/...', '/Crosswalk\/[\d\.]*/i' => 'Crosswalk/...', '/Gecko\/[\d\.]*/i' => 'Gecko/...', '/NTENTBrowser\/[\.\d]*/i' => 'NTENTBrowser/...', '/Snapchat\/[\d\w\-\.]*/i' => 'Snapchat/...', '/Java\/[\d\.]*/i' => 'Java/...', '/UCBrowser\/[\d\.]*/i' => 'UCBrowser', '/\(Linux[^\)]*SAMSUNG[^\)]*\)/i' => 'SAMSUNG...', '/\([^\)]*IPAD[^\)]*\)/i' => 'IPAD...', '/\([^\)]*SM-[^\)]*\)/i' => 'SM...', '/LG\-[\w\d]*/i' => 'LG...', '/LGL\d[\w\d]*/i' => 'LGL...', '/itel it\d*/i' => 'itel...', '/XT\d*/i' => 'XT...', '/TECNO\-[\w\d]*/i' => 'TECNO...', '/RCT[\d\w]*/i' => 'RCT...', '/Micromax\s[\w\d]*/i' => 'Micromax...', '/LGMS[\d]*/i' => 'LGMS...', '/GT\-[\w\d]*/i' => 'GT...', '/HUAWEI\s[\-\w\d]*/i' => 'HUAWEI...', '/Lenovo\s[\-\w\d]*/i' => 'Lenovo...', '/SCH\-[\w\d]*/i' => 'SCH...', '/rv\:[\d\.]*/i' => 'rv:...', '/Lumia\s\d+/i' => 'Lumia...', '/Instagram\s[\d\.]*/i' => 'Instagram...', '/iPhone OS 5[_\d]*/i' => 'iOS 5...', '/iPhone OS 6[_\d]*/i' => 'iOS 6...', '/iPhone OS 7[_\d]*/i' => 'iOS 7...', '/iPhone OS 8[_\d]*/i' => 'iOS 8...', '/iPhone OS 9[_\d]*/i' => 'iOS 9...', '/iPhone OS 10[_\d]*/i' => 'iOS 10...', '/iPhone OS 11[_\d]*/i' => 'iOS 11...', '/iOS 5[_\d]*/i' => 'iOS 5...', '/iOS 6[_\d]*/i' => 'iOS 6...', '/iOS 7[_\d]*/i' => 'iOS 7...', '/iOS 8[_\d]*/i' => 'iOS 8...', '/iOS 9[_\d]*/i' => 'iOS 9...', '/iOS 10[_\d]*/i' => 'iOS 10...', '/iOS 11[_\d]*/i' => 'iOS 11...', '/Android 2[\.\d]*/i' => 'Android 2...', '/Android 3[\.\d]*/i' => 'Android 3...', '/Android 4[\.\d]*/i' => 'Android 4...', '/Android 5[\.\d]*/i' => 'Android 5...', '/Android 6[\.\d]*/i' => 'Android 6...', '/Android 7[\.\d]*/i' => 'Android 7...', '/Android 8[\.\d]*/i' => 'Android 8...', '/QuantcastSDK[^\s]*(\s\(\d+\))?/i' => 'QuantcastSDK...', ]; $this->agentPatterns = array_keys($maps); $this->agentReplaces = array_values($maps); } /** * 记录一个站点的账号,密码,日志路径 * @param string $host 主机/账号 * @param string $user 站点名称/登录名 * @param string $pass 登录密码 * @param string $logPath 日志文件路径 */ public function site(string $host, string $user, string $pass, string $logPath): void { $this->feed = $user; $this->host = $host; $this->user = $user; $this->pass = $pass; $this->logPath = $logPath; //重新连接的间隔时间 $interval = 1; $connect = null; while (true) { //连接主机 $connect = ssh2_connect($this->host, '22'); //账号密码验证成功 if (false !== ssh2_auth_password($connect, $this->user, $this->pass)) { break; } //间隔时间2秒,4,8,... $interval *= 2; echo "auth wrong at $this->host, retry after $interval seconds\r\n"; //间隔指定 时间后,重新连接 sleep($interval); } //登录成功 echo "\r\nlogin $this->feed\r\n"; //读取文件列表 $this->sftp = ssh2_sftp($connect); if(!$this->sftp){ throw new Exception('ssh2_sftp fail.'); } $handle = opendir("ssh2.sftp://{$this->sftp}{$this->logPath}"); //ssh2.sftp://Resource #33/home/..... if (!$handle) { throw new Exception('open dir ssh2.sftp fail.'); } $zippedFiles = []; $unzippedFile = ''; while (false !== ($file = readdir($handle))) { $filePath = "ssh2.sftp://{$this->sftp}{$this->logPath}/$file"; //必须是文件,目录的不要 if (!is_file($filePath)) continue; //必须是访问日志 if (left($file, 10) !== 'access.log') continue; //如果是压缩文件 if (substr($file, -3) === '.gz') { //4.5之前的不处理(这天改格式了) if (substr($file, 11, 8) < '20180405') continue; $zippedFiles[] = $file; } else { $unzippedFile = $file; } } closedir($handle); //本站最后请求时间 $this->lastRow = table('log')->row('*', ['feedName' => $this->feed], 'id desc')->toArray(); //按创建时间正序排序 asort($zippedFiles); //逐个文件处理压缩文件 foreach ($zippedFiles as $file) { $this->file = $file; $this->zipped(); } //如果有非压缩日志,处理 if ($unzippedFile) { $this->file = $unzippedFile; $this->unzipped(); } } /** * 读取远程 文件内容 * @param $indicator string 远程 文件指示器 * @param $size int 文件大小 * @return Iterator 遍历器 */ private function readUnzipped(string $indicator, int $size): Iterator { echo "Begin read File:$this->file:" . STool::kmgt($size) . "\r\n"; //打开文件,指向上次读取的位置 $f = fopen($indicator, 'r'); if (!$f) { return; } if ($this->offset) { fseek($f, $this->offset); echo "Seek to $this->offset\r\n"; } //总行数 $lines = 0; //逐行读取 while (!feof($f)) { $lines++; $line = fgets($f); //更新偏移量 $this->offset = ftell($f); //返回行数 yield $line; //每200行输出一个显示 if ($lines % 500 == 0) { echo "read $this->feed $this->file Lines:$lines\r\n"; } } fclose($f); echo "read $this->feed $this->file Lines:$lines\r\n"; echo "End.\r\n"; } /** * 读取远程 文件内容 * @return string 缓存文件路径 */ private function readZipped(): string { //构造远程文件地址 $indicator = "ssh2.sftp://$this->sftp$this->logPath/$this->file"; //文件大小 $fileSize = filesize($indicator); $size = STool::kmgt($fileSize); //如果有缓存文件且缓存文件大小一致,则使用缓存文件 $cacheFile = self::CACHE_PATH . $this->feed . '/' . $this->file; if (is_file($cacheFile) and filesize($cacheFile) == $fileSize) { echo "Read Zipped File From Cache:" . $this->file . ' ' . $size . "\r\n"; return $cacheFile; } //从服务器读文件 echo "Begin read File:{$this->file}:" . $size . "\r\n"; $fileHandle = fopen($indicator, 'rb'); if (!$fileHandle) { dump($indicator, 'OPEN FAIL'); exit; } //读取远程文件内容 $content = ''; $i = 0; while (!feof($fileHandle)) { //每次能读回8K字节 $content .= fread($fileHandle, 65536); //每128K显示一次读取进度 $i++; if ($i % 16 == 0) { echo "$this->feed $this->file Reading :" . STool::kmgt(strlen($content)) . "/$size\r\n"; } } fclose($fileHandle); //保存到缓存文件中 echo "Save to cache:" . $cacheFile . " \r\n"; makeDir(dirname($cacheFile)); file_put_contents($cacheFile, $content); //返回压缩文件内容 return $cacheFile; } /** * 字符串分行 * @param string $content * @return Iterator */ public function explode(string $content): Iterator { $size = strlen($content); $pointer = 0; while ($pointer < $size) { $next = strpos($content, "\n", $pointer); if ($next === false) { $line = substr($content, $pointer); $next = $size; } else { $line = substr($content, $pointer, $next - $pointer); } yield $line; $pointer = $next + 1; } } private function valid(string $url): bool { return false !== strpos($url, '/?s=') or false !== strpos($url, '/?ss=') or preg_match('/^\/.*\/.*\/$/i', $url); } /** * 处理一个压缩日志文件 */ private function zipped(): void { //检查文件已经处理过 $fileTable = table('zipped'); if ($fileTable->exist(['feedName' => $this->feed, 'fileName' => $this->file])) return; //读取文件内容 $gz=gzopen($this->readZipped(),'r'); echo "\r\nBegin Process File\r\n"; //$memTable = $this->createTemporaryTable(uniqid('tmp_')); //要插入的日志表 $logTable = table('log'); //要插入的行缓冲区 $rows = []; $insertRowsCount = 0; $content = null; $key=0; while(!gzeof($gz)) { $line=gzgets($gz); if ((++$key) % 30000 == 0) { echo "Analysis LINES:$key\r\n"; } //空行不处理 $line = trim($line); if (!$line) continue; //行分解 $parts = $this->explodeLine($line); if (!$parts) continue; //判断 是否是 搜索 行 if (!$this->valid($parts['url'])) continue; //检查是否已经处理过 if ($this->lastRow) { if ($parts['timestamp'] < $this->lastRow['timestamp']) continue; if ($parts['timestamp'] == $this->lastRow['timestamp'] and $parts['url'] == $this->lastRow['url'] and $parts['ip'] == $this->lastRow['ip']) { continue; } } //加入缓冲 区 $parts['feedName'] = $this->feed; $rows[] = $parts; //每4000行执行一次插入,再多就会出现placeholder太多 if (count($rows) >= 4000) { $logTable->inserts($rows); $insertRowsCount += count($rows); SDebug::clearMsgs(); $rows = []; } } //处理最后剩余的行 if (count($rows)) { $logTable->inserts($rows); $insertRowsCount += count($rows); SDebug::clearMsgs(); } echo "insert LINES:$insertRowsCount\r\n"; //标记此文件已经处理过 //$fileTable->begin(); //$this->move($memTable); $fileTable->insert(['feedName' => $this->feed, 'fileName' => $this->file]); //$fileTable->commit(); } /** * 将临时表中的日志转移到正式表中 * @param STable $memTable 临时表对象 */ private function move(STable $memTable) { $fields = ['feedName', 'accessTime', 'timestamp', 'ip', 'requestTime', 'responseTime', 'method', 'url', 'code', 'length', 'referrer', 'agentId', 'created', 'updated', 'forward'];; $fieldsStr = implode(',', $fields); $memTable->execute("Insert" . " Into log($fieldsStr) select $fieldsStr from " . $memTable->name()); $memTable->deleteAll(); } /** * 当前文件的偏移 * @var int */ private $offset; /** * 处理一个未压缩的日志文件 */ private function unzipped(): void { //检查上次处理情况 $fileTable = table('unzipped'); //如果没有记录,则生成一条初始记录 if ($fileTable->notExist(['feedName' => $this->feed])) { $fileTable->insert(['feedName' => $this->feed, 'offset' => 0, 'size' => 0, 'timestamp' => 0]); } //取出处理信息,其中包含 offset(上次文件指针位置),size(上次文件大小), lasttime(上次最后时间) $info = $fileTable->row('*', ['feedName' => $this->feed]); //构造远程文件地址 $indicator = "ssh2.sftp://$this->sftp$this->logPath/$this->file"; //文件大小 $fileSize = filesize($indicator); //文件变小了, 说明是新文件 if ($fileSize < $info['size']) { $this->offset = 0; } else { // 取首行 $f = fopen($indicator, 'r'); $firstLine = fgets($f); fclose($f); $first = $this->explodeLine($firstLine); $timestamp = $first['timestamp']; if ($timestamp > $info['timestamp']) { $this->offset = 0; } else { $this->offset = $info['offset']; } } echo "\r\nBegin Process File\r\n"; //要插入的日志表 $logTable = table('log'); //要插入的行缓冲区 $rows = []; $insertedRowsCount = 0; $iterator = $this->readUnzipped($indicator, $fileSize); $lastTime = 0; foreach ($iterator as $key => $line) { //空行不处理 $line = trim($line); if (!$line) continue; //分解 日志行 $parts = $this->explodeLine($line); if (!$parts) continue; //判断 是否是 搜索 行 if (!$this->valid($parts['url'])) continue; //判断是否已经导入 if ($this->lastRow and (floatval($parts['timestamp']) < floatval($this->lastRow['timestamp']))) continue; $rows[] = array_merge($parts, [ 'feedName' => $this->feed ]); //最大的时间戳 $lastTime = $parts['timestamp']; //批量插入 if (count($rows) >= 100) { $insertedRowsCount += count($rows); $logTable->inserts($rows); $fileTable->update(['size' => $fileSize, 'offset' => $this->offset, 'timestamp' => $lastTime], ['feedName' => $this->feed]); echo "Insert LINES:$insertedRowsCount\r\n"; SDebug::clearMsgs(); $rows = []; } } //处理最后剩余的行 if (count($rows)) { $insertedRowsCount += count($rows); $logTable->inserts($rows); $fileTable->update(['size' => $fileSize, 'offset' => $this->offset, 'timestamp' => $lastTime], ['feedName' => $this->feed]); echo "Insert LINES:$insertedRowsCount\r\n"; SDebug::clearMsgs(); } } /** * 分解一行日志 * @param $line string * @return array * @throws Exception 匹配失败 */ private function explodeLine(string $line): array { //[08/Apr/2018:03:30:17 +0800] 1523129417.075 72.178.128.43 - 0.114 - "GET /index.php/blog/search/?s=lowering%20ldl%20cholesterol&subid=tgr_zhen_BFX0J1ILON6N__rmlwuf_73004751 HTTP/1.1" 499 0 "http://168634854.keywordblocks.com/Cholesterol_Hdl_Ldl_Ratio.cfm?&vsid=1661264105777118&vi=1523124812717930856&dytm=1523124813100&kbbq=%26sde%3D1%26adepth%3D1%26ddepth%3D3&tdAdd[]=%7C%40%7Csde%3D1%7C%40%7Cadepth%3D1%7C%40%7Cddepth%3D3&sbdrId=135&vgd_matchstr=CommercialUrlOn%7Chlid%3D2002&matchstring=CommercialUrlOn%7Chlid%3D2002&vgd_bdata=ss%3D320x568%7C%7CMM%3D1.0%7C%7Cbb%3D145%7C%7CMP%3D.*%2Fcholesterol-management%2F.*%7C%7Cfbb%3D0%7C%7CRB%3D34.18110604079318%7C%7Cbtd%3D2341877441767294977%7C%7Ccbid%3D34.18110604079318%7C%7CMB%3D15.0%7C%7CMC%3DAUTO%7C%7Curl_l%3D50%7C%7Chour_group_l%3D20%7C%7CRImp%3D9.0%7C%7Cbid%3D15.1%7C%7Cdevice_l%3D20%7C%7CisRef%3D0&verid=111299&acid=427573913652889251523124810846&hvsid=00001523124812870012196577713996&upk=1523124813.1380&sttm=1523124812870&=&kp=1&kbc=143697&bdrid=4&subBdr=135&kt=266&ki=5912010&ktd=274911461948&kbc2=rpc%3D0.14&fdkt=266&lkpgd=UUID%3Duuid_s8_3_1523124813_778621763%7C%7CSI%3D863%7C%7CMPTD%3D232%7C%7CPTD%3D6922032661652308480%7C%7CSID%3D14%7C%7CCI%3D863%7C%7CMN%3D8%7C%7Cerpm%3D-1.0%7C%7CMI%3D863%7C%7CKTGD%3D3866%7C%7CKSE%3D1523124813242%7C%7CAN%3D5%7C%7CHID%3D3%7C%7CPTD2%3D16896&&lktgd=3866&&fp=biwFab2EOSptF9Dp9P5pLIuIHpVTe2ha94T5u6HCtebISTPUlc1la6_ujtvHa-nb8nGHPkJ_EnIwZF7mo3KnR2p3XYd1wmF70O9szYDQ9ufyP0OyS-gxVg%3D%3D&c=O5LJq2Lix-2w0IdspaXDCw&cme=rs5xevxSmJb0u22ZZHKqUTjYupvdJAHcw4kmb0sBhK6UBgyb-EKIO8Yg8DI2Uv0ZcpIG4AQvPb75jBLoeAG5VMn2cBgcO0Er9uHnU2G2b5527aplb-EHrVG_De8s_c_9-9bkhpH6jUmk3eK5uGthWBagtuatdg2SBe72cEUSh9aPY9sVJrkoOPaGQsQOH5rqAz1TMLK3_fisF-ozH6JyNg%3D%3D%7C%7CNDHRnZ9Gz3KXlI-i9OnZqQ%3D%3D%7C5gDUJdTGiJzedmq9hanWYg%3D%3D%7CtrJ5NInYpv_AyRdJRHyQbAoA6iGqXTxu%7CRrUTbnOe6Nf9cTuAtIJVy9no3H-wuOVy%7CN7fu2vKt8_s%3D%7Cl44MelaykDW0jQJG6bjukdQlinX0DB9oV4Sm9gijr_bD43Zl1UaHw39JatxHgP46euFaB3PMdSqZJqb8JKnexHrlF_K3RJ5R%7CJf0d-WoAdPuDA6UD6Gc_F1zJX7Ucny7osFvXic8Z4MU%3D%7Cue9AR4Lxeuwq7AuXzY3UTfqQIZ7T1ETAepQ5ZjhMUrn8F4iL72pDJxv9w1vxSK2jeiEactQl6VTIdrnkiwcfmH0laLhDYgMhmFyUaT5z0ZmFu4kbMwh587f73k-Z2prl2NRyNqvZoZL_mL9UwcCaoUiGM916VV0SyiuEizF5kMH-PgMGZNtaVAulY1i6cP1h%7C&ib=0&cid=8CU12LGKP&crid=285618735&size=300x250&lpid=&tsid=1&ksu=233&chid=&https=0&kwdsMaxTm=400&ugd=3&maxProviderPixel1=%2F%2Fc.ad-srv.co%2Fpixel&maxProviderPixel2=%2F%2Fc.adyield.co%2Fpixel&rms=1523124813&&sc=TX&asn=11427&kals=base&kalog=SI%3D863%7C%7CTPTD%3D516%7C%7CCI%3D863%7C%7CUUID%3Duuid_s12_nc1b_4_1523124812_210054693%7C%7CSID%3D11%7C%7CHID%3D4%7C%7CMI%3D863%7C%7CMPTD%3D176&kasts=tstype%3DBASE_BAG%7C%7C&kata=8ce5&clsKb=2&ecref=w77E%3ASS7mE8NQ.BJGYO.NmYS7mE8NSuSTOj%2BTJeJjQ%2BImLY1j%2BD1zyJS%3Fx7YMN1YE18yzvJY4RuuT%26x7YM7JLYvBw17n8QnzmLY1jnjOjnNwmjJQ7JLmj%26x7YMQmxLNJv%26x7YMYJO8xYvG%26x7YMNmz7Jz7vfHFWiXW9FiuH%26yM7yv%26yM78vUBOofiiAXAf9uWuH%26yMOJvY%26yMOYv%26yM1Evu7f%26yMzBvy%26yMN8vu9HHHfuWWX%26yM18vX9hiuHFXXFA%26yMjEvi9fhX9A%26yMj8v%26UvBw17n8QnzmLY1jnjOjnNwmjJQ7JLmj%26yNj8Ov%3Dd9C%3DgdBfCqpRD%3DfKDVQK6rMLAkw9YDzi%20YAVJOGawyN077a0Ezk7z8903%20AuogkUFBTjI%20%3DK8jODdV1K8Qg4KTBMBNR&kct=20512&abpl=2" "Mozilla/5.0 (iPhone; CPU iPhone OS 11_2_6 like Mac OS X) AppleWebKit/604.5.6 (KHTML, like Gecko) Version/11.0 Mobile/15D100 Safari/604.1" "-" $ns = '([^\s]*)'; $str = '"([^"]*)"'; $datetime = '\[([^\]]*)\]'; //正则匹配 $matched = preg_match("/$datetime $ns $ns \- $ns $ns $str $ns $ns $str $str $str/i", $line, $matches); if (!$matched) { throw new Exception('NOT MATCH'); } //空格区别 MODE URL HTTP协议 list($mode, $url, $protocol) = explode(' ', $matches[6]); return [ 'accessTime' => datetime(strtotime($matches[1])), //访问时间(秒) 'timestamp' => floatval($matches[2]),//访问时间戳(带毫秒) 'ip' => $matches[3], //请求者IP 'requestTime' => floatval($matches[4]), //Nginx处理请求的时间 'responseTime' => floatval($matches[5]), //Nginx完成整个响应的时间 'method' => $mode, //GET/POST/... 'url' => $url, //请求地址 'code' => $matches[7], //响应代码 'length' => intval($matches[8]), //响应正文长度 'referrer' => left($matches[9], 250), //引用 'agentId' => $this->getAgentId($matches[10]), //用户代理 'forward' => $matches[11] //真实IP ]; } //获取Agent与ID的对应关系 private function getAgentMap() { $rows = table('agent')->select('id,agent', null, 'agent')->toArray(); return array_column($rows, 'id', 'agent'); } //根据一个Agent,获取对应ID,如果没有则创建一个对应关系 private function getAgentId($agent) { //如果UA为空 if (!$agent) { return 0; } //静态内存缓存 static $maps; if (!$maps) { $maps = $this->getAgentMap(); } //缩减[FBAN/FBIOS;...] $sub = mid($agent, '[', ']'); if ($sub) { $agent = str_replace('[' . $sub . ']', '[...]', $agent); } $agent = str_replace(' (KHTML, like Gecko)', '', $agent); //变种归并 $agent = preg_replace($this->agentPatterns, $this->agentReplaces, $agent); //Agent缩减到250个字符 $agent = left($agent, 191); if (!isset($maps[$agent])) { $id = table('agent')->insertIgnore(['agent' => $agent]); $maps[$agent] = $id; } return $maps[$agent]; } }