varnish 高性能加速器

第九系艾文

于 2022-08-19 12:09:44 发布

阅读量208

点赞数

分类专栏： linux 文章标签：缓存 Linux varish 缓存服务

本文链接：https://blog.csdn.net/ly1358152944/article/details/126421617

版权

linux 专栏收录该内容

168 篇文章 7 订阅

订阅专栏

Web Page Cache：

程序的运行具有局部性特征：

时间局部性

空间局部性

cache：命中

热区：局部性；

时效性：

缓存空间耗尽：LRU

过期：缓存清理

缓存命中率：hit/(hit+miss)

(0,1)

页面命中率：基于页面数量进行衡量

字节命中率：基于页面的体积进行衡量

缓存与否：

私有数据：private，private cache；

公共数据：public, public or private cache;

Cache-related Headers Fields

The most important caching header fields are:

Expires：过期时间；

Expires:Thu, 22 Oct 2026 06:34:30 GMT

Cache-Control

Etag

Last-Modified

If-Modified-Since

If-None-Match

Vary

Age

缓存有效性判断机制：

过期时间：Expires

HTTP/1.0

Expires

HTTP/1.1

Cache-Control: maxage=

Cache-Control: s-maxage=

条件式请求：

Last-Modified/If-Modified-Since

Etag/If-None-Match

Expires:Thu, 13 Aug 2026 02:05:12 GMT
Cache-Control:max-age=315360000
ETag:"1ec5-502264e2ae4c0"
Last-Modified:Wed, 03 Sep 2014 10:00:27 GMT

cache-request-directive =

"no-cache"         #能缓存但是不能直接响应,需要校验后才能响应                 
"no-store"                          #不存储缓存
"max-age" "=" delta-seconds        
"max-stale" [ "=" delta-seconds ]  
"min-fresh" "=" delta-seconds      
"no-transform"                    
"only-if-cached"                  
cache-extension

cache-response-directive =

"public"                               
"private" [ "=" <"> 1#field-name <"> ]        #私有缓存
"no-cache" [ "=" <"> 1#field-name <"> ]
"no-store"                            
"no-transform"                        
"must-revalidate"                     
"proxy-revalidate"                  
"max-age" "=" delta-seconds           
"s-maxage" "=" delta-seconds         #公共缓存  
cache-extension

程序架构：

Manager进程

Cacher进程，包含多种类型的线程：

accept, worker, expiry, ...

shared memory log：

统计数据：计数器；

日志区域：日志记录；

varnishlog, varnishncsa, varnishstat...

配置接口：VCL

Varnish Configuration Language,

vcl complier --> c complier --> shared object

varnish的程序环境：

/etc/varnish/varnish.params：配置varnish服务进程的工作特性，例如监听的地址和端口，缓存机制；

/etc/varnish/default.vcl：配置各Child/Cache线程的工作属性；

主程序：

/usr/sbin/varnishd

CLI interface：

/usr/bin/varnishadm

Shared Memory Log交互工具：

/usr/bin/varnishhist

/usr/bin/varnishlog

/usr/bin/varnishncsa

/usr/bin/varnishstat

/usr/bin/varnishtop

测试工具程序：

/usr/bin/varnishtest

VCL配置文件重载程序：

/usr/sbin/varnish_reload_vcl

Systemd Unit File：

/usr/lib/systemd/system/varnish.service

varnish服务

/usr/lib/systemd/system/varnishlog.service

/usr/lib/systemd/system/varnishncsa.service

日志持久的服务；

varnish的缓存存储机制( Storage Types)：

· malloc[,size]

内存存储，[,size]用于定义空间大小；重启后所有缓存项失效；

· file[,path[,size[,granularity]]]

文件存储，黑盒；重启后所有缓存项失效；

· persistent,path,size

文件存储，黑盒；重启后所有缓存项有效；实验；

man varnishd

varnish程序的选项：

程序选项：/etc/varnish/varnish.params文件

-a address[:port][,address[:port][...]，默认为6081端口；

-T address[:port]，默认为6082端口；

-s [name=]type[,options]，定义缓存存储机制；

-u user

-g group

-f config：VCL配置文件；

-F：在调试时使用,运行于前台；

...

运行时参数：/etc/varnish/varnish.params文件， DEAMON_OPTS

DAEMON_OPTS="-p thread_pool_min=5 -p thread_pool_max=500 -p thread_pool_timeout=300"

-p param=value：设定运行参数及其值；可重复使用多次；

-r param[,param...]: 设定指定的参数为只读状态；

vim /usr/lib/systemd/system/varnish.service

vim varnish.params

VARNISH_LISTEN_PORT=6081    #本地监听端口，一般为80端口
VARNISH_ADMIN_LISTEN_ADDRESS=127.0.0.1  #管理地址
VARNISH_ADMIN_LISTEN_PORT=6082              #管理的端口
VARNISH_SECRET_FILE=/etc/varnish/secret         #管理服务的共享密钥文件
VARNISH_STORAGE="mallow,1G"        #使用内存做为缓存,运行时不可修改,时间久会产生内存碎片，一般都是放在固态的文件上

vim default.vcl

写好后端主机ip后不要重启服务，直接重载配置，就可访问了

varnish_reload_vcl

重载vcl配置文件：

varnish_reload_vcl
varnishadm  -S /etc/varnish/secret -T [ADDRESS:]PORT
varnishadm -S /etc/varnish/secret -T 127.0.0.1:6082  # 可以省略ip但是 ：6082不可省略

help [<command>]
ping [<timestamp>]
auth <response>
quit
banner
status
start
stop
vcl.load <configname> <filename>
vcl.inline <configname> <quoted_VCLstring>
vcl.use <configname>                    #激活列表中的一个配置文件
vcl.discard <configname>     #删除一个配置文件
vcl.list               #列出了已经编译成功的列表
param.show [-l] [<param>]
param.set <param> <value>
panic.show
panic.clear
storage.list
vcl.show [-v] <configname>
backend.list [<backend_expression>]
backend.set_health <backend_expression> <state>
ban <field> <operator> <arg> [&& <field> <oper> <arg>]...
ban.list

配置文件相关：

vcl.list

vcl.load：装载，加载并编译；

vcl.use：激活；

vcl.discard：删除；

vcl.show [-v] <configname>：查看指定的配置文件的详细信息；

运行时参数：

param.show -l：显示列表；

param.show <PARAM>

param.set <PARAM> <VALUE>

缓存存储：

storage.list

后端服务器：

backend.list

如要使用多个需要使用负载均衡模块

VCL：

”域“专有类型的配置语言；

state engine：状态引擎；

VCL有多个状态引擎，状态之间存在相关性，但彼此间互相隔离；每个状态引擎可使用return(x)指明关联至哪个下一级引擎；

vcl_hash --> return(hit) --> vcl_hit

请求处理流程：

(1) 接收请求：vcl_recv；判断其是否可缓存；

(a) 可缓存：vcl_hash

(i) 命中：vcl_hit

(ii)未命中：vcl_miss --> vcl_fetch

(b) 不可缓存：vcl_fetch

(2) 响应：vcl_deliver

state engine：状态引擎切换机制

request: vcl_recv

response: vcl_deliver

(1) vcl_hash -(hit)-> vcl_hit --> vcl_deliver

(2) vcl_hash -(miss)-> vcl_miss --> vcl_backend_fetch --> vcl_backend_response --> vcl_deliver

(3) vcl_hash -(purge)-> vcl_purge --> vcl_synth #缓存清理

(4) vcl_hash -(pipe)-> vcl_pipe

两个特殊的引擎：

vcl_init：在处理任何请求之前要执行的vcl代码：主要用于初始化VMODs；

vcl_fini：所有的请求都已经结束，在vcl配置被丢弃时调用；主要用于清理VMODs；

vcl的语法格式：

(1) VCL files start with vcl 4.0; #从4.0开始

(2) //, # and /* foo */ for comments; #注释

(3) Subroutines are declared with the sub keyword; 例如sub vcl_recv { ...}； #子例程用sub 关键字来进行声明

(4) No loops, state-limited variables（受限于引擎的内建变量）； #不支持循环.支持变量,支持条件判断

(5) Terminating statements with a keyword for next action as argument of the return() function, i.e.: return(action)； #结束一个状态引擎,使用一个return函数交给下一级状态引擎

(6) Domain-specific; #域专用的配置,在一个域的配置只对本域有用

The VCL Finite State Machine

(1) Each request is processed separately; #每一个请求都是独立处理的

(2) Each request is independent from others at any given time; #请求在任何时间都是隔离的,独立的

(3) States are related, but isolated; #各状态引擎有相关性，但都是隔离的

(4) return(action); exits one state and instructs Varnish to proceed to the next state; #通过return状态的切换

(5) Built-in VCL code is always present and appended below your own VCL; #

三类主要语法：

sub subroutine {
    ...
}

if CONDITION {
    ...
} else {
    ...
}

return(), hash_data()

VCL Built-in Functions and Keywords

函数：

regsub(str, regex, sub) #只替换一个

regsuball(str, regex, sub) #全部替换

ban(boolean expression) #清理缓存空间的缓存项

hash_data(input) #进行hash计算

synthetic(str)

Keywords:

call subroutine， return(action)，new，set，unset

操作符：

==, !=, ~, >, >=, <, <=

逻辑操作符：&&, ||, !

变量赋值：=

举例：obj.hits

if (obj.hits>0) {
    set resp.http.X-Cache = "HIT via " + server.ip;
} else {
    set resp.http.X-Cache = "MISS via " + server.ip;
}

变量类型：

内建变量：

req.*：request，表示由客户端发来的请求报文相关；

req.http.*

req.http.User-Agent, req.http.Referer, ...

bereq.*：由varnish发往BE主机的httpd请求相关；

bereq.http.*

beresp.*：由BE主机响应给varnish的响应报文相关；

beresp.http.*

resp.*：由varnish响应给client相关；

obj.*：存储在缓存空间中的缓存对象的属性；只读；

常用变量：

bereq.*：

bereq.http.HEADERS：请求报文的某指定首部，可以自己定义

bereq.request：请求方法；

bereq.url：请求的url；

bereq.proto：请求的协议版本；

bereq.backend：指明要调用的后端主机；

req.*：

req.http.Cookie：客户端的请求报文中Cookie首部的值；

req.http.User-Agent ~ "chrome" #获取浏览器的值

beresp.*, resp.*：

beresp.http.HEADERS

beresp.status：响应的状态码；

reresp.proto：协议版本；

beresp.backend.name：BE主机的主机名；

beresp.ttl：BE主机响应的内容的余下的可缓存时长；

obj.*

obj.hits：此对象从缓存中命中的次数；

obj.ttl：对象的ttl值

server.*

server.ip

server.hostname

client.*

client.ip

用户自定义：

set

unset

示例1：强制对某类资源的请求不检查缓存：

vcl_recv {
    if (req.url ~ "(?i)^/(login|admin)") {
        return(pass);
    }
}

没有加载配置前如下

加载配置之后

示例2：对于特定类型的资源，例如公开的图片等，取消其私有标识，并强行设定其可以由varnish缓存的时长；

if (beresp.http.cache-control !~ "s-maxage") {
    if (bereq.url ~ "(?i)\.(jpg|jpeg|png|gif|css|js)$") {
        unset beresp.http.Set-Cookie;
        set beresp.ttl = 3600s;
    }
}

缓存对象的修剪：purge, ban

(1) 能执行purge操作

sub vcl_purge {
    return (synth(200,"Purged"));
}

(2) 何时执行purge操作

sub vcl_recv {
    if (req.method == "PURGE") {
        return(purge);
    }
    ...
}

添加此类请求的访问控制法则：

acl purgers {
    "127.0.0.0"/8;
    "10.1.0.0"/16;
}
sub vcl_recv {
    if (req.method == "PURGE") {
        if (!client.ip ~ purgers) {
            return(synth(405,"Purging not allowed for " + client.ip));
        }
        return(purge);
    }
    ...
}

如何设定使用多个后端主机：

backend default {
    .host = "172.16.100.6";
    .port = "80";
}

backend appsrv {
    .host = "172.16.100.7";
    .port = "80";
}

sub vcl_recv {
    if (req.url ~ "(?i)\.php$") {
        set req.backend_hint = appsrv;
    } else {
        set req.backend_hint = default;
    }
    ...
}

Director：

varnish module；

使用前需要导入：

import director；

示例：

import directors;    # load the directors
backend server1 {
    .host = 
    .port = 
}
backend server2 {
    .host = 
    .port = 
}
sub vcl_init {
    new GROUP_NAME = directors.round_robin();
    GROUP_NAME.add_backend(server1);
    GROUP_NAME.add_backend(server2);
}
sub vcl_recv {
# send all traffic to the bar director:
    set req.backend_hint = GROUP_NAME.backend();
}

BE Health Check：

backend BE_NAME {
    .host =  
    .port = 
    .probe = {
        .url= 
        .timeout= 
        .interval= 
        .window=
        .threshhold=
    }
}

.probe：定义健康状态检测方法；

.url：检测时请求的URL，默认为”/";

.request：发出的具体请求；

.request =

"GET /.healthtest.html HTTP/1.1"

"Host: www.magedu.com"

"Connection: close"

.window：基于最近的多少次检查来判断其健康状态；

.threshhold：最近.window中定义的这么次检查中至有.threshhold定义的次数是成功的；

.interval：检测频度；

.timeout：超时时长；

.expected_response：期望的响应码，默认为200；

健康状态检测的配置方式：

(1) probe PB_NAME = { }

backend NAME = {

.probe = PB_NAME;

...

}

(2) backend NAME {

.probe = {

...

}

示例：

probe check {
   .url="/.healthcheck";        #定义健康检查的页面
   .window = 8;                 #设定在判定后端主机健康状态时基于最近多少次的探测进行
   .threshold = 8;              #在.window中指定的次数中，至少有多少次是成功的才判定后端主机正健康运行
   .interval = 2s;              #探测请求的发送周期，默认为5秒
   .timeout = 1s;               #每次探测请求的超时时长
}
backend default {
    .host = "192.168.153.129";
    .port = "80";
    .probe=check;
}
backend two {
    .host = "192.168.153.130";
    .port = "80";
    .probe=check;
}

varnish的运行时参数：

线程模型：

cache-worker

cache-main

ban lurker

acceptor：

epoll/kqueue：

...

线程相关的参数：

在线程池内部，其每一个请求由一个线程来处理；其worker线程的最大数决定了varnish的并发响应能力；

thread_pools：Number of worker thread pools. 最好小于或等于CPU核心数量；

thread_pool_max：The maximum number of worker threads in each pool.

thread_pool_min：The minimum number of worker threads in each pool. 额外意义为“最大空闲线程数”；

最大并发连接数=thread_pools * thread_pool_max

thread_pool_timeout：Thread idle threshold. Threads in excess of thread_pool_min, which have been idle for at least this long, will be destroyed.

thread_pool_add_delay：Wait at least this long after creating a thread.

thread_pool_destroy_delay：Wait this long after destroying a thread.

设置方式：

param.set

永久有效的方法：

varnish.params

DEAMON_OPTS="-p PARAM1=VALUE -p PARAM2=VALUE"

varnishstat -1 -f MAIN.threads #显示指定的字段

varnish日志区域：

shared memory log

计数器

日志信息

1、varnishstat - Varnish Cache statistics

-1

-1 -f FILED_NAME

-l：可用于-f选项指定的字段名称列表；

MAIN.cache_hit

MAIN.cache_miss

varnishstat -1 -f MAIN.cache_hit -f MAIN.cache_miss

2、varnishtop - Varnish log entry ranking

-1 Instead of a continously updated display, print the statistics once and exit.

-i taglist，可以同时使用多个-i选项，也可以一个选项跟上多个标签；

-I <[taglist:]regex>

-x taglist：排除列表

-X <[taglist:]regex>

3、varnishlog - Display Varnish logs

4、 varnishncsa - Display Varnish logs in Apache / NCSA combined log format

内建函数：

hash_data()：指明哈希计算的数据；减少差异，以提升命中率；

regsub(str,regex,sub)：把str中被regex第一次匹配到字符串替换为sub；主要用于URL Rewrite

regsuball(str,regex,sub)：把str中被regex每一次匹配到字符串均替换为sub；

return()：

ban(expression)

ban_url(regex)：Bans所有的其URL可以被此处的regex匹配到的缓存对象；

synth(status,"STRING")：purge操作；

为了提高命中率防止其中一台服务器坏掉命中率下降,要使用nginx的

hash $request_uri consistent;       #表示使用一致性hash算法

第九系艾文

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录