varnish 详解
相关概念
1、Web Page Cache
web 页面缓存中常用的两个开源程序:squid 和 varnish 。
程序的运行具有局部性特征:
# 时间局部性:一个数据被访问过之后,可能很快会被再次访问到
# 空间局部性:一个数据被访问时,其周边的数据也有可能被访问到
cache:命中
# 热区:局部性
# 时效性:
缓存空间耗尽:LRU(最近最少使用)
过期:缓存清理
缓存命中率:hit/(hit+miss),区间:(0,1)
# 页面命中率:基于页面数量进行衡量
# 字节命中率:基于页面的体积进行衡量
缓存与否:
# 私有数据:private, private cache
# 公共数据:public, public or private cache
2、Cache-related Headers Fields
The most important caching header fields are:
# Expires:过期时间;
Expires:Thu, 22 Oct 2026 06:34:30 GMT
# Cache-Control:max-age=
# Etag
# If-None-Match
# Last-Modified
# If-Modified-Since
# Vary
# Age
缓存有效性判断机制:
# 过期时间:Expires
HTTP/1.0
Expires:过期
HTTP/1.1
Cache-Control: maxage=
Cache-Control: s-maxage=
# 条件式请求:
Last-Modified/If-Modified-Since:基于文件的修改时间戳来判别
Etag/If-None-Match:基于文件的校验码来判别
# 示例:
Expires:Thu, 13 Aug 2026 02:05:12 GMT
Cache-Control:max-age=315360000
ETag:"1ec5-502264e2ae4c0"
Last-Modified:Wed, 03 Sep 2014 10:00:27 GMT
缓存层级:
# 私有缓存:用户代理附带的本地缓存机制
# 公共缓存:反向代理服务器的缓存功能
请求流程:
User-Agent <--> private cache <--> public cache <--> public cache 2 <--> Original Server
3、请求报文和响应报文关于缓存的内容
请求报文用于通知缓存服务如何使用缓存响应请求的内容:
cache-request-directive =
"no-cache",
| "no-store"
| "max-age" "=" delta-seconds
| "max-stale" [ "=" delta-seconds ]
| "min-fresh" "=" delta-seconds
| "no-transform"
| "only-if-cached"
| cache-extension
响应报文用于通知缓存服务器如何存储上级服务器响应的内容:
cache-response-directive =
"public"
| "private" [ "=" <"> 1#field-name <"> ]
| "no-cache" [ "=" <"> 1#field-name <"> ]
# 可缓存,但响应给客户端之前需要revalidation,即必须发出条件式请求进行缓存有效性验正
| "no-store"
# 不允许存储响应内容于缓存中
| "no-transform"
| "must-revalidate"
| "proxy-revalidate"
| "max-age" "=" delta-seconds
| "s-maxage" "=" delta-seconds
| cache-extension
varnish
1、概念
varnish 是一款高性能的开源HTTP缓存加速器,同时还提供反向代理功能。varnish是基于内存缓存的,内存读写效率远远高于硬盘,重启后数据会丢失;支持精确的缓存时间设定;VCL的配置和管理都比较灵活;管理功能强大。
varnish官方站点: http://www.varnish-cache.org/
varnishi book网站:http://book.varnish-software.com/4.0/
varnish 分为:Community 和 Enterprise
This is Varnish Cache, a high-performance HTTP accelerator.
2、程序架构
Manager进程
Cacher进程,包含多种类型的线程:
# accept, worker, expiry, ...
shared memory log:
# 统计数据:计数器
# 日志区域:日志记录
varnishlog, varnishncsa, varnishstat...
配置接口:VCL(Varnish Configuration Language)
# vcl complier --> c complier --> shared object
3、程序环境
/etc/varnish/varnish.params # 配置varnish服务进程的工作特性,例如监听的地址和端口,缓存机制
/etc/varnish/default.vcl # 配置各Child/Cache线程的缓存策略
/usr/sbin/varnishd # 主程序
/usr/bin/varnishadm # varnish 管理工具,CLI interface
/usr/bin/varnishhist # Shared Memory Log 交互工具
/usr/bin/varnishlog # Shared Memory Log 交互工具
/usr/bin/varnishncsa # Shared Memory Log 交互工具
/usr/bin/varnishstat # Shared Memory Log 交互工具
/usr/bin/varnishtop # Shared Memory Log 交互工具
/usr/bin/varnishtest # 测试工具程序
/usr/sbin/varnish_reload_vcl # VCL配置文件重载程序
# Systemd Unit File:
/usr/lib/systemd/system/varnish.service # varnish 服务
/usr/lib/systemd/system/varnishlog.service # 日志持久服务
/usr/lib/systemd/system/varnishncsa.service # 日志持久服务
4、缓存存储机制
varnish的缓存存储机制( Storage Types):
-s [name=]type[,options]
三种存储机制:
· malloc[,size] # 内存存储,[,size]用于定义空间大小;重启后所有缓存项失效
· file[,path[,size[,granularity]]] # 磁盘文件存储,黑盒;重启后所有缓存项失效
· persistent,path,size # 文件存储,黑盒;重启后所有缓存项有效;实验
5、varnish 程序的选项
程序选项:/etc/varnish/varnish.params文件
-a address[:port][,address[:port][...] # 默认为6081端口
-T address[:port] # 默认为6082端口
-s [name=]type[,options] # 定义缓存存储机制
-u user
-g group
-f config # VCL配置文件
-F # 运行于前台
运行时参数:/etc/varnish/varnish.params文件, DEAMON_OPTS
DAEMON_OPTS="-p thread_pool_min=5 -p thread_pool_max=500 -p thread_pool_timeout=300"
-p param=value # 设定运行参数及其值; 可重复使用多次
-r param[,param...] # 设定指定的参数为只读状态
6、重载 vcl 配置文件(两种方法)
-
第一种方法
[root@Tang ~]# varnish_reload_vcl Loading vcl from /etc/varnish/default.vcl Current running config name is Using new config name reload_2019-10-10T00:16:11 VCL compiled. VCL 'reload_2019-10-10T00:16:11' now active available 0 boot active 0 reload_2019-10-10T00:16:11 Done
-
第二种方法
[root@Tang varnish]# varnishadm -S secret -T 127.0.0.1:6082 200 ----------------------------- Varnish Cache CLI 1.0 ----------------------------- Linux,3.10.0-957.el7.x86_64,x86_64,-smalloc,-smalloc,-hcritbit varnish-4.0.5 revision 07eff4c29 Type 'help' for command list. Type 'quit' to close CLI session. varnish> help 200 ... ... vcl.load <configname> <filename> vcl.inline <configname> <quoted_VCLstring> vcl.use <configname> vcl.discard <configname> vcl.list ... ... varnish> vcl.load test1 default.vcl 200 VCL compiled. varnish> vcl.list 200 active 0 boot available 0 test1 varnish> vcl.use test1 200 VCL 'test1' now active varnish> vcl.list 200 available 0 boot active 0 test1
7、varnishadm 介绍
varnishadm -S /etc/varnish/secret -T [ADDRESS:]PORT # 进行登陆
命令列表:
help [<command>]
ping [<timestamp>]
auth <response>
quit
banner
status
start
stop
vcl.load <configname> <filename>
vcl.inline <configname> <quoted_VCLstring>
vcl.use <configname>
vcl.discard <configname>
vcl.list
param.show [-l] [<param>]
param.set <param> <value>
panic.show
panic.clear
storage.list
vcl.show [-v] <configname>
backend.list [<backend_expression>]
backend.set_health <backend_expression> <state>
ban <field> <operator> <arg> [&& <field> <oper> <arg>]...
ban.list
配置文件相关:
vcl.list # 查看 vcl 列表
vcl.load # 装载,加载并编译
vcl.use # 激活
vcl.discard # 删除
vcl.show [-v] <configname> # 查看指定的配置文件的详细信息
运行时参数:
param.show -l # 显示列表
param.show <PARAM>
param.set <PARAM> <VALUE>
缓存存储:
storage.list
后端服务器:
backend.list
8、vcl 介绍
vcl 有很多域,每个”域“都有专有类型的配置语言,VCL有多个状态引擎(state engine),状态之间存在相关性,但状态引擎彼此间互相隔离;每个状态引擎可使用 return(x) 指明关联至哪个下一级引擎;每个状态引擎对应于vcl文件中的一个配置段,即为 subroutine 。
缓存命中流程:
vcl_hash --> return(hit) --> vcl_hit
- List item
8.1 state engine 及 varnish 处理流程
The VCL Finite State Machine
- Each request is processed separately;
- Each request is independent from others at any given time;
- States are related, but isolated;
- return(action); exits one state and instructs Varnish to proceed to the next state;
- Built-in VCL code is always present and appended below your own VCL.
varnish 处理请求过程:
- varnish 在获取客户端请求之后,由 vcl_recv 状态引擎进行处理,无法识别的请求会通过 pipe 提交给 vcl_pipe 状态引擎,需要查找缓存的请求用过 lookup 参数交给 vcl_hash 状态引擎处理,无需缓存的数据则是通过 pass 交给 vcl_pass 状态引擎
- vcl_hash 状态引擎在接收到请求后会从缓存中查找数据,查询结果会返回 hit 缓存命中或者是 miss 缓存未命中
- vcl_hit 状态引擎将命中的缓存数据通过参数 deliver 交给 vcl_deliver 引擎,待数据处理完成之后最终将数据返回给客户端
- vcl_miss 引擎将未命中的结果通过参数 fetch 交给 vcl_fetch,vcl_fetch 会从数据库中查找数据
- vcl_fetch 将从数据库中找到的结果,返回给 vcl_deliver 引擎
- vcl_deliver 状态引擎将结果返回给master进程,进而返回给客户端
下图是 state engine 之间的联系。
8.2 vcl 的默认配置查看
varnish> vcl.list
200
available 0 boot
active 0 reload_2019-10-10T00:16:11
varnish> vcl.show -v boot
200
// VCL.SHOW 0 1226 input
#
# This is an example VCL file for Varnish.
... ...
8.3 vcl_recv 默认配置
sub vcl_recv {
if (req.method == "PRI") {
/* We do not support SPDY or HTTP/2.0 */
return (synth(405));
}
if (req.method != "GET" &&
req.method != "HEAD" &&
req.method != "PUT" &&
req.method != "POST" &&
req.method != "TRACE" &&
req.method != "OPTIONS" &&
req.method != "DELETE") {
/* Non-RFC2616 or CONNECT which is weird. */
return (pipe);
}
if (req.method != "GET" && req.method != "HEAD") {
/* We only deal with GET and HEAD by default */
return (pass);
}
if (req.http.Authorization || req.http.Cookie) {
/* Not cacheable by default */
return (pass);
}
return (hash);
}
8.4 Client Side 的 state engine
vcl_recv, vcl_pass, vcl_hit, vcl_miss, vcl_pipe, vcl_purge, vcl_synth, vcl_deliver
vcl_recv:
hash:vcl_hash
pass: vcl_pass
pipe: vcl_pipe
synth: vcl_synth
purge: vcl_hash --> vcl_purge
vcl_hash:
lookup:
hit: vcl_hit
miss: vcl_miss
pass, hit_for_pass: vcl_pass
purge: vcl_purge
8.5 Backend Side 的 state engine
# vcl_backend_fetch
# vcl_backend_response
# vcl_backend_error
8.6 两个特殊的引擎
vcl_init # 在处理码任何请求之前要执行的vcl代:主要用于初始化VMODs
vcl_fini # 所有的请求都已经结束,在vcl配置被丢弃时调用;主要用于清理VMODs
8.7 vcl 的语法格式
(1) VCL files start with vcl 4.0; # 必须以 vcl 4.0; 开头
(2) //, # and /* foo */ for comments;
(3) Subroutines are declared with the sub keyword; # 例如,sub vcl_recv { ...};
(4) No loops, state-limited variables; # 受限于引擎的内建变量
(5) Terminating statements with a keyword for next action as argument of the return() function, i.e.: return(action);
# 用于实现状态引擎转换;
(6) Domain-specific;
vcl 的语法及变量介绍
1、三类主要语法
sub subroutine {
...
}
if CONDITION {
...
} else {
...
}
return()
hash_data()
2、vcl 内建函数、关键词和操作符
Built-in Functions:
# regsub(str, regex, sub)
# regsuball(str, regex, sub)
# ban(boolean expression)
# hash_data(input)
# synthetic(str)
hash_data() # 指明哈希计算的数据;减少差异,以提升命中率;
regsub(str,regex,sub) # 把str中被regex第一次匹配到字符串替换为sub;主要用于URL Rewrite
regsuball(str,regex,sub) # 把str中被regex每一次匹配到字符串均替换为sub;
return() # 转向别的区域
ban(expression)
ban_url(regex) # Bans所有的其URL可以被此处的regex匹配到的缓存对象;
synth(status,"STRING") # purge操作;
Keywords:
# call subroutine
# return(action)
# new
# set
# unset
操作符:
# ==, !=, ~, >, >=, <, <=
# 逻辑操作符:&&, ||, !
# 变量赋值:=
3、变量类型
3.1 内建变量
req.* # request,表示由客户端发来的请求报文相关
# req.http.*
# req.http.User-Agent, req.http.Referer, ...
bereq.* # 由varnish发往BE主机的httpd请求相关
# bereq.http.*
beresp.* # 由BE主机响应给varnish的响应报文相关
# beresp.http.*
resp.* # 由varnish响应给client相关
obj.* # 存储在缓存空间中的缓存对象的属性;只读
3.1.1 常用变量 1
req.*:
req.http.HEADERS # 请求首
req.request # 请求方法
req.url # 请求的url
req.proto # 请求的协议版本
req.backend # 指明要调用的后端主机
req.http.Cookie # 客户端的请求报文中Cookie首部的值
req.http.User-Agent ~ "chrome" # 客户端访问工具
bereq.*:
bereq.http.HEADERS # 请求首
bereq.request # 请求方法
bereq.url # 请求的url
bereq.proto # 请求的协议版本
bereq.backend # 指明要调用的后端主机
3.1.2 常用变量 2
resp.*:
resp.http.HEADERS # 响应报文的首部
resp.status # 响应的状态码
resp.proto # 协议版本
resp.backend.name # BE 主机的主机名
resp.ttl # BE 主机响应的内容的余下的可缓存时长
beresp.*:
beresp.http.HEADERS # 响应报文的首部
beresp.status # 响应的状态码
reresp.proto # 协议版本
beresp.backend.name # BE 主机的主机名
beresp.ttl # BE 主机响应的内容的余下的可缓存时长
3.1.3 常用变量 3
obj.*
obj.hits # 此对象从缓存中命中的次数
obj.ttl # 对象的ttl值
server.*
server.ip
server.hostname
client.*
client.ip
3.2 自定义变量
set # 自定义变量
unset # 取消自定义变量
4、vcl 语法介绍及示例
4.1 obj.hits 是内建变量,用于保存某缓存项的从缓存中命中的次数
# 需要配置在 sub vcl_deliver 域
if (obj.hits>0) {
set resp.http.X-Cache = "HIT via " + server.ip;
} else {
set resp.http.X-Cache = "MISS via " + server.ip;
}
演示示例:
sub vcl_deliver {
if (obj.hits>0) {
set resp.http.X-Cache-Tang = "HIT via " + server.ip;
} else {
set resp.http.X-Cache-Tang = "MISS via " + server.ip;
}
}
4.2 强制对某类资源的请求不检查缓存
# (?i) 表示不区分大小写
sub vcl_recv {
if (req.url ~ "(?i)^/(login|admin)") {
return(pass);
}
}
4.3 设置资源标识,并设定 varnish 缓存时长
对于特定类型的资源,例如公开的图片等,取消其私有标识,并强行设定其可以由 varnish 缓存的时长; 定义在vcl_backend_response 中。
if (beresp.http.cache-control !~ "s-maxage") {
if (bereq.url ~ "(?i)\.(jpg|jpeg|png|gif|css|js)$") {
unset beresp.http.Set-Cookie;
set beresp.ttl = 3600s;
}
}
4.4 请求报文中加入用户的 IP 地址信息,便于进行 log 分析
if () # 括号里如果是字符串,非空为真,空为假;数值型,空位真,非空为假
sub vcl_recv {
if (req.restarts == 0) {
if (req.http.X-Fowarded-For) {
set req.http.X-Forwarded-For = req.http.X-Forwarded-For + "," + client.ip;
} else {
set req.http.X-Forwarded-For = client.ip;
}
}
}
4.5 缓存对象的裁剪(purge)
-
能执行 purge 操作
sub vcl_purge { return (synth(200,"Purged")); }
-
如何执行 purge 操作
sub vcl_recv { if (req.method == "PURGE") { return(purge); } ... }
-
使用 curl 执行 purge 操作
~]# curl -X PURGE http://192.168.1.11 # 会把命中的缓存进行删除
-
对 purge 操作请求进行 acl (访问控制策略),只允许部分主机进行 purge 操作
acl purgers { "127.0.0.0"/8; "10.1.0.0"/16; } sub vcl_recv { if (req.method == "PURGE") { if (!client.ip ~ purgers) { return(synth(405,"Purging not allowed for " + client.ip)); } return(purge); } ... }
4.6 缓存对象的裁剪(banning)
-
使用 varnishadm 进行清除
varnishadm: ban <field> <operator> <arg> 示例: ban req.url ~ ^/javascripts # 把目录 javascript 的缓存进行清除
-
在配置文件中定义,使用ban()函数
if (req.method == "BAN") {
ban("req.http.host == " + req.http.host + " && req.url == " + req.url);
# Throw a synthetic page so the request won't go to the backend.
return(synth(200, "Ban added"));
}
4.7 如何设定使用多个后端主机
backend default {
.host = "172.16.100.6";
.port = "80";
}
backend appsrv {
.host = "172.16.100.7";
.port = "80";
}
# 访问动态内容的送入 appsrv 后端主机,其它的送入 default 后端主机
sub vcl_recv {
if (req.url ~ "(?i)\.php$") {
set req.backend_hint = appsrv;
} else {
set req.backend_hint = default;
}
...
}
5、Director 介绍及示例
varnish module;
使用前需要导入:
import directors; # 需要导入 directors 模块,不然无法使用
5.1 基本使用示例
配置思路:
- 定义后端主机
- 定义后端主机的集群服务,并指定调度方式
- 引用后端主机的集群服务名称
配置语法:
import directors; # load the directors
backend server1 {
.host =
.port =
}
backend server2 {
.host =
.port =
}
sub vcl_init {
new GROUP_NAME = directors.round_robin();
GROUP_NAME.add_backend(server1);
GROUP_NAME.add_backend(server2);
}
sub vcl_recv {
# send all traffic to the bar director:
set req.backend_hint = GROUP_NAME.backend();
}
5.2 基于 cookie 的 session sticky(会话黏性)
sub vcl_init {
new h = directors.hash();
h.add_backend(one, 1); // backend 'one' with weight '1'
h.add_backend(two, 1); // backend 'two' with weight '1'
}
sub vcl_recv {
// pick a backend based on the cookie header of the client
set req.backend_hint = h.backend(req.http.cookie);
}
5.3 示例 1
[root@Tang varnish]# cat default.vcl
vcl 4.0;
import directors;
backend vhost1 {
.host = "192.168.100.101";
.port = "8080";
}
backend vhost2 {
.host = "192.168.100.102";
.port = "8080";
}
backend vhost3 {
.host = "192.168.100.103";
.port = "8080";
}
sub vcl_init {
new v = directors.round_robin();
v.add_backend(vhost1);
v.add_backend(vhost2);
v.add_backend(vhost3);
}
sub vcl_recv {
set req.backend_hint = v.backend();
}
varnish> backend.list
200
Backend name Refs Admin Probe
default(192.168.100.100,,8080) 3 probe Healthy (no probe)
vhost1(192.168.100.101,,8080) 1 probe Healthy (no probe)
vhost2(192.168.100.102,,8080) 1 probe Healthy (no probe)
vhost3(192.168.100.103,,8080) 1 probe Healthy (no probe)
5.4 示例 2
backend imgsrv1 {
.host = "192.168.10.11";
.port = "80";
}
backend imgsrv2 {
.host = "192.168.10.12";
.port = "80";
}
backend appsrv1 {
.host = "192.168.10.21";
.port = "80";
}
backend appsrv2 {
.host = "192.168.10.22";
.port = "80";
}
sub vcl_init {
new imgsrvs = directors.random();
imgsrvs.add_backend(imgsrv1,10);
imgsrvs.add_backend(imgsrv2,20);
new staticsrvs = directors.round_robin();
staticsrvs.add_backend(appsrv1);
staticsrvs.add_backend(appsrv2);
new appsrvs = directors.hash();
appsrvs.add_backend(appsrv1,1);
appsrvs.add_backend(appsrv2,1);
}
sub vcl_recv {
if (req.url ~ "(?i)\.(css|js)$" {
set req.backend_hint = staticsrvs.backend();
}
if (req.url ~ "(?i)\.(jpg|jpeg|png|gif)$" {
set req.backend_hint = imgsrvs.backend();
} else {
set req.backend_hint = appsrvs.backend(req.http.cookie);
}
}
6、后端主机健康检测介绍及示例(BE Health Check)
6.1 健康检测参数
backend BE_NAME {
.host =
.port =
.probe = {
.url=
.timeout=
.interval=
.window=
.threshold=
}
}
6.2 健康检测参数含义
.probe # 定义健康状态检测方法
.url # 检测时要请求的URL,默认为”/"
.request # 发出的具体请求
.request =
"GET /.healthtest.html HTTP/1.1"
"Host: www.magedu.com"
"Connection: close"
.window # 基于最近的多少次检查来判断其健康状态
.threshold # 最近.window中定义的这么次检查中至有.threshhold定义的次数是成功的
.interval # 检测频度
.timeout # 超时时长
.expected_response # 期望的响应码,默认为200
6.3 健康状态检测的配置方式
配置思路:
语法格式:
(1) probe PB_NAME { }
backend NAME = {
.probe = PB_NAME;
...
}
(2) backend NAME {
.probe = {
...
}
}
6.4 健康状态检测的配置示例
probe check {
.url = "/.healthcheck.html";
.window = 5;
.threshold = 4;
.interval = 2s;
.timeout = 1s;
}
backend default {
.host = "10.1.0.68";
.port = "80";
.probe = check;
}
backend appsrv {
.host = "10.1.0.69";
.port = "80";
.probe = check;
}
7、设置后端主机属性
backend BE_NAME {
...
.connect_timeout = 0.5s;
.first_byte_timeout = 20s;
.between_bytes_timeout = 5s;
.max_connections = 50;
}
varnish 运行时参数
1、线程模型
cache-worker
cache-main
ban lurker
acceptor
epoll/kqueue
...
2、线程相关的参数
在线程池内部,其每一个请求由一个线程来处理; 其worker线程的最大数决定了varnish的并发响应能力。
thread_pools:Number of worker thread pools. 最好小于或等于CPU核心数量
thread_pool_max:The maximum number of worker threads in each pool. 每线程池的最大线程数
thread_pool_min:The minimum number of worker threads in each pool. 额外意义为“最大空闲线程数”
thread_pool_timeout:Thread idle threshold. Threads in excess of thread_pool_min, which have been idle for at least this long, will be destroyed.
thread_pool_add_delay:Wait at least this long after creating a thread.
thread_pool_destroy_delay:Wait this long after destroying a thread.
最大并发连接数=thread_pools * thread_pool_max
3、Timer 相关的参数
send_timeout:Send timeout for client connections. If the HTTP response hasn't been transmitted in this many seconds the session is closed.
timeout_idle:Idle timeout for client connections.
timeout_req: Max time to receive clients request headers, measured from first non-white-space character to double CRNL.
cli_timeout:Timeout for the childs replies to CLI requests from the mgt_param.
4、临时设置方式
vcl.param
param.set
5、永久有效的方法
修改 varnish.params 配置文件:
DEAMON_OPTS="-p PARAM1=VALUE -p PARAM2=VALUE"
varnish 日志区域
shared memory log
# 计数器
# 日志信息
1、varnishstat - Varnish Cache statistics
-1
-1 -f FILED_NAME
-l:可用于-f选项指定的字段名称列表
MAIN.cache_hit # 命中缓存的次数
MAIN.cache_miss # 未命中缓存的次数
示例:
# varnishstat -1 -f MAIN.cache_hit -f MAIN.cache_miss
# varnishstat -l -f MAIN -f MEMPOOL
2、varnishtop - Varnish log entry ranking
-1 # Instead of a continously updated display, print the statistics once and exit.
-i taglist # 可以同时使用多个-i选项,也可以一个选项跟上多个标签
-I <[taglist:]regex>
-x taglist # 排除列表
-X <[taglist:]regex>
3、varnishlog - Display Varnish logs
4、 varnishncsa - Display Varnish logs in Apache / NCSA combined log format