33. varnish

Web Page Cache:

squid --> varnish (两种主要的缓存代理,varnish是根据现有优势较新的一款代理产品)

程序的运行具有局部性特征:
	时间局部性:一个数据被访问过之后,可能很快会被再次访问到;
	空间局部性:一个数据被访问时,其周边的数据也有可能被访问到
	
cache:命中 
	
	热区:局部性;
		时效性:
			缓存空间耗尽:LRU,最近最少使用;
			过期:缓存清理
			
缓存命中率:hit/(hit+miss)
	(0,1)
	页面命中率:基于页面数量进行衡量
	字节命中率:基于页面的体积进行衡量
	
缓存与否:
	私有数据:private,private cache;
	公共数据:public, public or private cache;

Cache-related Headers Fields
	The most important caching header fields are:


	缓存有效性判断机制:
		过期时间:Expires
			HTTP/1.0
				Expires:过期  (本地时区时间)
			HTTP/1.1
				Cache-Control: maxage=      相对时间,不存在时区限制
				Cache-Control: s-maxage=    共有缓存相对时间
		条件式请求:
			Last-Modified/If-Modified-Since:基于文件的修改时间戳来判别;
			Etag/If-None-Match:基于文件的校验码来判别;
			
		Expires:Thu, 13 Aug 2026 02:05:12 GMT
		Cache-Control:max-age=315360000
		ETag:"1ec5-502264e2ae4c0"
		Last-Modified:Wed, 03 Sep 2014 10:00:27 GMT
		
	缓存层级:
		私有缓存:用户代理附带的本地缓存机制;
		公共缓存:反向代理服务器的缓存功能;
		
		User-Agent <--> private cache <--> public cache <--> public cache 2 <--> Original Server

请求报文用于通知缓存服务如何使用缓存响应请求:
	cache-request-directive = 
		"no-cache",                        
		| "no-store"                         
		| "max-age" "=" delta-seconds        
		| "max-stale" [ "=" delta-seconds ]  
		| "min-fresh" "=" delta-seconds      
		| "no-transform"                    
		| "only-if-cached"                  
		| cache-extension                    

响应报文用于通知缓存服务器如何存储上级服务器响应的内容:
	cache-response-directive =
		"public"      可以用于公共缓存和私有缓存                         
		| "private" [ "=" <"> 1#field-name <"> ] 	只能用于私有缓存
		| "no-cache" [ "=" <"> 1#field-name <"> ],可缓存,但响应给客户端之前需要revalidation,即必须发出条件式请求进行缓存有效性验正;
		| "no-store" ,不允许存储响应内容于缓存中;                           
		| "no-transform"                        
		| "must-revalidate"                     
		| "proxy-revalidate"                  
		| "max-age" "=" delta-seconds           
		| "s-maxage" "=" delta-seconds          
		| cache-extension    
		 
开源解决方案:
	squid:
	varnish:
		
	varnish官方站点: http://www.varnish-cache.org/
		Community
		Enterprise
		
		 This is Varnish Cache, a high-performance HTTP accelerator. 
		
	程序架构:
		Manager进程
		Cacher进程,包含多种类型的线程:  (类似nginx)
			accept, worker, expiry, ... 
		shared memory log:
			统计数据:计数器;
			日志区域:日志记录;
				varnishlog, varnishncsa, varnishstat... 
			
		配置接口:VCL   需要通过vcl接口配置,然后用编译程C语言,再编译成二进制
			Varnish Configuration Language, 
				vcl complier --> c complier --> shared object 

				
	varnish的程序环境:
	    两个主要配置文件
			/etc/varnish/varnish.params: 配置varnish服务进程的工作特性,例如监听的地址和端口,缓存机制;
			/etc/varnish/default.vcl:配置各Child/Cache线程的缓存策略;
		主程序:
			/usr/sbin/varnishd
		CLI interface:
			/usr/bin/varnishadm
		Shared Memory Log交互工具:
			/usr/bin/varnishhist
			/usr/bin/varnishlog
			/usr/bin/varnishncsa
			/usr/bin/varnishstat
			/usr/bin/varnishtop		
		测试工具程序:
			/usr/bin/varnishtest
		VCL配置文件重载程序:
			/usr/sbin/varnish_reload_vcl
		Systemd Unit File:
			/usr/lib/systemd/system/varnish.service
				varnish服务
			/usr/lib/systemd/system/varnishlog.service
			/usr/lib/systemd/system/varnishncsa.service	
				日志持久的服务;(默认日志重启后失效,如果想持久有效,需要开启日志的守护进程服务)
				
	varnish的三种缓存存储机制( Storage Types):
		-s [name=]type[,options]
		
		· malloc[,size]
			内存存储,[,size]用于定义空间大小;重启后所有缓存项失效;
		· file[,path[,size[,granularity]]]
			磁盘文件存储,黑盒;重启后所有缓存项失效;
		· persistent,path,size
			文件存储,黑盒;重启后所有缓存项有效;实验;
			
	varnish程序的选项:
		程序选项:/etc/varnish/varnish.params文件(主要是传递给warnishd这个进程的选项)
			-a address[:port][,address[:port][...],默认为6081端口; 
			-T address[:port],默认为6082端口;
			-s [name=]type[,options],定义缓存存储机制;
			-u user
			-g group
			-f config:VCL配置文件;
			-F:运行于前台;
			...
		运行时参数:/etc/varnish/varnish.params文件, DEAMON_OPTS
			DAEMON_OPTS="-p thread_pool_min=5 -p thread_pool_max=500 -p thread_pool_timeout=300"
			
			-p param=value:设定运行参数及其值; 可重复使用多次;
			-r param[,param...]: 设定指定的参数为只读状态; 
			
	重载vcl配置文件:
		~ ]# varnish_reload_vcl
			
	varnishadm(远程连接管理接口程序)
		-S /etc/varnish/secret -T [ADDRESS:]PORT 
  
		help [<command>]
		ping [<timestamp>]
		auth <response>
		quit
		banner
		status
		start
		stop
		vcl.load <configname> <filename>
		vcl.inline <configname> <quoted_VCLstring>
		vcl.use <configname>
		vcl.discard <configname>
		vcl.list
		param.show [-l] [<param>]
		param.set <param> <value>
		panic.show
		panic.clear
		storage.list
		vcl.show [-v] <configname>
		backend.list [<backend_expression>]
		backend.set_health <backend_expression> <state>
		ban <field> <operator> <arg> [&& <field> <oper> <arg>]...
		ban.list	
		
		配置文件相关:
			vcl.list 
			vcl.load:装载,加载并编译;
			vcl.use:激活;
			vcl.discard:删除;
			vcl.show [-v] <configname>:查看指定的配置文件的详细信息;
			
		运行时参数:
			param.show -l:显示列表;
			param.show <PARAM>
			param.set <PARAM> <VALUE>
			
		缓存存储:
			storage.list
			
		后端服务器:
			backend.list 
			
	VCL:
		”域“专有类型的配置语言;
		
		state engine:状态引擎;
		
		VCL有多个状态引擎,状态之间存在相关性,但状态引擎彼此间互相隔离;每个状态引擎可使用return(x)指明关联至哪个下一级引擎;每个状态引擎对应于vcl文件中的一个配置段,即为subroutine
		
			vcl_hash --> return(hit) --> vcl_hit
			
		vcl_recv的默认配置:
		
			sub vcl_recv {
				if (req.method == "PRI") {
					/* We do not support SPDY or HTTP/2.0 */
					return (synth(405));
				}
				if (req.method != "GET" &&
				req.method != "HEAD" &&
				req.method != "PUT" &&
				req.method != "POST" &&
				req.method != "TRACE" &&
				req.method != "OPTIONS" &&
				req.method != "DELETE") {
					/* Non-RFC2616 or CONNECT which is weird. */
					return (pipe);
				}

				if (req.method != "GET" && req.method != "HEAD") {
					/* We only deal with GET and HEAD by default */
					return (pass);
				}
				if (req.http.Authorization || req.http.Cookie) {
					/* Not cacheable by default */
					return (pass);
				}
					return (hash);
				}
			}
		
			
		Client Side:
			vcl_recv, vcl_pass, vcl_hit, vcl_miss, vcl_pipe, vcl_purge, vcl_synth, vcl_deliver
			
			vcl_recv:
				hash:vcl_hash
				pass: vcl_pass 
				pipe: vcl_pipe
				synth: vcl_synth
				purge: vcl_hash --> vcl_purge
				
			vcl_hash:
				lookup:
					hit: vcl_hit
					miss: vcl_miss
					pass, hit_for_pass: vcl_pass
					purge: vcl_purge
			
		Backend Side:
			vcl_backend_fetch, vcl_backend_response, vcl_backend_error
	
		两个特殊的引擎:
			vcl_init:在处理任何请求之前要执行的vcl代码:主要用于初始化VMODs;
			vcl_fini:所有的请求都已经结束,在vcl配置被丢弃时调用;主要用于清理VMODs;
		
	vcl的语法格式:
		(1) VCL files start with vcl 4.0;
		(2) //, # and /* foo */ for comments;
		(3) Subroutines are declared with the sub keyword; 例如sub vcl_recv { ...};
		(4) No loops, state-limited variables(受限于引擎的内建变量);
		(5) Terminating statements with a keyword for next action as argument of the return() function, i.e.: return(action);用于实现状态引擎转换; 
		(6) Domain-specific;
		
	The VCL Finite State Machine
		(1) Each request is processed separately;
		(2) Each request is independent from others at any given time;
		(3) States are related, but isolated;
		(4) return(action); exits one state and instructs Varnish to proceed to the next state;
		(5) Built-in VCL code is always present and appended below your own VCL;
		
	三类主要语法:
		sub subroutine {
			...
		}
		
		if CONDITION {
			...
		} else {	
			...
		}
		
		return(), hash_data()
		
	VCL Built-in Functions and Keywords
		函数:
			regsub(str, regex, sub)
			regsuball(str, regex, sub)
			ban(boolean expression)
			hash_data(input)
			synthetic(str)
			
		Keywords:
			call subroutine, return(action),new,set,unset 
			
		操作符:
			==, !=, ~, >, >=, <, <=
			逻辑操作符:&&, ||, !
			变量赋值:=
			
		举例:obj.hits是内建变量,用于保存某缓存项的从缓存中命中的次数;
			if (obj.hits>0) {
				set resp.http.X-Cache = "HIT via " + server.ip;
			} else {
				set resp.http.X-Cache = "MISS via " + server.ip;
			}
					
	
	变量类型:(直接引用或添加报文)
		内建变量:
			req.*:request,表示由客户端发来的请求报文相关;
				req.http.*
					req.http.User-Agent, req.http.Referer, ...   
			bereq.*:由varnish发往BE主机的httpd请求相关;
				bereq.http.*
			beresp.*:由BE主机响应给varnish的响应报文相关;
				beresp.http.*
			resp.*:由varnish响应给client相关;
			obj.*:存储在缓存空间中的缓存对象的属性;只读;
			
			常用变量:
				bereq.*, req.*:
					bereq.http.HEADERS  引用首部
					bereq.request:请求方法;
					bereq.url:请求的url;
					bereq.proto:请求的协议版本;
					bereq.backend:指明要调用的后端主机;
					
					req.http.Cookie:客户端的请求报文中Cookie首部的值; 
					req.http.User-Agent ~ "chrome"
					
					
				beresp.*, resp.*:
					beresp.http.HEADERS
					beresp.status:响应的状态码;
					reresp.proto:协议版本;
					beresp.backend.name:BE主机的主机名;
					beresp.ttl:BE主机响应的内容的余下的可缓存时长;
					
				obj.*
					obj.hits:此对象从缓存中命中的次数;
					obj.ttl:对象的ttl值
					
				server.*
					server.ip
					server.hostname
				client.*
					client.ip					
			
		用户自定义:
			set 
			unset 

各个变量可用的位置
在这里插入图片描述

	示例1:强制对某类资源的请求不检查缓存:
		vcl_recv {
			if (req.url ~ "(?i)^/(login|admin)") {
				return(pass);
			}
		}
			
	示例2:对于特定类型的资源,例如公开的图片等,取消其私有标识,并强行设定其可以由varnish缓存的时长; 定义在vcl_backend_response中;
		if (beresp.http.cache-control !~ "s-maxage") {
			if (bereq.url ~ "(?i)\.(jpg|jpeg|png|gif|css|js)$") {
				unset beresp.http.Set-Cookie;				取消cookie标识
				set beresp.ttl = 3600s;
			}
		}
		
	示例3:定义在vcl_recv中;定义客户端IP
		if (req.restarts == 0) {
			if (req.http.X-Fowarded-For) {
				set req.http.X-Forwarded-For = req.http.X-Forwarded-For + "," + client.ip;
			} else {
				set req.http.X-Forwarded-For = client.ip;
			}
		}		
			
	缓存对象的修剪:purge, ban  
		(1) 能执行purge操作(  清理单条匹配到的缓存项
			sub vcl_purge {
				return (synth(200,"Purged"));
			}
			
		(2) 何时执行purge操作
			sub vcl_recv {
				if (req.method == "PURGE") {
					return(purge);
				}
				...
			}
			
		添加此类请求的访问控制法则:
			acl purgers {  
				"127.0.0.0"/8;
				"10.1.0.0"/16;
			}
			
			sub vcl_recv {
				if (req.method == "PURGE") {
					if (!client.ip ~ purgers) {						:如果来访问的IP地址不在访问控制里就反回405
						return(synth(405,"Purging not allowed for " + client.ip));
					}
					return(purge);
				}
				...
			}
			
		Banning:清理一部分匹配到的缓存项
			(1) varnishadm:
				ban <field> <operator> <arg>
				
				示例:在命令行
					ban req.url ~ ^/javascripts
					
			(2) 在配置文件中定义,使用ban()函数;
			
			示例:
				if (req.method == "BAN") {
					ban("req.http.host == " + req.http.host + " && req.url == " + req.url);
					# Throw a synthetic page so the request won't go to the backend.
					return(synth(200, "Ban added"));
				}				
			
	如何设定使用多个后端主机:先定义后端主机,后把后端主机定义成组,后调用组
		backend default {
			.host = "172.16.100.6";
			.port = "80";
		}

		backend appsrv {
			.host = "172.16.100.7";
			.port = "80";
		}
		
		sub vcl_recv {				
			if (req.url ~ "(?i)\.php$") {
				set req.backend_hint = appsrv;
			} else {
				set req.backend_hint = default;
			}	
			
			...
		}
		
	
		
	Director:
		varnish module; 
			使用前需要导入:
				import directors;
		
		示例:
			import directors;    # load the directors  导入模块

			backend server1 {
				.host = 
				.port = 
			}
			backend server2 {
				.host = 
				.port = 
			}

			sub vcl_init {
				new GROUP_NAME = directors.round_robin();
				GROUP_NAME.add_backend(server1);  
				GROUP_NAME.add_backend(server2);
			}

			sub vcl_recv {
				# send all traffic to the bar director:
				set req.backend_hint = GROUP_NAME.backend();
			}
			
		基于cookie的session sticky:
			sub vcl_init {
				new h = directors.hash();
				h.add_backend(one, 1);   // backend 'one' with weight '1'
				h.add_backend(two, 1);   // backend 'two' with weight '1'
			}

			sub vcl_recv {
				// pick a backend based on the cookie header of the client
				set req.backend_hint = h.backend(req.http.cookie);
			}				
		
	BE Health Check:
		backend BE_NAME {
			.host =  
			.port = 
			.probe = {
				.url= 
				.timeout= 
				.interval= 
				.window=
				.threshold=
			}
		}
		
		.probe:定义健康状态检测方法;
			.url:检测时要请求的URL,默认为”/"; 
			.request:发出的具体请求;
				.request = 
					"GET /.healthtest.html HTTP/1.1"
					"Host: www.magedu.com"
					"Connection: close"
			.window:基于最近的多少次检查来判断其健康状态; 
			.threshold:最近.window中定义的这么次检查中至有.threshhold定义的次数是成功的;
			.interval:检测频度; 
			.timeout:超时时长;
			.expected_response:期望的响应码,默认为200;
			
		健康状态检测的配置方式:
			(1) probe PB_NAME  { }
			     backend NAME = {
				.probe = PB_NAME;
				...
			     }
			     
			(2) backend NAME  {
				.probe = {
					...
				}
			}

		示例:
			probe check {
				.url = "/.healthcheck.html";
				.window = 5;
				.threshold = 4;
				.interval = 2s;
				.timeout = 1s;
			}

			backend default {
				.host = "10.1.0.68";
				.port = "80";
				.probe = check;
			}

			backend appsrv {
				.host = "10.1.0.69";
				.port = "80";
				.probe = check;
			}	
			
	设置后端的主机属性:
		backend BE_NAME {
			...
			.connect_timeout = 0.5s;
			.first_byte_timeout = 20s;
			.between_bytes_timeout = 5s;
			.max_connections = 50;
		}
			
			
	 varnish的运行时参数:
		线程模型:
			cache-worker
			cache-main
			ban lurker
			acceptor:
			epoll/kqueue:
			...
			
		线程相关的参数:
			在线程池内部,其每一个请求由一个线程来处理; 其worker线程的最大数决定了varnish的并发响应能力;
			
			thread_pools:Number of worker thread pools. 最好小于或等于CPU核心数量; 
			thread_pool_max:The maximum number of worker threads in each pool. 每线程池的最大线程数;
			thread_pool_min:The minimum number of worker threads in each pool. 额外意义为“最大空闲线程数”;
			
				最大并发连接数=thread_pools  * thread_pool_max
				
			thread_pool_timeout:Thread idle threshold.  Threads in excess of thread_pool_min, which have been idle for at least this long, will be destroyed.
			thread_pool_add_delay:Wait at least this long after creating a thread.
			thread_pool_destroy_delay:Wait this long after destroying a thread.
			
		Timer相关的参数:
			send_timeout:Send timeout for client connections. If the HTTP response hasn't been transmitted in this many seconds the session is closed.
			timeout_idle:Idle timeout for client connections. 
			timeout_req: Max time to receive clients request headers, measured from first non-white-space character to double CRNL.
			cli_timeout:Timeout for the childs replies to CLI requests from the mgt_param.
		
			设置方式:
				vcl.param 
				param.set
			
			永久有效的方法:
				varnish.params
					DEAMON_OPTS="-p PARAM1=VALUE -p PARAM2=VALUE"
					
	varnish日志区域:
		shared memory log 
			计数器
			日志信息
			
		1、varnishstat - Varnish Cache statistics
			-1
			-1 -f FILED_NAME 
			-l:可用于-f选项指定的字段名称列表;
			
			MAIN.cache_hit 
			MAIN.cache_miss
			
			# varnishstat -1 -f MAIN.cache_hit -f MAIN.cache_miss
			# varnishstat -l -f MAIN -f MEMPOOL
			
		2、varnishtop - Varnish log entry ranking
			-1     Instead of a continously updated display, print the statistics once and exit.
			-i taglist,可以同时使用多个-i选项,也可以一个选项跟上多个标签;
			-I <[taglist:]regex>
			-x taglist:排除列表
			-X  <[taglist:]regex>
			
		3、varnishlog - Display Varnish logs
			
		4、 varnishncsa - Display Varnish logs in Apache / NCSA combined log format
		
	内建函数:
		hash_data():指明哈希计算的数据;减少差异,以提升命中率;
		regsub(str,regex,sub):把str中被regex第一次匹配到字符串替换为sub;主要用于URL Rewrite
		regsuball(str,regex,sub):把str中被regex每一次匹配到字符串均替换为sub;
		return():
		ban(expression) 
		ban_url(regex):Bans所有的其URL可以被此处的regex匹配到的缓存对象;
		synth(status,"STRING"):purge操作;

示例:
backend imgsrv1 {
	.host = "192.168.10.11";
	.port = "80";
}

backend imgsrv2 {
	.host = "192.168.10.12";
	.port = "80";
}	

backend appsrv1 {
	.host = "192.168.10.21";
	.port = "80";
}

backend appsrv2 {
	.host = "192.168.10.22";
	.port = "80";
}

sub vcl_init {
	new imgsrvs = directors.random();
	imgsrvs.add_backend(imgsrv1,10);
	imgsrvs.add_backend(imgsrv2,20);
	
	new staticsrvs = directors.round_robin();
	appsrvs.add_backend(appsrv1);
	appsrvs.add_backend(appsrv2);
	
	new appsrvs = directors.hash();
	appsrvs.add_backend(appsrv1,1);
	appsrvs.add_backend(appsrv2,1);		
}

sub vcl_recv {
	if (req.url ~ "(?i)\.(css|js)$" {
		set req.backend_hint = staticsrvs.backend();
	} 		
	if (req.url ~ "(?i)\.(jpg|jpeg|png|gif)$" {
		set req.backend_hint = imgsrvs.backend();
	} else {		
		set req.backend_hint = appsrvs.backend(req.http.cookie);
	}
}
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值