nginx做A/B测试

最新推荐文章于 2024-01-19 14:42:09 发布

yanglefeng2

最新推荐文章于 2024-01-19 14:42:09 发布

阅读量308

点赞数 1

文章标签：测试运维 php

本文链接：https://blog.csdn.net/yanglefeng2/article/details/84126017

版权

事先声明，这里说的A/B测试跟工具ApacheBench没有半毛钱关系。这里说的是关于web页面转化率统计方面的测试，点击这里看其解释。A/B测试是目前很多大公司采用的一种科学的统计方法。使用了这种方法后，就再也不需要争吵到底是A图片好还是B图片好了。一切看统计数据。

发布新版本web网站前，先做下A/B测试是最好的做法。那么，在服务器、代码结构方面应该如何实现呢？这个就是本文打算探讨的问题。

前提说明：本文只考虑静态页面的A/B测试。动态请求的A/B测试很容易实现，因此不作考虑。

最初的考虑：利用页面跳转

假设目标页面的URL是http://www.example.com/index.html，可以在index.html加一段随机跳转的代码，例如：

<script>
if(Math.random()>0.5){
	location.href='a.html';
}else{
	location.href='b.html';
}
</script>

这种实现的最大有点是实现简单，并且可以做很多自定义的判断。

但最大的缺点就是页面的URL会改变，而我们的目标是不让用户觉察出他正在被A/B测试，因此这种方式不列入考虑。

一：使用random_index命令

蔽司的服务器使用的是nginx服务器，所以我们先来看下nginx是否有自带的实现。

从google搜索了一下，找到了一个命令random_index。文章链接。具体使用方法如下：

server {
   listen 80;
   server_name www.example1.com;
   location / {
      root /var/www/www.example1.com/test_index;
      random_index on;
   }
}

这个命令的作用是在当前目录随机挑选一个文件进行输出。不过它有一个缺点，就是同一个用户每次刷出来的页面可能是不一样的。

二：使用nginx user模块

介绍的文章请看这里，作者是大虾。由于我没实验过，因此我将文章内容直接抄过来给大家看下：

userid on;
userid_name uid;
userid_domain milanoo.com;
userid_path /;
userid_expires 365d;
if ( $uid_set ~ “^uid=(.{9})(.)(.+)$” ) {
set $serp $2;
set $uid $1$2$3;
}
if ( $uid_got ~ “^uid=(.{9})(.)(.+)$” ) {
set $serp $2;
set $uid $1$2$3;
}
set $fa A;
if ( $serp ~ “(A|B|C|D)” ) {
set $fa B;
}
##这个也可以出C方案D方案，就和 $serp分吧，但是必须是1/16的。。
log_format main ‘$uid_got – $serp – $uid_set’; //debug Log
access_log logs/php.log main; //debug//
#####要想做a/b test对$serp进行正则即可。。###
fastcgi_pass 127.0.0.1:9000;
fastcgi_param FA $fa #将方案号传递给php $_SERVER['FA']
fastcgi_param UID $uid; #传递给php $_SERVER['UID']

感觉有点复杂了，因此我不打算使用这种方法。

三：使用ip_hash

ip_hash也是nginx自带的命令，通常用于负载均衡。配置方法：

upstream backend {
ip_hash;
server 211.100.26.100:80;
server 211.100.26.101:80;
}

这种方法实现简单，但是不能通过weight来分配权重。同一个IP访问，每次得到的结果是一致的。

这种实现方式有缺点。假如你的server上面有数千个静态页面，而你需要做A/B测试的或许只有一个页面，你仍然得将所有代码都COPY到每个机器上面去。

四：第二种ip匹配方法

这种方法是使用nginx的正则匹配上打主意，请看代码：

	location / {
            if ($remote_addr ~ "[02468]$") { 
                rewrite ^(.+)$ /experiment$1 last; 
            }
            rewrite ^(.+)$ /main$1 last;
        }

        location /main {
            internal;
            proxy_pass http://www.reddit.com/r/lisp;
        }

        location /experiment {
            internal;
            proxy_pass http://www.reddit.com/r/haskell;
        }

简单解释一下代码的意思，$remote_addr表示用户的IP。第一个判断语句的意思是，如果IP第四段最后一个数字，如果是0、2、4、6、8，就转向到test中去。这种方式也可以调整概率。假如我希望experiment页面显示的概率是20%，那么我可以这样写： $remote_addr ~ "[0]$" 这样就实现了概率的控制。同一个IP访问服务器时，每次得到的结果都是一样的。

这种方式也是我目前最主要使用的一种方法。

五：自己写一个fastcgi程序

这种实现方式是我自己想的。我编写了一个abtest.cgi的fastcgi程序。

访问方式：/abtest.cgi?templist=a.html;b.html;c.html;d.html。参数templist是需要显示的不同页面。

我只需要以SSI的方式将这段代码嵌入到页面中，就神不知鬼不觉了。

下面贡献出我写的代码：

#include "fcgi_server.h"
#include "fcgi_processor.h"
#include "fcgi_stdio.h"
#include <string>
#include <iostream>
#include <fstream>
#include <stdlib.h>
#include <vector>
#include <map>
using namespace std;

/*
[A/B测试通用类]说明：
本CGI的访问形式为：
/abtest?ip=$REMOTE_ADDR&templist=page_a.html;page_b.html;page_c.html
ip参数表示用户的IP地址，templist表示模板列表。

思路：
将用户的IP地址切成4个整数相加，然后%模板数量，得到模板下标。
然后根据得到的模板下标，读取相应的模板，输出。
*/

#define BLOCK_SIZE 40960

//切割字符串的函数，用于后面处理IP hash
vector<string> explode(const char *sep , const char* str){
	int i,last_i;
	string ss(str),block;
	vector<string> ips;

	size_t len = strlen(str);
	for(i=0,last_i=0;i<len;i++){
		if(str[i] == sep[0]){
			block = ss.substr(last_i , i - last_i).c_str();
			ips.push_back(block);
			//cout<<block<<endl;
			last_i = i + 1;
		}
	}
	block = ss.substr(last_i , len - last_i).c_str();
	ips.push_back(block);
	//cout<<block<<endl;

	return ips;
}
//IP转化为INT以便hash
int ip2int(const char* str){
	const char *sep = ".";
	vector<string> ips;
	ips = explode(sep , str);
	return atoi(ips[0].c_str())+atoi(ips[1].c_str())+atoi(ips[2].c_str())+atoi(ips[3].c_str());
}
string file_get_contents(string filename){//读取静态文件内容以输出
	ifstream input;
	char c;
	int n=0;
	char html[BLOCK_SIZE] = "";

	filename = "/path/to/your/document/abtest/" + filename;
	input.open(filename.c_str());
	if(input.fail())
	{
		cerr<<" File not found"<<endl;
		return "404 File not found";
	}

	while(!input.eof())
	{
		input.get(c);
		html[n++] = c;
	}

	input.close();

	return html;
}
string ip2html(std::string ip_addr , std::string templist){
		int value = ip2int(ip_addr.c_str());
		map<string,string> mytemplates;
		string mytemp;
		const char *sep = ";";
		vector<string> temps;

		temps = explode(sep , templist.c_str());
		mytemp = temps[value % temps.size()];
		return file_get_contents(mytemp);
}

class MyProcessor : public FcgiProcessor
{
public:
	void HandleRequest(FcgiRequest* pRequest, FcgiResponse* pResponse)
	{
		pResponse->SetContentType("text/html");
		pResponse->PrintHeader();

		std::string myip = pRequest->GetClientIp();
		std::string templist = pRequest->GetParam("templist");

		printf(ip2html(myip , templist).c_str());
		printf("<div style='display:none;'>");
		printf("IP:[%s]\n" , myip.c_str());
		printf("</div>");
	}
};

int main()
{
	MyProcessor processor;
	FcgiServer svr(&processor);
	svr.Run();
	return 0; 
}

（这个代码写得烂别见怪。我只是个PHPer，C++代码只在大学时学过，丢光了。）

使用这种方式实现有一个好处，就是可以实现很多个性化的功能，而你只需要稍微修改一下上面的代码。当然，前提是你必须懂C++和fastcgi。

看到我这里使用C++写了一个fastcgi接口，可能你会说既然这样何不使用PHP？主要的原因是PHP太笨重了，对于每秒钟千次的请求，PHP必死无疑。

六：使用SSI判断

既然可以在nginx.conf中使用条件判断，那么是不是也可以使用SSI中的条件判断来实现呢？这样的话，就更具有通用性了。

笔者我昨天花了一个下午做实验，在apache下实验成功了，可是在nginx中却不行。在谷歌中搜索了一下，说是nginx对SSI的支持还不够好。

希望往后新版本的nginx能将这一块继续优化一下啦！