模拟网页行为之实践篇三

现在来谈下验证码图片的获取方式,带有验证码的地方都会附带有个刷新按钮,而刷新按钮的地方就是获取验证码网址代码。如果看过前面写的《模拟网页行为之工具篇》就会很容易定位到代码位置。定位到代码位置后看下图:


基本可以看到的是获取验证码的网址是:https://ipin.siren24.com/stickyCaptcha。但这还不够,因为前篇我们还讲过关于cookie的概念,需要带有cookie去刷新验证码才是有效的验证码,但如何获取cookie,看下图:


可见cookie类型是hostonly的,hostonly就是说只能在当前网页获取cookie。有了上述抓包分析,那么对验证码的流程有了基本的了解。

那么接下来的步骤分为

1.获取https://ipin.siren24.com/stickyCaptcha的cookie,。

2. 带cookie刷新验证码获取图片数据。


获取hostonly的cookie,c++代码实现如下:

std::string CWebLoginDlg::GetCookie( std::string url )
{
	LPSTR lpszData = NULL;
	DWORD dwSize=0; 
	lpszData= new char[1];
	memset(lpszData,0, 1);

retry:	if (!InternetGetCookieA(url.c_str(), "", lpszData, &dwSize))
		{
			DWORD er = GetLastError();
			if (er == ERROR_INSUFFICIENT_BUFFER)
			{
				delete []lpszData;
				lpszData = new char[dwSize+1];
				memset(lpszData,0,dwSize+1);
				goto retry;
			}
			else
			{
				ATLTRACE("cookie is null");
			}
		}
	std::string strCookieContent = std::string(lpszData, dwSize);
	delete [] lpszData;
	return strCookieContent;
}
参数即为:https://ipin.siren24.com/stickyCaptcha


若cookie为httponly类型,获取的方式也不一样,C++代码如下:

std::wstring CWebLoginDlg::GetCookieEx( std::wstring url )
{
	LPWSTR lpszData = NULL;
	DWORD dwSize=0; 
	lpszData= new wchar_t[1];
	memset(lpszData,0, sizeof(wchar_t));

retry:	if (!InternetGetCookieEx(url.c_str(), L"JSESSIONID", lpszData, &dwSize, 0x00002000, NULL))
		{
			DWORD er = GetLastError();
			if (er == ERROR_INSUFFICIENT_BUFFER)
			{
				delete []lpszData;
				lpszData = new wchar_t[dwSize+1];
				memset(lpszData,0,dwSize+1);
				goto retry;
			}
			else
			{
				ATLTRACE("cookie is null");
			}
		}
		std::wstring strCookieContent = std::wstring(lpszData, dwSize);
		delete [] lpszData;
		return strCookieContent;
}

刷新验证码图片数据,我采用的方式是用curl库,实际上所有网页走网络协议方式都可以借助curl来实现,但这里只单纯刷验证码图片数据。上面步骤把cookie获取到后,将其编辑成以下格式,然后将其作为参数cookie,传入到获取网页返回数据函数,c++代码表示如下:

std::string cookie = GetCookie("https://ipin.siren24.com/stickyCaptcha");
char nline[1024];
sprintf_s(nline, sizeof(nline),
	"%s; domain=ipin.siren24.com; path=/; hostOnly", cookie.c_str());
m_pCurlClient->GetURLResource("https://ipin.siren24.com/stickyCaptcha", nline, ret);

GetURLResource实现如下:

struct MemoryStruct {
	char *memory;
	size_t size;
};

size_t CurlClient::WriteMemoryCallback(void *contents, size_t size, size_t nmemb, void *userp)
{
	size_t realsize = size * nmemb;
	if (userp == NULL)
	{
		return realsize;
	}
	struct MemoryStruct *mem = (struct MemoryStruct *)userp;
	//ATLTRACE("222 chunk addr %x %d %d threaid %d", (DWORD)mem, mem->size, realsize, GetCurrentThreadId());

	mem->memory = (char*)realloc(mem->memory, mem->size + realsize + 1);
	if(mem->memory == NULL) {
		/* out of memory! */ 
		printf("not enough memory (realloc returned NULL)\n");
		return 0;
	}

	memcpy(&(mem->memory[mem->size]), contents, realsize);
	mem->size += realsize;
	mem->memory[mem->size] = 0;

	return realsize;
}

bool CurlClient::GetURLResource( std::string url, std::string cookie, std::string &rev)
{
	bool ssl = (url.find("https") != std::string::npos);
	struct MemoryStruct chunk;
	chunk.memory = (char*)malloc(1); 
	chunk.size = 0;   

	CURL *curl;
	CURLcode res;

	curl = curl_easy_init();
	if (curl)
	{
		if (!cookie.empty())
		{
			char nline[1024];
			sprintf_s(nline, sizeof(nline),
				"Set-Cookie: "
				"%s", cookie.c_str());
			res = curl_easy_setopt(curl, CURLOPT_COOKIELIST, nline);
		}

		curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
		int agentIndex = m_Multi ? GetCurrentThreadId() % m_UserAgentList.size() : 0;
		curl_easy_setopt(curl, CURLOPT_USERAGENT, m_UserAgentList[agentIndex].c_str());
		if (ssl)
		{
			curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, 0L);
			curl_easy_setopt(curl, CURLOPT_SSL_VERIFYHOST, 0L);
		}
		curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, &CurlClient::WriteMemoryCallback);
		curl_easy_setopt(curl, CURLOPT_WRITEDATA, (void*)&chunk);
		curl_easy_setopt(curl, CURLOPT_VERBOSE, 1);
		curl_easy_setopt(curl, CURLOPT_NOSIGNAL, 1L);
		curl_easy_setopt(curl, CURLOPT_FORBID_REUSE, 1); //多线程完成任务马上断开连接
		curl_easy_setopt(curl, CURLOPT_TIMEOUT, 30);
		curl_easy_setopt(curl, CURLOPT_CONNECTTIMEOUT, 15);
		curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1);
		res = curl_easy_perform(curl);
		if (res != CURLE_OK)
		{
			char curlerror[1024 * 5] = {0};
			sprintf_s(curlerror, _countof(curlerror), "返回的信息 %s",curl_easy_strerror(res));
			m_Error = curlerror;
		}
		rev = std::string(chunk.memory, chunk.size);
		free(chunk.memory);
		curl_easy_cleanup(curl);
	}
	return res == CURLE_OK;
}

以上,验证码的图片数据即可获取。





评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值