最近想注册一个域名,使用万网尝试了很多域名,基本都已被注册。之前听说双拼域名很火,所以想写个脚本,看看哪些双拼域名还未被注册。
一、查询接口
网上搜索了一下,万网的域名查询接口比较简单易用,查询URL格式为: http://panda.www.net.cn/cgi-bin/check.cgi?area_domain=aaa.com
返回值及含义:
210 : Domain name is available
211 : Domain name is not available
212 : Domain name is invalid
214 : Unknown error
二、编程思路
1. DomainGenerator读取文件pinyin.txt,获取所有可用的拼音字母。遍历拼音字母, 组装成双拼域名。这个拼音列表是从网上搜索来的,可能会有纰漏。
2. 创建域名检测线程DomainRunner,每个线程采用httpclient调用万网的域名查询接口。
3. 每个线程调用DomainValidator检查返回结果。
4. 线程ResultRunner将可用域名写入domain.txt文件。
三、核心代码
DomainGenerator.java, 启动类,读取拼音列表,组装需要检测的域名,创建检测线程和结果处理线程。
package com.learnworld;
import java.util.List;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileReader;
import java.io.FileWriter;
import java.util.ArrayList;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.atomic.AtomicInteger;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.impl.conn.PoolingHttpClientConnectionManager;
public class DomainGenerator {
public static void main(String[] args){
// pinyin list, read from pinyin.txt
List<String> items = new ArrayList<String>();
// domain list, which need to check
ArrayBlockingQueue<String> taskQueue = new ArrayBlockingQueue<String>(163620);
// available domain list, which need to save into file
LinkedBlockingQueue<String> resultQueue = new LinkedBlockingQueue<String>();
// counter, need to count unavailable domain statistical information
AtomicInteger count = new AtomicInteger(0);
// Httpclient initialization
PoolingHttpClientConnectionManager cm = new PoolingHttpClientConnectionManager();
cm.setMaxTotal(20);
cm.setDefaultMaxPerRoute(20);
CloseableHttpClient httpClient = HttpClients.custom().setConnectionManager(cm).build();
try {
// pinyin.txt, used to save all available pinyin
BufferedReader reader = new BufferedReader(new FileReader("pinyin.txt"));
// domain.txt, used to save all available domain result
BufferedWriter writer = new BufferedWriter(new FileWriter("domain.txt"));
String item = null;
while((item = reader.readLine()) != null){
items.add(item);
}
// generate domain list
for (String item1 : items){
for (String item2 : items) {
taskQueue.offer(item1 + item2 + ".com");
}
}
int domainThreadNum = 3;
CountDownLatch downLatch = new CountDownLatch(domainThreadNum);
ExecutorService executor = Executors.newFixedThreadPool(domainThreadNum + 1);
// start domain check thread
for(int i = 0; i < domainThreadNum; i++){
executor.execute(new DomainRunner(taskQueue, resultQueue, downLatch, count, httpClient));
}
// start result handle thread
executor.execute(new ResultRunner(resultQueue, writer));
downLatch.await();
System.out.println("All tasks are done!");
// TODO, suggest use volatile flag to control ResultRunner
executor.shutdownNow();
reader.close();
writer.close();
httpClient.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
DomainRunner:域名检测线程,从域名domainQueue中读取域名,调用接口进行检测。 如果域名可用,将结果放入resultQueue中等待写入文件。
package com.learnworld;
import java.io.IOException;
import java.util.Calendar;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.atomic.AtomicInteger;
import org.apache.http.HttpEntity;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.protocol.BasicHttpContext;
import org.apache.http.protocol.HttpContext;
import org.apache.http.util.EntityUtils;
public class DomainRunner implements Runnable {
private ArrayBlockingQueue<String> domainQueue;
private LinkedBlockingQueue<String> resultQueue;
private CountDownLatch downLatch;
private AtomicInteger count;
private CloseableHttpClient httpClient;
public DomainRunner(ArrayBlockingQueue<String> domainQueue,
LinkedBlockingQueue<String> resultQueue, CountDownLatch downLatch,
AtomicInteger count, CloseableHttpClient httpClient) {
super();
this.domainQueue = domainQueue;
this.resultQueue = resultQueue;
this.downLatch = downLatch;
this.count = count;
this.httpClient = httpClient;
}
@Override
public void run() {
String domain = null;
while ((domain = domainQueue.poll()) != null) {
boolean isDomainAvailable = false;
RequestConfig requestConfig = RequestConfig.custom()
.setSocketTimeout(5000)
.setConnectTimeout(5000)
.setConnectionRequestTimeout(5000)
.build();
HttpGet httpGet = new HttpGet("http://panda.www.net.cn/cgi-bin/check.cgi?area_domain=" + domain);
httpGet.setConfig(requestConfig);
httpGet.setHeader("Connection", "close");
HttpContext context = new BasicHttpContext();
CloseableHttpResponse response = null;
try {
response = httpClient.execute(httpGet, context);
HttpEntity entity = response.getEntity();
int status = response.getStatusLine().getStatusCode();
if (status >= 200 && status < 300) {
String resultXml = EntityUtils.toString(entity);
isDomainAvailable = DomainValidator.isAvailableDomainForResponse(resultXml);
EntityUtils.consumeQuietly(entity);
} else {
System.out.println(domain + " check error.");
}
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
httpGet.releaseConnection();
if (response != null) {
response.close();
}
} catch (IOException e) {
e.printStackTrace();
}
}
// result handle
if(isDomainAvailable) {
resultQueue.offer(domain);
} else {
int totalInvalid = count.addAndGet(1);
if (totalInvalid % 100 == 0) {
System.out.println(totalInvalid + " " + Calendar.getInstance().getTime());
}
}
}
downLatch.countDown();
}
}
DomainValidator: 对万网返回结果进行检查,判断域名是否可用。
package com.learnworld;
public class DomainValidator {
public static boolean isAvailableDomainForResponse(String responseXml){
if(responseXml == null || responseXml.isEmpty()){
return false;
}
if(responseXml.contains("<original>210")){
return true;
} else if(responseXml.contains("<original>211")
|| responseXml.contains("<original>212")
|| responseXml.contains("<original>214")){
return false;
} else {
System.out.println("api callback error!");
try {
Thread.sleep(60000);
} catch (InterruptedException e) {
e.printStackTrace();
}
return false;
}
}
}
ResultRunner: 结果处理线程,将可用域名写入文件domain.txt中。
package com.learnworld;
import java.io.BufferedWriter;
import java.util.concurrent.LinkedBlockingQueue;
public class ResultRunner implements Runnable{
private LinkedBlockingQueue<String> resultQueue;
BufferedWriter writer;
public ResultRunner(LinkedBlockingQueue<String> resultQueue,
BufferedWriter writer) {
super();
this.resultQueue = resultQueue;
this.writer = writer;
}
@Override
public void run() {
String result = null;
try {
while ((result = resultQueue.take()) != null) {
writer.write(result);
writer.newLine();
writer.flush();
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
四、总结
1. 第一版程序采用单线程处理,性能很差,每100个域名大概需要90s左右,主要原因是网络IO延迟比较大。将代码修改为多线程处理,创建两个检测线程,每100个域名大概需要30s左右。
2. 提高检测线程数会加快处理性能,但建议不超过三个,原因有两个:
1) 万网采用了阿里云的过滤技术,如果一段时间内某个IP的请求数很高,就会将该IP加入屏蔽列表。 我开始采用了100个线程,不到1分钟就被屏蔽。
2)当请求数很高时,网络连接不能得到及时释放,很多TCP连接处于TIME_WAIT状态,进而出现BindException错误。
3. 我遍历了所有的双拼域名,目前约有1万个域名尚未被注册,结果见附件。我又遍历了四位及以下的纯英文字母域名,已经全部被注册。
需要注册双拼域名的童鞋要抓紧了~~