SpamCheck
许多服务器要监视“垃圾邮件”,然后通知客户端它要访问的host是否是垃圾邮件。这种实时的黑洞列表要求尽可能快,并且负载很大,可能是百万级的。
解决这个问题要反应尽可能快,最好是有缓存,负载问题可以通过分布式服务器完成.可以用web server 来完成,SOAP,UDP,自定义协议等。实际上这种服务器可以DNS实现。
public static final String BLACKHOLE = "sbl.spamhaus.org";
public static void main(String[] args) throws SocketException, UnknownHostException {
for (String arg : args) {
if (isSpammer(arg)) {
System.out.println(arg + " is a known spammer.");
} else {
System.out.println(arg + " appears legitimate.");
}
}
}
private static boolean isSpammer(String arg) {
try {
InetAddress address = InetAddress.getByName(arg);
byte[] quad = address.getAddress();
String query = BLACKHOLE;
for (byte octet : quad) {
int unsignedByte = octet < 0 ? octet + 256 : octet;
query = unsignedByte + "." + query;
}
System.out.println(InetAddress.getByName(query).getHostName());
System.out.println(InetAddress.getByName(query).getHostAddress());
return true;
} catch (UnknownHostException e) {
return false;
}
}
使用这种技术要注意对blackhole list和地址的维护。服务器被攻击,对所有请求都拒绝回应等问题要考虑到。
Processing Web Server Logfiles
205.160.186.76 unknown - [17/Jun/2013:22:53:58 -0500]
"GET /bgs/greenbg.gif HTTP 1.0" 200 50
上面那条记录表示来着 205.160.186.76 的浏览器请求资源 /bgs/greenbg.gif ,并且成功请求到,资源大小是 50 bytes
public class Weblog {
public static void main(String[] args) {
try (FileInputStream fin = new FileInputStream(args[0]);
Reader in = new InputStreamReader(fin);
BufferedReader bin = new BufferedReader(in);) {
for (String entry = bin.readLine();
entry != null;
entry = bin.readLine()) {
// separate out the IP address
int index = entry.indexOf(' ');
String ip = entry.substring(0, index);
String theRest = entry.substring(index);
// Ask DNS for the hostname and print it out
try {
InetAddress address = InetAddress.getByName(ip);
System.out.println(address.getHostName() + theRest);
} catch (UnknownHostException ex) {
System.err.println(entry);
}
}
} catch (IOException ex) {
System.out.println("Exception: " + ex);
}
}
}
InetAddress会缓存结果,所以同样的ip地址,不会再次访问DNS。
但上面的程序可以改造一下,变得更快!因为上面的程序花了非常多的时间在“等待”DNS的反应结果。这个时候,用多线程正好解决该问题。一个线程读取log entry,读到的entry交给其他线程去执行。但要注意到,可能log entry有很多很多,那如果每条log entry都启动一个线程的话,那VM几下就会被干趴下,所以这里要用线程池。
public class LookupTask implements Callable<String> {
private String line;
public LookupTask(String line) {
this.line = line;
}
@Override
public String call() {
try {
// separate out the IP address
int index = line.indexOf(' ');
String address = line.substring(0, index);
String theRest = line.substring(index);
String hostname = InetAddress.getByName(address).getHostName();
return hostname + " " + theRest;
} catch (Exception ex) {
return line;
}
}
}
// Requires Java 7 for try-with-resources and multi-catch
public class PooledWeblog {
private final static int NUM_THREADS = 4;
public static void main(String[] args) throws IOException {
ExecutorService executor = Executors.newFixedThreadPool(NUM_THREADS);
Queue<LogEntry> results = new LinkedList<LogEntry>();
try (BufferedReader in = new BufferedReader(
new InputStreamReader(new FileInputStream(args[0]), "UTF-8"));) {
for (String entry = in.readLine(); entry != null; entry = in.readLine()) {
LookupTask task = new LookupTask(entry);
Future<String> future = executor.submit(task);
LogEntry result = new LogEntry(entry, future);
results.add(result);
}
}
// Start printing the results. This blocks each time a result isn't ready.
for (LogEntry result : results) {
try {
System.out.println(result.future.get());
} catch (InterruptedException | ExecutionException ex) {
System.out.println(result.original);
}
}
executor.shutdown();
}
private static class LogEntry {
String original;
Future<String> future;
LogEntry(String original, Future<String> future) {
this.original = original;
this.future = future;
}
}
}
不完全科学的统计,上述方法比第一种方法要快10-50倍!
但上面的程序还有一个设计上的downside!logfile可能会是很大很大,那queue就会很大,程序就会消耗很多很多内存!避免这个问题方法可以是,将output工作放在一个单独的线程中,和input共享一个queue,早先处理的entry可以先打印出来,不必等所有entry都放到queue后再去output。但这个会引起另一个问题,你需要一个单独的signal来告知output已经完成了,因为queue为空并不能保证output已经完成,最简单的办法是count input 的条数和output的条数一致!