c4 Internet Address - Some useful Program

SpamCheck

许多服务器要监视“垃圾邮件”,然后通知客户端它要访问的host是否是垃圾邮件。这种实时的黑洞列表要求尽可能快,并且负载很大,可能是百万级的。

解决这个问题要反应尽可能快,最好是有缓存,负载问题可以通过分布式服务器完成.可以用web server 来完成,SOAP,UDP,自定义协议等。实际上这种服务器可以DNS实现。

	public static final String BLACKHOLE = "sbl.spamhaus.org";

	public static void main(String[] args) throws SocketException, UnknownHostException {

		for (String arg : args) {
			if (isSpammer(arg)) {
				System.out.println(arg + " is a known spammer.");
			} else {
				System.out.println(arg + " appears legitimate.");
			}
		}
	}

	private static boolean isSpammer(String arg) {
		try {
			InetAddress address = InetAddress.getByName(arg);
			byte[] quad = address.getAddress();
			String query = BLACKHOLE;
			for (byte octet : quad) {
				int unsignedByte = octet < 0 ? octet + 256 : octet;
				query = unsignedByte + "." + query;
			}
			System.out.println(InetAddress.getByName(query).getHostName());
			System.out.println(InetAddress.getByName(query).getHostAddress());
			return true;
		} catch (UnknownHostException e) {
			return false;
		}
	}

使用这种技术要注意对blackhole list和地址的维护。服务器被攻击,对所有请求都拒绝回应等问题要考虑到。


Processing Web Server Logfiles

205.160.186.76 unknown - [17/Jun/2013:22:53:58 -0500]
                               "GET /bgs/greenbg.gif HTTP 1.0" 200 50

上面那条记录表示来着 205.160.186.76 的浏览器请求资源 /bgs/greenbg.gif ,并且成功请求到,资源大小是 50 bytes

public class Weblog {
  public static void main(String[] args) {
    try (FileInputStream fin =  new FileInputStream(args[0]);
      Reader in = new InputStreamReader(fin);
      BufferedReader bin = new BufferedReader(in);) {
      for (String entry = bin.readLine();
        entry != null;
        entry = bin.readLine()) {
        // separate out the IP address
        int index = entry.indexOf(' ');
        String ip = entry.substring(0, index);
        String theRest = entry.substring(index);
        // Ask DNS for the hostname and print it out
        try {
          InetAddress address = InetAddress.getByName(ip);
          System.out.println(address.getHostName() + theRest);
        } catch (UnknownHostException ex) {
          System.err.println(entry);
        }
      }
    } catch (IOException ex) {
      System.out.println("Exception: " + ex);
    }
  }
}

InetAddress会缓存结果,所以同样的ip地址,不会再次访问DNS。

但上面的程序可以改造一下,变得更快!因为上面的程序花了非常多的时间在“等待”DNS的反应结果。这个时候,用多线程正好解决该问题。一个线程读取log entry,读到的entry交给其他线程去执行。但要注意到,可能log entry有很多很多,那如果每条log entry都启动一个线程的话,那VM几下就会被干趴下,所以这里要用线程池。

public class LookupTask implements Callable<String> {
  private String line;
  public LookupTask(String line) {
    this.line = line;
  }
  @Override
  public String call() {
    try {
      // separate out the IP address
      int index = line.indexOf(' ');
      String address = line.substring(0, index);
      String theRest = line.substring(index);
      String hostname = InetAddress.getByName(address).getHostName();
      return hostname + " " + theRest;
    } catch (Exception ex) {
      return line;
    }
  }
}

	// Requires Java 7 for try-with-resources and multi-catch
	public class PooledWeblog {
	  private final static int NUM_THREADS = 4;
	  public static void main(String[] args) throws IOException {
		    ExecutorService executor = Executors.newFixedThreadPool(NUM_THREADS);
		    Queue<LogEntry> results = new LinkedList<LogEntry>();
		    try (BufferedReader in = new BufferedReader(
		      new InputStreamReader(new FileInputStream(args[0]), "UTF-8"));) {
		      for (String entry = in.readLine(); entry != null; entry = in.readLine()) {
		        LookupTask task = new LookupTask(entry);
		        Future<String> future = executor.submit(task);
		        LogEntry result = new LogEntry(entry, future);
		        results.add(result);
		      }
		    }
		    // Start printing the results. This blocks each time a result isn't ready.
		    for (LogEntry result : results) {
		      try {
		        System.out.println(result.future.get());
		      } catch (InterruptedException | ExecutionException ex) {
		        System.out.println(result.original);
		      }
		    }
		    executor.shutdown();
		  }
	  private static class LogEntry {
		    String original;
		    Future<String> future;
		    LogEntry(String original, Future<String> future) {
		     this.original = original;
		     this.future = future;
		    }
		  }
		}

不完全科学的统计,上述方法比第一种方法要快10-50倍!

但上面的程序还有一个设计上的downside!logfile可能会是很大很大,那queue就会很大,程序就会消耗很多很多内存!避免这个问题方法可以是,将output工作放在一个单独的线程中,和input共享一个queue,早先处理的entry可以先打印出来,不必等所有entry都放到queue后再去output。但这个会引起另一个问题,你需要一个单独的signal来告知output已经完成了,因为queue为空并不能保证output已经完成,最简单的办法是count input 的条数和output的条数一致!














  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值