用java做抓取的时候免不了要用到多线程的了,因为要同时抓取多个网站或一条线程抓取一个网站的话实在太慢,而且有时一条线程抓取同一个网站的话也比较浪费CPU资源。要用到多线程的等方面,也就免不了对线程的控制或用到线程池。 我在做我们现在的那一个抓取框架的时候,就曾经用过java.util.concurrent.ExecutorService作为线程池,关于ExecutorService的使用代码大概如下:
java.util.concurrent.Executors类的API提供大量创建连接池的静态方法:1.固定大小的线程池:
1 package BackStage;
2
3 import java.util.concurrent.Executors;
4 import java.util.concurrent.ExecutorService;
5
6 public class JavaThreadPool {
7 public static void main(String[] args) {
8 // 创建一个可重用固定线程数的线程池
9 ExecutorService pool = Executors.newFixedThreadPool( 2 );
10 // 创建实现了Runnable接口对象,Thread对象当然也实现了Runnable接口
11 Thread t1 = new MyThread();
12 Thread t2 = new MyThread();
13 Thread t3 = new MyThread();
14 Thread t4 = new MyThread();
15 Thread t5 = new MyThread();
16 // 将线程放入池中进行执行
17 pool.execute(t1);
18 pool.execute(t2);
19 pool.execute(t3);
20 pool.execute(t4);
21 pool.execute(t5);
22 // 关闭线程池
23 pool.shutdown();
24 }
25 }
26
27 class MyThread extends Thread {
28 @Override
29 public void run() {
30 System.out.println(Thread.currentThread().getName() + " 正在执行。。。 " );
31 }
32 }
后来发现ExecutorService的功能没有想像中的那么好,而且最多只是提供一个线程的容器而然,所以后来我用改用了java.lang.ThreadGroup,ThreadGroup有很多优势,最重要的一点就是它可以对线程进行遍历,知道那些线程已经运行完毕,还有那些线程在运行。关于ThreadGroup的使用代码如下:
1 class MyThread extends Thread {
2 boolean stopped;
3
4 MyThread(ThreadGroup tg, String name) {
5 super (tg, name);
6 stopped = false ;
7 }
8
9 public void run() {
10 System.out.println(Thread.currentThread().getName() + " starting. " );
11 try {
12 for ( int i = 1 ; i < 1000 ; i ++ ) {
13 System.out.print( " . " );
14 Thread.sleep( 250 );
15 synchronized ( this ) {
16 if (stopped)
17 break ;
18 }
19 }
20 } catch (Exception exc) {
21 System.out.println(Thread.currentThread().getName() + " interrupted. " );
22 }
23 System.out.println(Thread.currentThread().getName() + " exiting. " );
24 }
25
26 synchronized void myStop() {
27 stopped = true ;
28 }
29 }
30
31 public class Main {
32 public static void main(String args[]) throws Exception {
33 ThreadGroup tg = new ThreadGroup( " My Group " );
34
35 MyThread thrd = new MyThread(tg, " MyThread #1 " );
36 MyThread thrd2 = new MyThread(tg, " MyThread #2 " );
37 MyThread thrd3 = new MyThread(tg, " MyThread #3 " );
38
39 thrd.start();
40 thrd2.start();
41 thrd3.start();
42
43 Thread.sleep( 1000 );
44
45 System.out.println(tg.activeCount() + " threads in thread group. " );
46
47 Thread thrds[] = new Thread[tg.activeCount()];
48 tg.enumerate(thrds);
49 for (Thread t : thrds)
50 System.out.println(t.getName());
51
52 thrd.myStop();
53
54 Thread.sleep( 1000 );
55
56 System.out.println(tg.activeCount() + " threads in tg. " );
57 tg.interrupt();
58 }
59 }
由以上的代码可以看出:ThreadGroup比ExecutorService多以下几个优势
1.ThreadGroup可以遍历线程,知道那些线程已经运行完毕,那些还在运行
2.可以通过ThreadGroup.activeCount知道有多少线程从而可以控制插入的线程数