用java做抓取的时候免不了要用到多线程的了,因为要同时抓取多个网站或一条线程抓取一个网站的话实在太慢,而且有时一条线程抓取同一个网站的话也比较浪费CPU资源。要用到多线程的等方面,也就免不了对线程的控制或用到线程池。 我在做我们现在的那一个抓取框架的时候,就曾经用过java.util.concurrent.ExecutorService作为线程池,关于ExecutorService的使用代码大概如下:
java.util.concurrent.Executors类的API提供大量创建连接池的静态方法:1.固定大小的线程池:
1
package
BackStage;
2
3
import
java.util.concurrent.Executors;
4
import
java.util.concurrent.ExecutorService;
5
6
public
class
JavaThreadPool {
7
public
static
void
main(String[] args) {
8
//
创建一个可重用固定线程数的线程池
9
ExecutorService pool
=
Executors.newFixedThreadPool(
2
);
10
//
创建实现了Runnable接口对象,Thread对象当然也实现了Runnable接口
11
Thread t1
=
new
MyThread();
12
Thread t2
=
new
MyThread();
13
Thread t3
=
new
MyThread();
14
Thread t4
=
new
MyThread();
15
Thread t5
=
new
MyThread();
16
//
将线程放入池中进行执行
17
pool.execute(t1);
18
pool.execute(t2);
19
pool.execute(t3);
20
pool.execute(t4);
21
pool.execute(t5);
22
//
关闭线程池
23
pool.shutdown();
24
}
25
}
26
27
class
MyThread
extends
Thread {
28
@Override
29
public
void
run() {
30
System.out.println(Thread.currentThread().getName()
+
"
正在执行。。。
"
);
31
}
32
}
复制代码
后来发现ExecutorService的功能没有想像中的那么好,而且最多只是提供一个线程的容器而然,所以后来我用改用了java.lang.ThreadGroup,ThreadGroup有很多优势,最重要的一点就是它可以对线程进行遍历,知道那些线程已经运行完毕,还有那些线程在运行。关于ThreadGroup的使用代码如下:
1
class
MyThread
extends
Thread {
2
boolean
stopped;
3
4
MyThread(ThreadGroup tg, String name) {
5
super
(tg, name);
6
stopped
=
false
;
7
}
8
9
public
void
run() {
10
System.out.println(Thread.currentThread().getName()
+
"
starting.
"
);
11
try
{
12
for
(
int
i
=
1
; i
<
1000
; i
++
) {
13
System.out.print(
"
.
"
);
14
Thread.sleep(
250
);
15
synchronized
(
this
) {
16
if
(stopped)
17
break
;
18
}
19
}
20
}
catch
(Exception exc) {
21
System.out.println(Thread.currentThread().getName()
+
"
interrupted.
"
);
22
}
23
System.out.println(Thread.currentThread().getName()
+
"
exiting.
"
);
24
}
25
26
synchronized
void
myStop() {
27
stopped
=
true
;
28
}
29
}
30
31
public
class
Main {
32
public
static
void
main(String args[])
throws
Exception {
33
ThreadGroup tg
=
new
ThreadGroup(
"
My Group
"
);
34
35
MyThread thrd
=
new
MyThread(tg,
"
MyThread #1
"
);
36
MyThread thrd2
=
new
MyThread(tg,
"
MyThread #2
"
);
37
MyThread thrd3
=
new
MyThread(tg,
"
MyThread #3
"
);
38
39
thrd.start();
40
thrd2.start();
41
thrd3.start();
42
43
Thread.sleep(
1000
);
44
45
System.out.println(tg.activeCount()
+
"
threads in thread group.
"
);
46
47
Thread thrds[]
=
new
Thread[tg.activeCount()];
48
tg.enumerate(thrds);
49
for
(Thread t : thrds)
50
System.out.println(t.getName());
51
52
thrd.myStop();
53
54
Thread.sleep(
1000
);
55
56
System.out.println(tg.activeCount()
+
"
threads in tg.
"
);
57
tg.interrupt();
58
}
59
}
复制代码
由以上的代码可以看出:ThreadGroup比ExecutorService多以下几个优势
1.ThreadGroup可以遍历线程,知道那些线程已经运行完毕,那些还在运行
2.可以通过ThreadGroup.activeCount知道有多少线程从而可以控制插入的线程数
转自: http://www.cnblogs.com/jimmy0756/archive/2011/04/18/2019439.html