一个简单的线程池实现

     最近一直在写爬虫,于是写了以下一个简单的线程池。在这,抛砖引玉,忘大家多多指点。

     项目开始初,在查阅javaeye论坛中,曾看到一句这样的提示性设计:

             线程是被动,由别人(监工)分配,激发

             有一个主线程--“监工”:负责查看任务队列中是否有任务,如果有,取出一个任务,设置到一个“空闲”的线程中,并notify该线程。(http://www.iteye.com/topic/104432)

     在这,我也运用这个思想,详见下面的具体实现。

     先来看池程池类:

    

package jk.spider.core.task.threading;

import jk.spider.core.task.WorkerTask;
import jk.spider.core.task.dispatch.DispatcherTask;

import org.apache.log4j.Logger;

/**
 * 线程池
 * @author kqy
 * @date 2008-12-30
 * @version 2.0
 */

public class WorkerThreadPool extends ThreadGroup {
	protected static final Logger log = Logger.getLogger(WorkerThreadPool.class);
	protected DispatcherThread dispatcherThread;

	protected WorkerThread[] pool;

	protected int poolSize;

	public WorkerThreadPool(String poolName, String threadName, int poolSize) {
		super(poolName);
		this.poolSize = poolSize;
		//事件分发线程
		dispatcherThread = new DispatcherThread(this, threadName + " dispathcer");
		pool = new WorkerThread[poolSize];
		for (int i = 0; i < poolSize; i++) {
			pool[i] = new WorkerThread(this, threadName, i);
			synchronized (this) {
				try {
					//启动线程,随即让其休眠
					pool[i].start(); 
					wait();
				} catch (InterruptedException e) {
					Thread.currentThread().interrupt();
				}
			}
		}
	}

	/**
	 * 分配工作任务至线程池,线程池将选择工作线程执行任务
	 * 首先检查池程池中是否有空闲的工作线程,
	 * 如果存在,唤醒该线程; 否,休眠,直到有空闲线程时唤醒
	 * @param task
	 */
	public synchronized void assign(WorkerTask task) {
		while (true) {
			for (int i = 0; i < poolSize; i++) {
				if (pool[i].isAvailable()) {  //判断该线程是否可以分配任务
					pool[i].assign(task); //唤醒该线程
					return;
				}
			}
			try {
				wait();
			} catch (InterruptedException e) {
				Thread.currentThread().interrupt();
			}
		}
	}

	/**
	 * 分配分发任务至线程池
	 * 
	 * @param task
	 */
	public void assignGroupTask(DispatcherTask task) {
		dispatcherThread.assign(task);
	}

	/**
	 * 关闭线程池
	 * 
	 */
	public void stopAll() {
		for (int i = 0; i < pool.length; i++) {
			WorkerThread thread = pool[i];
			thread.stopRunning();
		}
	}

	public int getSize() {
		return poolSize;
	}
}

 

 

   下面再看工作线程:

   

package jk.spider.core.task.threading;

import jk.spider.core.task.WorkerTask;

import org.apache.log4j.Logger;

/**
 * 工作线程
 * @author kqy
 * @date 2008-12-30
 * @version 2.0
 */
public class WorkerThread extends Thread {
	protected static final Logger log = Logger.getLogger(WorkerThread.class);
	//空闲的线程
	public static final int WORKERTHREAD_IDLE = 0;
	//阻塞的
	public static final int WORKERTHREAD_BLOCKED = 1;
	//繁忙的
	public static final int WORKERHTREAD_BUSY = 2;
	
	protected int state;
	//是否有工作任务分配至该线程
	protected boolean assigned;
	//该线程是否被激活,正在运行
	protected boolean running;
	
	protected WorkerThreadPool pool;
	
	protected WorkerTask task;
	
	public WorkerThread(WorkerThreadPool pool, String name, int i) {
		super(pool, name + " " + i);
		this.pool = pool;
		running = false;
		assigned = false;
		state = WORKERTHREAD_IDLE;
	}
	
	/**
	 * 判断是否还可分配任务至该线程
	 * @return
	 */
	public boolean isAvailable() {
		return (!assigned) && running;
	}
	
	public boolean isOccupied() {
		return assigned;
	}
	
	/**
	 * 分配一个新的任务,并告知这个线程不接受任何新的任务
	 * @param task
	 */
	public synchronized void assign(WorkerTask task) {
		if(!running) {
			throw new RuntimeException("THREAD NOT RUNNING, CANNOT ASSIGN TASK !!!");
		}
		if(assigned) {
			throw new RuntimeException("THREAD ALREADY ASSIGNED !!!");
		}
		
		this.task = task;
		assigned = true;
		notify();
	}
	
	public int getStates() {
		return state;
	}
	
	public synchronized void run() {
		running = true;
		log.info("Worker thread ( " + this.getName() + " ) born...");
		
		synchronized(pool) {
			pool.notify(); //唤醒下一个线程
		}
		
		while(running) {
			if(assigned) {
				state = WORKERTHREAD_BLOCKED;
				task.prepare(); //前期准备
				state = WORKERHTREAD_BUSY;
				try {
					task.execute();//执行任务
				} catch (Exception e) {
					log.fatal("PANIC! Task " + task + " threw an excpetion!", e);
				}
			
				synchronized(pool) {
					assigned = false;
					task = null;
					state = WORKERTHREAD_IDLE;
					pool.notify(); //唤醒池程池中待分配的任务
					this.notify();
				}
			}
			try {
				wait();
			} catch (InterruptedException e) {
				Thread.currentThread().interrupt();
			}
		}
		log.info("Worker thread (" + this.getName() + ") dying");
	}
	
	/**
	 * 关闭所有线程
	 */
	public synchronized void stopRunning() {
		if( !running ) {
			throw new RuntimeException ("THREAD NOT RUNNING - CANNOT STOP !");
		}
		if ( assigned ) {
            try {
                this.wait();
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
            }
        }
        running = false;
        notify();
	}

	
}

 

   现还缺一个监工,负责分配,激发线程,这个就很简单的,主要是调用WorkerThreadPool 中的assign方法。

  

package jk.spider.core.task.threading;

import jk.spider.core.task.dispatch.DispatcherTask;

public class DispatcherThread extends Thread {
	protected DispatcherTask task;
	public DispatcherThread(ThreadGroup group, String name) {
		super(group, name);
	}
	
	public void assign(DispatcherTask task) {
		this.task = task;
		start();
	}
	
	public void run() {
		synchronized(task) {
			task.execute();
			task.notify();
		}
	}
}

 

   DispatcherTask 类,抽象类,因为在爬虫,我设计了任务分发器,都是从DispatcherTask中继承下来,在这,具体看负责分发抓取的任务分发类。

  

package jk.spider.core.task.dispatch;

import jk.spider.core.SpiderController;
import jk.spider.core.task.Task;
import jk.spider.core.task.WorkerTask;
import jk.spider.core.task.threading.WorkerThreadPool;

public abstract class DispatcherTask implements WorkerTask {
	protected SpiderController controller;
	protected WorkerThreadPool pool;
	protected boolean running;
	
	public DispatcherTask(SpiderController controller, WorkerThreadPool pool) {
		this.controller = controller;
		this.pool = pool;
		this.running = true;
	}
	
	public SpiderController getSpiderController() {
		return this.controller;
	}
	
	public void shutdown() {
		this.running = false;
	}
}

 

  DispatchSpiderTasks.java 具体分发任务,唤醒线程,化被动通知线程池执行为主动激发线程池中线程处理

 

package jk.spider.core.task.dispatch;

import jk.spider.core.SpiderController;
import jk.spider.core.task.WorkerTask;
import jk.spider.core.task.threading.WorkerThreadPool;

import org.apache.log4j.Logger;

public class DispatchSpiderTasks extends DispatcherTask {
	protected static final Logger log = Logger.getLogger(DispatchSpiderTasks.class);
	protected WorkerThreadPool spiders;

	public DispatchSpiderTasks(SpiderController controller, WorkerThreadPool spiders) {
		super(controller, spiders);
	}

	public int getType() {
		return WorkerTask.WORKERTASK_SPIDERTASK;
	}

	public void prepare() { }

	public void execute() {
		log.info("Spider task dispatcher running ...");
		while(running) {
			try {
				//从Scheduler中得到一个任务,队列是使用JDK1.5中的LinkedBlockingQueue
				spiders.assign(controller.getContent().getSpiderTask());
			} catch (InterruptedException e) {
				log.warn("DispatchSpiderTasks InterruptedException -> ", e);
				running = false;
			}
		}
		log.info("Spider task dispatcher dying ...");
	}
}

 

  Task.java  这个类,大家一看就会明白,我就不细说了。

 

package jk.spider.core.task;

/**
 * Spider Task 提供统一接口,并将任务添加至Scheduler
 * 由线程池从Scheduler中取任务
 * @author kqy
 *
 */


public interface Task {
	

	/**
	 * 执行任务,线程池将会调用该方法执行任务
	 */
	public void execute();
}

 

       写了一大篇幅,而现在JDK1.5以上都提供了线程池的实现,使用起来更加方法,但为了更扎实自己对线程的理解,在参考了一些资料自己现实该简单的线程池,收获不少,不再被其中的wait(),notify(),notifyAll()搞得晕头转向。

       实践才是真理啊...

       经过一段时间,爬虫也已完工。其中慢慢的不断改善,已具体爬虫必备的良好扩展及可配置。

       现回顾,总结,又是一次学习,发现其中很多的设计是那么的傻。

       水平未到家,继续需努力...

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值