简易网络爬虫程序的开发(4)(c#版)

 

AbsThreadManager 类:

AbsThreadManager的主要功能是管理开启WorkThread工作线程,与监控线线程的,WorkThread对象与Thread对象一一对应,这两个对象都被封在ObjThread对象里,先看看ObjThread源码:

namespace WebSpider
{
    internal class ObjThread
    {
        private WorkThread _workThread;

        private System.Threading.Thread _thread;

        internal WorkThread WorkThread { get { return _workThread; } set { _workThread = value; } }

        internal System.Threading.Thread Thread { get { return _thread; } set { _thread = value; } }

    }
}

ObjThread类是非常的简单的,只有一个Thread对象与一个WorkThread对象.

在AbsThreadManagers中用List<ObjThread>来维护一系列的线程对象与WorkThread对象,同时在AbsThreadManagers中增加了一个监控线程,用来查看工作线程的工作线程,若工作线程死去,由监控线程重新启动。源码如下:

namespace WebSpider
{
    public abstract class AbsThreadManager
    {
        public int _maxThread = Convert.ToInt32(System.Configuration.ConfigurationManager.AppSettings["MaxCount"]);

        internal List<ObjThread> list = new List<ObjThread>();

        private bool _isRun = false;

        private System.Threading.Thread _watchThread = null;

        /// <summary>
        /// 当前深度
        /// </summary>
        public int Current { get { return UrlStack.Instance.Count; } }

        /// <summary>
        /// 开启服务
        /// </summary>
        /// <param name="url">种子URL</param>
        public void Start(string url)
        {
            UrlStack.Instance.Push(url);

            _isRun = true;

            for (int i = 0; i < _maxThread; i++)
            {
                AddObjThread();
            }
            _watchThread = new System.Threading.Thread(Watch);
            _watchThread.Start();
        }


        private void AddObjThread()
        {
            ObjThread thread = new ObjThread();
            thread.WorkThread = new WorkThread();
            thread.WorkThread.ChainMain.SetProcessHandler(GetChainHeader());
            thread.Thread = new System.Threading.Thread(thread.WorkThread.Start);
            list.Add(thread);
            thread.Thread.Start();
        }

        /// <summary>
        /// 停止服务
        /// </summary>
        public void Stop()
        {
            _isRun = false;
            _watchThread.Join();
            foreach (ObjThread obj in list)
            {
                obj.WorkThread.Stop();
                obj.Thread.Abort();
                obj.Thread.Join();
            }
            list.RemoveRange(0, list.Count);

        }

        /// <summary>
        /// 设置职责链头节点
        /// </summary>
        /// <returns>返回用户定义的Chain</returns>
        protected abstract AbsChain GetChainHeader();

        internal void Watch()
        {
            List<ObjThread> newList = new List<ObjThread>();
            while (_isRun)
            {
                try
                {
                    foreach (ObjThread temp in list)
                    {
                        if (temp.WorkThread.IsRun && temp.Thread.IsAlive)
                        {
                            newList.Add(temp);
                        }
                    }
                    list.RemoveRange(0, list.Count);

                    list.AddRange(newList);

                    int newCount = _maxThread - list.Count;

                    for (int i = 0; i < newCount; i++)
                    {
                        AddObjThread();
                    }
                    newList.RemoveRange(0, newList.Count);

                    System.Threading.Thread.Sleep(5 * 1000);

                }
                catch
                {
                }
            }
        }
    }
}

在这个类中只有一个抽象方法 protected abstract AbsChain GetChainHeader(),用户通过重定义GetChainHeader返加一个继承了AbsChain类的对象,这个对象将会被设置到ChainMain的_handler中。

至此,Spider程序集中的对有类都介绍完了,下一篇将会利用这个程序集来完成一个完整的蜘蛛程序

未完,待续……

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值