XFJ 2013-11-5

1.概要

。记录时间: 2013-11-5

。服务器: XFJ

。现象:

单据未送达,但发送方发送状态为"已送达".


2.分析与处理

在当天的日志中发现下列信息:

[2013-11-05 00:00:56:942][线程2484][1][20][0][]CBBoxPlugin::HandleInput_i 处理消息5:809(source:1 10146,dest:1 10170)...
[2013-11-05 00:00:56:942][线程2484][5][20][0][]CAPBase::GetOrg orgid=10170,ret=-30974,错误:DB_RUNRECOVERY: Fatal error, run database recovery.
[2013-11-05 00:00:56:942][线程2484][5][20][0][]CAPBase::GetOrg orgid=10170,ret=-30974,错误:DB_RUNRECOVERY: Fatal error, run database recovery.
[2013-11-05 00:00:56:942][线程2484][5][20][0][]CBBoxPlugin::HandleInput_i 消息5:809(source:1 10146,dest:1 10170)传输出现回路,丢弃此消息.
当出现回路时,本地将丢弃消息并在传输层确认,结果就是发送方已送达接收方没有数据。11.5的日志从一开始所有HandleInput_i调用GetOrg的地方都失败(没有全面检查,至少没有发现特例),直到15:05:40重起服务器后才正常。
这说明上述DB_RUNRECOVERY错误是不会自动恢复的。


DB_RUNRECOVERY是唯一的线索。
?如何产生的,是并发导致的吗
?能重现吗
?如何从此错误中恢复

此现象不是必然的,仅在XFJ系统上出现过2次。
与其它系统的差别是什么呢?
---系统环境差异:经检查系统和SP,库文件都没有什么特别的
---有什么是新增或修改的模块
既然是偶发性故障,其它服务器没有出现过也并不能完全排除一般性可能。
为此进行了并发测试,并没有出现同样的错误,测试代码见下文。暂时排除并发性问题。


DbEnv::open的flag参数为:DB_CREATE | DB_INIT_CDB | DB_INIT_MPOOL | DB_THREAD

关于DB_RUNRECOVERY
http://www-rohan.sdsu.edu/doc/BerkeleyDB/ref/program/errorret.html#DB_RUNRECOVERY   
There exists a class of errors that Berkeley DB considers fatal to an entire Berkeley DB environment. An example of this type of error is a corrupted database page. The only way to recover from these failures is to have all threads of control exit the Berkeley DB environment, run recovery of the environment, and re-enter Berkeley DB. (It is not strictly necessary that the processes exit, although that is the only way to recover system resources, such as file descriptors and memory, allocated by Berkeley DB.)

When this type of error is encountered, the error value DB_RUNRECOVERY is returned. This error can be returned by any Berkeley DB interface. Once DB_RUNRECOVERY is returned by any interface, it will be returned from all subsequent Berkeley DB calls made by any threads of control participating in the environment.

该系统有几个模块与其它服务器有差异:
。lm: 为了支持本项目,在2800-Indication协议处理激发BILL_ACCEPT事件时,在事件对象(CBillAcceptEvent类型)增加了
    CMsg *data_; ///< 单据报文
    激发时对消息报文增加引用计数:
    e->data_ = msg->Duplicate();    
    但没有实现析构函数,导致消息对象没有销毁
。sheet_catcher: 单据捕获模块
  按指定的条件截获单据报文,可对报文进行处理,支持lua脚本.
。dxi_ddd:数据库之间映射接口模块    
 
sheet_catcher中RegularString有内存泄漏,但该函数仅只有启动时才执行,不会持续性影响。
dxi_ddd没有发现内存泄漏的逻辑缺陷.

lm导致内存泄漏的代码:
    CBillAcceptEvent *e = new CBillAcceptEvent;
    e->orgid_ = dest_id_; ///< 接收方机构ID
    e->pub_orgid_ = sheet->GetSheetTypeInfo()->pub_orgid();
    e->sheet_type_ = sheet->GetSheetTypeInfo()->id();
    e->src_orgid_ = sheet->GetSrcOrgID();
    e->sheet_id_ = sheet->GetSheetID();
    e->data_ = msg->Duplicate();
    if (event_controller_->pulse(BILL_ACCEPT,e)) {

   

第7行:msg->Duplicate()后在CBillAcceptEvent释放时没有Release.

处理:
增加析构函数(lm_event.h中):    
    ~CBillAcceptEvent() {
        if (data_) data_->Release();
    }

3.并发测试代码

首先对org_db进行新增,修改, 查询,删除的多线程并发操作,连续运行后没有发现问题,增加对user_db的操作,测试也没有发现问题。

ACE_Thread_Manager g_thr_mgr;
ACE_THR_FUNC_RETURN test1_proc(void *arg) {
	short flag = (short)arg;
	const unsigned int org_num = 100;

	int k=0;
	do {
	CAPPlugin *ap = CAPPlugin::instance();
	switch(flag) {
		case 1: ///< 增加
			for (int i=0;i<org_num;i++) {
				ORGINFO *org = dynamic_cast<ORGINFO*>(ap->GetOrg(i+1));
				if (org) {
					org->Release();
					continue;
				}
				org = new ORGINFO;
				org->org_id_ = i+1;
				if (ap->AddOrg(org)) {
				//	ap->nlogger_->log(LO_STDOUT|LO_FILE,SEVERITY_ERROR,"AddOrg失败.\n");
				}
				USERINFO *user = dynamic_cast<USERINFO*>(ap->GetUser(org->org_id_*10000+i));
				if (user) {
					user->Release();
					continue;
				}
				user = new USERINFO;
				user->user_serial_ = org->org_id_*10000+i;
				ap->AddUser(user);
			}
			break;
		case 2:///< 查询
			{
			srand(GetTickCount());
			unsigned long orgid = rand()%200;
			IORGINFO *org = ap->GetOrg(orgid);
			if (org==0) {
				//	ap->nlogger_->log(LO_STDOUT|LO_FILE,SEVERITY_ERROR,"GetOrg未找到%d机构.\n",orgid);
			}
			else {
				org->Release();
				unsigned long uid = rand()%10000;
				unsigned long userserial = orgid*10000+uid;
				IUSERINFO *user = ap->GetUser(userserial);
				if (user)
					user->Release();
			}
			
			}
			
			break;
		case 3:///< 修改
			{
				srand(GetTickCount());
				unsigned long orgid = rand()%200;
				IORGINFO *org = ap->GetOrg(orgid);
				if (org) {
					ap->UpdateOrg(dynamic_cast<ORGINFO*>(org));
					org->Release();
					unsigned long uid = rand()%10000;
					unsigned long userserial = orgid*10000+uid;
					IUSERINFO *user = ap->GetUser(userserial);
					if (user) {
						ap->UpdateUser(user);
						user->Release();
					}
				}

			}
			break;
		case 4: ///< 删除
			{
				srand(GetTickCount());
				unsigned long orgid = rand()%200;
				ap->RemoveOrg(orgid);

				unsigned long uid = rand()%10000;
				unsigned long userserial = orgid*10000+uid;
				ap->RemoveUserBySerial(userserial);
			}
			break;
	}
	if (++k==5) {
		Sleep(100);
		k = 0;
			
	}
	} while(1);


	return 0;
}

void CAPPlugin::test() {
	///< 增加,查询,修改,删除操作的线程个数
	int thr_num[] = {30,30,10,10};
	for (int i=0;i<sizeof(thr_num)/sizeof(thr_num[0]);i++) {
		g_thr_mgr.spawn_n(thr_num[i],test1_proc,(void*)(i+1));
	}
	return ;
}



  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值