小宝老豆的专栏

小宝老豆出品必出精品

ASE数据库大内存配置后服务进程直接down掉问题的解决

最近跟我司合作的某移动机房内某运营系统ASE数据库后台日志,频繁报出

The 8K memory pool of named cache default data cache (cache id 0, cachelet id 1) is co

nfigured too small for current demands (state 2). Transaction progress may cease or response time may increase.

信息,意思就是default data cache配置的的太小针对当前的需求,事物进程也许会停止或者响应时间增加。

在不进行JAVA应用进行优化的情况下,最直接的方式就是给数据库服务配置更高的内存。以下就是本次问题案例的处理过程:

1.先查看数据库服务器的物理总内存 (linux操作系统)

cat /proc/meminfo

查看到服务器物理总内存有32g

       

2.确认此服务器为纯单系统的数据库服务器,所有内存除了给操作系统用外就是给数据库服务使用


3.查看数据库服务配置的最大内存

sp_configure "max memory"

经查看只配了6g给数据库,工作上的纰漏啊,因为这部分部署安装配置是运维这边在做,犯了这么个低级错吴。


4.根据物理总内存配置规则,给数据库配25g总内存完全不成问题,给default cache配个20g,马上联系运维部门,

由于这两个参数调整是动态立即生效的,所以建议他们马上调整。


5.先调整数据库服务最大内存,从原来的5242880个2k内存调整到12582912个2k(24G),pass

sp_configure "max memory",12582912


6.调整“allocate max shared memory”,pass

sp_configure,“allocate max shared memory”,1


7.调整“default data cache"到16g,服务进程直接down掉,出现问题了

sp_cacheconfig "default data cache","16384M"

当时错误日志信息是:

 Allocating a shared memory segment of size 17510002688 bytes.

 os_create_region: shmget(0xf5035c7a): No space left on device

kbcreate: couldn't create server region 0.

Retrying shared memory allocation with smaller size 8755001344 bytes.

........................

看提示数据库系统本身智能式的在不停的尝试着用更小的共享内存分配


8.由于看到报错信息以及数据库服务进程的down掉,为了把影响用户使用性减少的最低,

立即修改数据库服务的外部配置文件(服务无法正常启动情况下修改此文件),

最大内存改为10g,default cache改为8g,重新启动数据库服务成功。


9.启动好后根据数据库后台日志进行分析,查看操作系统的sysctl.conf文件配置,

因为只有在操作系统的共享内存足够大情况下>=24g,才能使得数据库max memory 配的上24g,

进而使得数据库的default data chche配上足够大接近24g的容量

root用户下

vi /etc/sysctl.conf

kernel.core_uses_pid = 1
kernel.shmmax = 32212254720
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syn_retries = 2
net.ipv4.tcp_rmem = 32768
net.ipv4.tcp_wmem = 32768
net.ipv4.tcp_sack = 0

kernel.shmmax上看是已经足够大,支持30g了,但是shmmax内核参数定义单个共享内存段的最大值,

即单个进程的最大内存使用量,那么是否由于总得共享内存容量限制了呢,再看上面的配置内容,

没有kernel.shmall 参数设置。

kernel.shmall 参数是控制共享内存页数 。Linux 共享内存页大小为4KB, 共享内
存段的大小都是共享内存页大小的整数倍。一个共享内存段的最大大小是30G,那么需
要共享内存页数是 30GB/4KB=31457280KB/4KB=7864320 (页),也就是64Bit 系统下
30GB 物理内存,设置 kernel.shmall = 7864320 才符合要求(几乎是原来设置2097152
的4倍)。这时可以将shmmax 参数调整到 30G 了.

kernel.shmmni ----
shmmni 内核参数是共享内存段的最大数量(注意这个参数不是 shmmin,是
shmmni, shmmin 表示内存段最小大小 ) 。shmmni 缺省值 4096 ,一般肯定是够用了 。


那问题基本确定了,马上添加了

kernel.shmall = 268435456 --1T


10.开始重新调整数据库服务内存,再次调整max memory到24g,

default data cache到16g结果服务还是马上down掉,这个问题到底出现在哪儿


11.进行google,关于数据库后台问题报告参数(还是上次down掉同类报告)的问题查找,

结果查来查去还是在操作系统的kernel共享内存的原因上


12.后来想到是否会由于机器没有重新启动,想要获取的缓存已经都被其他进城占用锁住,

而无法被数据库征用了呢,马上进行相关方面的资料查询,包括操作系统,以及数据库系统的。


13.查找到了一则关于步骤7中出现问题os_create_region Errors 的官方文档:

os_create_region Errors

Error message text

os_create_region: shmget (0x%x): %s

os_create_region: Shared memory segment %d is in the way

os_create_region: uninitialized shared memory descriptor

os_create_region: shmat (%d): %s

os_create_region: can't allocate %d bytes

This error may be caused by a hardware problem.

Explanation

Adaptive Server uses the following functions to manage shared memory:

  • os_get_shmid - create a shared memory identifier

  • os_create_region - create a region based on a shared memory identifier

  • os_attach_region - attach to a region based on a shared memory identifier

  • os_detach_region - detach from (and delete) the shared region

  • os_format_shmid - format a shared memory identifier for printing

When os_create_region errors occur, Adaptive Server will not start.

The message texts shown here apply to UNIX systems only. Other operating systems raise slightly different errors.

os_create_region: shmget (0x%x): %s

This message is written to the error log when Adaptive Server fails to get a shared memory segment. In this message, %x is a shared memory key based on the shared memory identifier and %s is an operating system error message.

os_create_region: Shared memory segment %d is in the way

This error follows the shmget message and is also written to the Adaptive Server error log. A value of -1 for %dmeans the region does not exist.

os_create_region: uninitialized shared memory descriptor

During creation of a shared memory region, Adaptive Server attempts to validate the descriptor for the memory region. This message is written to the error log if the descriptor is found to be invalid.

os_create_region: shmat (%d): %s

This message is written to the error log when Adaptive Server fails to attach at an address. In this message, %dis the shared memory identifier and %s is an operating system error message.

os_create_region: can't allocate %d bytes

Adaptive Server was unable to allocate the number of bytes it requested for the shared memory region.

Action

  1. At the operating system level, check which shared memory processes are using and whether shared memory segments are being used by Adaptive Server.

    To check this on UNIX, run this command as the "sybase" user:

    % ipcs -m
    IPC status from workstation1 as of Fri May 26 14:08:25 1995
    T     ID     KEY         MODE     OWNER    GROUP
    Shared Memory:
    m    256 0x699b7e24 --rw-------   sybase   sybase
    m    257 0x699b7e25 --rw-------   sybase   sybase

    If shared memory segments are being used by Adaptive Server, reboot the operating system to clear shared memory or remove them using the ipcrm operating system command.

    Before removing the shared memory segments, identify the process that created them using the command "ipcs -ma" to make sure you only remove the appropriate segments.

  2. Check the $SYBASE directory to determine whether there are any *.krg or *.srg files left from an abnormal Adaptive Server exit. If any such files exist, delete them.

  3. os_create_region errors can occur when shared memory is not configured properly on your operating system. For information about configuring shared memory properly, refer to the Adaptive Server installation and configuration guide for your platform.

Shared Memory Error on Digital Unix

os_create_region: can't allocate %d bytes indicates that one or more kernel parameters needs to be reset. Logically, resetting shm-max should allow Adaptive Server to configure shared memory. However, other operating system kernel parameters also affect allocation. Consult your operating system documentation for details.

Additional information

Refer to the operating system man pages for the shget() and shmat() system calls.

Refer to the operating system man pages for ipcs and ipcrm.


根据以上说明删除了*.krg *.srg文件并进行服务器重新启动后,问题解决,配上了25g最大内存,18g default data cache




阅读更多
个人分类: SYBASE ASE
想对作者说点什么? 我来说一句

weblogic被锁解决方法

2015年01月30日 1KB 下载

没有更多推荐了,返回首页

加入CSDN,享受更精准的内容推荐,与500万程序员共同成长!
关闭
关闭