In VxWorks 5.5 shell, we could use the following tool to set hardware breakpoint:
-> bh address, access, task, count, quiet
access: 0 - instruction,
1 - read/write data,
2 - read data,
3 - write data
For example, if you want to monitor the data write to the address 0x27b5600, you could use:
-> bh 0x27b5600, 3, 0, 0, 0
When any tasks try to write data to the address 0x27b5600, it will break and the related task will be suspended.
Here is an example on how to debug stack overflow using the hardware breakpoint. It is related to an IPv6 CR, which is good for demonstration.
---------------------
1. Background
---------------------
In IPv6, when an interface is configured with a new address, the switch would send out a NS message to determine if the given address has been used by another switch.
If yes, the switch would get a response NA message, then it would give up the given address. This process is called DAD(duplicate address detection). DAD is performed for both IPv6 management interface and the other general IPv6 interfaces.
----------------
2. Problem
----------------
When the tester assigns the duplicate IPv6 management address on the different switches, she gets the following error message:
SW WARNING checkStack: task: 2 tid: 0x27699a8 name: tNetTask size: 9984 cur: 248 high: 9984 margin: 0
It means that the task tNetTask is overflow or is corrupted in the processing of the incoming DAD NA message.
----------------------
3. Investigation
----------------------
This issue might be caused by stack overflow or corruption, we need reproduce it and analyze the stack information.
Step (1): Make the related tasks breakable. Since the tNetTask is overflow in this case, we make it first.
In the shell, run the following command:
-> taskOptionsSet(tNetTask, 7, 5)
Step (2): Select the address to be monitored.
We need select an address in the stack of tNetTask as the one to be monitored.
In the shell, we could use the following command to get some general stack information of the task tNetTask.
-> ti tNetTask
---------------------------------------------------------------------------------------------------------------
NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY
------------- ----------- -------- ---- ------------ ------- -------- ------- -----
tNetTask netTask 2692518 50 READY 1423c0 2692420 0 0
stack: base 0x2692518 end 0x268fe08 size 9984 high 2344 margin 7640
options: 0x5
VX_SUPERVISOR_MODE VX_DEALLOC_STACK
VxWorks Events
--------------
Events Pended on : Not Pended
Received Events : 0x0
Options : N/A
r0 = 0 sp = 2692420 r2 = 0 r3 = 0
r4 = 0 r5 = 0 r6 = 0 r7 = 0
r8 = 0 r9 = 0 r10 = 0 r11 = 0
r12 = 0 r13 = 0 r14 = 0 r15 = 0
r16 = 0 r17 = 0 r18 = 0 r19 = 0
r20 = 0 r21 = 0 r22 = 0 r23 = 0
r24 = 0 r25 = 0 r26 = 0 r27 = 0
r28 = 0 r29 = ffffffff r30 = b030 r31 = 17e0700
msr = b030 lr = 0 ctr = 0 pc = 1423c0
cr = 20000043 xer = 0
value = 0 = 0x0
-------------------------------------------------------------------------------------------------------------
As we can see, the stack end address is 0x268fe08. Let us display the memory nearby this address.
-> d 0x268fe08, 20, 4
-------------------------------------------------------------------------------------------------
0268fe00: 744e6574 5461736b * tNetTask*
0268fe10: 00eeeeee eeeeeeee eeeeeeee eeeeeeee *................*
0268fe20: eeeeeeee eeeeeeee eeeeeeee eeeeeeee *................*
0268fe30: eeeeeeee eeeeeeee eeeeeeee eeeeeeee *................*
0268fe40: eeeeeeee eeeeeeee eeeeeeee eeeeeeee *................*
0268fe50: eeeeeeee eeeeeeee *................*
value = 21 = 0x15
--------------------------------------------------------------------------------------------------
As it is shown above, the tNetTask's name is saved at its stack end address. Normally, it should not be changed except for stack overflow or corruption. Let us select this address as the one to be monitored.
-> bh 0x268fe08,3,0,0,0
Step (3): Reproduce the problem
When I reproduce the problem, it breaks by the hardware breakpoint with the following information:
------------------------------------------------------------------------------------------------------------------------------------------------
Break at 0x0268fe08: G_MacAddrCapacity+0x4933c0 Task: 0x2692518 (tNetÞ®/}DìWò¸°:Ú7ðPð)
------------------------------------------------------------------------------------------------------------------------------------------------
It is obviously that the address 0x268fe08 is corrupted by tNetTask itself. I could guess that the problem is not caused by the stack corruption. But I still need dump and analyze the satck information to confirm and to find out the reason for the stack overflow.
Step (4): Dump and Analyze the stack of tNetTask
This time, we can not display the information of tNetTask using "ti tNetTask" as before, since the stack end part has been corrupted.
-> ti tNetTask
----------------------------------------
Undefined symbol: tNetTask
-----------------------------------------
We could try its TID. The TID of tNetTask is given in Step (4), 0x2692518. We could also get the TID using command "i".
-> ti 0x2692518
----------------------------------------------------------------------------------------------------------------------
NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY
---------- ------------ -------- --- ---------- -------- -------- ------- -----
tNetÞ®/}DnetTask 2692518 50 SUSPEND a0d08 268f8b0 0 0
stack: base 0x2692518 end 0x268fe08 size 9984 high 9984 margin 0
options: 0x5
VX_SUPERVISOR_MODE VX_DEALLOC_STACK
VxWorks Events
--------------
Events Pended on : Not Pended
Received Events : 0x0
Options : N/A
r0 = ba78c4 sp = 268f8b0 r2 = 0 r3 = 12be6e8
r4 = 268fe0c r5 = 412 r6 = 0 r7 = 3e07841c
r8 = 0 r9 = 1520000 r10 = 14c r11 = 0
r12 = 0 r13 = 0 r14 = 0 r15 = 0
r16 = 0 r17 = 0 r18 = 0 r19 = 124d1b8
r20 = 2690b40 r21 = 420 r22 = 124d1bc r23 = 2690e60
r24 = 0 r25 = 0 r26 = 2690d40 r27 = 4
r28 = 268f930 r29 = 268f930 r30 = 15235a8 r31 = 2690d60
msr = b030 lr = 107a04 ctr = 137 pc = a0d08
cr = 20842043 xer = 0
value = 0 = 0x0
----------------------------------------------------------------------------------------------------------------------
We can see that tNetTask is suspended by the hardware breakpoint. The sp register has the top stack frame address, it has the value 0x268f8b0, which is lower than the stack end address 0x268fe08. The stack grows from high address to low address.
VxWorks has a shell tool to do stack trace on task:
-> tt 0x2692518
--------------------------------------------------
trcStack aborted: error in top frame
--------------------------------------------------
In our case, It doesn't work since the overflow part of the stack might be corrupted by other tasks. I have to dump the call stack by myself.
-> d 0x268f8b0, 50, 4
--------------------------------------------------------------------------------------------------
0268f8b0: 0268f8d0 00000000 00000000 00000000 *.h..............*
0268f8c0: 00000000 0268f930 015235a8 02690d60 *.....h.0.R5..i.`*
0268f8d0: 0268f910 00ba78c4 00000000 00000000 *.h....x.........*
0268f8e0: 00000000 00000000 00000000 00000000 *................*
0268f8f0: 00000000 00000000 02690d40 02690e60 *.........i.@.i.`*
0268f900: 0268f930 0268f920 02690d60 02690d60 *.h.0.h. .i.`.i.`*
0268f910: 026909a0 004ca2ec 00000000 00000000 *.i...L..........*
0268f920: 00000000 00000000 00000000 00000000 *................*
0268f930: 00000000 00000000 00000000 00000000 *................*
0268f940: 00000000 00000000 00000000 00000000 *................*
0268f950: 00000000 00000000 00000000 00000000 *................*
0268f960: 00000000 00000000 00000000 00000000 *................*
0268f970: 00000000 00000000 *................*
value = 21 = 0x15
--------------------------------------------------------------------------------------------------
The data at address 0x268f8b0 has the value 0x0268f8d0, which is the address of the next level stack frame. Let us analyze this stack frame:
-------------------------------------------------------------------------------------------------
0268f8d0: 0268f910 00ba78c4 00000000 00000000 *.h....x.........*
-------------------------------------------------------------------------------------------------
The data at address 0x0268f8d4 is the return address. We could find the related function it belongs to.
-> lkAddr 0x00ba78c4
----------------------------------------------------------
0x00ba780c BF_set_key text
0x00ba7a30 BIO_new text
0x00ba7ac8 BIO_set text
0x00ba7b80 BIO_free text
0x00ba7c50 BIO_read text
0x00ba7d8c BIO_write text
0x00ba7efc BIO_puts text
0x00ba8014 BIO_gets text
0x00ba813c BIO_int_ctrl text
0x00ba8164 BIO_ptr_ctrl text
0x00ba81a0 BIO_ctrl text
0x00ba82b8 BIO_callback_ctrl text
value = 0 = 0x0
-----------------------------------------------------------
So, it belongs to the function BF_set_key. Using the similar method, we finally could get the whole call stack as follows:
-------------------------------------
vxTaskEntry()
netTask()
dec21x40RxIntHandle()
dec21x40Recv()
endRcvRtnCall()
muxReceive()
endEtherInputHookRtn()
rcip6InputSniffer()
ipv6ProcessFrame()
ifyDipRx()
processIngressPacket()
ifyRpcInProcLocalPkt()
v6ProcLocalPkt()
v6InnerProcLocalPtk()
v6NdRx()
v6procNbrAdv()
ifyDADComplete()
duReport()
bf_encrypt_NP_info()
BF_set_key()
------------------------------------
---------------------
4. Root Cause
---------------------
According to some investigation, the call stack itself has no errors. But when I look into the code of the function bf_encrypt_NP_info, I find it declares a huge local struct data as follows:
int bf_encrypt_NP_info(const unsigned char *inText, char *retText)
{
char iv[8];
int enc_data_length=0;
BF_KEY key;
…
}
typedef struct bf_key_st
{
BF_LONG P[BF_ROUNDS+2];
BF_LONG S[4*256]; --> 4*4*256 = 4096 bytes
} BF_KEY;
In Step (2), we could see that the stack size for tNetTask is only 9984, which is much less than that of tMainTask(81232). When the function bf_encrypt_NP_info is called, its local parameters run out of the free space of the stack, which makes it overflow.
-> bh address, access, task, count, quiet
access: 0 - instruction,
1 - read/write data,
2 - read data,
3 - write data
For example, if you want to monitor the data write to the address 0x27b5600, you could use:
-> bh 0x27b5600, 3, 0, 0, 0
When any tasks try to write data to the address 0x27b5600, it will break and the related task will be suspended.
Here is an example on how to debug stack overflow using the hardware breakpoint. It is related to an IPv6 CR, which is good for demonstration.
---------------------
1. Background
---------------------
In IPv6, when an interface is configured with a new address, the switch would send out a NS message to determine if the given address has been used by another switch.
If yes, the switch would get a response NA message, then it would give up the given address. This process is called DAD(duplicate address detection). DAD is performed for both IPv6 management interface and the other general IPv6 interfaces.
----------------
2. Problem
----------------
When the tester assigns the duplicate IPv6 management address on the different switches, she gets the following error message:
SW WARNING checkStack: task: 2 tid: 0x27699a8 name: tNetTask size: 9984 cur: 248 high: 9984 margin: 0
It means that the task tNetTask is overflow or is corrupted in the processing of the incoming DAD NA message.
----------------------
3. Investigation
----------------------
This issue might be caused by stack overflow or corruption, we need reproduce it and analyze the stack information.
Step (1): Make the related tasks breakable. Since the tNetTask is overflow in this case, we make it first.
In the shell, run the following command:
-> taskOptionsSet(tNetTask, 7, 5)
Step (2): Select the address to be monitored.
We need select an address in the stack of tNetTask as the one to be monitored.
In the shell, we could use the following command to get some general stack information of the task tNetTask.
-> ti tNetTask
---------------------------------------------------------------------------------------------------------------
NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY
------------- ----------- -------- ---- ------------ ------- -------- ------- -----
tNetTask netTask 2692518 50 READY 1423c0 2692420 0 0
stack: base 0x2692518 end 0x268fe08 size 9984 high 2344 margin 7640
options: 0x5
VX_SUPERVISOR_MODE VX_DEALLOC_STACK
VxWorks Events
--------------
Events Pended on : Not Pended
Received Events : 0x0
Options : N/A
r0 = 0 sp = 2692420 r2 = 0 r3 = 0
r4 = 0 r5 = 0 r6 = 0 r7 = 0
r8 = 0 r9 = 0 r10 = 0 r11 = 0
r12 = 0 r13 = 0 r14 = 0 r15 = 0
r16 = 0 r17 = 0 r18 = 0 r19 = 0
r20 = 0 r21 = 0 r22 = 0 r23 = 0
r24 = 0 r25 = 0 r26 = 0 r27 = 0
r28 = 0 r29 = ffffffff r30 = b030 r31 = 17e0700
msr = b030 lr = 0 ctr = 0 pc = 1423c0
cr = 20000043 xer = 0
value = 0 = 0x0
-------------------------------------------------------------------------------------------------------------
As we can see, the stack end address is 0x268fe08. Let us display the memory nearby this address.
-> d 0x268fe08, 20, 4
-------------------------------------------------------------------------------------------------
0268fe00: 744e6574 5461736b * tNetTask*
0268fe10: 00eeeeee eeeeeeee eeeeeeee eeeeeeee *................*
0268fe20: eeeeeeee eeeeeeee eeeeeeee eeeeeeee *................*
0268fe30: eeeeeeee eeeeeeee eeeeeeee eeeeeeee *................*
0268fe40: eeeeeeee eeeeeeee eeeeeeee eeeeeeee *................*
0268fe50: eeeeeeee eeeeeeee *................*
value = 21 = 0x15
--------------------------------------------------------------------------------------------------
As it is shown above, the tNetTask's name is saved at its stack end address. Normally, it should not be changed except for stack overflow or corruption. Let us select this address as the one to be monitored.
-> bh 0x268fe08,3,0,0,0
Step (3): Reproduce the problem
When I reproduce the problem, it breaks by the hardware breakpoint with the following information:
------------------------------------------------------------------------------------------------------------------------------------------------
Break at 0x0268fe08: G_MacAddrCapacity+0x4933c0 Task: 0x2692518 (tNetÞ®/}DìWò¸°:Ú7ðPð)
------------------------------------------------------------------------------------------------------------------------------------------------
It is obviously that the address 0x268fe08 is corrupted by tNetTask itself. I could guess that the problem is not caused by the stack corruption. But I still need dump and analyze the satck information to confirm and to find out the reason for the stack overflow.
Step (4): Dump and Analyze the stack of tNetTask
This time, we can not display the information of tNetTask using "ti tNetTask" as before, since the stack end part has been corrupted.
-> ti tNetTask
----------------------------------------
Undefined symbol: tNetTask
-----------------------------------------
We could try its TID. The TID of tNetTask is given in Step (4), 0x2692518. We could also get the TID using command "i".
-> ti 0x2692518
----------------------------------------------------------------------------------------------------------------------
NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY
---------- ------------ -------- --- ---------- -------- -------- ------- -----
tNetÞ®/}DnetTask 2692518 50 SUSPEND a0d08 268f8b0 0 0
stack: base 0x2692518 end 0x268fe08 size 9984 high 9984 margin 0
options: 0x5
VX_SUPERVISOR_MODE VX_DEALLOC_STACK
VxWorks Events
--------------
Events Pended on : Not Pended
Received Events : 0x0
Options : N/A
r0 = ba78c4 sp = 268f8b0 r2 = 0 r3 = 12be6e8
r4 = 268fe0c r5 = 412 r6 = 0 r7 = 3e07841c
r8 = 0 r9 = 1520000 r10 = 14c r11 = 0
r12 = 0 r13 = 0 r14 = 0 r15 = 0
r16 = 0 r17 = 0 r18 = 0 r19 = 124d1b8
r20 = 2690b40 r21 = 420 r22 = 124d1bc r23 = 2690e60
r24 = 0 r25 = 0 r26 = 2690d40 r27 = 4
r28 = 268f930 r29 = 268f930 r30 = 15235a8 r31 = 2690d60
msr = b030 lr = 107a04 ctr = 137 pc = a0d08
cr = 20842043 xer = 0
value = 0 = 0x0
----------------------------------------------------------------------------------------------------------------------
We can see that tNetTask is suspended by the hardware breakpoint. The sp register has the top stack frame address, it has the value 0x268f8b0, which is lower than the stack end address 0x268fe08. The stack grows from high address to low address.
VxWorks has a shell tool to do stack trace on task:
-> tt 0x2692518
--------------------------------------------------
trcStack aborted: error in top frame
--------------------------------------------------
In our case, It doesn't work since the overflow part of the stack might be corrupted by other tasks. I have to dump the call stack by myself.
-> d 0x268f8b0, 50, 4
--------------------------------------------------------------------------------------------------
0268f8b0: 0268f8d0 00000000 00000000 00000000 *.h..............*
0268f8c0: 00000000 0268f930 015235a8 02690d60 *.....h.0.R5..i.`*
0268f8d0: 0268f910 00ba78c4 00000000 00000000 *.h....x.........*
0268f8e0: 00000000 00000000 00000000 00000000 *................*
0268f8f0: 00000000 00000000 02690d40 02690e60 *.........i.@.i.`*
0268f900: 0268f930 0268f920 02690d60 02690d60 *.h.0.h. .i.`.i.`*
0268f910: 026909a0 004ca2ec 00000000 00000000 *.i...L..........*
0268f920: 00000000 00000000 00000000 00000000 *................*
0268f930: 00000000 00000000 00000000 00000000 *................*
0268f940: 00000000 00000000 00000000 00000000 *................*
0268f950: 00000000 00000000 00000000 00000000 *................*
0268f960: 00000000 00000000 00000000 00000000 *................*
0268f970: 00000000 00000000 *................*
value = 21 = 0x15
--------------------------------------------------------------------------------------------------
The data at address 0x268f8b0 has the value 0x0268f8d0, which is the address of the next level stack frame. Let us analyze this stack frame:
-------------------------------------------------------------------------------------------------
0268f8d0: 0268f910 00ba78c4 00000000 00000000 *.h....x.........*
-------------------------------------------------------------------------------------------------
The data at address 0x0268f8d4 is the return address. We could find the related function it belongs to.
-> lkAddr 0x00ba78c4
----------------------------------------------------------
0x00ba780c BF_set_key text
0x00ba7a30 BIO_new text
0x00ba7ac8 BIO_set text
0x00ba7b80 BIO_free text
0x00ba7c50 BIO_read text
0x00ba7d8c BIO_write text
0x00ba7efc BIO_puts text
0x00ba8014 BIO_gets text
0x00ba813c BIO_int_ctrl text
0x00ba8164 BIO_ptr_ctrl text
0x00ba81a0 BIO_ctrl text
0x00ba82b8 BIO_callback_ctrl text
value = 0 = 0x0
-----------------------------------------------------------
So, it belongs to the function BF_set_key. Using the similar method, we finally could get the whole call stack as follows:
-------------------------------------
vxTaskEntry()
netTask()
dec21x40RxIntHandle()
dec21x40Recv()
endRcvRtnCall()
muxReceive()
endEtherInputHookRtn()
rcip6InputSniffer()
ipv6ProcessFrame()
ifyDipRx()
processIngressPacket()
ifyRpcInProcLocalPkt()
v6ProcLocalPkt()
v6InnerProcLocalPtk()
v6NdRx()
v6procNbrAdv()
ifyDADComplete()
duReport()
bf_encrypt_NP_info()
BF_set_key()
------------------------------------
---------------------
4. Root Cause
---------------------
According to some investigation, the call stack itself has no errors. But when I look into the code of the function bf_encrypt_NP_info, I find it declares a huge local struct data as follows:
int bf_encrypt_NP_info(const unsigned char *inText, char *retText)
{
char iv[8];
int enc_data_length=0;
BF_KEY key;
…
}
typedef struct bf_key_st
{
BF_LONG P[BF_ROUNDS+2];
BF_LONG S[4*256]; --> 4*4*256 = 4096 bytes
} BF_KEY;
In Step (2), we could see that the stack size for tNetTask is only 9984, which is much less than that of tMainTask(81232). When the function bf_encrypt_NP_info is called, its local parameters run out of the free space of the stack, which makes it overflow.