https://www.tibcommunity.com/docs/DOC-2232
DATALOSS Advisories
DATALOSS advisories indicate network transmission problems. These advisories point to situations that defeat the Rendezvous reliable delivery protocols and require investigation.
.PTP indicates that a point-to-point message was lost.
.BCAST indicates that a broadcast message was lost.
DATALOSS.OUTBOUND
Clients of the sending daemon present DATALOSS.OUTBOUND advisories
Example:
{ADV_CLASS="ERROR" ADV_SOURCE="SYSTEM" ADV_NAME="DATALOSS.OUTBOUND.BCAST" ADV_DESC="dataloss: remote daemon asking for retransmission after we timed out the data" host="xx.xxx.x.xxx" lost=y}
DATALOSS.INBOUND
Clients of the receiving daemon present DATALOSS.INBOUND advisories.
Example:
{ADV_CLASS="ERROR" ADV_SOURCE="SYSTEM" ADV_NAME="DATALOSS.INBOUND.PTP" ADV_DESC="dataloss: remote daemon did not satisfy our retransmission requests" host="xx.x.x.xxx" lost=y}
FAQs
- Is the IP address in the advisory message for the sending or receiving daemon?
For DATALOSS.OUTBOUND advisories, the IP address will be the host of the receiving daemon and for DATALOSS.INBOUND advisories, the IP address will be the host of the sending daemon - the IP address will be the 'other' host involved.
- How to find out on which subject data has been lost?
As DATALOSS implies that we do not see the packet (it is based on a sequence number only), determining the subject for which the data has been lost is not possible.
DATALOSS advisories can only inform of a packet 'gap' and an associated source IP address, but not a subject.
Packets are assembled into messages, or messages are stripped out of packets. Once all packets for a message are present, a routine is used that retrieves the subject from the message. If a packet(s) is missed, and it is not possible to recover it/them due to data loss, then it is not possible to know what the subjects were that relate to the data loss.
Additionally, the sending transport (client) cannot be identified, as the protocol is, by design, anonymous. If non-anonymous (identified) behaviour is required, certified messaging can be used which identifies the transport (application endpoint) with a unique identifier.
- What does the ‘lost’ field in a DATALOSS advisory message represent?
The lost parameter represents the total number of packets lost.
- My Rendezvous daemon is running with a reliability parameter of 10 seconds instead of the recommended 60 seconds. Could this be related to the DATALOSS advisories I am seeing?
Yes. Decreasing retention time decreases reliability and increases the probability of lost data.
Document References
RV Concepts manual, Appendix A: System Advisory Messages: DATALOSS
Troubleshooting
The ADV_DESC field of a DATALOSS advisory message can provide some further details about why dataloss has occurred.
- The following description is included in the DATALOSS advisory: ADV_DESC="dataloss: unable to interpret incoming packet host=xxx.xxx.xx.xx" - what does this mean?
The above error indicates that there is dataloss due to the reason that Rendezvous cannot interpret the incoming packets from host id (xxx.xxx.xx.xx). If a message is not sent from a Rendezvous application, you could potentially receive this error.
- I am seeing one or more of the following descriptions in the ADV_DESC field of the DATALOSS advisory.
ADV_DESC="dataloss: remote daemon already timed out the data"
ADV_DESC="dataloss: remote daemon did not acknowledge our transmission"
ADV_DESC="dataloss: remote daemon asking for retransmission after we timed out the data"
ADV_DESC="dataloss: remote daemon did not satisfy our retransmission request(s)"
These descriptions indicate that data has been lost because the sending daemon no longer retains it.
- Potential causes of DATALOSS
There can be several causes for the receipt of DATALOSS advisories:
- Some hardware component is experiencing intermittent failure, for example: a faulty network card, loose connection, frayed wire....
- The network can be saturated.
- A daemon process is starved for CPU cycles, that is the computer is too heavily loaded or the priority of the daemon process is too low.
- The daemon is running with a "-reliability" parameter lower than 60 seconds.
Information to be sent to TIBCO Support
Please open a SR with TIBCO Support and upload the following:
1. Run "iniftst" (found under <TIBRV_HOME>/bin directory) on all the affected machines (daemons/clients) and capture the output.
2. Run tibrvlisten on the subject "_RV.>" for approximately 20 minutes.
3. Please send a raw packet capture using rvtrace or tcpdump sniffer tools:
- Using rvtrace (located in <TIBRV>/bin directory): 'rvtrace -w <capture file name>'. Note: without any license, rvtrace will stop capturing packets after 10 minute.
- Using tcpdump: 'tcpdump -s 250 -w <output_file_name>'
Documentation on rvtrace can be found in Chapter 12 of the RV Administration manual. Please review the sections on "Limitations" and "Performance Effects" before running rvtrace.
4. Run "netstat -s" twice, before and after running rvtrace, and submit the output for review.
5. Monitor and submit details of the Memory/CPU usage of the affected hosts
Copyright © TIBCO Software Inc. All rights reserved www.tibco.com |