UDP configuration:

UDP protocol stack will bundle message into large ones and send them together to reduce network overhead. Relevant configurations are:

  • bundler_type:   old  -> DefaultBundler   use a TimerScheduler to send the message.  new ->TransferQueueBundler   use an internal LinkedBlockingQueue to hold the message  Add message to queue unless total message size reaches max_bundle_size or queue is 90% full (bundler_capacity)

  • max_bundle_timeout  useful only for the DefaultBundler, this is the scheduling delay 

  • mcast_send_buf_size, mcast_recv_buf_size, ucast_send_buf_size, ucast_recv_buf_size   unicast or multicast socket sender/receiver buffer size.

  • timer_type, timer_min_threads, timer_max_threads, timer.keep_alive_time,  timer.queue_max_size  Timer thread pool is used to perform scheduled task, like bundling

  • thread_pool.min_threads, thread_pool.max_threads, thread_pool.keep_alive_time, thread_pool.queue_enabled, thread_pool.queue_max_size, thread_pool.rejection_policy    Thread pool is used to handle received batch of message and execute BatchHandler

  • enable_batching:  directly pass a batch of messages up, instead of processing them one by one, default to true


Other protocol stacks:

  • FD: failure detection

  • pbcast.NAKACK2, UNICAST:  message reliability, use an internal sliding window to ensure that messages are delivered in order

  • STABLE: message stability, ensures that message are seen to every member in the cluster, periodically broadcast latest stable message


Monitoring:


java  -cp  jgroups-xxx.jar   -Djava.net.preferIPv4Stack=true  org.jgroups.tests.Probe   jmx=[Protocol name]

You can view  num_msgs_sent, num_msgs_received, num_bytes_sent, num_bytes_received, num_rejected_msgs,  num_oob_msgs_received etc



JGroups optimization for invalidation usage:

  • the JGroups config shipped by default is simply taken directly from the JGroups documentation. It does not take into account our requirements, or improvements in JGroups and provides functionality we do not need.


  • Flow Control: We use MFC and UFC and by default allow them 4M credits. This seems to adversely affect the performance of import and catalog sync, as the system soon blocks waiting for more credits. Disabling Flow Control floods the network and quickly causes problems. Testing with 200M found that CPU utilisation was able to climb to 100%, but then timeouts on cluster sync and heartbeat messages caused the importing node to be evicted from the cluster, also causing problems. Testing with *40M* produced much increased performance over 4M and no errors reported. Recommendation:

    <UFC max_credits="40M" min_threshold="0.4" /><MFC max_credits="40M" min_threshold="0.4" />


  • BARRIER and pbcast.STATE_TRANSFER: Not useful for invalidation usage (order is not important) so they can be removed

    .

  • MERGE: We use MERGE2, more recent MERGE3 should be used as it uses a more efficient algorithm.


  • FRAG2: We set 60K by default, but this can be optimised by using the network's max frame size.


  • UDP:
    Set your send and receive buffer sizes to match your operating system settings.
    Set your timer type to "new3"
    Ensure thread_pool.enabled="true" and thread_pool.queue_enabled="true" are set. Make your queue massive, 1000000 is fine, or more. Increase number of threads to 2xCores on box.
    Bundling. This is very important and max_bundle_timeout="5" or less should be set, as 30 seconds is a very long time between invalidation messages. max_bundle_size should be slightly bigger than your FRAG2 frag_size setting.


To simulate 2 network interfaces on my centos virtual server I have defined an alias for my eth0 interface. In a real customer environment skip this step assuming you already have the dedicated network interface for cluster messages configured.

sudo vi /etc/sysconfig/network-scripts/ifcfg-eth0:0

#I have pasted the lines below. Make sure you change these settings according to your local network settings:

DEVICE=eth0:0
NETWORK=10.5.0.0
NETMASK=255.255.255.0
IPADDR=10.5.0.246

#end of ifcfg-etg0:0 settings

sudo ifup eth0:0

ifconfig #verify you have an eth0:0 interface


JGROUPS UDP Setting:
mcast_addr="230.0.0.1"
bind_addr="10.5.0.246"

Sometimes multicast traffic choses to use IPV6 network interfaces even if we specifically bind by an IPV4 address. To make sure the JVM uses ipv4 add the following settings
-Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Addresses=true


Verification:

netstat -ng
IPv6/IPv4 Group Memberships
Interface       RefCnt Group
--------------- ------ ---------------------
lo              1      224.0.75.75
lo              1      224.0.0.1
eth0            1      224.0.75.75
eth0            1      230.0.0.1
eth0            1      224.0.0.1
lo              1      ff02::1
eth0            1      ff02::202
eth0            1      ff02::1:ffa4:7c61
eth0            1      ff02::1
eth1            1      ff02::1



netstat -anp | grep 230.0.0.1
(Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.)
udp        0      0 230.0.0.1:9997              0.0.0.0:*                               3761/java     


Server console log:
GMS: address=node-1, cluster=broadcast, physical address=10.5.0.246:38815
Or
netstat -an | grep udp| grep  10.5.0.246
udp        0      0 10.5.0.246:38815            0.0.0.0:*   



sudo tcpdump -ni eth0:0 udp port 9997

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0:0, link-type EN10MB (Ethernet), capture size 65535 bytes
16:05:11.457605 IP 10.5.0.246.palace-6 > 230.0.0.1.palace-6: UDP, length 78
16:05:11.963756 IP 10.5.0.246.palace-6 > 230.0.0.1.palace-6: UDP, length 78
16:05:13.184506 IP 10.5.0.246.palace-6 > 230.0.0.1.palace-6: UDP, length 91



In case of any issues, you would also want to make sure that the jgroups settings: receive_on_all_interfaces and send_on_all_intefaces are set to false.  By default they should be, at least the receive_on_all_interfaces based on jgroups documentation.
Other jgroups settings that could be explored are  receive_interfaces and send_interfaces. Not sure if for a large environment, for performance reasons one would want to further separate the receive and send traffic on their own interfaces.
For more strict control monitoring or troubleshooting: bind_port and port_range could also be explored.

Besides the multicast traffic jgroups also uses unicast sockets for node to node communication. The physical address of the node is the unicast socket. If a network filter blocks the communication on these sockets, the cluster cannot form and you'd get error in the log similar to these.
[OOB-434,broadcast,node-17] [UDP] node-17: no physical address for 8a0e9532-54df-d91e-79af-185b2cadaf1f, dropping message. By default the unicast socket uses a random port number, this is fine in most environments but for environments where iptables or other local firewalls are enabled, you will need to set the bind_port and port_range, otherwise the cluster would not be able to form. Define your port range using these 2 values and add rules to your iptables to allow communication on these ports.


In most of the servers out of the box sysctl settings don't seem to be be optimal for Jgroups . Check for warns in the console log right after Jgroups started that look similar to this. Ask your admin to adjust these settings accordingly.
WARN  [main] [UDP] [JGRP00014] the send buffer of socket DatagramSocket was set to 640KB, but the OS only allocated 229.38KB. This might lead to performance problems. Please set your max send buffer in the OS correctly (e.g. net.core.wmem_max on Linux)    
WARN  [main] [UDP] [JGRP00014] the receive buffer of socket DatagramSocket was set to 20MB, but the OS only allocated 229.38KB. This might lead to performance problems. Please set your max receive buffer in the OS correctly (e.g. net.core.rmem_max on Linux)     
[java] WARN  [main] [UDP] [JGRP00014] the send buffer of socket MulticastSocket was set to 640KB, but the OS only allocated 229.38KB. This might lead to performance problems. Please set your max send buffer in the OS correctly (e.g. net.core.wmem_max on Linux)     
[java] WARN  [main] [UDP] [JGRP00014] the receive buffer of socket MulticastSocket was set to 25MB, but the OS only allocated 229.38KB. This might lead to performance problems. Please set your max receive buffer in the OS correctly (e.g. net.core.rmem_max on Linux)



JGroups with large cluster:
Newer version of JGroups 2.2.9 and higher could leverage TCP NIO for TCP clusters: http://www.jgroups.org/Perftest.html . While in small cluster there does not seem to be a difference, actually the opposite, NIO seems to be slower, it is expected to yield better performance in larger clusters.
http://sourceforge.net/p/javagroups/mailman/message/31439524/


Error Analysis:

WARN  [INT-2,hybris-broadcast,hybrisnode-1] [UDP] JGRP000012: discarded message from different cluster EH_CACHE (our cluster is xxx). Sender was 7d979f10-cadd-6813-d5c4-21e4cca405c5

check  Jgroup configuration file's mcast_port setting: probably it is in conflict with some other cluster


WARN  [TransferQueueBundler,hybris-broadcast,hybrisnode-2] [UDP] JGRP000032: %s: no physical address for %s, dropping message

Add system property -Djava.net.preferIPv4Stack=true-Djava.net.preferIPv4Addresses=true

check Jgroup configuration file bind_addr="match-interface:eth0"