Introduction
The goal for this document is to mainly describe the Hibernate 2nd-level caching strategy, learning how to use it and how can we got the performance benefit from the caching strategy.
Caching strategy and scopes
1. Three kinds of scope caching strategies
Transaction scope cache
Transaction scope cache: Attached to the current unit of work, which always associates with a database transaction/conversation. It’s valid and used only as long as the unit of work runs. Every unit of work has its own cache. Data in this cache isn’t accessed concurrently. (When hibernate transaction committed, this scope caching cleaned)
Process scope cache
Process scope cache: Shared between many (possibly concurrent) units of work or transactions. This means that data in the process scope cache is accessed by concurrently running threads, obviously with implications on transaction isolation, this only works in one single JVM
Cluster scope cache
Cluster scope cache: Shared between multiple processes on the same machine or between multiple machines in a cluster. Here, network communication is an important point worth consideration. This works for multiple JVM, Caching information must be replicated to all nodes in the
cluster.
2. The Hibernate cache architecture
First level cache
The first-level cache is the persistence context cache. A Hibernate Session lifespan corresponds to either a single request - usually implemented with one database
transaction/conversation. This is a mandatory first-level cache that also guarantees the scope of object and database identity (the exception being the StatelessSession, which doesn’t have a persistence context).
First-level cache is associated with the Session Object.
Second level cache
The second-level cache in Hibernate is pluggable and may be scoped to the process or cluster. This is a cache of state (returned by value), not of actual persistent instances. Use of the second-level cache is optional and can be configured on a per-class and per-collection basis—each such cache utilizes its own physical cache region.
Second-level cache is associated with the SessionFactory object.
To reduce database traffic, second-level cache keeps loaded objects at the SessionFactory level between transactions. These objects are available to the whole application.
3. Hibernate Second Level Cache
3.1 Persistence instances in Hibernate 2nd-level cache
Persistent instances are stored in the second-level cache in a disassembled form. Think of disassembly as a process a bit like serialization (the algorithm is much, much faster than Java serialization, however). - This is a cache of state (returned by value), not of actual persistent instances.
Actually Hibernate stores those persistence objects in their “dehydrated” form, that is something like the property values as below
{
30 => [cn,China,30],
214 => [US,United State,214],
158 => [DE,German,158],
31 => [by,Belarus,31]
95 => [in,India,95]
...
}
3.2 The good candidates classes for caching
Good candidate classes for caching are classes that represent
■ Data that changes rarely
■ Content-management data
■ Data that is local to the application and not shared with other legacy application
Bad candidates for second-level caching are
■ Data that is updated often
■ Financial data
■ Data that is shared with a legacy application
Just as an example in PRT, such as the CONF_CTRY records are good candidates but the PRICE_CN records are bad candidates cause it is updated often.
To repeat, the cache is usually useful only for read-mostly classes. If you have data that is updated much more often than it’s read, don’t enable the second-level cache, even if all other conditions for caching are true! The price of maintaining the cache during updates can possibly outweigh the performance benefit of faster reads.
PRT is a legacy application because it can process by JDBC, and Hibernate can never aware of the changes made by JDBC. See Section 2.3.9 to check the way how to resolve this issue.
3.3 Build-in Concurrency strategies - transaction isolation
The four built-in concurrency strategies represent decreasing levels of strictness
in terms of transaction isolation:
■ Transactional—Available in a managed environment only, it guarantees full
transactional isolation up to repeatable read, if required. Use this strategy for read-mostly data where it’s critical to prevent stale data in concurrent transactions, in the rare case of an update.
■ Read-write—This strategy maintains read committed isolation, using a timestamping
mechanism and is available only in non-clustered environments. Again, use this strategy for read-mostly data where it’s critical to prevent stale data in concurrent transactions, in the rare case of an update.
■ Nonstrict-read-write—Makes no guarantee of consistency between the cache and the database. If there is a possibility of concurrent access to the same entity, you should configure a sufficiently short expiry timeout. Otherwise, you may read stale data from the cache. Use this strategy if data hardly ever changes (many hours, days, or even a week) and a small likelihood of stale data isn’t of critical concern.
■ Read-only—A concurrency strategy suitable for data which never changes. Use it for reference data only.
3.3 Cluster Cache VS Process Cache
Cluster cache can replicate the changed data across the clusters, one cluster data is changed, all the other cluster will be synchornized immediately to keep the caching data to avoid the dirty read – example, if both Cluster1 and Cluster2 cached the element E, and Cluster 1 updated the E, if not using Cluster caching, Cluster2 never knows the changes made by Cluster 1 ( only except for when the element E time out ), then Cluster2 continue to get the unchanged data from its own 2nd-level cache, then the dirty data is fetched – So we have to use the Cluster Caching between clusters except for the scenario that only needs caching read-only data(the data never updated). And at the general, Cluster cache only support READ-ONLY and REPEATABLE-READ isolation level. JBoss caching highly suggests to use the Optimistic Node locking Strategies to enhancement the performance when replication among clusters
Process cache only works for a single JVM, it supports all the Caching concurrency isolation level except for Transactional.
Any application that is designed to scale must support clustered operation. A process scope cache doesn’t maintain consistency between the different caches on different machines in the cluster. In this case, a cluster scope (distributed) second-level cache should be used instead of the process scope cache. Our WWPRT System contains the ATS and 2 WAS applications, so we have to choose the cluster scope second – level cache
3.4 Caching Provider
For considering the concurrency strategies you’ll use for your cache candidate classes to pick a cache provider. The provider is a plug-in, the physical implementation of a cache system.
Hibernate forces you to choose a single cache provider for the whole application.
Providers for the following open source products are built into Hibernate:
■ EHCache is a cache provider intended for a simple process scope cache in a single JVM. It can cache in memory or on disk, and it supports the optional Hibernate query result cache. (The latest version of EHCache now supports clustered caching, but we haven’t tested this yet.)
■ OpenSymphony OSCache is a service that supports caching to memory and
disk in a single JVM, with a rich set of expiration policies and query cache support.
■ SwarmCache is a cluster cache based on JGroups. It uses clustered invalidation
but doesn’t support the Hibernate query cache.
■ JBoss Cache is a fully transactional replicated clustered cache also based on the JGroups multicast library. It supports replication or invalidation, synchronous or asynchronous communication, and optimistic and pessimistic locking. The Hibernate query cache is supported, assuming that clocks are synchronized in the cluster.
JBoss caching provider supports the Read-Only and Transactional Concurrency Strategy across the clusters.
If a record is cached as read-commit, it plays as the same action than database, if it updated by the a transaction, the other transactions can not access the record.
3.5 JBoss Caching Strategy
We know that, JBoss caching supports to replicate the changed data among its managed clusters to avoid the dirty read between the clusters. So How does it works ?
JBossCache has five different cache modes, i.e., LOCAL , REPL_SYNC , REPL_ASYNC , INVALIDATION_SYNC and INVALIDATION_ASYNC.
If you want to run JBoss Cache as a single instance, then you should set the cache mode to
LOCAL so that it won't attempt to replicate anything.
If you want to have synchronous replication among different JBoss Cache instances, you can
set it to REPL_SYNC . For asynchronous replication, use AYSNC_REPL .
If you do not wish to replicate cached data but simply inform other caches in a cluster that data
under specific addresses are now stale and should be evicted from memory, use
INVALIDATION_SYNC or INVALIDTAION_ASYNC . Synchronous and asynchronous
behavior applies to invalidation as well as replication.
INVALIDATION_SYNC always to be the best choice for us, it always plays more faster during replication because it doesn’t need to synchronize the data in each cluster's caching memory, it directly tell the other clusters to evict the stale data and re-load from DB.
When does the JBoss cache to replicate the stale data between clusters?
If the updates are under transaction, then the replication happen only when the transaction is about to commit (actually during the prepare stage internally). However, if the operations are not under transaction context, then each update will trigger replication. Note that this has performance implication if network transport is heavy (it usually is).
So a good practice to use Second-Level cache is that to make your system under well transactional management always.
How does it replication mechanism works?
JBoss Cache leverages JGroups as a replication layer. A user can configure the cluster of JBoss Cache instances by sharing the same cluster name. In WWPRT environment, we have ATS and WAS clusters, we can set them as one JGoups with a unique cluster name.
Note that once all instances join the same replication group, every replication change is propagated to all participating members. There is no mechanism for sub-partitioning where some replication can be done within only a subset of members. This is on our to do list.
Check the more frequentely asked questions from
3.6 Example of configuring the second-level cache instance
Read Only
<class name="auction.model.Category" table="CATEGORY">
<cache usage="read-only"/>
<id ...
</class>
Read Commit
<class name="auction.model.Category" table="CATEGORY">
<cache usage="read-write"/>
<id ...
</class>
Enable Collection cache instances
<class name="Item" table="ITEM">
<cache usage="read-write"/>
<id ...
<set name="bids">
<cache usage="read-write"/>
<key ...
</set>
</class>
<class name="Bid" table="BID" mutable="false">
<cache usage="read-write"/>
<id ...
</class>
3.7 Setting up the cache providers
Hibernate only supports one cache provider
EHCache Example
EHcache Provider is a Process cache, it's single JVM based
Step1: Set the configuration property that selects a cache provider as below,
hibernate.cache.provider_class = org.hibernate.cache.EhCacheProvider
Step2: Configure the ehcache.xml under classpath for the Category class,
<cache name="auction.model.Bid"
maxElementsInMemory="50000"
eternal="false"
timeToIdleSeconds="1800"
timeToLiveSeconds="100000"
overflowToDisk="false"
memoryStoreEvictionPolicy="LRU"
/>
maxelementsInMemory: If the records cached in memory are extends the number, the least
accessed records will be removed.
external: If true, that means the records will never be removed(never disabling eviction by
timeout).
timeToIdleSeconds: the expiry time in seconds since an element was last accessed in the cache.
timeToLiveSeconds: the expiry time in seconds since the element was first added to the cache.
OverflowToDisk: If the cached records extends the maxelementInMemory, it will be
serialized into the file. Set as false
memoryStoreEvictionPolicy: the policy on how to evict the cache data
LRU – least recently used
LFU – least frequently used
FIFO – first in first out, the oldest element by creation time
JBoss Cache Example
see http://comedsh.iteye.com/admin/blogs/729150
3.8 Management the changed records by other applications
Many Java applications share access to their database with other applications. There is no way for a cache system to know when the legacy application(the system is not context management by Hibernate) updated the shared data. For example, we use the SQL directly changed the database, then our cache system will never knows immediately. So we have to implement application-level functionality to trigger an invalidation of the process (or cluster) scope cache when changes are made to the database.
Fortunately, we already have the application-level functionality implement in our system, com.ibm.finance.tools.wwprt.util.ExpirationListener, if we touch the file wwprt-was.properties all the cached data will be reloaded immediately.
( current it not implement for 2nd-leve cache strategy, but we can added here). But this is only usable for plain SQL executed manually, then we can touch (shell command) the file to re-loaded those cached data.
It still can not resolve the JDBC access by programming, The only way to do so is to evict the related cached data from SessionFactory( 2nd level cached data stored in ) after the JDBC executed every time, then 2nd-level cache will re-load those invalided records automatically, this is the only way for us because Hibernate never knows the changes under JDBC Spring Template.
3.8 Bench mark testing
Times | Without 2nd caching | With 2nd caching | Perf-enhanced |
1 | 25,250 ms | 24,265ms | ~ |
2 | 19,203 ms | 678ms | 96.4 %. |
3 | 19,344 ms | 719ms | 96.2 %. |
4 | 19,203 ms | 818ms | 95.7 %. |
5 | 18,891 ms | 660ms | 96.5 %. |
3.9 Control the 2nd level cache
We have a those ways below provided by Hibernate/Cache provider to control the 2nd-level cache.
Control on SessionFactory level
remove an element/collection from the second-level cache
SessionFactory.evict( Category.class, new Long(123) );
remove all elements/collections of a certain class
SessionFactory.evict("auction.model.Category");
SessionFactory.evictCollection("auction.model.Category.items");
Control on Session level
Hibernate offers CacheMode options that can be activated for a particular Session. Imagine that you want to batch insert some records into database in one Session, and we don't want to add those 2nd -level cachable objects caching into 2nd-level cache, we can do as below,
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
session.setCacheMode(CacheMode.IGNORE);
for ( int i=0; i<100000; i++ ) {
Item item = new Item(...);
session.save(item);
if ( i % 100 == 0 ) {
session.flush();
session.clear();
}
}
tx.commit();
session.close();
CacheMode.IGNORE: Hibernate never interacts with the second-level cache except to
invalidate cached items when updates occur.