小心使用你的Cache。。。
http://www.codeproject.com/KB/web-cache/cachingmistakes.aspx
Ten caching mistakes that break your app
Introduction
Caching frequently used objects, that are expensive to fetch from the source, makes application perform faster under high load. It helps scale an application under concurrent requests. But some hard to notice mistakes can lead the application to suffer under high load, let alone making it perform better, especially when you are using distributed caching where there’s separate cache server or cache application that stores the items. Moreover, code that works fine using in-memory cache can fail when the cache is made out-of-process. Here I will show you some common distributed caching mistakes that will help you make better decision when to cache and when not to cache.
Here are the top 10 mistakes I have seen:
- Relying on .NET’s default serializer.
- Storing large objects in a single cache item.
- Using cache to share objects between threads.
- Assuming items will be in cache immediately after storing it.
- Storing entire collection with nested objects.
- Storing parent-child objects together and also separately.
- Caching Configuration settings.
- Caching Live Objects that has open handle to stream, file, registry, or network.
- Storing same item using multiple keys.
- Not updating or deleting items in cache after updating or deleting them on persistent storage.
Let’s see what they are and how to avoid them.
I am assuming you have been using ASP.NET Cache or Enterprise Library Cache for a while, you are satisfied, now you need more scalability and thus moved to a out-of-process or distributed cache like Velocity or Memcache. After that things have started to fall apart and thus the common mistakes listed below applies to you.
Relying on .NET’s default serializer
When you use an out-of-process caching solution like Velocity or memcached, where items in cache are stored in a separate process than where your application runs; every time you add an item to the cache it serializes the item into byte array and then sends the byte array to the cache server to store it. Similarly, when you get an item from the cache, the cache server sends back the byte array to your application and then the client library deserializes the byte array into the target object. Now .NET’s default serializer is not optimal since it relies on Reflection which is CPU intensive. As a result, storing items in cache and getting items from cache add high serialization and deserialization overhead that results in high CPU, especially if you are caching complex types. This high CPU usage happens on your application, not on the cache server. So, you should always use one of the better approaches shown in this article so that the CPU consumption in serialization and deserialization is minimized. I personally prefer the approach where you serialize and deserialize the properties all by yourself by implementing ISerializable interface and then implementing the deserialization constructor.
[Serializable]
public class Customer : ISerializable
{
public string FirstName;
public string LastName;
public int Salary;
public DateTime DateOfBirth;
public Customer()
{
}
public Customer(SerializationInfo info, StreamingContext context)
{
FirstName = info.GetString("FirstName");
LastName = info.GetString("LastName");
Salary = info.GetInt32("Salary");
DateOfBirth = info.GetDateTime("DateOfBirth");
}
#region ISerializable Members
public void GetObjectData(SerializationInfo info, StreamingContext context)
{
info.AddValue("FirstName", FirstName);
info.AddValue("LastName", LastName);
info.AddValue("Salary", Salary);
info.AddValue("DateOfBirth", DateOfBirth);
}
#endregion
}
This prevents the formatter from using reflection. The performance improvement you get using this approach is sometimes 100 times better than the default implementation when you have large objects. So, I strongly recommend that at least for the objects that are cached, you should always implement your own serialization and deserialization code and not let .NET use Reflection to figure out what to serialize.
Storing large objects in a single cache item
Sometimes we think large objects should be cached because they are too expensive to fetch from the source. For example, you might think caching an object graph of 1 MB might give you better performance than loading that object graph from file or database. You would be surprised how non scalable that is. It will certainly work a lot faster than loading the same thing from database when you have only one request at a time. But under concurrent load, frequent access to that large object graph will blow up server’s CPU. This is because Caching has high serialization and deserialization overhead. Every time you will try to get an 1 MB object graph from an out of process cache, it will consume significant CPU to build that object graph in memory.
var largeObjectGraph = myCache.Get("LargeObjectGraph");
var anItem = largeObjectGraph.FirstLevel.SecondLevel.ThirdLevel.FourthLevel.TheItemWeNeed;
Solution it not to cache the large object graph as a single item in the cache using a single key. Instead you should break that large object graph into smaller items and then cache those smaller items individually. You should only retrieve from cache the smallest item you need.
// store smaller parts in cache as individual item
var largeObjectGraph = new VeryLargeObjectGraph();
myCache.Add("LargeObjectGraph.FirstLevel.SecondLevel.ThirdLevel",
largeObjectGraph.FirstLevel.SecondLevel.ThirdLevel);
...
...
// get the smaller parts from cache
var thirdLevel = myCache.Get("LargeObjectGraph.FirstLevel.SecondLevel.ThirdLevel");
var anItem = thirdLevel.FourthLevel.TheItemWeNeed;
The idea is to look at the items that you need most frequently from the large object (say the connection strings from a configuration object graph) and store those items separately in the cache. Always keep in mind that the item that you retrieve from cache is always small, say max 8 KB.
Using cache to share objects between multiple threads
Since you can access cache from multiple threads, sometimes you use it to conveniently pass data between multiple threads. But cache, like static variables, can suffer from race conditions. It’s even more common when the cache is distributed since storing and reading an item requires out-of-process communication and your threads get more chance to overlap on each other than in-memory cache. The following example shows how in-memory cache rarely demostrate the race condition but an out-of-process cache almost always shows it:
myCache["SomeItem"] = 0;
var thread1 = new Thread(new ThreadStart(() =>
{
var item = myCache["SomeItem"]; // Most likely 0
item ++;
myCache["SomeItem"] = item;
});
var thread2 = new Thread(new ThreadStart(() =>
{
var item = myCache["SomeItem"]; // Most likely 1
item ++;
myCache["SomeItem"] = item;
});
var thread3 = new Thread(new ThreadStart(() =>
{
var item = myCache["SomeItem"]; // Most likely 2
item ++;
myCache["SomeItem"] = item;
});
thread1.Start();
thread2.Start();
thread3.Start();
.
.
.
The above code most of the time demonstrates the most likely behavior when you are using in-memory cache. But when you go out-of-process or distributed, it will always fail to demonstrate the most-likely behavior. You need to implement some kind of locking here. Some caching provider allows you to lock an item. For example, Velocity has locking feature, but memcache does not. In Velocity, you can lock an item:
// get an item and lock it
DataCacheLockHandle handle;
SomeClass someItem = _defaultCache.GetAndLock("SomeItem",
TimeSpan.FromSeconds(1), out handle, true) as SomeClass;
// update an item
someItem.FirstName = "Version2";
// put it back and get the new version
DataCacheItemVersion version2 = _defaultCache.PutAndUnlock("SomeItem",
someItem, handle);
You can use locking to reliably read and write to cache items that get changed by multiple threads.
Assuming items will be in cache immediately after storing it
Sometimes you store an item in cache on a submit button click and assume that upon the page postback the item can be read from cache because it was just stored in cache. You are wrong.
private void SomeButton_Clicked(object sender, EventArgs e)
{
myCache["SomeItem"] = someItem;
}
private void OnPreRender()
{
var someItem = myCache["SomeItem"]; // It's gone dude!
Render(someItem);
}
You can never assume an item will be in cache for sure. Even if you are storing the item in Line 1 and reading it from Line 3. When your application is under pressure and there’s a scarcity of physical memory, cache will flush out items that aren’t frequently used. So, by the time code reaches Line 3, cache could be flushed out. Never assume you can always get an item back from cache. Always have a null check and retrieve from persistent storage.
var someItem = myCache["SomeItem"] as SomeClass ?? GetFromSource();
You should always use this format when reading an item from cache.
Storing entire collection with nested objects
Sometimes you store an entire collection in a single cache item because you need to access the items in the collection frequently. Thus every time you try to read an item from the collection, you have to load the collection first and then read that particular item. Something like this:
var products = myCache.Get("Products");
var product = products[1];
This is inefficient. You are unnecessarily loading an entire collection just to read a certain item. You will have absolutely no problem when the cache is in-memory, as the cache will just store a reference to the collection. But in a distributed cache, where the entire collection is deserialized every time you access it, it will result in poor performance. Instead of caching a whole collection, you should cache individual items separately.
// store individual items in cache
foreach (Product product in products)
myCache.Add("Product." + product.Index, product);
...
...
// read the individual item from cache
var product = myCache.Get("Product.0");
The idea is simple, you store each item in the collection individually using a key that can be guessed easily, for example using the index as a padding.
Storing parent-child objects together and also separately
Sometimes you store an object in cache that has a child object, which you also separately store in another cache item. For example, say you have a customer object that has an order collection. So, when you cache customer, the order collection gets cached as well. But then you separately cache the individual orders. So, when an individual order is updated in cache, the orders collection containing the same order inside the customer object is not updated and thus gives you inconsistent result. Again this works fine when you have in-memory cache but fails when your cache is made out-of-process or distributed.
var customer = SomeCustomer();
var recentOrders = SomeOrders();
customer.Orders = GetCustomerOrders();
myCache.Add("RecentOrders", recentOrders);
myCache.Add("Customer", customer);
...
...
var recentOrders = myCahce.Get("RecentOrders");
var order = recentOrders["ORDER10001"];
order.Status = CANCELLED;
...
...
...
var customer = myCache.Get("Customer");
var order = customer.Orders["ORDER10001"];
order.Status = PROCESSING; // Inconsistent. The order has already been cancelled
This is a hard problem to solve. It requires clever design so that you never end up having same object stored twice in the cache. One common approach is not to store child objects in cache instead store keys of child object so that they can be retrieved from cache individually. So, in the above scenario, you would not store the customer’s order collection in cache. Instead you will store the OrderID collection with Customer and then when you need to see the orders of a customer, you try to load the individual order object using the OrderID.
var recentOrders = SomeOrders();
foreach (Order order in recentOrders)
myCache.Add("Order." + order.ID, order);
...
var customer = SomeCustomer();
customer.OrderKeys = GetCustomerOrders(); // Store keys only
myCache.Add("Customer", customer);
...
...
var order = myCache.Get["Order.10001"];
order.Status = CANCELLED;
...
...
...
var customer = myCache.Get("Customer");
var customerOrders = customer.OrderKeys.ConvertAll<string, Order>
(key => myCache.Get("Order." + key));
var order = customerOrders["10001"]; // Correct object from cache
This approach ensures that a certain instance of an entity is stored in the cache only once, no matter how many times it appears in collections or parent objects.
Caching configuration settings
Sometimes you cache configuration settings. You use some cache expiration logic to ensure the configuration is refreshed periodically or refreshed when the configuration file or database table changes. Since configuration settings are access very frequently, reading them from cache adds significant CPU overhead. Instead you should just use static variables to store configurations.
var connectionString = myCache.Get("Configuration.ConnectionString");
You should not follow such an approach. Getting an item from cache is not cheap. It may not be as expensive as reading from a file or registry. But it’s not very cheap either, especially if the item is a custom class that adds some serialization overhead. So, you should instead store the configuration settings in static variables. But you might ask, how do we refresh configuration without restarting appdomain when it’s stored in static variable? You can use some expiration logic like file listener to reload the configuration when configuration file changes or use some database polling to check for database update.
Caching live objects that has open file, registry or network handle
I have seen developers cache instance of classes which holds open connection to file, registry or external network connection. This is dangerous. When items are removed from cache, they aren’t disposed automatically. Unless you dispose such class, you leak system resource. Every time such a class instance is removed from cache due to expiration or some other reason without being disposed, it leaks the resources it was holding onto.
You should never cache such objects that holds open streams, file handles, registry handles or network connections just because you want to save opening the resource every time you need them. Instead you should use some static variable or use some in-memory cache that is guaranteed to give you expiration callback so that you can dispose them properly. Out of process caches or session stores do not give you expiration callback consistently. So, never store live objects there.
Storing same item using multiple keys
Sometimes you store objects in cache using the key and also by index because you not only need to retrieve items by key but also needs to iterate through items using index. For example,
var someItem = new SomeClass();
myCache["SomeKey"] = someItem;
.
.
myCache["SomeItem." + index] = someItem;
.
.
If you are using in-memory cache, the following code will work fine:
var someItem = myCache["SomeKey"];
someItem.SomeProperty = "Hello";
.
.
.
var someItem = myCache["SomeItem." + index];
var hello = someItem.SomeProperty; // Returns Hello, fine, when In-memory cache
/* But fails when out of process cache */
The above code works when you have in-memory cache. Both of the items in the cache is referring to the same instance of the object. So, no matter how you get the item from cache, it always returns the same instance of the object. But in an out-of-process cache, especially in a distributed cache, items are stored after serializing them. Items aren’t stored by reference. Thus you store copies of items in cache, you never store the item itself. So, if you retrieve an item using a key, you are getting a freshly made copy of that item as the item is deserialized and created fresh every time you get it from cache. As a result, changes made to the object never reflects back to the cache unless you overwrite the item in the cache after making the changes. So, in a distributed cache, you will have to do the following:
var someItem = myCache["SomeKey"];
someItem.SomeProperty = "Hello";
myCache["SomeKey"] = someItem; // Update cache
myCache["SomeItem." + index] = someItem; // Update all other entries
.
.
.
var someItem = myCache["SomeItem." + index];
var hello = someItem.SomeProperty; // Now it works in out-of-process cache
Once you update the cache entry using the modified item, it works as the items in the cache receive a new copy of the item.
Not updating or deleting objects from cache when items are updated or deleted from data source
This again works in in-memory cache, but fails when you go to out-of-process/distributed cache. Here’s an example:
var someItem = myCache["SomeItem"];
someItem.SomeProperty = "Hello Changed";
database.Update(someItem);
.
.
.
var someItem = myCache["SomeItem"];
Console.WriteLine(someItem.SomeProperty); // "Hello Changed"? Nope.
This works fine in a in-memory cache. But fails when it’s out-of-process or distributed cache. The reason is you changed the object but never updated the cache with the latest object. Items in cache are stored as a copy, not the original instance.
Another mistake is not deleting items from cache when the item is deleted from database.
var someItem = myCache["SomeItem"];
database.Delete(someItem);
.
.
.
var someItem = myCache["SomeItem"];
Console.WriteLine(someItem.SomeProperty); // Works fine. Oops!
Don’t forget to delete items from cache, all possible ways it has been stored in cache, when you delete an item from database, file or some persistent store.
Conclusion
Caching requires careful planning and clear understanding of the data being cached. Otherwise when cache is made distributed it not only performs worse but can also fail the code. Keeping these common mistakes in mind while caching will help you cash out from your code.