Ten caching mistakes that break your app

小心使用你的Cache。。。

http://www.codeproject.com/KB/web-cache/cachingmistakes.aspx

 

Ten caching mistakes that break your app

Caching large objects, duplicate objects, caching collections, live objects, thread unsafe caching and other common mistakes break your app instead of making it fly. Learn ten common caching mistakes devs make.

 

Introduction

Caching frequently used objects, that are expensive to fetch from the source, makes application perform faster under high load. It helps scale an application under concurrent requests. But some hard to notice mistakes can lead the application to suffer under high load, let alone making it perform better, especially when you are using distributed caching where there’s separate cache server or cache application that stores the items. Moreover, code that works fine using in-memory cache can fail when the cache is made out-of-process. Here I will show you some common distributed caching mistakes that will help you make better decision when to cache and when not to cache.

Here are the top 10 mistakes I have seen: 

  1. Relying on .NET’s default serializer. 
  2. Storing large objects in a single cache item.
  3. Using cache to share objects between threads.
  4. Assuming items will be in cache immediately after storing it.
  5. Storing entire collection with nested objects.
  6. Storing parent-child objects together and also separately.
  7. Caching Configuration settings.
  8. Caching Live Objects that has open handle to stream, file, registry, or network.
  9. Storing same item using multiple keys.
  10. Not updating or deleting items in cache after updating or deleting them on persistent storage.

Let’s see what they are and how to avoid them.

I am assuming you have been using ASP.NET Cache or Enterprise Library Cache for a while, you are satisfied, now you need more scalability and thus moved to a out-of-process or distributed cache like Velocity or Memcache. After that things have started to fall apart and thus the common mistakes listed below applies to you.  

Relying on .NET’s default serializer

When you use an out-of-process caching solution like Velocity or memcached, where items in cache are stored in a separate process than where your application runs; every time you add an item to the cache it serializes the item into byte array and then sends the byte array to the cache server to store it. Similarly, when you get an item from the cache, the cache server sends back the byte array to your application and then the client library deserializes the byte array into the target object. Now .NET’s default serializer is not optimal since it relies on Reflection which is CPU intensive. As a result, storing items in cache and getting items from cache add high serialization and deserialization overhead that results in high CPU, especially if you are caching complex types. This high CPU usage happens on your application, not on the cache server. So, you should always use one of the better approaches shown in this article so that the CPU consumption in serialization and deserialization is minimized. I personally prefer the approach where you serialize and deserialize the properties all by yourself by implementing ISerializable interface and then implementing the deserialization constructor.

Collapse | Copy Code
[Serializable]
    public class Customer : ISerializable
    {
        public string FirstName;
        public string LastName;
        public int Salary;
        public DateTime DateOfBirth;

        public Customer()
        {
        }

        public Customer(SerializationInfo info, StreamingContext context)
        {
            FirstName = info.GetString("FirstName");
            LastName = info.GetString("LastName");
            Salary = info.GetInt32("Salary");
            DateOfBirth = info.GetDateTime("DateOfBirth");
        }

        #region ISerializable Members

        public void GetObjectData(SerializationInfo info, StreamingContext context)
        {
            info.AddValue("FirstName", FirstName);
            info.AddValue("LastName", LastName);
            info.AddValue("Salary", Salary);
            info.AddValue("DateOfBirth", DateOfBirth);
        }

        #endregion

        
    }

This prevents the formatter from using reflection. The performance improvement you get using this approach is sometimes 100 times better than the default implementation when you have large objects. So, I strongly recommend that at least for the objects that are cached, you should always implement your own serialization and deserialization code and not let .NET use Reflection to figure out what to serialize.

Storing large objects in a single cache item

Sometimes we think large objects should be cached because they are too expensive to fetch from the source. For example, you might think caching an object graph of 1 MB might give you better performance than loading that object graph from file or database. You would be surprised how non scalable that is. It will certainly work a lot faster than loading the same thing from database when you have only one request at a time. But under concurrent load, frequent access to that large object graph will blow up server’s CPU. This is because Caching has high serialization and deserialization overhead. Every time you will try to get an 1 MB object graph from an out of process cache, it will consume significant CPU to build that object graph in memory.

Collapse | Copy Code
var largeObjectGraph = myCache.Get("LargeObjectGraph");
var anItem = largeObjectGraph.FirstLevel.SecondLevel.ThirdLevel.FourthLevel.TheItemWeNeed;

Solution it not to cache the large object graph as a single item in the cache using a single key. Instead you should break that large object graph into smaller items and then cache those smaller items individually. You should only retrieve from cache the smallest item you need.

Collapse | Copy Code
// store smaller parts in cache as individual item
var largeObjectGraph = new VeryLargeObjectGraph();
myCache.Add("LargeObjectGraph.FirstLevel.SecondLevel.ThirdLevel", 
  largeObjectGraph.FirstLevel.SecondLevel.ThirdLevel);
...
...
// get the smaller parts from cache
var thirdLevel = myCache.Get("LargeObjectGraph.FirstLevel.SecondLevel.ThirdLevel");
var anItem = thirdLevel.FourthLevel.TheItemWeNeed;
   

The idea is to look at the items that you need most frequently from the large object (say the connection strings from a configuration object graph) and store those items separately in the cache. Always keep in mind that the item that you retrieve from cache is always small, say max 8 KB.

Using cache to share objects between multiple threads

Since you can access cache from multiple threads, sometimes you use it to conveniently pass data between multiple threads. But cache, like static variables, can suffer from race conditions. It’s even more common when the cache is distributed since storing and reading an item requires out-of-process communication and your threads get more chance to overlap on each other than in-memory cache. The following example shows how in-memory cache rarely demostrate the race condition but an out-of-process cache almost always shows it:

Collapse | Copy Code
myCache["SomeItem"] = 0;

var thread1 = new Thread(new ThreadStart(() =>
{
    var item = myCache["SomeItem"]; // Most likely 0
    item ++;
    myCache["SomeItem"] = item;
});
var thread2 = new Thread(new ThreadStart(() =>
{
    var item = myCache["SomeItem"]; // Most likely 1
    item ++;
    myCache["SomeItem"] = item;
});
var thread3 = new Thread(new ThreadStart(() =>
{
    var item = myCache["SomeItem"];  // Most likely 2
    item ++;
    myCache["SomeItem"] = item;
});

thread1.Start();
thread2.Start();
thread3.Start();
.
.
.

The above code most of the time demonstrates the most likely behavior when you are using in-memory cache. But when you go out-of-process or distributed, it will always fail to demonstrate the most-likely behavior. You need to implement some kind of locking here. Some caching provider allows you to lock an item. For example, Velocity has locking feature, but memcache does not. In Velocity, you can lock an item:

Collapse | Copy Code
// get an item and lock it
DataCacheLockHandle handle;
SomeClass someItem = _defaultCache.GetAndLock("SomeItem", 
   TimeSpan.FromSeconds(1), out handle, true) as SomeClass;
// update an item
someItem.FirstName = "Version2";
// put it back and get the new version
DataCacheItemVersion version2 = _defaultCache.PutAndUnlock("SomeItem", 
    someItem, handle);

You can use locking to reliably read and write to cache items that get changed by multiple threads.

Assuming items will be in cache immediately after storing it

Sometimes you store an item in cache on a submit button click and assume that upon the page postback the item can be read from cache because it was just stored in cache. You are wrong.

Collapse | Copy Code
private void SomeButton_Clicked(object sender, EventArgs e)
{
  myCache["SomeItem"] = someItem;
}

private void OnPreRender()
{
  var someItem = myCache["SomeItem"]; // It's gone dude!
  Render(someItem);
}

You can never assume an item will be in cache for sure. Even if you are storing the item in Line 1 and reading it from Line 3. When your application is under pressure and there’s a scarcity of physical memory, cache will flush out items that aren’t frequently used. So, by the time code reaches Line 3, cache could be flushed out. Never assume you can always get an item back from cache. Always have a null check and retrieve from persistent storage.

Collapse | Copy Code
var someItem = myCache["SomeItem"] as SomeClass ?? GetFromSource();

You should always use this format when reading an item from cache.

Storing entire collection with nested objects

Sometimes you store an entire collection in a single cache item because you need to access the items in the collection frequently. Thus every time you try to read an item from the collection, you have to load the collection first and then read that particular item. Something like this:

Collapse | Copy Code
var products = myCache.Get("Products");
var product = products[1];

This is inefficient. You are unnecessarily loading an entire collection just to read a certain item. You will have absolutely no problem when the cache is in-memory, as the cache will just store a reference to the collection. But in a distributed cache, where the entire collection is deserialized every time you access it, it will result in poor performance. Instead of caching a whole collection, you should cache individual items separately.

Collapse | Copy Code
// store individual items in cache
foreach (Product product in products)
  myCache.Add("Product." + product.Index, product);
...
...
// read the individual item from cache
var product = myCache.Get("Product.0");

The idea is simple, you store each item in the collection individually using a key that can be guessed easily, for example using the index as a padding.

Storing parent-child objects together and also separately

Sometimes you store an object in cache that has a child object, which you also separately store in another cache item. For example, say you have a customer object that has an order collection. So, when you cache customer, the order collection gets cached as well. But then you separately cache the individual orders. So, when an individual order is updated in cache, the orders collection containing the same order inside the customer object is not updated and thus gives you inconsistent result. Again this works fine when you have in-memory cache but fails when your cache is made out-of-process or distributed.

Collapse | Copy Code
var customer = SomeCustomer();
var recentOrders = SomeOrders();
customer.Orders = GetCustomerOrders();
myCache.Add("RecentOrders", recentOrders);
myCache.Add("Customer", customer);
...
...
var recentOrders = myCahce.Get("RecentOrders");
var order = recentOrders["ORDER10001"];
order.Status = CANCELLED; 
...
...
...
var customer = myCache.Get("Customer");
var order = customer.Orders["ORDER10001"];
order.Status = PROCESSING; // Inconsistent. The order has already been cancelled

This is a hard problem to solve. It requires clever design so that you never end up having same object stored twice in the cache. One common approach is not to store child objects in cache instead store keys of child object so that they can be retrieved from cache individually. So, in the above scenario, you would not store the customer’s order collection in cache. Instead you will store the OrderID collection with Customer and then when you need to see the orders of a customer, you try to load the individual order object using the OrderID.

Collapse | Copy Code
var recentOrders = SomeOrders();
foreach (Order order in recentOrders)
   myCache.Add("Order." + order.ID, order);
...
var customer = SomeCustomer();
customer.OrderKeys = GetCustomerOrders(); // Store keys only
myCache.Add("Customer", customer);
...
...
var order = myCache.Get["Order.10001"];
order.Status = CANCELLED; 
...
...
...
var customer = myCache.Get("Customer");
var customerOrders = customer.OrderKeys.ConvertAll<string, Order>
   (key => myCache.Get("Order." + key));
var order = customerOrders["10001"]; // Correct object from cache

This approach ensures that a certain instance of an entity is stored in the cache only once, no matter how many times it appears in collections or parent objects.

Caching configuration settings

Sometimes you cache configuration settings. You use some cache expiration logic to ensure the configuration is refreshed periodically or refreshed when the configuration file or database table changes. Since configuration settings are access very frequently, reading them from cache adds significant CPU overhead. Instead you should just use static variables to store configurations.

Collapse | Copy Code
var connectionString = myCache.Get("Configuration.ConnectionString");

You should not follow such an approach. Getting an item from cache is not cheap. It may not be as expensive as reading from a file or registry. But it’s not very cheap either, especially if the item is a custom class that adds some serialization overhead. So, you should instead store the configuration settings in static variables. But you might ask, how do we refresh configuration without restarting appdomain when it’s stored in static variable? You can use some expiration logic like file listener to reload the configuration when configuration file changes or use some database polling to check for database update. 

Caching live objects that has open file, registry or network handle

I have seen developers cache instance of classes which holds open connection to file, registry or external network connection. This is dangerous. When items are removed from cache, they aren’t disposed automatically. Unless you dispose such class, you leak system resource. Every time such a class instance is removed from cache due to expiration or some other reason without being disposed, it leaks the resources it was holding onto.

You should never cache such objects that holds open streams, file handles, registry handles or network connections just because you want to save opening the resource every time you need them. Instead you should use some static variable or use some in-memory cache that is guaranteed to give you expiration callback so that you can dispose them properly. Out of process caches or session stores do not give you expiration callback consistently. So, never store live objects there.

Storing same item using multiple keys

Sometimes you store objects in cache using the key and also by index because you not only need to retrieve items by key but also needs to iterate through items using index. For example,

Collapse | Copy Code
var someItem = new SomeClass();
myCache["SomeKey"] = someItem;
.
.
myCache["SomeItem." + index] = someItem;
.
.

If you are using in-memory cache, the following code will work fine:

Collapse | Copy Code
var someItem = myCache["SomeKey"];
someItem.SomeProperty = "Hello";
.
.
.
var someItem = myCache["SomeItem." + index];
var hello = someItem.SomeProperty; // Returns Hello, fine, when In-memory cache
/* But fails when out of process cache */

The above code works when you have in-memory cache. Both of the items in the cache is referring to the same instance of the object. So, no matter how you get the item from cache, it always returns the same instance of the object. But in an out-of-process cache, especially in a distributed cache, items are stored after serializing them. Items aren’t stored by reference. Thus you store copies of items in cache, you never store the item itself. So, if you retrieve an item using a key, you are getting a freshly made copy of that item as the item is deserialized and created fresh every time you get it from cache. As a result, changes made to the object never reflects back to the cache unless you overwrite the item in the cache after making the changes. So, in a distributed cache, you will have to do the following:

Collapse | Copy Code
var someItem = myCache["SomeKey"];
someItem.SomeProperty = "Hello";
myCache["SomeKey"] = someItem; // Update cache
myCache["SomeItem." + index] = someItem; // Update all other entries
.
.
.
var someItem = myCache["SomeItem." + index];
var hello = someItem.SomeProperty; // Now it works in out-of-process cache

Once you update the cache entry using the modified item, it works as the items in the cache receive a new copy of the item.

Not updating or deleting objects from cache when items are updated or deleted from data source

This again works in in-memory cache, but fails when you go to out-of-process/distributed cache. Here’s an example:

Collapse | Copy Code
var someItem = myCache["SomeItem"];
someItem.SomeProperty = "Hello Changed";
database.Update(someItem);
.
.
.
var someItem = myCache["SomeItem"];
Console.WriteLine(someItem.SomeProperty); // "Hello Changed"? Nope.

This works fine in a in-memory cache. But fails when it’s out-of-process or distributed cache. The reason is you changed the object but never updated the cache with the latest object. Items in cache are stored as a copy, not the original instance.

Another mistake is not deleting items from cache when the item is deleted from database.

Collapse | Copy Code
var someItem = myCache["SomeItem"];
database.Delete(someItem);
.
.
.
var someItem = myCache["SomeItem"];
Console.WriteLine(someItem.SomeProperty); // Works fine. Oops!

Don’t forget to delete items from cache, all possible ways it has been stored in cache, when you delete an item from database, file or some persistent store.

Conclusion

Caching requires careful planning and clear understanding of the data being cached. Otherwise when cache is made distributed it not only performs worse but can also fail the code. Keeping these common mistakes in mind while caching will help you cash out from your code.

 

技术选型 【后端】:Java 【框架】:springboot 【前端】:vue 【JDK版本】:JDK1.8 【服务器】:tomcat7+ 【数据库】:mysql 5.7+ 项目包含前后台完整源码。 项目都经过严格调试,确保可以运行! 具体项目介绍可查看博主文章或私聊获取 助力学习实践,提升编程技能,快来获取这份宝贵的资源吧! 在当今快速发展的信息技术领域,技术选型是决定一个项目成功与否的重要因素之一。基于以下的技术栈,我们为您带来了一份完善且经过实践验证的项目资源,让您在学习和提升编程技能的道路上事半功倍。以下是该项目的技术选型和其组件的详细介绍。 在后端技术方面,我们选择了Java作为编程语言。Java以其稳健性、跨平台性和丰富的库支持,在企业级应用中处于领导地位。项目采用了流行的Spring Boot框架,这个框架以简化Java企业级开发而闻名。Spring Boot提供了简洁的配置方式、内置的嵌入式服务器支持以及强大的生态系统,使开发者能够更高效地构建和部署应用。 前端技术方面,我们使用了Vue.js,这是一个用于构建用户界面的渐进式JavaScript框架。Vue以其易上手、灵活和性能出色而受到开发者的青睐,它的组件化开发思想也有助于提高代码的复用性和可维护性。 项目的编译和运行环境选择了JDK 1.8。尽管Java已经推出了更新的版本,但JDK 1.8依旧是一种成熟且稳定的选择,广泛应用于各类项目中,确保了兼容性和稳定性。 在服务器方面,本项目部署在Tomcat 7+之上。Tomcat是Apache软件基金会下的一个开源Servlet容器,也是应用最为广泛的Java Web服务器之一。其稳定性和可靠的性能表现为Java Web应用提供了坚实的支持。 数据库方面,我们采用了MySQL 5.7+。MySQL是一种高效、可靠且使用广泛的关系型数据库管理系统,5.7版本在性能和功能上都有显著的提升。 值得一提的是,该项目包含了前后台的完整源码,并经过严格调试,确保可以顺利运行。通过项目的学习和实践,您将能更好地掌握从后端到前端的完整开发流程,提升自己的编程技能。欢迎参考博主的详细文章或私信获取更多信息,利用这一宝贵资源来推进您的技术成长之路!
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值