Java Reference Objects or How I Learned to Stop Worrying and Love OutOfMemoryError

最新推荐文章于 2023-11-23 11:07:37 发布

bithe

最新推荐文章于 2023-11-23 11:07:37 发布

阅读量946

点赞数

分类专栏： Java gc

Java gc 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

http://kdgregory.com/index.php?page=java.refobj

对象的四种引用

1，strong 常见的方式

2，soft软引用，尽量持有对象，在jvm抛出outOfMemory 之前回收，如果回收后还是没有足够空间，抛出OutOfMemory异常

3，weak弱引用，只要gc收集就会被回收

4，phantom引用，无法通过一个虚引用获取对象的实例，只能是在对象被回收的时候，收到一个系统通知，然后做一些相关关闭的工作。

Java 堆和对象生命周期

对于新开始使用Java的C++程序员，Java中的堆和栈的关系很难理解。在C++中，使用new操作符对象将会在堆上创建；或者在栈上自动分配创建。例如下面的语法中，C++会在栈上创建一个Integer 对象，而在Java中会当做语法错误

Integer foo = Integer(1);

和C++不一样，Java将所有的对象存储在堆上，需要使用new操作符新建对象。本地变量仍旧存储在栈上，但是会使用一个pointer指向对象，而不是对象本身(这些pointer称作‘引用’）。下面的Java 方法有一个Integer的变量指向从String中解析出的值：

public static void foo(String bar)
{
    Integer baz = new Integer(bar);
}

下面的图表表明了上述方法的堆和栈的关系。栈被分成‘frames'，包含着在调用中每个方法的局部变量和参数。所有被指向的对象都存在于堆中。

在foo的第一行中new了一个Integer对象。在动作的背后，jvm首先试图在堆上为这个对象找到足够的空间。如果分配了足够的空间，然后调用构造函数，构造Integer对象。最后，jvm在变量baz中存储一个指针指向在堆中申请的那个对象。但是，jvm并不是每次都能为objects找到足够的空间。在抛出OOM之前，jvm会调用gc来回收垃圾空间。

Garbage Collector

the garbage collector goes to work when the program tries to create a new object and there isn't enough space for it in the heap. The requesting thread is suspended while the collector looks through the heap, tring to find objects that are no longer actively used by the program, and reclaiming their space. If the collector is unable to free up enough space, and jvm is unable to expand the heap, the new operator fails with an OutOfMemoryError. This is normally followed by your application shutting down.

Mark-Sweep: the idea behind mark-sweep garbage collection is simple: from the roots, every object that can't be reached by the program is garbage, and is eligible for collection. So what are the "roots"? In a simple Java application, they're method arguments and local variables(stored on the stack), the operands of the currently executing expression (also stored on the stack), and static class member variables.

In programs that use their own classloaders, such as app-servers, the picture gets muddy: only classes loaded by the system classloader (the loader used by the jvm when it starts) contain root references. Any classloaders that application creates are themselves subject to collection, once there are no more references to them. This is what allows app-servers to hot-deploy: they create a separate classloader for each deployed application, and let go of the classloader reference when the application is undeployed or redeployed.

finalizers

C++ allows objects to define a destructor methods: when the object goes out of scope or is explicitly deleted, its destructor is called to clean up the resources it used. For most objects, this means explicitly releasing any memory the object allocated with new or malloc. In java, the garbage collector handles memory cleanup for you, so there's no need for an explicit destructor to do this.

however, memory isn't the only resource that might need to be cleaned up. Consider FileOutputStream: when you create an instance of this object, it allocates a file handle from the operating system. If you let all references to the stream go out of scope before closing it, what happens to that file handle? The answer is that the stream has a finalizer method: a method that's called by the jvm just before the garbage collector reclaims the object. In the case of FileOutputStream, the finalizer closed the stream, which releases the file handle back to the operating system -- and also flushes any buffers, ensuring that all data is properly written to disk.

while finalizers seem like an easy way to clean up after yourself, they do have some serious limitations. First, you should never rely on them for anything important, since an object's finalizer may be never be called -- the application might exit before the object is eligible for garbage collection. There are some other, more subtle problems with finalizers, but I'll hold off on these until we get to phantom reference.

Softly reachable(referent of Softreference): the jvm may preserve as long as possible, and collect it before throwing OutOfMemory.

Weakly reachable(referent of WeakRefernce):the gc is free to collect the object at any time, with no attempt to preserve it. in practice, the object will be collected during a major collection, but may survive a minor collection.

phantom reachable(referent of PhantomReference): ithas already been selected for collection and its finalizer (if any) has run. The tem "reachable" is really misnomer in this case, as there's no way for you to access the actual object,but it's the terminology that the API docs use.

Phantom references

the trouble with finalizers
1, a finalizer might never be invoked. If your program never runs out of available memory, then the garbage collector won't run, and neither will your finalizers. This usually isn't an issue with long-running (eg, server) applications, but short-running programs may finish without ever running garbage collection. And while there is a way to tell the jvm to run finalizers before the porgram exits, this is unreliable and may conflict with other shutdown hooks.

2, finalizers can create another strong reference to an object. For example, by adding the object to a collection. This essentially resurrects the object, but, as with Stephen King's Pet Sematary, the returned object "isn't quite right". In particular, its finalizer won't run when the objects is again eligible for collection. Perhaps there's a reason that you would use this resurrection trick, but I can't image it, and would look very dimly on code that did.

Now that those are out of the way, i believe the real problem with finalizers is that they introduce a gap between the time that the garbage collector first identifies an object for collection and the time that its memory is actually reclaimed, because finalization happens on its own thread, independent of the garbage collector's thread. The jvm is guaranteed to perform a full collection before it returns OutOfMemoryError, but if all objects eligible for collection have finalizers, then the collection will have no effect: those objects remain in memory awaiting finalization. Throwing in the fact that a standard jvm only has a single thread to handle finalization for all objects, and some long-running finalizers, and you can see where issues might arise.

the phantom knows

phantom references allow the application to learn when an objects is no longer used, so that the application can clean up the object's non-memory resources. Unlike finalizers, however, the object itself has already been collected by the time the application learns this.

Also unlike finalizers, cleanup is scheduled by the application, not the garbage collector. You might dedicate one or more threads to cleanup, perhaps increasing the number if the number of objects demands it. An alternative -- and often simpler -- approach is to use an object factory, and clean up after any collected instances before creating a new one.

The key point to understand about phantom references is that you can't use the reference to access the object: get() always returns null, even if the object is still strongly reachable. This means that the referent can't hold the sole reference to the resources to be cleaned up. Instead, you must maintain at least one other strong reference to those resources, and use a reference queue to signal that the referent has been collected. As with other reference types, your program must also hold a strong reference to the reference object itself, or it will be collected and the resources leaked.

Implementing a connection pool with phantom reference

database connections are one of the most precious resources in any application: they take time to establish, and database servers place strict limits on the number of simultaneous open connections that they'll accept. For all that, programmers are remarkably careless with them, sometimes opening a new connections for every query and either forgetting to close it or not closing it in a finally block.

rather than allow the application to open direct connections to the database, most application server deployments use a connection pool: the pool maintains a (normally fixed) set of open connections, and hands them to the program as needed. Production-quality pools provide several ways to prevent connection leaks, including timeouts( to identify queries that run excessively long) and recovery of connections that are left for the garbage collector.

This latter feature serves as a great example of phantom references. To make it work, the Connection objects that the pool provides are just wrappers around an actual database connection. They can be collected without losing the database connection because the pool maintain its own strong reference to the actual connection. The pool associcates a phantom reference with the "wrapper" connection, and return the actual connection to the pool if and when that reference ends up on a reference queue.

The least interesing part of the pool is the PooledConnection, shown below. As I said, it's a wrapper that delegates calls to the actual connection. One twist is that I used a reflection proxy for implemention. The jdbc interface has evolved with each version of java, in ways that are neither forward nor backward compatible; if I had used a concrete implementation, you wouldn't be able to compile the demo unless you used the same jdk version that i did. The reflection proxy solves this problem, and also makes the code quite a bit shorter.

public class PooledConnection implements InvocationHandler
{
	private ConnectionPool _pool;
	private Connection _cxt;

	public PooledConnection(ConnectionPool pool, Connection cxt)
	{
		_pool = pool;
		_cxt = cxt;
	}

	private Connection getConnection()
	{
		try
		{
			if( (_cxt == null) || _cxt.isClosed() )
				throw new RuntimeException("Connection is closed");
		}catch(SQLException ex)
		{
			throw new RuntimeException("unable to determine if underlying Connection is open", ex);
		}
		return _cxt;
	}

	public static Connection newInstance(ConnectionPool pool, Connection cxt)
	{
		return (Connection)Proxy.newProxyInstance(
				PooledConnection.class.getClassLoader(),
				new Class[] {Connection.class},
				new PooledConnection(pool, cxt)
				);
	}

	@Override
	public Object invoke(Object proxy, Method method, Object[] args)
		throws Throwable
		{
			//if calling close() or isClosed(), invoke our implementation
			//otherwise, invoke the passed method on the delegate
		}

	private void close() throws SQLException
	{
		if( _cxt != null )
		{
			_pool.releaseConnection(_cxt);
			_cxt = null;
		}
	}

	private boolean isClosed() throws SQLException
	{
		return (_cxt ==null) || (_cxt.isClosed());
	}
}

The most important thing to note is that PooledConnection has a reference to both the underlying database connection and the pool. The latter is used for applications that do remember to close the connection: we want to tell the pool right away, so that the underlying connection can be immediately reused.

The getConnection() method also deserves some mention: it exists to catch applications that attempt to use a connection after they've explicity closed it. This could be a very bad thing if the connection has already been handed to another consumer. So close() explicitly clears the reference, and getConnection() checks this and throws if the connection is no longer valid. The invocation handler uses this method for all delegated calls.

So now let's turn our attention to the pool itself, starting with the objects it uses to manage connections.

private Queue<Connection> _pool = new LinkedList<Connection>();

private ReferenceQueue<Object> _refQueue = new ReferenceQueue<Object>();

private IdentityHashMap<Object, Connection> _ref2Cxt = new IdentityHashMap<Object, Connection>();
private IdentityHashMap<Connection, Object> _cxt2Ref = new IdentityHashMap<Connection, Object>();

Available connections are initialized when the pool is constructed and stored in _pool. We use a reference queue, _refQueue, to identify connections have been collected. And finally, we have a bidirectional mapping between connections and references, used when returning connections to the pool.

As I've said before, the actual database connection will be wrapped in a PooledConnection before it is handed to application code. This happens in the wrapConnection() function, which is also where we create the phantom reference and the connection-reference mappings.

private synchronized Connection wrapConnection(Connection cxt)
{
	Connection wrapped = PooledConnection.newInstance(this, cxt);
	PhantomReference<Connection> ref = new PhantomReference<Connection>(wrapped, _refQueue);
	_cxt2Ref.put(cxt, ref);
	_ref2Cxt.put(ref, cxt);
	System.err.println("Acquired connection " + cxt);
	return wrapped;
}

The counterpart of wrapConnection is releaseConnection(), and there are two variants of this function. The first is called by PooledConnection when the application code explicitly closes the connection. This is the "happy path", and it puts the connection back into the pool for later use. It also clears the mappings between connection and reference, as they're no longer needed. Note that this method has default (package) synchronization: it's called by PooledConnection so can't be private, but is not generally accessible.

synchronized void releaseConnection(Connection cxt)
{
	Object ref = _cxt2Ref.remove(cxt);
	_ref2Cxt.remove(ref);
	_pool.offer(cxt);
	System.err.println("released connection " + cxt);
}

The other variant is called using the phantom reference; it's the 'sad path,' followed when the application doesn't remember to close the connection. In this case, all we've got is the phantom reference, and we need to use the mapping to retrieve the actual connection (which is then returned to the pool using the first variant)..

private synchronized void releaseConnection(Reference<?> ref)
{
	Connection cxt = _ref2Cxt.remove(ref);
	if( cxt != null )
		releaseConnection(cxt);
}

There is one edge case: what happens if the reference gets enqueued after the application has called close()? This case is unlikely: when we cleared the mapping, the phantom reference should have become eligible for collection, so it wouldn't be enqueued. However, we have to consider this case, which results in the null check above: if the mapping has already been removed, then the connection has been explicitly returned and we don't need to do anything.

OK, you've seen the low-level code, now it's time for the only method that the application will call;

public Connection getConnection() throws SQLException
{
	while(true)
	{
		synchronized(this)
		{
			if( _pool.size() > 0)
				return wrapConnection(_pool.remove());
		}
		tryWaitingForGarbageCollector();
	}
}

The happy path for getConnection() that there are connections available in _pool. In this case one is removed, wrapped, and return to the caller. The sad path is that there aren't any connections, in which case the caller expects us to block until one becomes available. This can happen two ways: either the application closes a connection and it goes back in _pool, or the garbage collector finds one that's been abandoned, and enqueues its associated phantom reference.

Before following that path, I want to talk about synchronization. Clearly, all access to the internal data structures must be synchronized, because multiple threads may attempt to get or return connections concurrently. As long as there are connections in _pool, the synchronized code executes quickly and the chance of connection is low. However, if we have to loop until connections become available, we want to minimize the amount of time that we're synchronized: we don't want to cause a deadlock between a caller requesting a connection and another caller returning one. Thus the explicit synchronized block while checking for connections.

So, what happens if we call getConnection() and the pool is empty? this is when we examine the reference queue to find an abandoned connection.

private void tryWaitingForGarbageCollector()
{
	try
	{
		Reference<?> ref = _refQueue.remove(100);
		if( ref != null )
			releaseConnection(ref);
	}
	catch(InterruptedException ignored)
	{
		//we have to catch this exception, but it provides no information here
		// a production-quality pool might use it as part of an orderly shutdown
	}
}

The function highlights another set of conflicting goals: we don't want to waste time if there aren't any enqueued reference, but we also don't want to spin in a tight loop in which we repeatedly check _pool and _refQueue. So I use a short timeout when polling the queue; it there's nothing there, we'll give another thread the chance to return a connection. This does, of course, instroduce a fairness problem: while one thread is waiting on the reference queue, another might return a connection that's immediately grabbed by a third. In theory, the waiting thread could be waiting forever. In the real world, with infrequent need for database connections, this situation is unlikely to happen.

The Trouble with Phantom references
serveral pages back, I noted that finalizers are not guaranteed to be called. Neither are phantom reference, and for the same reasons: if the collector doesn't run, unreachable objects aren't collected, and references to those objects won't be enqueued. Consider a program did nothing but call getConnection in a loop and let the returned connections go out of scope. If it did nothing else to make the garbage collector run, then it would quickly exhaust the pool and block, waiting for a connection that will never be recoverd.

There are, of course, ways to resolve this problem. One of simplest is to call system.gc() in tryWaitingForGarbageCollector(). While there is a lot of myth and dogma surrouding this method -- the use of the word "suggest" in its documentation has been grist for amateur language lawyers as long as I can remember -- it's an effective way to nudge the jvm back toward a dsired state. And it's a technique that works for finalizers as well as phantom references.

That doesn't mean that you should ignore phantom references and just use a finalizer. In the case of a connection pool, for example, you might want to explicitly shutdown the pool and close all of the underlying connections. You could do that with finalizers, but would need just as much bookkeeping as with phantom references. In that case, the additional control that you get with references(versus an arbitrary finalization thread) makes them a better choice.

bithe

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Java Reference Objects or How I Learned to Stop Worrying and Love OutOfMemoryError

http://kdgregory.com/index.php?page=java.refobjJava 堆和对象生命周期对于新开始使用Java的C++程序员，Java中的堆和栈的关系很难理解。在C++中，使用new操作符对象将会在堆上创建；或者在栈上自动分配创建。例如下面的语法中，C++会在栈上创建一个Integer 对象，而在Java中会当做语法错误 Integer foo
复制链接

扫一扫