Paging in J2EE: Manage Large Result Sets Efficiently

Pagination is the simplest and most common way to break up large amounts of data into more manageable chunks. Pagination is a key part of Web site design, both from the UI perspective (to manage limited screen real estate) and from the server perspective (to process large result sets efficiently, without causing resource spikes or servicing delays). In J2EE, efficient pagination (and good UI design) are essential for handling large query result sets in a resource-pooled environment.



Specifically, the following conditions make pagination necessary:
  • Screen real estate is limited. Your JSP/client UI may have insufficient space to display the entire results at once and your user may not be able to find what he or she wants quickly in a very large data set.
  • Your resources are pooled or constrained. Result sets are too large to be managed reasonably within your connection and memory limitations.

Paging Strategies That Work
Basic querying and paging strategies can start out simple but quickly lead to disaster if you fail to consider query evolution and data growth. Open-ended or unbounded queries that perform well over small test data sets may degrade sharply against real data. If you don't manage your precious DBMS and memory resources, your problems will only compound under load.

Efficient data retrieval and paging is a trade-off between memory usage and response-time requirements. Broadly speaking, most roll-your-own J2EE paging mechanisms fall into two basic categories:

  • Cache based: Results are cached for fast access in subsequent pages.
  • Query based: Results are fetched from the DBMS on demand as the user pages.

Cache-based approaches cache the query results over and above what the database provides, in an intermediate tier (on the HTTP session, in the server using a Stateful Session Bean or in a home-grown cache). Subsequent page requests hit the cache rather than the database. Such cache-based mechanisms are memory intensive and work well for low-volumes, recurring queries, and small to medium-sized result sets. The cache-hit rates are high and the I/O overheads are low, yielding good response times.

Query-based approaches don't use caches. They deliver pages on demand, direct from the DBMS to the client. They service additional pages by issuing more queries to the DBMS rather than pulling data from a cache. These approaches discard or skip unwanted rows from previous or future pages using low-level JDBC functions and context information from the last fetched page. Query-centric paging is memory efficient but requires additional round trips to the database, leading to slightly slower response times overall.

More sophisticated mechanisms adopt a hybrid approach. They constrain cache sizes by caching a fixed number of pages ahead, maintaining bounded cache windows, and using efficient query mechanisms and offline threads to backfill the results transparently to the client.

Caching vs. Query Trade-offs
Cache-based approaches are fast, but they have a few pitfalls. For example, unbounded or open-ended queries for large result sets can lead to uncontrolled memory growth as the results are serialised and cached. Large result sets take a long time to service and take up valuable connection and session resources. Whenever a connection is consumed for long periods of time, the number of connections available in your J2EE shared connection pool is effectively reduced. To cope with high load, your system should aim to service your requests as quickly as possible. If your requests take a long time to process, your overall throughput will drop. You don't want to occupy your resources with long-lived queries, because you won't have any connections and threads left to do anything else.

Long-lived queries can also lead to connection timeouts if your query takes more time to execute than your preset JDBC connection time limit allows. Avoid this in cache-based strategies by narrowing or constraining your search criteria so that your DBMS queries (and results) are bounded.

Caching is most beneficial when the cached data is hit repeatedly. However, if your user searches are consistently yielding distinct results, your caches can't be reused across requests. Worse, if the user rarely pages beyond the first page, your caching within the session is for naught. You should carefully manage caches so that they don't grow too large and they expire promptly when they're no longer required.

Hybrid or query-based approaches are more reliable than cache-based alternatives. They have slightly slower response times but can scale transparently for large result sets and high request volumes. An unbounded query will take a long time to execute but won't exceed your memory constraints or take too long to execute. The DBMS connection is relinquished once the required data (usually the first one or two pages) has been fetched. This means that the connection is held for the bare minimum before being returned to the connection pool. The only downside is that additional context information must be passed back and forth on your page requests.

Table 1 lists the issues you need to consider when deciding on a paging approach.

Conditions for PaginationCache-based ApproachHybrid/Query-based Approach
Is your result data bounded or unbounded?Applicable for fixed or bounded result sizes; search criteria is fixed or constrainedApplicable for unbounded results sets (such as searches with loose criteria) since the implementation inherently limits result size
How big are your result sets?Good for small to medium-sizedGood for medium-sized to large
What are your response-time requirements?When response time is criticalWhen response time is less critical
How often is the query issued? Do you fetch the same data repeatedly, or is your result data different for each request?Applicable for when the same data is requested repeatedlyApplicable for when the search criteria and results vary from request to request
How volatile is your (static vs. dynamic)? What are your requirements?Unchanging or static data can be serviced reliably from cachesIf data is dynamic, query every time to return the most up-to-date data
Table 1. Cache-based vs. Hybrid/Query-based Paging Strategies

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
在一个将页面表存储在内存中的页面系统中,操作系统通过使用页表来管理进程的内存访问。页面系统是一种虚拟存储管理技术,通过使用页表,将进程的虚拟地址映射到实际的物理地址。 页面表存储在内存中,通常会被划分为多个固定大小的页,以方便管理。每个页表项记录了虚拟页号与物理页框号的映射关系。当进程访问某个虚拟地址时,操作系统会首先检查页表,根据虚拟地址找到对应的物理页框号,然后将其与页内偏移量结合,计算出实际的物理地址。 如果虚拟页号在页表中没有对应的物理页框号,那么意味着对应的页面需要从辅存中加载到内存。操作系统会根据页替换算法来选择要被替换的页面,并将其从内存中换出,然后将新的页面加载到内存中,并更新页表中的映射关系。 使用页面系统能够有效地管理进程的内存访问,提高了内存的利用率。由于虚拟内存空间可以超过实际物理内存大小,因此允许更大的进程运行,提高了系统的处理能力。而且,使用页面系统可以实现对内存的保护和隔离,不同进程之间彼此独立,不会相互干扰。 然而,页面系统也存在一些问题。由于页面的加载和页面替换造成了额外的开销,会影响系统的性能。此外,如果页面表太大,无法完全放在内存中,则可能需要进行多级页表的设计,加大了对内存的访问次数,进一步降低了系统性能。 总之,页面系统是一种基于页表的虚拟存储管理技术,在操作系统中起到了重要的作用,提供了更大的内存空间和进程隔离,并在一定程度上提高了系统的处理能力。但是,需要权衡管理的开销与系统性能之间的关系。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值