The Top 10 Challenges in Extreme-Scale Visual Analytics

Pak Chung Wong,

Pacific Northwest National Laboratory

Han-Wei Shen,

Ohio State University

Christopher R. Johnson,

University of Utah

Chaomei Chen, and

Drexel University

Robert B. Ross

Argonne National Laboratory


在这期CG&A中,研究者们分享他们在应用可视分析(visual analytics (VA))超大规模数据的R&D发现和结果。通过调查领域中文献和其它R&D,我们确定了我们所认为大规模VA最主要的挑战。为了满足杂志不同的阅读者,我们论述评估在该领域所有范围的挑战,包括算法、硬件、软件、工程和社会问题。


在九月/十月 2004 CG&A介绍了计算机科学中的可视分析概念,在2005年,一个国际多学科陪审团一致同意并集体地定义了这个新建立的领域“通过交互的可视界面促进科学的分析推理”。VA的含义和目标从此演化并延伸,覆盖不同科学和非科学的数据类型,形状、大小、领域、以及应用。大规模数据集开始彻底变革我们每天的工作生活,研究者们注意到VA作为大数据问题的解决方案。





Top 10 大挑战



1.原位分析 (又称现场分析(In Situ Analysis))




2. 交互与用户界面(Interaction and User Interfaces)



Top Interaction and UI Challenges in Extreme-Scale Visual Analytics


Human-computer interaction has long been a substantial barrier in many computer

science development areas. Visual analytics (VA) is no exception. Significantly

increasing data size in traditional VA tasks inevitably compounds the existing problems.

We identified 10 major challenges regarding interaction and user interfaces in extremescale

VA. The following is a brief revisit of our study of the topic.1


1. 原位交互分析(In Situ Interactive Analysis


2. 用户驱动数据约简(User-Driven Data Reduction


3. 可伸缩与多等级层次Scalability and Multilevel Hierarchy



4. 表现证据和不确定( Representing Evidence and Uncertainty



5. 多源数据融合( Heterogeneous-Data Fusion



6.交互查询的数据综述与分诊(Data Summarization and Triage for Interactive Query


7. 时间演化特征分析(Analytics of Temporally Evolved Features



8. 人类瓶颈(The Human Bottleneck



9. 设计与工程开发(Design and Engineering Development



10. 传统智慧文艺复兴(The Renaissance of Conventional Wisdom

也许最重大的挑战是在应用与超大规模数据的VA中,发出传统智慧的文艺复兴星火。(例如,Ben Shneiderman的信息可视化原则---overviewzoomfilterdetails on demand。当应用到前述的挑战中变得不足)成功返回传统智慧发现的可靠原则,并发现如何应用它们到大规模数据将最可能促进更多我们所描述问题的解决。



2. Ashby, S., et al. The Opportunities and Challenges of Exascale Computing—Summary Report of

the Advanced Scientific Computing Advisory Committee (ASCAC) Subcommittee. US Dept. of

Energy Office of Science; 2010.





3. 大规模数据可视化(Large-Data Visualization)




4. 数据库与存储(Databases and Storage)

云服务与应用的出现已经深深的影响大规模数据库与存储领域。这个盛行的Apache Hadoop框架支持应用工作在EB级别的数据存储在公共云中。许多在线商家,包括Facebook, Google ,eBay 以及Yahoo,已经开发了基于Hadoop的大规模数据应用。


最后,不是所有的云服务系统支持分布式存储中ACID(atomicity, consistency,isolation, and durability)所需,关于Hadoop,在软件应用层必须解决这些需求。超大规模数据库是硬件和软件问题。


5. 算法(Algorithms)





6.数据管理,数据传输,网络基础设施(Data Movement, Data Transport, and Network Infrastructure)





7.不确定量化( Uncertainty Quantification)







8. 并行(Parallelism)

To cope with the sheer size of data, parallel processing can effectively reduce the turnaround

time for visual computing and hence enable interactive data analytics. Future computer

architectures will likely have significantly more cores per processor. In the meantime, the

amount of memory per core will shrink, and the cost of moving data within the system will

increase. Large-scale parallelism will likely be available even on desktop and laptop


To fully exploit the upcoming pervasive parallelism, many VA algorithms must be

completely redesigned. Not only is a larger degree of parallelism likely, but also new data

models will be needed, given the per-core memory constraint. The distinction between task

and data parallelism will be blurred; a hybrid model will likely prevail. Finally, many

parallel algorithms might need to perform out-of-core if their data footprint overwhelms the

total memory available to all computing cores and they can’t be divided into a series of

smaller computations.





9. (Domain and Development Libraries, Frameworks, and Tools)

The lack of affordable resource libraries, frameworks, and tools hinders the rapid R&D of

HPC-based VA applications. These problems are common in many application areas,

including UIs, databases, and visualization, which all are critical to VA system

development. Even software development basics such as post-C languages or debugging

tools are lacking on most, if not all, HPC platforms. Unsurprisingly, many HPC developers

are still using printf() as a debugging tool.

The lack of resources is especially frustrating for scientific-domain users. Many popular

visualization and analytics software tools for desktop computers are too costly or

unavailable on HPC platforms. Developing customized software is always an option but

remains costly and time-consuming. This is a community-wide challenge that can be

addressed only by the community itself, which brings us to the final challenge.


10. Social, Community, and Government Engagements

Two major communities in the civilian world are investing in R&D for extreme-scale VA.

The first is government, which supports the solution of scientific-discovery problems

through HPC. The second is online-commerce vendors, who are trying to use HPC to tackle

their increasingly difficult online data management problems.

The final challenge is for these two communities to jointly provide leadership to disseminate

their extreme-scale-data technologies to society at large. For example, they could influence

hardware vendors to develop system designs that meet the community’s technical needs—an

issue we mentioned earlier. They must engage the academic community to foster future

development talent in parallel computation, visualization, and analytics technologies. They

must not just develop problem-solving technologies but also provide opportunities for

society to access these technologies through, for example, the Web. These goals require the

active engagement of the technical community, society, and the government.



The previous top-10 list echoes challenges previously described in a 2007 DOE workshop

report.6 Compared to that report, our list concentrates particularly on human cognition and

user interaction issues raised by the VA community. It also pays increased attention to

database issues found in both public clouds and private clusters. In addition, it focuses on

both scientific and nonscientific applications with data sizes reaching exabytes and beyond.

Although we face daunting challenges, we’re not without hope. The arrival of multithreaded

desktop computing is imminent. Significantly more powerful HPC systems that

require less cooling and consume less electricity are on the horizon. Although we accept the

reality that there is no Moore’s law for human cognitive abilities, opportunities exist for

significant progress and improvement in areas such as visualization, algorithms, and

databases. Both industry and government have recognized the urgency and invested

significant R&D in various extreme-scale-data areas. Universities are expanding their

teaching curricula in parallel computation to educate a new generation of college graduates

to embrace the world of exabyte data and beyond.

The technology world is evolving, and new challenges are emerging even as we write. But

we proceed with confidence that many of these challenges can be either solved or alleviated

significantly in the near future.



The article benefited from a discussion with Pat Hanrahan. We thank John Feo, Theresa-Marie Rhyne, and the

anonymous reviewers for their comments. This research has been supported partly by the US Department of Energy

(DOE) Office of Science Advanced Scientific Computing Research under award 59172, program manager Lucy

Nowell; DOE award DOE-SC0005036, Battelle Contract 137365; DOE SciDAC grant DE-FC02-06ER25770; the

DOE SciDAC Visualization and Analytics Center for Enabling Technologies; DOE SciDAC grant DEAC02-

06CH11357; US National Science Foundation grant IIS-1017635; and the Pfizer Corporation. Battelle

Memorial Institute manages the Pacific Northwest National Laboratory for the DOE under contract DEAC06-




Pak Chung Wong is a project manager and chief scientist in the Pacific Northwest National

Laboratory’s Computational and Statistical Analytics Division. Contact him at

Han-Wei Shen is an associate professor in Ohio State University’s Computer Science and

Engineering Department. Contact him at

Christopher R. Johnson is a Distinguished Professor of Computer Science and the director

of the Scientific Computing and Imaging Institute at the University of Utah. Contact him at

Chaomei Chen is an associate professor in Drexel University’s College of Information

Science and Technology. Contact him at

Robert B. Ross is a senior fellow of the Computation Institute at the University of Chicago

and Argonne National Laboratory. Contact him at


