Abstract
- the story of Vectorwise,
- a high-performance analytical database system
- its history from academic to commercial
- the evolution of its technical architecture
- customer reactions to the product
- its future research and development roadmap.
- novelty in Vectorwise
- much more than column-storage:
- many query processing innovations in its vectorized execution model,
- an adaptive mixed row/column data storage model
- with indexing support tailored
- to analytical workloads.
- there is a long road from research prototype to commercial product,
- though database research
- continues to achieve a strong innovative influence on product development.
1 Introduction
- Vectorwise back to 2003
- researchers from CWI in Amsterdam,
- MonetDB project [5],
- invented a new query processing model.
- This vectorized query processing approach
- the foundation of the X100 project [6].
- the project served as a platform
- for further improvements in query processing [23, 26] and storage [24, 25].
也就是说这个X100 project以后几年都成为了平台了!没啥实用!
- Initial results of the project
- impressive performance improvements both in decision support[6]
- information retrieval [7].
- commercial potential of X100
- CWI spun-out this project
- founded Vectorwise BV as company in 2008.
- Vectorwise BV decided to
- combine the X100 processing
- and
- storage components with the mature higher-layer database components and APIs of the Ingres DBMS;
- a product of Actian Corp.
- After two years of cooperation
- and delivery of the first versions of the integrated product aimed at the analytical database market,
- Vectorwise was acquired and became a part of Actian Corp.
2 Vectorwise Architecture
- The upper layers of the Vectorwise architecture consist of Ingres,
- database administration tools,
- connectivity APIs,
- SQL parsing
- cost-based query optimizer based on histogram statistics [13].
也就是Vectorwise的上层都是复用了Ingres吗?
- lower layers come from the X100 project,
- delivering cutting-edge query execution
- data storage [21], outlined in Figure 1.
- details of combining these two platforms are described in [11].
- how the most important feature of Vectorwise,
- dazzling query execution speed,
- was preserved and improved
- from its inception in an academic
- into
- a full-fledged database product.
Data Storage.
- Vectorwise provides
- great performance for
- memory-resident data sets,
- when deployed on a high-bandwidth IO subsystem (locally attached),
- also allows efficient analysis of much larger datasets,
- often allowing processing of disk-resident data
- with performance close to that of buffered data.
- To achieve that, a number of techniques are applied.
第二段
- Vectorwise stores data using a generalized row/column storage based on PAX [2].
- A table is stored in multiple PAX partitions,
- each of which contains a group of columns.
This allows providing both “DSM/PAX”
(with each column in a separate PAX group) and “NSM/PAX” (with all columns in one PAX group), as well
as all options in between. We argue here from the IO perspective: disk blocks containing data from only one
column we call DSM/PAX, and containing all columns we call NSM/PAX (this is called PAX in [2]).