A Technical Overview of Cloudera Altus Analytic DB

最新推荐文章于 2023-07-16 09:40:28 发布

lliushanmei

最新推荐文章于 2023-07-16 09:40:28 发布

阅读量303

点赞数

Big Data 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

几个星期之前我们宣布了基于云的数据仓库Cloudera Altus Analytic DB的beta版。就如我们承诺的，beta现已可用并且我们想在此介绍其独特的架构。

Architecture of Cloudera Altus Analytic DB

Cloudera Altus Analytic DB架构

Altus Analytic DB 构建于 Cloudera Altus PaaS 基础之上，同时支持Altus Data Engineering service 。Cloudera Altus 的架构基于一些简单但重要的前提 -- 使用云的客户希望能控制他们的数据并保证其安全，同时让数据上的分析服务尽可能简单易用。因此， Cloudera 为这些数据带来了基于Apache Impala的数据仓库Altus Analytic DB

A Single Shared Repository of Data in Open File Formats

基于开放式文件格式的单一共享仓库

云对象存储，例如Amazon S3 and Azure ADLS，这种在单一地址以廉价且可扩展的方式搜集和存储大量数据已经变的越来越流行。Altus Analytic DB采用模块化设计，并且能够利用已有的对象存储，其可以直接操纵Amazon S3中的数据来支持大量的分析用例。包括对新获取的和尚未建模的数据进行探索性分析，同时覆盖数据集市和数据仓库等已经为分析进行了文件格式优化的用例，例如Apache Parquet。

区别于传统分析数据库和许多其他基于云的数据仓库服务需要第一步将数据拷贝到数据库中，Altus Analytic DB 不需要拷贝数据。用户可以立即对数据进行完全操作。将数据存于客户账户和采用开放式文件格式的另一个好处就是避免了锁死并且数据对于客户想要使用（正在使用）的其他应用，处理引擎，服务来说是持续可访问的。

Multiple Clusters Over Shared Data

共享数据之上的多个集群

Altus Analytic DB提升了云提供分离且可扩展存储和计算资源的能力。这意味着机构现在可以不依赖数据大小进行扩展。事实上，通过Altus Analytic DB

原文

https://blog.cloudera.com/blog/tag/microsoft-azure/

February 8, 2018

By Greg Rahn

No Comments

Categories: Altus Analytic Database Cloud

A few weeks back, we announced the upcoming beta of Cloudera Altus Analytic DB for cloud-based data warehousing. As promised, the beta is now available and we wanted to spend some time describing the unique architecture.

Architecture of Cloudera Altus Analytic DB

Altus Analytic DB is built on the Cloudera Altus platform-as-a-service foundation, which also supports theAltus Data Engineering service. The architecture of Cloudera Altus is based around a few simple but important premises — customers operating in the cloud want to have control of their data and keep it secure, all while making it easy to run analytic services on that data. It’s for this reason that Cloudera is able to bring the data warehouse to the data with Altus Analytic DB powered by Apache Impala.

A Single Shared Repository of Data in Open File Formats

Cloud object storage, such as Amazon S3 and Azure ADLS, is becoming an increasingly popular way to collect and store large amounts of data in a single location that is both scalable and cost-effective. Altus Analytic DB is modular by design and takes advantage of existing object storage. It operates directly on data in Amazon S3 to support a number of analytic use cases, including exploratory analytics over newly acquired or yet-to-be modeled data as well as data mart or data warehouse use cases on file formats optimized for analytics such as Apache Parquet.

Unlike legacy analytic databases and many other cloud-based data warehouse services where the first step involves copying data into the database, there is no need to copy data into Altus Analytic DB. Users can instead immediately begin operating on the full breadth of data in the object store. Another advantage of storing data in the customer’s account and using open file formats is that it avoids lock-in and keeps data accessible to other applications, processing engines, and services that customers may want to use (or already be using) to operate on their data.

Multiple Clusters Over Shared Data

Altus Analytic DB also leverages the cloud’s ability to provide separate but scalable storage and compute resources. This means organizations can now scale compute resources independently of data size. In fact, with Altus Analytic DB, organizations can easily provision multiple compute clusters over the shared data to enable the isolation of key analytical SQL workloads as well as providing infinite resource scalability. With Altus Analytic DB, customers have the ability to choose from a list of optimized instance types as well as the number of nodes, allowing them to pick the configuration that best meets the needs of their specific workloads.

Through the Altus console, administrators are able to get a holistic view across all these clusters, as well as terminate them on-demand to control resource costs. For those also using Altus Data Engineering, this same console provides visibility and monitoring of data processing jobs. This makes it easy to support a common use case of running ETL jobs to prepare data for analytic reporting, leveraging both Altus Data Engineering and Altus Analytic DB.

Figure 1: Cloudera Altus Analytic DB Architecture in AWS

Data Security

The Altus Analytic DB service deploys nodes running Apache Impala into an Amazon Virtual Private Cloud (VPC) running in the customer’s own AWS account, giving the customer full control over their data and computing resources. Deploying into VPCs allow customers to isolate the network that is used by their Altus deployments from the rest of the networks in their AWS account, control and limit access to it via Security Groups, and change or revoke permissions at any time. Data never has to leave the customer’s cloud infrastructure.

Access to data in S3 can be controlled using AWS Identity and Access Management (IAM) and instance profiles. This means that the services running on the data never need to be provided long-lived text-based credentials, so administrators need not worry about S3 credentials being leaked.

Easy Provisioning

Cloudera Altus makes it easy to provision a cluster. In just four steps and a matter of minutes, one can have an Altus Analytic DB up and running.

Name the cluster, select the software version, and environment (eg. dev, stage, production)
Choose the instance type and number of nodes
Enter the security credentials
Click “Create Cluster”

Once the cluster is up and running, users can connect to it via JDBC or ODBC.

Figure 2: Altus Analytic DB Provisioning Page

Benefits for Both Knowledge Workers and IT

This architecture provides a number of different benefits for both knowledge workers (data analysts, data, engineers, data scientists, etc.) as well as IT professionals.

For knowledge workers, it means:

Access to all of the data quickly and easily, including the raw data not typically loaded into data warehouse
Teams can provision their own clusters with just a few clicks and work on the datasets they need, without impacting critical production reporting
Direct data access outside of Altus Analytic DB, so non-SQL analysis and access from other applications can be done

For the IT organization, it means:

Data teams get resources quickly and securely on-demand (and for however long needed) without any upfront sizing or planning
All data can reside in a single shared repository, thus eliminating the need for data movement and data silos
IT can empower knowledge workers through a self-service workflow and speed up the time to delivery for the business

Conclusion

We’re excited about the advancements that Altus Analytic DB provides to the world of cloud-based data warehousing, bringing the warehouse to the data. If you’re interested in learning more or trying out the beta, visit https://www.cloudera.com/products/altus/altus-analytic-db.html to sign up for the waitlist.

lliushanmei

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
A Technical Overview of Cloudera Altus Analytic DB

几个星期之前我们宣布了基于云的数据仓库Cloudera Altus Analytic DB的beta版。就如我们承诺的，beta现已可用并且我们想在此介绍其独特的架构。Architecture of Cloudera Altus Analytic DB Altus Analytic DB 构建于 Cloudera Altus PaaS 基础之上，同时支持Altus Data Engineering...
复制链接

扫一扫

专栏目录