A Technical Overview of Cloudera Altus Analytic DB

几个星期之前我们宣布了基于云的数据仓库Cloudera Altus Analytic DB的beta版。就如我们承诺的,beta现已可用并且我们想在此介绍其独特的架构。

Architecture of Cloudera Altus Analytic DB

Cloudera Altus Analytic DB架构


Altus Analytic DB 构建于 Cloudera Altus PaaS 基础之上,同时支持Altus Data Engineering serviceCloudera Altus 的架构基于一些简单但重要的前提 -- 使用云的客户希望能控制他们的数据并保证其安全, 同时让数据上的分析服务尽可能简单易用。因此, Cloudera 为这些数据带来了基于Apache Impala的数据仓库Altus Analytic DB

A Single Shared Repository of Data in Open File Formats

基于开放式文件格式的单一共享仓库

云对象存储,例如Amazon S3 and Azure ADLS,这种在单一地址以廉价且可扩展的方式搜集和存储大量数据已经变的越来越流行。Altus Analytic DB采用模块化设计,并且能够利用已有的对象存储,其可以直接操纵Amazon S3中的数据来支持大量的分析用例。包括对新获取的和尚未建模的数据进行探索性分析,同时覆盖数据集市和数据仓库等已经为分析进行了文件格式优化的用例, 例如Apache Parquet。


区别于传统分析数据库和许多其他基于云的数据仓库服务需要第一步将数据拷贝到数据库中,Altus Analytic DB  不需要拷贝数据。用户可以立即对数据进行完全操作。将数据存于客户账户和采用开放式文件格式的另一个好处就是避免了锁死并且数据对于客户想要使用(正在使用)的其他应用,处理引擎,服务来说是持续可访问的。

Multiple Clusters Over Shared Data
共享数据之上的多个集群

Altus Analytic DB提升了云提供分离且可扩展存储和计算资源的能力。这意味着机构现在可以不依赖数据大小进行扩展。事实上,通过Altus Analytic DB

原文

https://blog.cloudera.com/blog/tag/microsoft-azure/

Categories:  Altus   Analytic Database   Cloud

A few weeks back, we announced the upcoming beta of Cloudera Altus Analytic DB for cloud-based data warehousing. As promised, the beta is now available and we wanted to spend some time describing the unique architecture.

Architecture of Cloudera Altus Analytic DB

Altus Analytic DB is built on the Cloudera Altus platform-as-a-service foundation, which also supports theAltus Data Engineering service. The architecture of Cloudera Altus is based around a few simple but important premises — customers operating in the cloud want to have control of their data and keep it secure, all while making it easy to run analytic services on that data. It’s for this reason that Cloudera is able to bring the data warehouse to the data with Altus Analytic DB powered by Apache Impala.

A Single Shared Repository of Data in Open File Formats

Cloud object storage, such as Amazon S3 and Azure ADLS, is becoming an increasingly popular way to collect and store large amounts of data in a single location that is both scalable and cost-effective. Altus Analytic DB is modular by design and takes advantage of existing object storage. It operates directly on data in Amazon S3 to support a number of analytic use cases, including exploratory analytics over newly acquired or yet-to-be modeled data as well as data mart or data warehouse use cases on file formats optimized for analytics such as Apache Parquet.

Unlike legacy analytic databases and many other cloud-based data warehouse services where the first step involves copying data into the database, there is no need to copy data into Altus Analytic DB. Users can instead immediately begin operating on the full breadth of data in the object store. Another advantage of storing data in the customer’s account and using open file formats is that it avoids lock-in and keeps data accessible to other applications, processing engines, and services that customers may want to use (or already be using) to operate on their data.

Multiple Clusters Over Shared Data

Altus Analytic DB also leverages the cloud’s ability to provide separate but scalable storage and compute resources. This means organizations can now scale compute resources independently of data size. In fact, with Altus Analytic DB, organizations can easily provision multiple compute clusters over the shared data to enable the isolation of key analytical SQL workloads as well as providing infinite resource scalability. With Altus Analytic DB, customers have the ability to choose from a list of optimized instance types as well as the number of nodes, allowing them to pick the configuration that best meets the needs of their specific workloads.

Through the Altus console, administrators are able to get a holistic view across all these clusters, as well as terminate them on-demand to control resource costs. For those also using Altus Data Engineering, this same console provides visibility and monitoring of data processing jobs. This makes it easy to support a common use case of running ETL jobs to prepare data for analytic reporting, leveraging both Altus Data Engineering and Altus Analytic DB.

Figure 1: Cloudera Altus Analytic DB Architecture in AWS

Figure 1: Cloudera Altus Analytic DB Architecture in AWS

Data Security

The Altus Analytic DB service deploys nodes running Apache Impala into an Amazon Virtual Private Cloud (VPC) running in the customer’s own AWS account, giving the customer full control over their data and computing resources. Deploying into VPCs allow customers to isolate the network that is used by their Altus deployments from the rest of the networks in their AWS account, control and limit access to it via Security Groups, and change or revoke permissions at any time. Data never has to leave the customer’s cloud infrastructure.

Access to data in S3 can be controlled using AWS Identity and Access Management (IAM) and instance profiles. This means that the services running on the data never need to be provided long-lived text-based credentials, so administrators need not worry about S3 credentials being leaked.

Easy Provisioning

Cloudera Altus makes it easy to provision a cluster. In just four steps and a matter of minutes, one can have an Altus Analytic DB up and running.

  1. Name the cluster, select the software version, and environment (eg. dev, stage, production)
  2. Choose the instance type and number of nodes
  3. Enter the security credentials
  4. Click “Create Cluster”

Once the cluster is up and running, users can connect to it via JDBC or ODBC.

Figure 2: Altus Analytic DB Provisioning Page

Figure 2: Altus Analytic DB Provisioning Page

Benefits for Both Knowledge Workers and IT

This architecture provides a number of different benefits for both knowledge workers (data analysts, data, engineers, data scientists, etc.) as well as IT professionals.

For knowledge workers, it means:

  • Access to all of the data quickly and easily, including the raw data not typically loaded into data warehouse
  • Teams can provision their own clusters with just a few clicks and work on the datasets they need, without impacting critical production reporting
  • Direct data access outside of Altus Analytic DB, so non-SQL analysis and access from other applications can be done

For the IT organization, it means:

  • Data teams get resources quickly and securely on-demand (and for however long needed) without any upfront sizing or planning
  • All data can reside in a single shared repository, thus eliminating the need for data movement and data silos
  • IT can empower knowledge workers through a self-service workflow and speed up the time to delivery for the business

Conclusion

We’re excited about the advancements that Altus Analytic DB provides to the world of cloud-based data warehousing, bringing the warehouse to the data. If you’re interested in learning more or trying out the beta, visit https://www.cloudera.com/products/altus/altus-analytic-db.html to sign up for the waitlist.


 


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值