星型架构和雪花架构的优缺点_来自云数据架构师的雪花式ipo投资者见解

星型架构和雪花架构的优缺点

Snowflake, a big data warehousing and processing system used by both startups and enterprises, is undergoing an initial public offering at $120 a share. There’s a lot of hype about cloud and big data companies. Is it a good purchase?

Snowflake是初创企业和企业使用的大数据仓库和处理系统,目前正在以每股120美元的价格进行首次公开​​发行。 关于云和大数据公司的宣传很多。 这是一个很好的购买吗?

In this article, I provide my insights about Snowflake as a first-hand user. I’m a big data and cloud architect based out of New York City. I have used most of the larger cloud-based big data warehousing systems for many years, and have also consulted on the implementation of these systems with large enterprises. At the end of this article, I will give you my breakdown of how Snowflake stacks up against its competitors and whether I would personally buy shares of Snowflake as a specialist in cloud technologies.

在本文中,我提供了有关Snowflake作为第一手用户的见解。 我是纽约市的大数据和云架构师。 我已经使用了大多数大型的基于云的大数据仓库系统很多年,并且还与大型企业一起就这些系统的实施进行了咨询。 在本文的最后,我将详细介绍Snowflake如何与竞争对手竞争,以及我是否会以云技术专家的身份亲自购买Snowflake的股票。

什么是雪花 (What Is Snowflake)

Image for post

Snowflake is a scalable big data warehousing system. Specifically, this means it can store and process petabytes of data to answer critical questions for your business such as what is my daily revenue, what are my customer’s lifetime value, or what are my most profitable products?

Snowflake是可扩展的大数据仓库系统。 具体来说,这意味着它可以存储和处理PB的数据,以回答有关您业务的关键问题,例如我的每日收入是多少,客户的生命周期价值是多少或我最赚钱的产品是什么?

It’s core features include its ability to separates storage costs from data computing resource costs. Specifically, it means that Snowflake doesn’t charge you for the data you aren’t using (or querying). It also handles unstructured data and allows you to query data using SQL. It’s comparatively easy to use and maintain. Finally, it sits on top of and uses resources from your existing cloud infrastructures on AWS, Google Cloud, or Azure.

它的核心功能包括将存储成本与数据计算资源成本分开的能力。 具体来说,这意味着Snowflake不会针对未使用(或查询)的数据向您收费。 它还处理非结构化数据,并允许您使用SQL查询数据。 它相对易于使用和维护。 最后,它位于AWS,Google Cloud或Azure上的现有云基础架构之上,并使用这些资源。

竞争者 (Competitors)

Google-大查询(Google — Big Query)

Image for post

Google Big Query is a petabyte-scale big data warehouse that requires zero maintenance, no infrastructure setup, and is super easy to use. It’s fantastic. You write a query, and it returns results in seconds. Like Snowflake, storage is charged separately from compute. Although Snowflake’s infrastructure maintenance and setup are minimal, it still requires a bit of manual touch. In Snowflake, users need to choose the right size warehouse to run against the right size data and optimize their queries and resources. Snowflake also requires a decent amount of configurations to get similar performance, the time it takes to return results, as Big Query for petabyte-scale data. For these reasons, Big Query is typically easier to set up and use than Snowflake.

Google Big Query是一个PB级的大数据仓库,需要零维护,无需基础架构设置,并且非常易于使用。 这是梦幻般的。 您编写一个查询,它以秒为单位返回结果。 与Snowflake一样,存储与计算分开收费。 尽管Snowflake的基础结构维护和设置很少,但是仍然需要一些手动操作。 在Snowflake中,用户需要选择适当大小的仓库以针对适当大小的数据运行并优化其查询和资源。 Snowflake还需要相当数量的配置才能获得类似的性能,即返回结果所需的时间,就像对PB级数据的大查询一样。 由于这些原因,Big Query通常比Snowflake更容易设置和使用。

如果这么好,有什么收获? (If It’s So Good, What’s the Catch?)

Big Query charges per query. Suppose you’re a startup with a reasonable volume of data and a small data team; then, the cost might be manageable. However, at enterprises with many data teams with varied skill sets, this can be super expensive. Since users are charged on the amount of data they query, if a junior analyst writes an inefficient SQL query, the company could end up spending a lot of money. It’s nearly impossible to forecast data querying costs ahead of time because businesses typically don’t know what specific questions will need to be asked months in advance. The lack of cost transparency is a significant downside for Big Query. They recently released reserved instances to help control cost; however, it’s not as effective in this regard as fixed infrastructure.

大查询每查询收费。 假设您是一家拥有合理数据量和少量数据团队的初创公司; 那么,成本可能是可控的。 但是,在拥有许多具有不同技能的数据团队的企业中,这可能会非常昂贵。 由于用户是根据查询的数据量付费的,因此,如果初级分析师编写效率低下SQL查询,该公司最终可能会花费大量金钱。 提前预测数据查询成本几乎是不可能的,因为企业通常不知道需要提前几个月提出哪些具体问题。 缺乏成本透明性是Big Query的重大弊端。 他们最近发布了预留实例,以帮助控制成本。 但是,它在这方面不如固定基础结构有效。

With Snowflake, the cost is exceptionally transparent. You know precisely the resources you’re paying for, and you can forecast it comparatively better ahead of time.

有了Snowflake,费用变得格外透明。 您确切地知道要支付的资源,并且可以提前对其进行更好的预测。

亚马逊网络服务-Redshift (Amazon Web Services- Redshift)

Image for post

For the longest time, Snowflake was light years ahead of Amazon. Compared to Redshift, Snowflake required less maintenance, separated storage costs from computing, and could scale instantly.

在最长的时间内,雪花雪花比亚马逊轻了数年。 与Redshift相比,Snowflake需要较少的维护,将存储成本与计算分开,并且可以立即扩展。

Seeing the loss in market shares, Amazon has caught up. Last year, they announced in reinvent that Redshift now scales dynamically for data reads, albeit slower than Snowflake, and has automated all of its maintenance. Also, Redshift Spectrum (release 2017) and Athena (release 2016) allow data to be stored and queried directly from Amazon S3 (Amazon’s storage solution), thus separating compute and storage.

看到市场份额的损失,亚马逊迎头赶上。 去年,他们重新发明了Redshift现在可以动态扩展以进行数据读取,尽管速度比Snowflake慢,并且已使其所有维护自动化。 此外,Redshift Spectrum(2017版)和Athena(2016版)允许直接从Amazon S3(Amazon的存储解决方案)存储和查询数据,从而将计算和存储分开。

Interestingly, for the same server size, Snowflake is more expensive than Redshift. It makes up for this difference in cost by only utilizing servers on-demand, while Redshift is typically used in an always-on fashion. In my opinion, Snowflake is still more straightforward to use and less convoluted than the three product offerings Amazon offers to enable the same capabilities.

有趣的是,对于相同的服务器大小,Snowflake比Redshift贵。 它仅通过按需使用服务器来弥补成本上的这种差异,而Redshift通常以始终在线的方式使用。 我认为,与Amazon提供的实现相同功能的三种产品相比,Snowflake更加易于使用且不易混淆。

Between 2012 until about 2019, the advantage was in Snowflake’s favor. However, Amazon is aggressively playing catchup with its product offerings and capabilities and is now very equally matched.

从2012年到2019年左右,这种优势得到了Snowflake的青睐。 但是,亚马逊正在积极地追赶其产品和功能的发展,现在已经非常平等地匹配了。

Azure-突触 (Azure- Synapse)

Image for post

Microsoft released Azure Synapse, a highly scalable data warehousing solution, in late 2019. Features wise, it’s very similar to Amazon Redshift with a different design. To be honest- I haven’t used or evaluated it.

微软于2019年末发布了高度可扩展的数据仓库解决方案Azure Synapse。明智的做法是,它与Amazon Redshift非常相似,但设计有所不同。 老实说-我还没有使用过或评估过它。

Some might disagree; however, my personal opinion is that Microsoft and Azure are popular with large enterprises that are committed to Microsoft contracts and have lots of legacy Microsoft software. Microsoft makes it extremely hard to use anything but Microsoft, so most major tech companies and startup do not use Microsoft. Microsoft has been ramping up in their cloud offerings, but they are always “catching up” in capabilities and offerings compared to Google and Amazon. Microsoft’s main selling point is its superior enterprise data security and reliability. In this space, they likely outcompete or at least do a better job branding then AWS and Google Cloud.

有些人可能不同意; 但是,我个人认为,Microsoft和Azure在致力于签订Microsoft合同并拥有许多旧版Microsoft软件的大型企业中很受欢迎。 Microsoft使除Microsoft之外的任何东西都变得极为困难,因此大多数大型科技公司和初创公司都不使用Microsoft。 微软一直在增加他们的云产品,但是与Google和Amazon相比,他们总是在“赶上”功能和产品。 微软的主要卖点是其卓越的企业数据安全性和可靠性。 在这个领域,他们可能胜过AWS,或者至少在品牌推广上比AWS和Google Cloud更好。

雪花的缺点 (Snowflake Disadvantages)

Security Compliance

安全合规

When you use Snowflake, some level of data (mainly metadata) is transferred to their system. The Snowflake user interface (web portal), which connects to your instance, is outside of the customer’s cloud infrastructure. Furthermore, the cloud infrastructure Snowflake uses is managed and maintained by Snowflake, not the customer’s enterprise IT team. For many conservative industries (finance, insurance, etc..), this can be a huge problem. Enterprises typically like to control all aspects of their infrastructure security. This contrast to Snowflake’s security procedures makes it challenging to pass enterprise security reviews.

使用Snowflake时,某些级别的数据(主要是元数据)将传输到其系统。 连接到您的实例的Snowflake用户界面(Web门户)在客户的云基础架构之外。 此外,Snowflake使用的云基础架构是由Snowflake(而不是客户的企业IT团队)管理和维护的。 对于许多保守的行业(金融,保险等),这可能是一个巨大的问题。 企业通常喜欢控制其基础结构安全性的各个方面。 与Snowflake的安全性程序形成对比,这使得通过企业安全性审查具有挑战性。

Ease of Adoption

易于采用

If you’re an enterprise on AWS or Google Cloud, it’s much easier to procure Redshift or Big Query than it is to procure a second vendor (Snowflake). This nuance exists because adding a data warehouse in the same cloud umbrella is just an adjustment to an already existing bill vs. having to go through an entire vendor and security selection review. This process can take months and require many internal people to buy-in. This barrier to adoption can make it difficult for Snowflake to enter larger organizations.

如果您是使用AWS或Google Cloud的企业,则购买Redshift或Big Query要比购买第二个供应商(Snowflake)要容易得多。 之所以存在这种细微差别,是因为在同一云保护伞中添加数据仓库只是对现有账单的一种调整,而不是必须经过整个供应商和安全选择审查。 这个过程可能需要几个月的时间,并且需要许多内部人员的支持。 这种采用的障碍可能使Snowflake难以进入更大的组织。

Heavy Competition

激烈的竞争

Major cloud competitors understand their weakness against Snowflake and are investing heavily to upgrade their products. They also have the financial means to do so. In the last year specifically, they have been adding features to offer very comparative capabilities to Snowflake.

主要的云竞争对手知道他们对Snowflake的弱点,并正在大力投资以升级其产品。 他们也有这样做的经济能力。 特别是在去年,他们一直在添加功能以为Snowflake提供非常可比的功能。

雪花优势 (Snowflakes Advantage)

Niche Offering

利基产品

Snowflake provides a great middle-ground between Big Query and Redshift. It’s an easy to use system that allows you to scale your data. Snowflake’s features are still slightly better than Redshift and easier to use. Unlike Big Query, it also allows you to estimate and control costs more efficiently.

Snowflake在Big Query和Redshift之间提供了很好的中间地带。 这是一个易于使用的系统,可让您扩展数据。 Snowflake的功能仍比Redshift更好,并且易于使用。 与Big Query不同,它还允许您更有效地估算和控制成本。

Cloud Agnostic

不可知论

Snowflake is cloud-agnostic. It can live on AWS, Google Cloud, and even Azure. If you change cloud providers or use multi-cloud, you don’t have to change your data warehouse. Many enterprises migrating to the cloud are not quite committed to one provider or want to future proof their infrastructure, which favors Snowflakes cloud flexibility.

雪花与云无关。 它可以在AWS,Google Cloud甚至Azure上运行。 如果您更改云提供商或使用多云,则不必更改数据仓库。 许多迁移到云的企业并不完全致力于一个提供商,也不希望将来对他们的基础架构进行证明,这有利于Snowflakes云的灵活性。

Skillful Sales

熟练的销售

Snowflake has a dedicated sales team. This statement is purely anecdotal, but I’ve seen the pitches from AWS, Google Cloud, and Snowflake sales teams. The major cloud provider’s sales teams are much more passive. Amazon and Google simply have too many products to cover, and thus their calls are more surface-level, often touching on multiple products in their ecosystem. Since all Snowflake offers is data warehousing, they are more aggressive and informative in selling it. Furthermore, Snowflake typically offers much better support than AWS and Google due to the above point.

Snowflake拥有一支敬业的销售团队。 该声明纯属轶事,但我已经看到AWS,Google Cloud和Snowflake销售团队的建议。 主要的云提供商的销售团队更加被动。 亚马逊和谷歌根本没有太多产品可以覆盖,因此他们的呼吁更加表面化,经常涉及其生态系统中的多种产品。 由于Snowflake的所有优惠都是数据仓库,因此他们在销售时更具进取性和信息量。 此外,由于上述几点,Snowflake通常提供比AWS和Google更好的支持。

我会在雪花中购买股票吗? (Would I Buy Shares In Snowflake?)

Would I personally buy shares in Snowflake? Personally, I’m holding off on buying shares for now. Snowflake is up against a lot of competition. It might have a superior sales team focused on data warehousing and a decent product offering with a nice niche, but the major cloud providers have a lot more money, and thus engineers to add features, and a lower barrier of adoption. Furthermore, it’s very likely another competitor or cloud provider creates a next-generation warehousing system, perhaps with predictive machine learning capabilities, that completely changes the game.

我会亲自购买Snowflake的股票吗? 就个人而言,我暂时暂不购买股票。 雪花面临很多竞争。 它可能拥有一支专注于数据仓库的出色销售团队,以及具有不错定位的良好产品,但是主要的云提供商却拥有更多的钱,因此工程师可以增加功能并降低采用的障碍。 此外,很有可能是另一个竞争对手或云提供商创建了下一代仓储系统,该系统可能具有预测性机器学习功能,从而完全改变了游戏规则。

I’m not convinced that the advantages in favor of Snowflake at the current moment make it a definite long term hold. It’s also not nearly as popular as Redshift or Big Query. Out of the 20 companies (Series B startups and enterprise) that I have had the chance to work with, only 3 uses Snowflake. If Snowflake can gain and maintain a larger market share in the enterprise and startup data warehousing space, it has a chance for explosive growth. Unfortunately, this means an increase in sales expenditure that takes away from the R&D budget. To compete effectively, Snowflake will need to create game-changing features, such as the ability to automate repetitive work done by data engineering and analyst, to make it a no brainer option for most companies.

我不相信当前支持雪花的优势使其可以长期使用。 它也没有Redshift或Big Query受欢迎。 在我有机会合作的20家公司(B系列初创企业和企业)中,只有3家使用Snowflake。 如果Snowflake可以在企业和启动数据仓库领域中获得并维持更大的市场份额,那么它就有机会实现爆炸性增长。 不幸的是,这意味着销售支出的增加抵消了研发预算。 为了有效竞争,Snowflake将需要创建改变游戏规则的功能,例如能够自动执行数据工程人员和分析师所做的重复性工作的功能,以使其对大多数公司而言毫无道理。

From a forward-looking perspective, the amount of money raised by Snowflake in their IPO is likely significantly larger than the budget Google/Amazon/Microsoft allocates to the development of their data warehousing capabilities. The new funding along with Snowflake’s advantage in being able to laser focus into one niche and not spread out into over 100 products, unlike all the major cloud providers, may allow it outcompete in the long term. However, the major cloud providers currently also have existing AI capabilities and scientists who they can tap into and synergy with their existing products (storage, machine learning, data governance/catalog,etc..). I am definitely excited to see what the future holds for Snowflake!

从前瞻性角度来看,Snowflake在首次公开募股中筹集的资金可能会大大超过Google / Amazon / Microsoft用于发展其数据仓库功能的预算。 与所有主要的云提供商不同,新的融资以及Snowflake的优势在于能够将精力集中到一个小众市场,而不会分散到100多种产品中,这可能使它在长期内胜出。 但是,目前主要的云提供商还具有现有的AI功能和科学家,他们可以利用它们并与现有产品(存储,机器学习,数据治理/目录等)协同作用。 看到雪花的未来,我一定会感到兴奋!

As a disclaimer, the opinions in this article are solely mine, and no one else’s or the companies I work or have worked for and does not constitute investment advice.

作为免责声明,本文中的观点仅属于我的观点,我,我工作或曾经工作的公司均不构成任何投资建议。

翻译自: https://medium.com/swlh/the-snowflake-ipo-investor-insights-from-a-cloud-data-architect-172dc6da161d

星型架构和雪花架构的优缺点

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值