虚拟化迁移 硬件完全一样_像老板一样迁移到雪花

虚拟化迁移 硬件完全一样

Tell me if this rings a bell: your boss is convinced that Snowflake is the future of data and informs your team that you need to migrate from your data warehouse to this sparkling new solution more than anything you’ve ever needed in your lives. Isolating storage from compute will save your company so much money, and on top of that, your VP can generate fancy new dashboards for your CEO to track.

告诉我这是否振铃:您的老板坚信 Snowflake 数据的未来,并告知您的团队,您需要从数据仓库迁移到这个闪闪发光的新解决方案,这比您一生中需要的更多。 将存储与计算隔离开可以为您的公司节省很多钱,最重要的是,您的副总裁可以为您的CEO生成新的仪表盘,以供他们跟踪。

Snowflake, a cloud data warehousing platform, makes it easy for data teams to store and use data. Unlike traditional storage solutions, Snowflake supports a plethora of data types and business intelligence tools and makes it easy for internal and external teams to collaborate throughout the ETL pipeline. A relational database, Snowflake can also support most structured and unstructured data types.

Snowflake是一个云数据仓库平台,使数据团队可以轻松存储和使用数据。 与传统的存储解决方案不同,Snowflake支持大量的数据类型和商业智能工具,使内部和外部团队可以在整个ETL管道中轻松进行协作。 作为关系数据库,Snowflake还可以支持大多数结构化和非结构化数据类型。

Like your VP, many of my customers are excited at the prospect of migrating to a cloud storage and compute solution like Snowflake but they don’t know where to start. Rightly so: I was able to find several articles about migrating from Redshift to Snowflake, but very little about making the polar plunge from other solutions.

像您的副总裁一样,我的许多客户对迁移到Snowflake等云存储和计算解决方案的前景感到兴奋,但他们不知道从哪里开始。 这并不奇怪:我能找到一些 文章有关从红移迁移到雪花,但甚少使得从其他解决方案极地跳水。

Image for post
Like a snowflake, no two data stacks are alike, each with their own assets, complexities, and requirements . Snowflake makes it easy to manage and collaborate across a wide array of databases and data types. (Image courtesy of Aaron Burden on Unsplash.)
就像雪花一样,没有两个数据栈是相同的,每个数据栈都有自己的资产,复杂性和 需求 Snowflake使跨各种数据库和数据类型的管理和协作变得容易。 (图片由 Aaron Burden 在Unsplash上​​提供。)

After talking to several migrators in the field, I broke down some lesser-discussed considerations for teams moving to Snowflake, regardless of where you’re starting from:

在与该领域的一些移民进行了交谈之后,无论您从何处开始,我都为迁移到Snowflake的团队分解了一些讨论较少的注意事项:

1.告别分区和索引。 (1. Say goodbye to partitions and indexes.)

Unlike other data warehouses, Snowflake does not support partitions or indexes. Instead, Snowflake automatically divides large tables into micro-partitions, which are used to calculate statistics about the value ranges each column contains. These insights then determine which parts of your data set you actually need to run your query.

与其他数据仓库不同, Snowflake不支持分区或索引 。 相反,Snowflake自动将大型表划分为微分区 ,这些微分区用于计算有关每一列包含的值范围的统计信息。 这些洞察力然后确定运行查询实际需要数据集的哪些部分。

For most practitioners, this paradigm shift from indexes to micro-partitions really shouldn’t be an issue (in fact, many people choose to migrate to Snowflake because this approach reduces query latency). Still, if you have partitions and indexes in your current ecosystem and are migrating to “clustering” models, you need a sound approach. A few tips for a safe migration:

对于大多数从业者来说,从索引到微分区的这种范式转换实际上不应该成为问题(事实上,许多人选择迁移到Snowflake,因为这种方法减少了查询延迟)。 但是,如果您当前的生态系统中有分区和索引,并且要迁移到“集群”模型,则需要一种可靠的方法。 安全迁移的一些技巧:

  1. Document current data schema and lineage. This will be important for when you have to cross-reference your old data ecosystem with your new one.

    记录当前数据模式和沿袭。 这对于必须将旧数据生态系统与新数据生态系统进行交叉引用时非常重要。

  2. Analyze your current schema and lineage. Next, determine if this structure and its corresponding upstream sources and downstream consumers make sense for how you’ll be using the data once it’s migrated to Snowflake.

    分析您当前的架构和血统。 接下来,确定此结构及其相应的上游源和下游使用者是否对将数据迁移到Snowflake后如何使用它们有意义。

  3. Select appropriate cluster keys. This will ensure the best query performance for your team’s access patterns.

    选择适当的群集密钥。 这样可以确保团队访问模式的最佳查询性能。

Bidding adieu to partitions and indexes is nothing to lose sleep over as long as you have visibility into your data.

只要对数据具有可见性,将分区与索引竞标就不会失去任何睡眠。

2.期望(并接受)语法问题。 (2. Expect (and embrace) syntax issues.)

Several data teams I spoke with repeatedly called out syntax issues as an inevitable component of any cloud warehouse migration, and a migration to Snowflake is no exception.

与我交谈过的几个数据团队反复提到语法问题,这是任何云仓库迁移的必然组成部分,向Snowflake的迁移也不例外。

One data analyst specifically called out the difficulty converting SSIS packages for handling ETL from her SQL Server to Snowflake, which admits that SSIS packages are not easily integrated with their solution. Such errors were not only frustrating, but substantially slowed down her migration, leading to unforeseen costs and resource constraints.

一位数据分析师特别指出了将用于处理ETL的SSIS程序包从其SQL Server转换到Snowflake的困难,这承认 SSIS程序包不容易与其解决方案集成。 这些错误不仅令人沮丧,而且极大地减慢了她的迁移速度,从而导致不可预见的成本和资源限制。

While modeling solutions like DBT help with validating data sets, the formatting of functions like hashing, time stamps, and dates are often inconsistent between old and new versions of the data.

尽管诸如DBT之类的建模解决方案有助于验证数据集,但新旧版本的数据之间诸如哈希,时间戳和日期之类的功能格式通常不一致

Additionally, Snowflake is case sensitive, so it’s important that you check for comparison issues in queries. As a result of these issues, some companies can expect to inspect and refactor ALL lines of SQL being migrated.

此外,Snowflake区分大小写,因此检查查询中的比较问题很重要。 由于这些问题,某些公司可以期望检查和重构要迁移的所有SQL行

Syntax errors become a bigger pain point for companies in traditional industries, such as financial services or healthcare (ICD10 codes, I’m looking at you) that have long relied on legacy solutions and manual, error-prone data input. Simply moving to the cloud won’t fix these issues. As one data analyst at a public sector consulting firm told me: “Even if you hire amazing people and put the best data dictionary in front of them, they probably can’t tell you what it all means.”

对于传统行业中的公司而言,语法错误成为一个更大的痛点,例如金融服务或医疗保健( ICD10代码 ,我正在为您服务),这些公司长期以来都依赖于传统解决方案和容易出错的手动数据输入。 仅仅迁移到云将无法解决这些问题。 正如一家公共部门咨询公司的一位数据分析师告诉我的那样: “即使您雇用了出色的人才并在他们面前放上了最好的数据字典,他们也可能无法告诉您这一切意味着什么。”

The sooner you accept syntax errors as a part of the process, the easier it is to identify trends and patterns in these inconsistencies that can expedite their resolution.

您越早接受语法错误作为该过程的一部分,越容易识别出这些不一致之处中的趋势和模式,以加快其解决速度。

3.始终且经常监视您的数据。 (3. Monitor your data, always and often.)

Similar to syntax errors, data issues can cause even the smoothest Snowflake migrations to fail, generating false or misleading analysis once you hook up your business intelligence tools. Oftentimes these will result in silent errors that will go unnoticed until a consumer downstream catches an issue in a report or dashboard. If you’re lucky, it’s an internal user — and if you’re not, it might just be that new important customer you onboarded only last week and are trying to impress.

与语法错误类似,数据问题甚至可能导致最平滑的Snowflake迁移失败,一旦连接了商务智能工具,就会生成错误或误导性的分析。 通常,这些错误会导致无提示的错误,直到下游的使用者在报告或仪表板中发现问题之前,这些错误才会被注意到。 如果您很幸运,那么它是内部用户,否则,可能只是您上周才加入并试图打动您的那个重要新客户。

Another analyst we spoke with at a digital marketing consultancy noted that it can be hard to ensure comprehensive data definition between your old and new data warehouses. After a few data errors popped up in her company’s new Snowflake warehouse, she decided to test the reliability of their data by evolving two parallel data analytics layers, one her legacy warehouse and one via Snowflake. Using Looker to generate metrics for both stacks, they quickly determined that there were, in fact, inconsistencies between the two warehouses, with each set of metrics presenting different data volumes.

我们在数字营销咨询公司采访的另一位分析师指出,要确保新旧数据仓库之间的全面数据定义可能很难。 在公司新的Snowflake仓库中出现一些数据错误后,她决定通过扩展两个并行数据分析层(一个是她的旧仓库,另一个是通过Snowflake)来测试其数据的可靠性。 他们使用Looker为两个堆栈生成度量标准,他们很快确定了两个仓库之间实际上存在不一致,并且每组度量标准都表示不同的数据量。

When upgrading your data warehouse, make sure you’re also upgrading the way your team operates, from small things like syntax concurrency all the way to data quality and reliability.

在升级数据仓库时,请确保您也在升级团队的运营方式,从语法并发之类的小事情一直到数据质量和可靠性。

You’ve invested so much in this migration (rightfully so!), it’d be silly to let it all go to waste if the data itself can’t be trusted.

您在此迁移上投入了很多资金(理所当然的是!),如果数据本身不能被信任,让所有这些浪费掉是很愚蠢的。

掌握您的迁移 (Master your migration)

If you move on from indexes and partitions, expect syntax issues, and prioritize data quality, you’ll achieve a more seamless Snowflake migration, facilitating easier collaboration and delivering true business value for your organization.

如果您从索引和分区继续前进,期待语法问题,并优先考虑数据质量,则将实现更加无缝的Snowflake迁移,从而简化协作并为组织带来真正的业务价值。

Moving to Snowflake means more flexibility and scalability for your team, as well as quicker, more reliable insights for your customers — and if you do it right, it can be a force multiplier for your entire organization, too.

迁移到Snowflake意味着为您的团队提供更多的灵活性和可伸缩性,以及为您的客户提供更快,更可靠的见解-如果操作正确,它也可能成为整个组织的力量倍增器。

Don’t worry: you’ll also impress your boss. I guarantee it.

别担心:您也会打动老板。 我保证

If you want to learn more, reach out to Barr Moses.

如果您想了解更多,请联系 Barr Moses

This article was co-written by Barr Moses, CEO of Monte Carlo, & Lior Gavish, CTO of Monte Carlo.

本文由 蒙特卡洛首席执行官 Barr Moses 和蒙特卡洛 首席技术官 Lior Gavish 共同撰写

翻译自: https://towardsdatascience.com/migrating-to-snowflake-like-a-boss-6163293f0bcb

虚拟化迁移 硬件完全一样

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值