未名企鹅极客 | 流向处理新技术

未名企鹅的系统架构师Lee分享了分布式计算框架Flink如何应用于流向处理。Flink通过分布式计算解决了单机处理大量数据的局限性,实现了快速的数据清洗和实时展示,尤其适用于有界和无界数据流的统一处理。通过与K8s集成,Flink可以动态调整资源,确保高效利用硬件资源,并具备灵活的任务调度能力。
摘要由CSDN通过智能技术生成

技术创新,永远是企业进步和行业发展的内驱力!

在不断的思考和应用的过程中,未名企鹅努力透过科技的力量来助力传统行业的发展。未名企鹅决定开启新的极客栏目,很高兴有机会跟大家分享我们的科技观点。

今天我们邀请到的是未名企鹅的系统架构师Lee,来谈谈一个比较新的技术分布流处理架构Flink在流向处理中是如何应用的。

在这里插入图片描述

分布式计算框架Flink在流向处理中的应用

01
什么是分布式计算

分布式计算框架,与所有分布式系统一样,都为了解决单机的局限性问题,分布式计算框架可以将一个大的计算任务或者说数据的处理任务分发给多个计算机执行,最后再将结果进行汇总得到最终的计算结果。

而在未名企鹅提供的数据流向清洗服务里处理流向数据时,一次最少都是几十万数据,如果时间跨度增大,处理一年的数据,那数据量就可能会是几百万甚至千万级别,如果所有数据都在一台计算机负责计算,当然也可以,但是有可能需要30分钟甚至更长时间,而且还有可能因为单点故障导致计算任务失败。

通过使用分布式计算框架,就可以在5分钟、1分钟甚至近乎实时就完成计算任务,而且即使有个别计算机在计算的过程中出现故障也不会影响整体的计算结果。

当处理数据的速度提升上来后,就能更快的给客户呈现清洗之后的流向数据,在未名企鹅终端通产品里的数据大业务块能够更加实时的显示客户最新的数据情况,也能更快的为客户提供各种报表和数据分析的结果。

Flink,做为最近几年崛起很快的分布式计算框架,自然就成为我们首先考虑的

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Python Machine Learning By Example by Yuxi (Hayden) Liu English | 31 May 2017 | ASIN: B01MT7ATL5 | 254 Pages | AZW3 | 3.86 MB Key Features Learn the fundamentals of machine learning and build your own intelligent applications Master the art of building your own machine learning systems with this example-based practical guide Work with important classification and regression algorithms and other machine learning techniques Book Description Data science and machine learning are some of the top buzzwords in the technical world today. A resurging interest in machine learning is due to the same factors that have made data mining and Bayesian analysis more popular than ever. This book is your entry point to machine learning. This book starts with an introduction to machine learning and the Python language and shows you how to complete the setup. Moving ahead, you will learn all the important concepts such as, exploratory data analysis, data preprocessing, feature extraction, data visualization and clustering, classification, regression and model performance evaluation. With the help of various projects included, you will find it intriguing to acquire the mechanics of several important machine learning algorithms – they are no more obscure as they thought. Also, you will be guided step by step to build your own models from scratch. Toward the end, you will gather a broad picture of the machine learning ecosystem and best practices of applying machine learning techniques. Through this book, you will learn to tackle data-driven problems and implement your solutions with the powerful yet simple language, Python. Interesting and easy-to-follow examples, to name some, news topic classification, spam email detection, online ad click-through prediction, stock prices forecast, will keep you glued till you reach your goal. What you will learn Exploit the power of Python to handle data extraction, manipulation, and exploration techniques Use Python to visualize data spread across multiple dimensions and extract useful features Dive deep into the world of analytics to predict situations correctly Implement machine learning classification and regression algorithms from scratch in Python Be amazed to see the algorithms in action Evaluate the performance of a machine learning model and optimize it Solve interesting real-world problems using machine learning and Python as the journey unfolds About the Author Yuxi (Hayden) Liu is currently a data scientist working on messaging app optimization at a multinational online media corporation in Toronto, Canada. He is focusing on social graph mining, social personalization, user demographics and interests prediction, spam detection, and recommendation systems. He has worked for a few years as a data scientist at several programmatic advertising companies, where he applied his machine learning expertise in ad optimization, click-through rate and conversion rate prediction, and click fraud detection. Yuxi earned his degree from the University of Toronto, and published five IEEE transactions and conference papers during his master's research. He finds it enjoyable to crawl data from websites and derive valuable insights. He is also an investment enthusiast. Table of Contents Getting Started with Python and Machine Learning Exploring the 20 newsgroups data set Spam email detection with Naive Bayes News topic classification with Support Vector Machine Click-through prediction with tree-based algorithms Click-through rate prediction with logistic regression Stock prices prediction with regression algorithms Best practices
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值