转自:https://spark-summit.org/2015/events/recipes-for-running-spark-streaming-applications-in-production/
Tathagata Das (Databricks)
Spark Streaming extends the core Apache Spark to perform large-scale stream processing. It is being rapidly adopted by companies spread across various business verticals – ad monitoring, real-time analysis of machine data, anomaly detections, etc. This interest is due to its simple, high-level programming model, and its seamless integration with SQL querying (Spark SQL), machine learning algorithms (MLlib), etc. However, for building a real-time streaming analytics pipeline, its not sufficient to be able to easily express your business logic. Running the platform with high uptimes and continuously monitoring it has a lot of operational challenges. Fortunately, Spark Streaming makes all that easy as well. In this talk, I am going to elaborate about various operational aspects of a Spark Streaming application at different stages of deployment – prototyping, testing, monitoring continuous operation, upgrading. In short, all the recipes that takes you from “hello-world” to large scale production in no time.