[Spark版本更新]--Spark-2.0.2

Apache Spark 2.0.2发布包括Sub-task、Bug修复、依赖升级、文档改进和新特性等多个方面。重点修复了Structured Streaming、Kafka源、SQL查询和执行性能等方面的问题,并引入了更细粒度的控制功能。此版本增强了Spark的稳定性和功能。
摘要由CSDN通过智能技术生成

原文链接:https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420&version=12338301

Sub-task

  • [SPARK-16963] - Change Source API so that sources do not need to keep unbounded state
  • [SPARK-17346] - Kafka 0.10 support in Structured Streaming
  • [SPARK-17731] - Metrics for Structured Streaming
  • [SPARK-17790] - Support for parallelizing R data.frame larger than 2GB
  • [SPARK-17812] - More granular control of starting offsets (assign)
  • [SPARK-17813] - Maximum data per trigger
  • [SPARK-17834] - Fetch the earliest offsets manually in KafkaSource instead of counting on KafkaConsumer
  • [SPARK-17926] - Add methods to convert StreamingQueryStatus to json
  • [SPARK-18143] - History Server is broken because of the refactoring work in Structured Streaming
  • [SPARK-18154] - CLONE - Change Source API so that sources do not need to keep unbounded state
  • [SPARK-18164] - ForeachSink should fail the Spark job if `process` throws exception

Bug

  • [SPARK-13747] - Concurrent execution in SQL doesn't work with Scala ForkJoinPool
  • [SPARK-16304] - LinkageError should not crash Spark executor
  • [SPARK-16804] - Correlated subqueries containing non-deterministic operators return incorrect results
  • [SPARK-16988] - spark history server log needs to be fixed to show https url when ssl is enabled
  • [SPARK-17112] - "select if(true, null, null)" via JDBC triggers IllegalArgumentException in Thriftserver
  • [SPARK-17123] - Performing set operations that combine string and date / timestamp columns may result in generated projection code which doesn't compile
  • [SPARK-17153] - [Structured streams] readStream ignores partition columns
  • [SPARK-17337] - Incomplete algorithm for name resolution in Catalyst paser may lead to incorrect result
  • [SPARK-17417] - Fix sorting of part files while reconstructing RDD/partition from checkpointed files.
  • [SPARK-17549] - InMemoryRelation doesn't scale to large tables
  • [SPARK-17559] - PeriodicGraphCheckpointer did not persist edges as expected in some cases
  • [SPARK-17587] - SparseVector __getitem__ should follow __getitem__ contract
  • [SPARK-17612] - Support `DESCRIBE table PARTITION` SQL syntax
  • [SPARK-17643] - Remove comparable requirement from Offset
  • [SPARK-17697] - BinaryLogisticRegressionSummary, GLM Summary should handle non-Double numeric types
  • [SPARK-17698] - Join predicates should not contain filter clauses
  • [SPARK-17707] - Web UI prevents spark-submit application to be finished
  • [SPARK-17712] - Incorrect result due to invalid pushdown of data-independent filter beneath aggregate
  • [SPARK-17721] - Erroneous computation in multiplication of transposed SparseMatrix with SparseVector
  • [SPARK-17733] - InferFiltersFromConstraints rule never terminates for query
  • [SPARK-17750] - Cannot create view which includes interval arithmetic
  • [SPARK-17753] - Simple case in spark sql throws ParseException
  • [SPARK-17758] - Spark Aggregate function LAST returns null on an empty partition
  • [SPARK-17782] - Kafka 010 test is flaky
  • [SPARK-17792] - L-BFGS solver for linear regression does not accept general numeric label column types
  • [SPARK-17798] - Remove redundant Experimental annotations in sql.streaming package
  • [SPARK-17805] - sqlContext.read.text() does not work with a list of paths
  • [SPARK-17806] - Incorrect result when work with data from parquet
  • [SPARK-17808] - BinaryType fails in Python 3 due to outdated Pyrolite
  • [SPARK-17810] - Default spark.sql.warehouse.dir is relative to local FS but can resolve as HDFS path
  • [SPARK-17811] - SparkR cannot parallelize data.frame with NA or NULL in Date columns
  • [SPARK-17816] - Json serialzation of accumulators are failing with ConcurrentModificationException
  • [SPARK-17818] - Cannot SELECT NULL
  • [SPARK-17819] - Specified database in JDBC URL is ignored when connecting to thriftserver
  • [SPARK-17832] - TableIdentifier.quotedString creates un-parseable names when name contains a backtick
  • [SPARK-17841] - Kafka 0.10 commitQueue needs to be drained
  • [SPARK-17853] - Kafka OffsetOutOfRangeException on DStreams union from separate Kafka clusters with identical topic names.
  • [SPARK-17859] - persist should not impede with spark's ability to perform a broadcast join.
  • [SPARK-17863] - SELECT distinct does not work if there is a order by clause
  • [SPARK-17876] - Write StructuredStreaming WAL to a stream instead of materializing all at once
  • [SPARK-17880] - The url linking to `AccumulatorV2` in the document is incorrect.
  • [SPARK-17882] - RBackendHandler swallowing errors
  • [SPARK-17884] - In the cast expression, casting from empty string to interval type throws NullPointerException
  • [SPARK-17892] - Query in CTAS is Optimized Twice (branch-2.0)
  • [SPARK-17929] - Deadlock when AM restart and send RemoveExecutor on reset
  • [SPARK-17986] - SQLTransformer leaks temporary tables
  • [SPARK-17989] - Check ascendingOrder type in sort_array function ahead
  • [SPARK-18001] - Broke link to R DataFrame In sql-programming-guide
  • [SPARK-18003] - RDD zipWithIndex generate wrong result when one partition contains more than 2147483647 records.
  • [SPARK-18009] - Spark 2.0.1 SQL Thrift Error
  • [SPARK-18022] - java.lang.NullPointerException instead of real exception when saving DF to MySQL
  • [SPARK-18030] - Flaky test: org.apache.spark.sql.streaming.FileStreamSourceSuite
  • [SPARK-18034] - Upgrade to MiMa 0.1.11
  • [SPARK-18058] - AnalysisException may be thrown when union two DFs whose struct fields have different nullability
  • [SPARK-18063] - Failed to infer constraints over multiple aliases
  • [SPARK-18070] - binary operator should not consider nullability when comparing input types
  • [SPARK-18093] - Fix default value test in SQLConfSuite to work regardless of warehouse dir's existence
  • [SPARK-18114] - MesosClusterScheduler generate bad command options
  • [SPARK-18132] - spark 2.0 branch's spark-release-publish failed because style check failed.
  • [SPARK-18148] - Misleading Error Message for Aggregation Without Window/GroupBy
  • [SPARK-18189] - task not serializable with groupByKey() + mapGroups() + map
  • [SPARK-18342] - HDFSBackedStateStore can fail to rename files causing snapshotting and recovery to fail
  • [SPARK-18358] - Multiple Aggregation Using 'countDistinct' and 'first' result in error

Dependency upgrade

  • [SPARK-17803] - Docker integration tests don't run with "Docker for Mac"

Documentation

Improvement

  • [SPARK-16343] - Improve the PushDownPredicate rule to pushdown predicates currectly in non-deterministic condition
  • [SPARK-17751] - Remove spark.sql.eagerAnalysis
  • [SPARK-17780] - Report NoClassDefFoundError in StreamExecution
  • [SPARK-17999] - Add getPreferredLocations for KafkaSourceRDD
  • [SPARK-18044] - FileStreamSource should not infer partitions in every batch

New Feature

Test

  • [SPARK-17624] - Flaky test? StateStoreSuite maintenance
  • [SPARK-17738] - Flaky test: org.apache.spark.sql.execution.columnar.ColumnTypeSuite MAP append/extract

 

  • [SPARK-17778] - Mock SparkContext to reduce memory usage of BlockManagerSuite

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值