原文链接:https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420&version=12338301
Sub-task
- [SPARK-16963] - Change Source API so that sources do not need to keep unbounded state
- [SPARK-17346] - Kafka 0.10 support in Structured Streaming
- [SPARK-17731] - Metrics for Structured Streaming
- [SPARK-17790] - Support for parallelizing R data.frame larger than 2GB
- [SPARK-17812] - More granular control of starting offsets (assign)
- [SPARK-17813] - Maximum data per trigger
- [SPARK-17834] - Fetch the earliest offsets manually in KafkaSource instead of counting on KafkaConsumer
- [SPARK-17926] - Add methods to convert StreamingQueryStatus to json
- [SPARK-18143] - History Server is broken because of the refactoring work in Structured Streaming
- [SPARK-18154] - CLONE - Change Source API so that sources do not need to keep unbounded state
- [SPARK-18164] - ForeachSink should fail the Spark job if `process` throws exception
Bug
- [SPARK-13747] - Concurrent execution in SQL doesn't work with Scala ForkJoinPool
- [SPARK-16304] - LinkageError should not crash Spark executor
- [SPARK-16804] - Correlated subqueries containing non-deterministic operators return incorrect results
- [SPARK-16988] - spark history server log needs to be fixed to show https url when ssl is enabled
- [SPARK-17112] - "select if(true, null, null)" via JDBC triggers IllegalArgumentException in Thriftserver
- [SPARK-17123] - Performing set operations that combine string and date / timestamp columns may result in generated projection code which doesn't compile
- [SPARK-17153] - [Structured streams] readStream ignores partition columns
- [SPARK-17337] - Incomplete algorithm for name resolution in Catalyst paser may lead to incorrect result
- [SPARK-17417] - Fix sorting of part files while reconstructing RDD/partition from checkpointed files.
- [SPARK-17549] - InMemoryRelation doesn't scale to large tables
- [SPARK-17559] - PeriodicGraphCheckpointer did not persist edges as expected in some cases
- [SPARK-17587] - SparseVector __getitem__ should follow __getitem__ contract
- [SPARK-17612] - Support `DESCRIBE table PARTITION` SQL syntax
- [SPARK-17643] - Remove comparable requirement from Offset
- [SPARK-17697] - BinaryLogisticRegressionSummary, GLM Summary should handle non-Double numeric types
- [SPARK-17698] - Join predicates should not contain filter clauses
- [SPARK-17707] - Web UI prevents spark-submit application to be finished
- [SPARK-17712] - Incorrect result due to invalid pushdown of data-independent filter beneath aggregate
- [SPARK-17721] - Erroneous computation in multiplication of transposed SparseMatrix with SparseVector
- [SPARK-17733] - InferFiltersFromConstraints rule never terminates for query
- [SPARK-17750] - Cannot create view which includes interval arithmetic
- [SPARK-17753] - Simple case in spark sql throws ParseException
- [SPARK-17758] - Spark Aggregate function LAST returns null on an empty partition
- [SPARK-17782] - Kafka 010 test is flaky
- [SPARK-17792] - L-BFGS solver for linear regression does not accept general numeric label column types
- [SPARK-17798] - Remove redundant Experimental annotations in sql.streaming package
- [SPARK-17805] - sqlContext.read.text() does not work with a list of paths
- [SPARK-17806] - Incorrect result when work with data from parquet
- [SPARK-17808] - BinaryType fails in Python 3 due to outdated Pyrolite
- [SPARK-17810] - Default spark.sql.warehouse.dir is relative to local FS but can resolve as HDFS path
- [SPARK-17811] - SparkR cannot parallelize data.frame with NA or NULL in Date columns
- [SPARK-17816] - Json serialzation of accumulators are failing with ConcurrentModificationException
- [SPARK-17818] - Cannot SELECT NULL
- [SPARK-17819] - Specified database in JDBC URL is ignored when connecting to thriftserver
- [SPARK-17832] - TableIdentifier.quotedString creates un-parseable names when name contains a backtick
- [SPARK-17841] - Kafka 0.10 commitQueue needs to be drained
- [SPARK-17853] - Kafka OffsetOutOfRangeException on DStreams union from separate Kafka clusters with identical topic names.
- [SPARK-17859] - persist should not impede with spark's ability to perform a broadcast join.
- [SPARK-17863] - SELECT distinct does not work if there is a order by clause
- [SPARK-17876] - Write StructuredStreaming WAL to a stream instead of materializing all at once
- [SPARK-17880] - The url linking to `AccumulatorV2` in the document is incorrect.
- [SPARK-17882] - RBackendHandler swallowing errors
- [SPARK-17884] - In the cast expression, casting from empty string to interval type throws NullPointerException
- [SPARK-17892] - Query in CTAS is Optimized Twice (branch-2.0)
- [SPARK-17929] - Deadlock when AM restart and send RemoveExecutor on reset
- [SPARK-17986] - SQLTransformer leaks temporary tables
- [SPARK-17989] - Check ascendingOrder type in sort_array function ahead
- [SPARK-18001] - Broke link to R DataFrame In sql-programming-guide
- [SPARK-18003] - RDD zipWithIndex generate wrong result when one partition contains more than 2147483647 records.
- [SPARK-18009] - Spark 2.0.1 SQL Thrift Error
- [SPARK-18022] - java.lang.NullPointerException instead of real exception when saving DF to MySQL
- [SPARK-18030] - Flaky test: org.apache.spark.sql.streaming.FileStreamSourceSuite
- [SPARK-18034] - Upgrade to MiMa 0.1.11
- [SPARK-18058] - AnalysisException may be thrown when union two DFs whose struct fields have different nullability
- [SPARK-18063] - Failed to infer constraints over multiple aliases
- [SPARK-18070] - binary operator should not consider nullability when comparing input types
- [SPARK-18093] - Fix default value test in SQLConfSuite to work regardless of warehouse dir's existence
- [SPARK-18114] - MesosClusterScheduler generate bad command options
- [SPARK-18132] - spark 2.0 branch's spark-release-publish failed because style check failed.
- [SPARK-18148] - Misleading Error Message for Aggregation Without Window/GroupBy
- [SPARK-18189] - task not serializable with groupByKey() + mapGroups() + map
- [SPARK-18342] - HDFSBackedStateStore can fail to rename files causing snapshotting and recovery to fail
- [SPARK-18358] - Multiple Aggregation Using 'countDistinct' and 'first' result in error
Dependency upgrade
- [SPARK-17803] - Docker integration tests don't run with "Docker for Mac"
Documentation
- [SPARK-17736] - Update R README for rmarkdown, pandoc
- [SPARK-17883] - Possible typo in comments of Row.scala
- [SPARK-17953] - Fix typo in SparkSession scaladoc
- [SPARK-18104] - Don't build KafkaSource doc
Improvement
- [SPARK-16343] - Improve the PushDownPredicate rule to pushdown predicates currectly in non-deterministic condition
- [SPARK-17751] - Remove spark.sql.eagerAnalysis
- [SPARK-17780] - Report NoClassDefFoundError in StreamExecution
- [SPARK-17999] - Add getPreferredLocations for KafkaSourceRDD
- [SPARK-18044] - FileStreamSource should not infer partitions in every batch
New Feature
- [SPARK-17711] - Compress rolled executor logs
Test
- [SPARK-17624] - Flaky test? StateStoreSuite maintenance
- [SPARK-17738] - Flaky test: org.apache.spark.sql.execution.columnar.ColumnTypeSuite MAP append/extract
- [SPARK-17778] - Mock SparkContext to reduce memory usage of BlockManagerSuite