Sub-task
- [SPARK-24535] - Fix java version parsing in SparkR on Windows
- [SPARK-24976] - Allow None for Decimal type conversion (specific to PyArrow 0.9.0)
Bug
- [SPARK-22809] - pyspark is sensitive to imports with dots
- [SPARK-23243] - Shuffle+Repartition on an RDD could lead to incorrect answers
- [SPARK-23618] - docker-image-tool.sh Fails While Building Image
- [SPARK-23731] - FileSourceScanExec throws NullPointerException in subexpression elimination
- [SPARK-23732] - Broken link to scala source code in Spark Scala api Scaladoc
- [SPARK-24018] - Spark-without-hadoop package fails to create or read parquet files with snappy compression
- [SPARK-24216] - Spark TypedAggregateExpression uses getSimpleName that is not safe in scala
- [SPARK-24369] - A bug when having multiple distinct aggregations
- [SPARK-24385] - Trivially-true EqualNullSafe should be handled like EqualTo in Dataset.join
- [SPARK-24415] - Stage page aggregated executor metrics wrong when failures
- [SPARK-24452] - long = int*int or long = int+int may cause overflow.
- [SPARK-24468] - DecimalType `adjustPrecisionScale` might fail when scale is negative
- [SPARK-24495] - SortMergeJoin with duplicate keys wrong results
- [SPARK-24506] - Spark.ui.filters not applied to /sqlserver/ url
- [SPARK-24530] - Sphinx doesn't render autodoc_docstring_signature correctly (with Python 2?) and pyspark.ml docs are broken
- [SPARK-24531] - HiveExternalCatalogVersionsSuite failing due to missing 2.2.0 version
- [SPARK-24536] - Query with nonsensical LIMIT hits AssertionError
- [SPARK-24552] - Task attempt numbers are reused when stages are retried
- [SPARK-24578] - Reading remote cache block behavior changes and causes timeout issue
- [SPARK-24583] - Wrong schema type in InsertIntoDataSourceCommand
- [SPARK-24588] - StreamingSymmetricHashJoinExec should require HashClusteredPartitioning from children
- [SPARK-24589] - OutputCommitCoordinator may allow duplicate commits
- [SPARK-24603] - Typo in comments
- [SPARK-24613] - Cache with UDF could not be matched with subsequent dependent caches
- [SPARK-24704] - The order of stages in the DAG graph is incorrect
- [SPARK-24739] - PySpark does not work with Python 3.7.0
- [SPARK-24781] - Using a reference from Dataset in Filter/Sort might not work.
- [SPARK-24809] - Serializing LongHashedRelation in executor may result in data error
- [SPARK-24813] - HiveExternalCatalogVersionsSuite still flaky; fall back to Apache archive
- [SPARK-24867] - Add AnalysisBarrier to DataFrameWriter
- [SPARK-24879] - NPE in Hive partition filter pushdown for `partCol IN (NULL, ....)`
- [SPARK-24889] - dataset.unpersist() doesn't update storage memory stats
- [SPARK-24891] - Fix HandleNullInputsForUDF rule
- [SPARK-24908] - [R] remove spaces to make lintr happy
- [SPARK-24909] - Spark scheduler can hang when fetch failures, executor lost, task running on lost executor, and multiple stage attempts
- [SPARK-24927] - The hadoop-provided profile doesn't play well with Snappy-compressed Parquet files
- [SPARK-24934] - Complex type and binary type in in-memory partition pruning does not work due to missing upper/lower bounds cases
- [SPARK-24948] - SHS filters wrongly some applications due to permission check
- [SPARK-24950] - scala DateTimeUtilsSuite daysToMillis and millisToDays fails w/java 8 181-b13
- [SPARK-24957] - Decimal arithmetic can lead to wrong values using codegen
- [SPARK-24987] - Kafka Cached Consumer Leaking File Descriptors
- [SPARK-25028] - AnalyzePartitionCommand failed with NPE if value is null
- [SPARK-25051] - where clause on dataset gives AnalysisException
- [SPARK-25076] - SQLConf should not be retrieved from a stopped SparkSession
- [SPARK-25084] - "distribute by" on multiple columns may lead to codegen issue
- [SPARK-25114] - RecordBinaryComparator may return wrong result when subtraction between two words is divisible by Integer.MAX_VALUE
- [SPARK-25124] - VectorSizeHint.size is buggy, breaking streaming pipeline
- [SPARK-25144] - distinct on Dataset leads to exception due to Managed memory leak detected
- [SPARK-25164] - Parquet reader builds entire list of columns once for each column
- [SPARK-25205] - typo in spark.network.crypto.keyFactoryIteration
- [SPARK-25231] - Running a Large Job with Speculation On Causes Executor Heartbeats to Time Out on Driver
- [SPARK-25313] - Fix regression in FileFormatWriter output schema
- [SPARK-25330] - Permission issue after upgrade hadoop version to 2.7.7
- [SPARK-25357] - Add metadata to SparkPlanInfo to dump more information like file path to event log
- [SPARK-25368] - Incorrect constraint inference returns wrong result
- [SPARK-25371] - Vector Assembler with no input columns leads to opaque error
- [SPARK-25402] - Null handling in BooleanSimplification
- [SPARK-26802] - CVE-2018-11760: Apache Spark local privilege escalation vulnerability
New Feature
- [SPARK-24542] - Hive UDF series UDFXPathXXXX allow users to pass carefully crafted XML to access arbitrary files
Story
- [SPARK-25234] - SparkR:::parallelize doesn't handle integer overflow properly
Improvement
- [SPARK-24455] - fix typo in TaskSchedulerImpl's comments
- [SPARK-24696] - ColumnPruning rule fails to remove extra Project
- [SPARK-25400] - Increase timeouts in schedulerIntegrationSuite
Test
- [SPARK-24502] - flaky test: UnsafeRowSerializerSuite
- [SPARK-24521] - Fix ineffective test in CachedTableSuite
- [SPARK-24564] - Add test suite for RecordBinaryComparator
Documentation
- [SPARK-24507] - Description in "Level of Parallelism in Data Receiving" section of Spark Streaming Programming Guide in is not relevan for the recent Kafka direct apprach
- [SPARK-25273] - How to install testthat v1.0.2
Edit/Copy Release Notes
The text area below allows the project release notes to be edited and copied to another document.