Release Notes - Spark - Version 2.2.3 - HTML format

Sub-task

  • [SPARK-26327] - Metrics in FileSourceScanExec not update correctly while relation.partitionSchema is set

Bug

  • [SPARK-21402] - Fix java array of structs deserialization
  • [SPARK-22951] - count() after dropDuplicates() on emptyDataFrame returns incorrect value
  • [SPARK-23207] - Shuffle+Repartition on an DataFrame could lead to incorrect answers
  • [SPARK-23243] - Shuffle+Repartition on an RDD could lead to incorrect answers
  • [SPARK-24603] - Typo in comments
  • [SPARK-24677] - TaskSetManager not updating successfulTaskDurations for old stage attempts
  • [SPARK-24809] - Serializing LongHashedRelation in executor may result in data error
  • [SPARK-24813] - HiveExternalCatalogVersionsSuite still flaky; fall back to Apache archive
  • [SPARK-24927] - The hadoop-provided profile doesn't play well with Snappy-compressed Parquet files
  • [SPARK-24948] - SHS filters wrongly some applications due to permission check
  • [SPARK-24950] - scala DateTimeUtilsSuite daysToMillis and millisToDays fails w/java 8 181-b13
  • [SPARK-24957] - Decimal arithmetic can lead to wrong values using codegen
  • [SPARK-25081] - Nested spill in ShuffleExternalSorter may access a released memory page
  • [SPARK-25114] - RecordBinaryComparator may return wrong result when subtraction between two words is divisible by Integer.MAX_VALUE
  • [SPARK-25144] - distinct on Dataset leads to exception due to Managed memory leak detected
  • [SPARK-25164] - Parquet reader builds entire list of columns once for each column
  • [SPARK-25402] - Null handling in BooleanSimplification
  • [SPARK-25568] - Continue to update the remaining accumulators when failing to update one accumulator
  • [SPARK-25591] - PySpark Accumulators with multiple PythonUDFs
  • [SPARK-25714] - Null Handling in the Optimizer rule BooleanSimplification
  • [SPARK-25726] - Flaky test: SaveIntoDataSourceCommandSuite.`simpleString is redacted`
  • [SPARK-25797] - Views created via 2.1 cannot be read via 2.2+
  • [SPARK-25854] - mvn helper script always exits w/1, causing mvn builds to fail
  • [SPARK-26233] - Incorrect decimal value with java beans and first/last/max... functions
  • [SPARK-26537] - update the release scripts to point to gitbox
  • [SPARK-26545] - Fix typo in EqualNullSafe's truth table comment
  • [SPARK-26553] - NameError: global name '_exception_message' is not defined
  • [SPARK-26802] - CVE-2018-11760: Apache Spark local privilege escalation vulnerability

New Feature

  • [SPARK-26118] - Make Jetty's requestHeaderSize configurable in Spark

Improvement

  • [SPARK-20715] - MapStatuses shouldn't be redundantly stored in both ShuffleMapStage and MapOutputTracker
  • [SPARK-25253] - Refactor pyspark connection & authentication
  • [SPARK-25576] - Fix lint failure in 2.2

Test

  • [SPARK-24564] - Add test suite for RecordBinaryComparator

Edit/Copy Release Notes

The text area below allows the project release notes to be edited and copied to another document.