Release Notes - Spark - Version 3.2.3 - HTML format

Sub-task

  • [SPARK-38697] - Extend SparkSessionExtensions to inject rules into AQE Optimizer
  • [SPARK-39200] - Stream is corrupted Exception while fetching the blocks from fallback storage system
  • [SPARK-39965] - Skip PVC cleanup when driver doesn't own PVCs
  • [SPARK-40459] - recoverDiskStore should not stop by existing recomputed files
  • [SPARK-40636] - Fix wrong remained shuffles log in BlockManagerDecommissioner

Bug

  • [SPARK-8731] - Beeline doesn't work with -e option when started in background
  • [SPARK-32380] - sparksql cannot access hive table while data in hbase
  • [SPARK-35542] - Bucketizer created for multiple columns with parameters splitsArray,  inputCols and outputCols can not be loaded after saving it.
  • [SPARK-39184] - ArrayIndexOutOfBoundsException for some date/time sequences in some time-zones
  • [SPARK-39647] - Block push fails with java.lang.IllegalArgumentException: Active local dirs list has not been updated by any executor registration even when the NodeManager hasn't been restarted
  • [SPARK-39775] - Regression due to AVRO-2035
  • [SPARK-39833] - Filtered parquet data frame count() and show() produce inconsistent results when spark.sql.parquet.filterPushdown is true
  • [SPARK-39835] - Fix EliminateSorts remove global sort below the local sort
  • [SPARK-39839] - Handle special case of null variable-length Decimal with non-zero offsetAndSize in UnsafeRow structural integrity check
  • [SPARK-39847] - Race condition related to interruption of task threads while they are in RocksDBLoader.loadLibrary()
  • [SPARK-39867] - Global limit should not inherit OrderPreservingUnaryNode
  • [SPARK-39887] - Expression transform error
  • [SPARK-39900] - Issue with querying dataframe produced by 'binaryFile' format using 'not' operator
  • [SPARK-39932] - WindowExec should clear the final partition buffer
  • [SPARK-39952] - SaveIntoDataSourceCommand should recache result relation
  • [SPARK-39962] - Global aggregation against pandas aggregate UDF does not take the column order into account
  • [SPARK-39972] - Revert the test case of SPARK-39962 in branch-3.2 and branch-3.1
  • [SPARK-40002] - Limit improperly pushed down through window using ntile function
  • [SPARK-40065] - Executor ConfigMap is not mounted if profile is not default
  • [SPARK-40079] - Add Imputer inputCols validation for empty input case
  • [SPARK-40089] - Sorting of at least Decimal(20, 2) fails for some values near the max.
  • [SPARK-40117] - Convert condition to java in DataFrameWriterV2.overwrite
  • [SPARK-40121] - Initialize projection used for Python UDF
  • [SPARK-40124] - Update TPCDS v1.4 q32 for Plan Stability tests
  • [SPARK-40149] - Star expansion after outer join asymmetrically includes joining key
  • [SPARK-40169] - Fix the issue with Parquet column index and predicate pushdown in Data source V1
  • [SPARK-40212] - SparkSQL castPartValue does not properly handle byte & short
  • [SPARK-40218] - GROUPING SETS should preserve the grouping columns
  • [SPARK-40270] - Make compute.max_rows as None working in DataFrame.style
  • [SPARK-40280] - Failure to create parquet predicate push down for ints and longs on some valid files
  • [SPARK-40315] - Non-deterministic hashCode() calculations for ArrayBasedMapData on equal objects
  • [SPARK-40407] - Repartition of DataFrame can result in severe data skew in some special case
  • [SPARK-40470] - arrays_zip output unexpected alias column names when using GetMapValue and GetArrayStructFields
  • [SPARK-40493] - Revert "[SPARK-33861][SQL] Simplify conditional in predicate"
  • [SPARK-40562] - Add spark.sql.legacy.groupingIdWithAppendedUserGroupBy
  • [SPARK-40583] - Documentation error in "Integration with Cloud Infrastructures"
  • [SPARK-40588] - Sorting issue with partitioned-writing and AQE turned on
  • [SPARK-40612] - On Kubernetes for long running app Spark using an invalid principal to renew the delegation token
  • [SPARK-40660] - Switch to XORShiftRandom to distribute elements
  • [SPARK-40829] - STORED AS serde in CREATE TABLE LIKE view does not work
  • [SPARK-40851] - TimestampFormatter behavior changed when using the latest Java 8/11/17
  • [SPARK-40869] - KubernetesConf.getResourceNamePrefix creates invalid name prefixes
  • [SPARK-40874] - Fix broadcasts in Python UDFs when encryption is enabled
  • [SPARK-40902] - Quick submission of drivers in tests to mesos scheduler results in dropping drivers
  • [SPARK-40963] - ExtractGenerator sets incorrect nullability in new Project
  • [SPARK-40987] - Avoid creating a directory when deleting a block, causing DAGScheduler to not work
  • [SPARK-41035] - Incorrect results or NPE when a literal is reused across distinct aggregations
  • [SPARK-41091] - Fix Docker release tool for branch-3.2
  • [SPARK-41188] - Set executorEnv OMP_NUM_THREADS to be spark.task.cpus by default for spark executor JVM processes
  • [SPARK-41327] - Fix SparkStatusTracker.getExecutorInfos by switch On/OffHeapStorageMemory info
  • [SPARK-41395] - InterpretedMutableProjection can corrupt unsafe buffer when used with decimal data
  • [SPARK-41448] - Make consistent MR job IDs in FileBatchWriter and FileFormatWriter
  • [SPARK-41522] - GA dependencies test faild
  • [SPARK-41535] - InterpretedUnsafeProjection and InterpretedMutableProjection can corrupt unsafe buffer when used with calendar interval data
  • [SPARK-41668] - DECODE function returns wrong results when passed NULL

Improvement

  • [SPARK-38034] - Optimize time complexity and extend applicable cases for TransposeWindow
  • [SPARK-39831] - R dependencies installation start to fail after devtools_2.4.4 was released
  • [SPARK-39879] - Reduce local-cluster memory configuration in BroadcastJoinSuite* and HiveSparkSubmitSuite
  • [SPARK-40022] - YarnClusterSuite should not ABORTED when there is no Python3 environment
  • [SPARK-40241] - Correct the link of GenericUDTF
  • [SPARK-40490] - `YarnShuffleIntegrationSuite` no longer verifies `registeredExecFile` reload after SPARK-17321
  • [SPARK-40574] - Add PURGE to DROP TABLE doc
  • [SPARK-41541] - Fix wrong child call in SQLShuffleWriteMetricsReporter.decRecordsWritten()

Test

  • [SPARK-40172] - Temporarily disable flaky test cases in ImageFileFormatSuite
  • [SPARK-40461] - Set upperbound for pyzmq 24.0.0 for Python linter

Task

  • [SPARK-40213] - Incorrect ASCII value for Latin-1 Supplement characters
  • [SPARK-40292] - arrays_zip output unexpected alias column names

Dependency upgrade

Documentation

  • [SPARK-40043] - Document DataStreamWriter.toTable and DataStreamReader.table
  • [SPARK-40983] - Remove Hadoop requirements for zstd mention in Parquet compression codec

Edit/Copy Release Notes

The text area below allows the project release notes to be edited and copied to another document.