Release Notes - Spark - Version 3.4.3 - HTML format

Sub-task

  • [SPARK-44495] - Use the latest minikube in K8s IT
  • [SPARK-45445] - Upgrade snappy to 1.1.10.5
  • [SPARK-46369] - Remove `kill` link from RELAUNCHING drivers in MasterPage
  • [SPARK-46400] - When there are corrupted files in the local maven repo, retry to skip this cache
  • [SPARK-46411] - Change to use bcprov/bcpkix-jdk18on for test
  • [SPARK-46704] - Fix `MasterPage` to sort `Running Drivers` table by `Duration` column correctly
  • [SPARK-46747] - Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1
  • [SPARK-46817] - Fix `spark-daemon.sh` usage by adding `decommission` command
  • [SPARK-46888] - Fix `Master` to reject worker kill request if decommission is disabled
  • [SPARK-47021] - Fix `kvstore` module to have explicit `commons-lang3` test dependency
  • [SPARK-47111] - Upgrade `PostgreSQL` JDBC driver to 42.7.2 and docker image to 16.2
  • [SPARK-47368] - Remove inferTimestampNTZ config check in ParquetRowConverter
  • [SPARK-47370] - Add migration doc: TimestampNTZ type inference on Parquet files
  • [SPARK-47494] - Add migration doc for the behavior change of Parquet timestamp inference since Spark 3.3
  • [SPARK-47537] - Use MySQL Connector/J for MySQL DB instead of MariaDB Connector/J
  • [SPARK-47666] - Fix NPE when reading mysql bit array as LongType
  • [SPARK-47770] - Fix `GenerateMIMAIgnore.isPackagePrivateModule` to return false instead of failing
  • [SPARK-47774] - Remove redundant rules from `MimaExcludes`

Bug

  • [SPARK-45580] - Subquery changes the output schema of the outer query
  • [SPARK-46092] - Overflow in Parquet row group filter creation causes incorrect results
  • [SPARK-46189] - Various Pandas functions fail in interpreted mode
  • [SPARK-46239] - Hide Jetty info
  • [SPARK-46275] - Protobuf: Permissive mode should return null rather than struct with null fields
  • [SPARK-46330] - Loading of Spark UI blocks for a long time when HybridStore enabled
  • [SPARK-46339] - Directory with number name should not be treated as metadata log
  • [SPARK-46466] - vectorized parquet reader should never do rebase for timestamp ntz
  • [SPARK-46514] - Fix HiveMetastoreLazyInitializationSuite
  • [SPARK-46577] - HiveMetastoreLazyInitializationSuite leaks hive's SessionState
  • [SPARK-46598] - OrcColumnarBatchReader should respect the memory mode when creating column vectors for the missing column
  • [SPARK-46700] - count the last spilling for the shuffle disk spilling bytes metric
  • [SPARK-46763] - ReplaceDeduplicateWithAggregate fails when non-grouping keys have duplicate attributes
  • [SPARK-46779] - Grouping by subquery with a cached relation can fail
  • [SPARK-46786] - Fix MountVolumesFeatureStep to use ReadWriteOncePod instead of ReadWriteOnce
  • [SPARK-46794] - Incorrect results due to inferred predicate from checkpoint with subquery
  • [SPARK-46855] - Add `sketch` to the dependencies of the `catalyst` module in `module.py`
  • [SPARK-46861] - Avoid Deadlock in DAGScheduler
  • [SPARK-46862] - Incorrect count() of a dataframe loaded from CSV datasource
  • [SPARK-46893] - Remove inline scripts from UI descriptions
  • [SPARK-46945] - Add `spark.kubernetes.legacy.useReadWriteOnceAccessMode` for old K8s clusters
  • [SPARK-47063] - CAST long to timestamp has different behavior for codegen vs interpreted
  • [SPARK-47068] - Recover -1 and 0 case for spark.sql.execution.arrow.maxRecordsPerBatch
  • [SPARK-47072] - Wrong error message for incorrect ANSI intervals
  • [SPARK-47085] - Preformance issue on thrift API
  • [SPARK-47125] - Return null if Univocity never triggers parsing
  • [SPARK-47146] - Possible thread leak when doing sort merge join
  • [SPARK-47177] - Cached SQL plan do not display final AQE plan in explain string
  • [SPARK-47196] - Fix `core` module to succeed SBT tests
  • [SPARK-47236] - Fix `deleteRecursivelyUsingJavaIO` to skip non-existing file
  • [SPARK-47305] - PruneFilters incorrectly tags isStreaming flag when replacing child of Filter with LocalRelation
  • [SPARK-47318] - AuthEngine key exchange needs additional KDF round
  • [SPARK-47385] - Tuple encoder produces wrong results with Option inputs
  • [SPARK-47434] - Streaming Statistics link redirect causing 302 error
  • [SPARK-47455] - Fix Resource Handling of `scalaStyleOnCompileConfig` in SparkBuild.scala
  • [SPARK-47503] - Spark history sever fails to display query for cached JDBC relation named in quotes
  • [SPARK-47521] - Use `Utils.tryWithResource` during reading shuffle data from external storage
  • [SPARK-47646] - try_to_number fails with NPE for malformed input
  • [SPARK-47676] - Clean up the removed `VersionsSuite` references
  • [SPARK-47824] - Nondeterminism in pyspark.pandas.series.asof
  • [SPARK-47844] - Upgrade ORC to 1.8.7

Improvement

  • [SPARK-45587] - Skip UNIDOC and MIMA in build GitHub Action job
  • [SPARK-46286] - Document spark.io.compression.zstd.bufferPool.enabled
  • [SPARK-46425] - Pin the bundler version in CI
  • [SPARK-47505] - Fix `pyspark-errors` test jobs for branch-3.4
  • [SPARK-47734] - Fix flaky pyspark.sql.dataframe.DataFrame.writeStream doctest by stopping streaming query

Test

  • [SPARK-45141] - Pin `pyarrow==12.0.1` in CI
  • [SPARK-46801] - Do not treat exit 5 as a test failure in Python testing script
  • [SPARK-47472] - Pin `numpy` to 1.23.5 in `dev/infra/Dockerfile`

Task

  • [SPARK-46182] - Shuffle data lost on decommissioned executor caused by race condition between lastTaskRunningTime and lastShuffleMigrationTime
  • [SPARK-46628] - Use SPDX short identifier in `licenses` name
  • [SPARK-47187] - Fix hive compress output config does not work
  • [SPARK-47432] - Add `pyarrow` upper bound requirement, `<13.0.0`
  • [SPARK-47433] - Update PySpark package dependency version ranges
  • [SPARK-47481] - Fix Python linter

Dependency upgrade

Edit/Copy Release Notes

The text area below allows the project release notes to be edited and copied to another document.