Sub-task
- [SPARK-44495] - Use the latest minikube in K8s IT
- [SPARK-45445] - Upgrade snappy to 1.1.10.5
- [SPARK-46369] - Remove `kill` link from RELAUNCHING drivers in MasterPage
- [SPARK-46400] - When there are corrupted files in the local maven repo, retry to skip this cache
- [SPARK-46411] - Change to use bcprov/bcpkix-jdk18on for test
- [SPARK-46704] - Fix `MasterPage` to sort `Running Drivers` table by `Duration` column correctly
- [SPARK-46747] - Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1
- [SPARK-46817] - Fix `spark-daemon.sh` usage by adding `decommission` command
- [SPARK-46888] - Fix `Master` to reject worker kill request if decommission is disabled
- [SPARK-47021] - Fix `kvstore` module to have explicit `commons-lang3` test dependency
- [SPARK-47111] - Upgrade `PostgreSQL` JDBC driver to 42.7.2 and docker image to 16.2
- [SPARK-47368] - Remove inferTimestampNTZ config check in ParquetRowConverter
- [SPARK-47370] - Add migration doc: TimestampNTZ type inference on Parquet files
- [SPARK-47494] - Add migration doc for the behavior change of Parquet timestamp inference since Spark 3.3
- [SPARK-47537] - Use MySQL Connector/J for MySQL DB instead of MariaDB Connector/J
- [SPARK-47666] - Fix NPE when reading mysql bit array as LongType
- [SPARK-47770] - Fix `GenerateMIMAIgnore.isPackagePrivateModule` to return false instead of failing
- [SPARK-47774] - Remove redundant rules from `MimaExcludes`
Bug
- [SPARK-45580] - Subquery changes the output schema of the outer query
- [SPARK-46092] - Overflow in Parquet row group filter creation causes incorrect results
- [SPARK-46189] - Various Pandas functions fail in interpreted mode
- [SPARK-46239] - Hide Jetty info
- [SPARK-46275] - Protobuf: Permissive mode should return null rather than struct with null fields
- [SPARK-46330] - Loading of Spark UI blocks for a long time when HybridStore enabled
- [SPARK-46339] - Directory with number name should not be treated as metadata log
- [SPARK-46466] - vectorized parquet reader should never do rebase for timestamp ntz
- [SPARK-46514] - Fix HiveMetastoreLazyInitializationSuite
- [SPARK-46577] - HiveMetastoreLazyInitializationSuite leaks hive's SessionState
- [SPARK-46598] - OrcColumnarBatchReader should respect the memory mode when creating column vectors for the missing column
- [SPARK-46700] - count the last spilling for the shuffle disk spilling bytes metric
- [SPARK-46763] - ReplaceDeduplicateWithAggregate fails when non-grouping keys have duplicate attributes
- [SPARK-46779] - Grouping by subquery with a cached relation can fail
- [SPARK-46786] - Fix MountVolumesFeatureStep to use ReadWriteOncePod instead of ReadWriteOnce
- [SPARK-46794] - Incorrect results due to inferred predicate from checkpoint with subquery
- [SPARK-46855] - Add `sketch` to the dependencies of the `catalyst` module in `module.py`
- [SPARK-46861] - Avoid Deadlock in DAGScheduler
- [SPARK-46862] - Incorrect count() of a dataframe loaded from CSV datasource
- [SPARK-46893] - Remove inline scripts from UI descriptions
- [SPARK-46945] - Add `spark.kubernetes.legacy.useReadWriteOnceAccessMode` for old K8s clusters
- [SPARK-47063] - CAST long to timestamp has different behavior for codegen vs interpreted
- [SPARK-47068] - Recover -1 and 0 case for spark.sql.execution.arrow.maxRecordsPerBatch
- [SPARK-47072] - Wrong error message for incorrect ANSI intervals
- [SPARK-47085] - Preformance issue on thrift API
- [SPARK-47125] - Return null if Univocity never triggers parsing
- [SPARK-47146] - Possible thread leak when doing sort merge join
- [SPARK-47177] - Cached SQL plan do not display final AQE plan in explain string
- [SPARK-47196] - Fix `core` module to succeed SBT tests
- [SPARK-47236] - Fix `deleteRecursivelyUsingJavaIO` to skip non-existing file
- [SPARK-47305] - PruneFilters incorrectly tags isStreaming flag when replacing child of Filter with LocalRelation
- [SPARK-47318] - AuthEngine key exchange needs additional KDF round
- [SPARK-47385] - Tuple encoder produces wrong results with Option inputs
- [SPARK-47434] - Streaming Statistics link redirect causing 302 error
- [SPARK-47455] - Fix Resource Handling of `scalaStyleOnCompileConfig` in SparkBuild.scala
- [SPARK-47503] - Spark history sever fails to display query for cached JDBC relation named in quotes
- [SPARK-47521] - Use `Utils.tryWithResource` during reading shuffle data from external storage
- [SPARK-47646] - try_to_number fails with NPE for malformed input
- [SPARK-47676] - Clean up the removed `VersionsSuite` references
- [SPARK-47824] - Nondeterminism in pyspark.pandas.series.asof
- [SPARK-47844] - Upgrade ORC to 1.8.7
Improvement
- [SPARK-45587] - Skip UNIDOC and MIMA in build GitHub Action job
- [SPARK-46286] - Document spark.io.compression.zstd.bufferPool.enabled
- [SPARK-46425] - Pin the bundler version in CI
- [SPARK-47505] - Fix `pyspark-errors` test jobs for branch-3.4
- [SPARK-47734] - Fix flaky pyspark.sql.dataframe.DataFrame.writeStream doctest by stopping streaming query
Test
- [SPARK-45141] - Pin `pyarrow==12.0.1` in CI
- [SPARK-46801] - Do not treat exit 5 as a test failure in Python testing script
- [SPARK-47472] - Pin `numpy` to 1.23.5 in `dev/infra/Dockerfile`
Task
- [SPARK-46182] - Shuffle data lost on decommissioned executor caused by race condition between lastTaskRunningTime and lastShuffleMigrationTime
- [SPARK-46628] - Use SPDX short identifier in `licenses` name
- [SPARK-47187] - Fix hive compress output config does not work
- [SPARK-47432] - Add `pyarrow` upper bound requirement, `<13.0.0`
- [SPARK-47433] - Update PySpark package dependency version ranges
- [SPARK-47481] - Fix Python linter
Dependency upgrade
- [SPARK-44393] - Upgrade H2 from 2.1.214 to 2.2.220
Edit/Copy Release Notes
The text area below allows the project release notes to be edited and copied to another document.