Sub-task
- [SPARK-38697] - Extend SparkSessionExtensions to inject rules into AQE Optimizer
- [SPARK-40872] - Fallback to original shuffle block when a push-merged shuffle chunk is zero-size
- [SPARK-41185] - Remove ARM limitation for YuniKorn from docs
- [SPARK-41388] - getReusablePVCs should ignore recently created PVCs in the previous batch
- [SPARK-42071] - Register scala.math.Ordering$Reverse to KyroSerializer
Bug
- [SPARK-32380] - sparksql cannot access hive table while data in hbase
- [SPARK-39404] - Unable to query _metadata in streaming if getBatch returns multiple logical nodes in the DataFrame
- [SPARK-40493] - Revert "[SPARK-33861][SQL] Simplify conditional in predicate"
- [SPARK-40588] - Sorting issue with partitioned-writing and AQE turned on
- [SPARK-40817] - Remote spark.jars URIs ignored for Spark on Kubernetes in cluster mode
- [SPARK-40819] - Parquet INT64 (TIMESTAMP(NANOS,true)) now throwing Illegal Parquet type instead of automatically converting to LongType
- [SPARK-40829] - STORED AS serde in CREATE TABLE LIKE view does not work
- [SPARK-40851] - TimestampFormatter behavior changed when using the latest Java 8/11/17
- [SPARK-40869] - KubernetesConf.getResourceNamePrefix creates invalid name prefixes
- [SPARK-40874] - Fix broadcasts in Python UDFs when encryption is enabled
- [SPARK-40902] - Quick submission of drivers in tests to mesos scheduler results in dropping drivers
- [SPARK-40918] - Mismatch between ParquetFileFormat and FileSourceScanExec in # columns for WSCG.isTooManyFields when using _metadata
- [SPARK-40924] - Unhex function works incorrectly when input has uneven number of symbols
- [SPARK-40932] - Barrier: messages for allGather will be overridden by the following barrier APIs
- [SPARK-40963] - ExtractGenerator sets incorrect nullability in new Project
- [SPARK-40987] - Avoid creating a directory when deleting a block, causing DAGScheduler to not work
- [SPARK-41035] - Incorrect results or NPE when a literal is reused across distinct aggregations
- [SPARK-41118] - to_number/try_to_number throws NullPointerException when format is null
- [SPARK-41144] - UnresolvedHint should not cause query failure
- [SPARK-41151] - Keep built-in file _metadata column nullable value consistent
- [SPARK-41154] - Incorrect relation caching for queries with time travel spec
- [SPARK-41162] - Anti-join must not be pushed below aggregation with ambiguous predicates
- [SPARK-41187] - [Core] LiveExecutor MemoryLeak in AppStatusListener when ExecutorLost happen
- [SPARK-41188] - Set executorEnv OMP_NUM_THREADS to be spark.task.cpus by default for spark executor JVM processes
- [SPARK-41202] - Update ORC to 1.7.7
- [SPARK-41254] - YarnAllocator.rpIdToYarnResource map is not properly updated
- [SPARK-41327] - Fix SparkStatusTracker.getExecutorInfos by switch On/OffHeapStorageMemory info
- [SPARK-41339] - RocksDB state store WriteBatch doesn't clean up native memory
- [SPARK-41350] - allow simple name access of using join hidden columns after subquery alias
- [SPARK-41365] - Stages UI page fails to load for proxy in some yarn versions
- [SPARK-41375] - Avoid empty latest KafkaSourceOffset
- [SPARK-41376] - Executor netty direct memory check should respect spark.shuffle.io.preferDirectBufs
- [SPARK-41379] - Inconsistency of spark session in DataFrame in user function for foreachBatch sink in PySpark
- [SPARK-41385] - Replace deprecated `.newInstance()` in K8s module
- [SPARK-41395] - InterpretedMutableProjection can corrupt unsafe buffer when used with decimal data
- [SPARK-41448] - Make consistent MR job IDs in FileBatchWriter and FileFormatWriter
- [SPARK-41458] - Correctly transform the SPI services for Yarn Shuffle Service
- [SPARK-41468] - Fix PlanExpression handling in EquivalentExpressions
- [SPARK-41522] - GA dependencies test faild
- [SPARK-41535] - InterpretedUnsafeProjection and InterpretedMutableProjection can corrupt unsafe buffer when used with calendar interval data
- [SPARK-41554] - Decimal.changePrecision produces ArrayIndexOutOfBoundsException
- [SPARK-41668] - DECODE function returns wrong results when passed NULL
- [SPARK-41732] - Session window: analysis rule "SessionWindowing" does not apply tree-pattern based pruning
- [SPARK-41989] - PYARROW_IGNORE_TIMEZONE warning can break application logging setup
- [SPARK-42084] - Avoid leaking the qualified-access-only restriction
- [SPARK-42090] - Introduce sasl retry count in RetryingBlockTransferor
- [SPARK-42134] - Fix getPartitionFiltersAndDataFilters() to handle filters without referenced attributes
- [SPARK-42157] - `spark.scheduler.mode=FAIR` should provide FAIR scheduler
- [SPARK-42176] - Cast boolean to timestamp fails with ClassCastException
- [SPARK-42179] - Upgrade ORC to 1.7.8
- [SPARK-42188] - Force SBT protobuf version to match Maven on branch 3.2 and 3.3
- [SPARK-42201] - `build/sbt` should allow SBT_OPTS to override JVM memory setting
- [SPARK-42222] - Spark 3.3 Backport: SPARK-41344 Reading V2 datasource masks underlying error
- [SPARK-42259] - ResolveGroupingAnalytics should take care of Python UDAF
- [SPARK-42344] - The default size of the CONFIG_MAP_MAXSIZE should not be greater than 1048576
- [SPARK-42346] - distinct(count colname) with UNION ALL causes query analyzer bug
- [SPARK-42747] - Fix incorrect internal status of LoR and AFT
New Feature
- [SPARK-47717] - Support Hive tables as a streaming source and sink
Improvement
- [SPARK-38277] - Clear write batch after RocksDB state store's commit
- [SPARK-40886] - Bump Jackson Databind 2.13.4.2
- [SPARK-40913] - Pin `pytest==7.1.3`
- [SPARK-41031] - Upgrade `org.tukaani:xz` to 1.9
- [SPARK-41089] - Relocate Netty native arm64 libs
- [SPARK-41360] - Avoid BlockManager re-registration if the executor has been lost
- [SPARK-41476] - Prevent `README.md` from triggering CIs
- [SPARK-41541] - Fix wrong child call in SQLShuffleWriteMetricsReporter.decRecordsWritten()
- [SPARK-41962] - Update the import order of scala package in class SpecificParquetRecordReaderBase
- [SPARK-42230] - Improve `lint` job by skipping PySpark and SparkR docs if unchanged
Test
- [SPARK-41863] - Skip `flake8` tests if the command is not available
- [SPARK-41864] - Fix mypy linter errors
- [SPARK-42110] - Reduce the number of repetition in ParquetDeltaEncodingSuite.`random data test`
Task
- [SPARK-41415] - SASL Request Retries
- [SPARK-41538] - Metadata column should be appended at the end of project list
Dependency upgrade
- [SPARK-40801] - Upgrade Apache Commons Text to 1.10
- [SPARK-41030] - Upgrade Apache Ivy to 2.5.1
- [SPARK-41686] - Upgrade Apache Ivy to 2.5.1
Question
- [SPARK-42977] - spark sql Disable vectorized faild
Documentation
- [SPARK-40983] - Remove Hadoop requirements for zstd mention in Parquet compression codec
Edit/Copy Release Notes
The text area below allows the project release notes to be edited and copied to another document.