Release Notes - ASF JIRA

Release Notes - Spark - Version 3.4.2 - HTML format

Configure Release Notes

Sub-task

[SPARK-42730] - Update Spark Standalone Mode - Starting a Cluster Manually
[SPARK-44641] - SPJ: Results duplicated when SPJ partial-cluster and pushdown enabled but conditions unmet
[SPARK-44729] - Add canonical links to the PySpark docs page
[SPARK-44857] - Fix getBaseURI error in Spark Worker LogPage UI buttons
[SPARK-45187] - Fix WorkerPage to use the same pattern for `logPage` urls
[SPARK-45652] - SPJ: Handle empty input partitions after dynamic filtering
[SPARK-45749] - Fix Spark History Server to sort `Duration` column properly
[SPARK-45961] - Document `spark.master.*` configurations
[SPARK-46012] - EventLogFileReader should not read rolling logs if appStatus is missing
[SPARK-46095] - Document REST API for Spark Standalone Cluster

Bug

[SPARK-40154] - PySpark: DataFrame.cache docstring gives wrong storage level
[SPARK-42784] - Fix the problem of incomplete creation of subdirectories in push merged localDir
[SPARK-43203] - Fix DROP table behavior in session catalog
[SPARK-43393] - Sequence expression can overflow
[SPARK-44074] - `Logging plan changes for execution` test failed
[SPARK-44079] - Json reader crashes when a different schema is present
[SPARK-44134] - Can't set resources (GPU/FPGA) to 0 when they are set to positive value in spark-defaults.conf
[SPARK-44158] - Remove unused `spark.kubernetes.executor.lostCheck.maxAttempts`
[SPARK-44180] - DistributionAndOrderingUtils should apply ResolveTimeZone
[SPARK-44184] - Remove a wrong doc about ARROW_PRE_0_15_IPC_FORMAT
[SPARK-44215] - Client receives zero number of chunks in merge meta response which doesn't trigger fallback to unmerged blocks
[SPARK-44241] - Set io.connectionTimeout/connectionCreationTimeout to zero or negative will cause executor incessantes cons/destructions
[SPARK-44251] - Potential for incorrect results or NPE when full outer USING join has null key value
[SPARK-44313] - Generated column expression validation fails if there is a char/varchar column anywhere in the schema
[SPARK-44391] - `url_decode` can fail w/ an internal error
[SPARK-44464] - Fix applyInPandasWithStatePythonRunner to output rows that have Null as first column value
[SPARK-44494] - K8s-it test failed
[SPARK-44513] - Upgrade snappy-java to 1.1.10.3
[SPARK-44547] - BlockManagerDecommissioner throws exceptions when migrating RDD cached blocks to fallback storage
[SPARK-44581] - ShutdownHookManager get wrong hadoop user group information
[SPARK-44585] - Fix warning condition in MLLib RankingMetrics ndcgAk
[SPARK-44588] - Migrated shuffle blocks are encrypted multiple times when io.encryption is enabled
[SPARK-44630] - Revert SPARK-43043 Improve the performance of MapOutputTracker.updateMapOutput
[SPARK-44634] - Encoders.bean does no longer support nested beans with type arguments
[SPARK-44653] - non-trivial DataFrame unions should not break caching
[SPARK-44657] - Incorrect limit handling and config parsing in Arrow collect
[SPARK-44670] - Fix the `test_to_excel` tests for python3.7
[SPARK-44805] - Data lost after union using spark.sql.parquet.enableNestedColumnVectorizedReader=true
[SPARK-44813] - The JIRA Python misses our assignee when it searches user again
[SPARK-44840] - array_insert() give wrong results for ngative index
[SPARK-44843] - flaky test: RocksDBStateStoreStreamingAggregationSuite
[SPARK-44846] - PushFoldableIntoBranches in complex grouping expressions may cause bindReference error
[SPARK-44854] - Python timedelta to DayTimeIntervalType edge cases bug
[SPARK-44871] - Fix PERCENTILE_DISC behaviour
[SPARK-44910] - Encoders.bean does not support superclasses with generic type arguments
[SPARK-44922] - Disable o.a.p.h.InternalParquetRecordWriter logs for tests to reduce the log volume
[SPARK-44925] - K8s default service token file should not be materialized into token
[SPARK-44935] - Fix `RELEASE` file to have the correct information in Docker images
[SPARK-44940] - Improve performance of JSON parsing when "spark.sql.json.enablePartialResults" is enabled
[SPARK-44973] - Fix ArrayIndexOutOfBoundsException in conv()
[SPARK-44990] - CSV conversion performance severely degraded for null fields
[SPARK-45054] - HiveExternalCatalog.listPartitions should restore Spark SQL stats
[SPARK-45057] - Deadlock caused by rdd replication level of 2
[SPARK-45075] - Alter table with invalid default value will not report error
[SPARK-45078] - The ArrayInsert function should make explicit casting when element type not equals derived component type
[SPARK-45079] - percentile_approx() fails with an internal error on NULL accuracy
[SPARK-45081] - Encoders.bean does no longer work with read-only properties
[SPARK-45100] - reflect() fails with an internal error on NULL class and method
[SPARK-45103] - Update ORC to 1.8.5
[SPARK-45109] - Fix eas_decrypt and ln in connect
[SPARK-45210] - Switch languages consistently across docs for all code snippets (Spark 3.4 and below)
[SPARK-45227] - Fix a subtle thread-safety issue with CoarseGrainedExecutorBackend where an executor process randomly gets stuck
[SPARK-45237] - Correct the default value of `spark.history.store.hybridStore.diskBackend` in `monitoring.md`
[SPARK-45282] - Join loses records for cached datasets
[SPARK-45311] - Encoder fails on many "NoSuchElementException: None.get" since 3.4.x, search for an encoder for a generic type, and since 3.5.x isn't "an expression encoder"
[SPARK-45430] - FramelessOffsetWindowFunctionFrame fails when ignore nulls and offset > # of rows
[SPARK-45433] - CSV/JSON schema inference when timestamps do not match specified timestampFormat with only one row on each partition report error
[SPARK-45473] - Incorrect error message for RoundBase
[SPARK-45508] - Add "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED" so Platform can access cleaner on Java 9+
[SPARK-45592] - AQE and InMemoryTableScanExec correctness bug
[SPARK-45604] - Converting timestamp_ntz to array<timestamp_ntz> can cause NPE or SEGFAULT on parquet vectorized reader
[SPARK-45670] - SparkSubmit does not support --total-executor-cores when deploying on K8s
[SPARK-45678] - Cover BufferReleasingInputStream.available under tryOrFetchFailedException
[SPARK-45786] - Inaccurate Decimal multiplication and division results
[SPARK-45814] - ArrowConverters.createEmptyArrowBatch may cause memory leak
[SPARK-45847] - CliSuite flakiness due to non-sequential guarantee for stdout&stderr
[SPARK-45878] - ConcurrentModificationException in CliSuite
[SPARK-45884] - Upgrade ORC to 1.8.6
[SPARK-45896] - Expression encoding fails for Seq/Map of Option[Seq/Date/Timestamp/BigDecimal]
[SPARK-45920] - group by ordinal should be idempotent
[SPARK-45935] - Fix RST files link substitutions error
[SPARK-45963] - Restore documentation for DSv2 API
[SPARK-46006] - YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after YarnSchedulerBackend call stop
[SPARK-46016] - Fix pandas API support list properly
[SPARK-46019] - Fix HiveThriftServer2ListenerSuite and ThriftServerPageSuite to create java.io.tmpdir if it doesn't exist
[SPARK-46033] - Fix flaky ArithmeticExpressionSuite
[SPARK-46062] - CTE reference node does not inherit the flag `isStreaming` from CTE definition node
[SPARK-46064] - EliminateEventTimeWatermark does not consider the fact that isStreaming flag can change for current child during resolution

New Feature

[SPARK-45735] - Reenable CatalogTests without Spark Connect

Improvement

[SPARK-44206] - Dataset.selectExpr scope Session.active
[SPARK-44415] - Upgrade snappy-java to 1.1.10.2
[SPARK-44875] - commentor to commenter in merge script
[SPARK-44920] - Use await() instead of awaitUninterruptibly() in TransportClientFactory.createClient()
[SPARK-44929] - Standardize log output for console appender in tests
[SPARK-45071] - Optimize the processing speed of `BinaryArithmetic#dataType` when processing multi-column data
[SPARK-45127] - Exclude README.md from document build
[SPARK-45286] - Add back Matomo analytics to release docs
[SPARK-45588] - Minor scaladoc improvement in StreamingForeachBatchHelper
[SPARK-45640] - Fix flaky ProtobufCatalystDataConversionSuite
[SPARK-45751] - The default value of ‘spark.executor.logs.rolling.maxRetainedFiles' on the official website is incorrect
[SPARK-45829] - The default value of ‘spark.executor.logs.rolling.maxSize' on the official website is incorrect
[SPARK-45882] - BroadcastHashJoinExec propagate partitioning should respect CoalescedHashPartitioning

Test

[SPARK-44544] - Deduplicate run_python_packaging_tests
[SPARK-44553] - Ignoring `connect-check-protos` logic in GA testing
[SPARK-44661] - getMapOutputLocation should not throw NPE
[SPARK-45568] - WholeStageCodegenSparkSubmitSuite flakiness

Task

[SPARK-44557] - Flaky PIP packaging test

Documentation

[SPARK-44725] - Document spark.network.timeoutInterval
[SPARK-44745] - Document shuffle data recovery from the remounted K8s PVCs
[SPARK-44859] - Fix incorrect property name `asyncProgressCheckpointingInterval` in structured streaming doc

Edit/Copy Release Notes

The text area below allows the project release notes to be edited and copied to another document.

Release Notes - Spark - Version 3.4.2
    
<h2>        Sub-task
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-42730'>SPARK-42730</a>] -         Update Spark Standalone Mode - Starting a Cluster Manually
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44641'>SPARK-44641</a>] -         SPJ: Results duplicated when SPJ partial-cluster and pushdown enabled but conditions unmet
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44729'>SPARK-44729</a>] -         Add canonical links to the PySpark docs page
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44857'>SPARK-44857</a>] -         Fix getBaseURI error in Spark Worker LogPage UI buttons
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45187'>SPARK-45187</a>] -         Fix WorkerPage to use the same pattern for `logPage` urls
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45652'>SPARK-45652</a>] -         SPJ: Handle empty input partitions after dynamic filtering
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45749'>SPARK-45749</a>] -         Fix Spark History Server to sort `Duration` column properly
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45961'>SPARK-45961</a>] -         Document `spark.master.*` configurations
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-46012'>SPARK-46012</a>] -         EventLogFileReader should not read rolling logs if appStatus is missing
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-46095'>SPARK-46095</a>] -         Document REST API for Spark Standalone Cluster
</li>
</ul>
            
<h2>        Bug
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-40154'>SPARK-40154</a>] -         PySpark: DataFrame.cache docstring gives wrong storage level
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-42784'>SPARK-42784</a>] -         Fix the problem of incomplete creation of subdirectories in push merged localDir
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-43203'>SPARK-43203</a>] -         Fix DROP table behavior in session catalog
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-43393'>SPARK-43393</a>] -         Sequence expression can overflow
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44074'>SPARK-44074</a>] -         `Logging plan changes for execution` test failed
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44079'>SPARK-44079</a>] -         Json reader crashes when a different schema is present
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44134'>SPARK-44134</a>] -         Can&#39;t set resources (GPU/FPGA) to 0 when they are set to positive value in spark-defaults.conf
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44158'>SPARK-44158</a>] -         Remove unused `spark.kubernetes.executor.lostCheck.maxAttempts`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44180'>SPARK-44180</a>] -         DistributionAndOrderingUtils should apply ResolveTimeZone
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44184'>SPARK-44184</a>] -         Remove a wrong doc about ARROW_PRE_0_15_IPC_FORMAT
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44215'>SPARK-44215</a>] -         Client receives zero number of chunks in merge meta response which doesn&#39;t trigger fallback to unmerged blocks
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44241'>SPARK-44241</a>] -         Set io.connectionTimeout/connectionCreationTimeout to zero or negative will cause executor incessantes cons/destructions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44251'>SPARK-44251</a>] -         Potential for incorrect results or NPE when full outer USING join has null key value
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44313'>SPARK-44313</a>] -         Generated column expression validation fails if there is a char/varchar column anywhere in the schema
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44391'>SPARK-44391</a>] -         `url_decode` can fail w/ an internal error
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44464'>SPARK-44464</a>] -         Fix applyInPandasWithStatePythonRunner to output rows that have Null as first column value
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44494'>SPARK-44494</a>] -         K8s-it test failed
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44513'>SPARK-44513</a>] -         Upgrade snappy-java to 1.1.10.3
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44547'>SPARK-44547</a>] -         BlockManagerDecommissioner throws exceptions when migrating RDD cached blocks to fallback storage
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44581'>SPARK-44581</a>] -         ShutdownHookManager get wrong hadoop user group information
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44585'>SPARK-44585</a>] -         Fix warning condition in MLLib RankingMetrics ndcgAk
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44588'>SPARK-44588</a>] -         Migrated shuffle blocks are encrypted multiple times when io.encryption is enabled 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44630'>SPARK-44630</a>] -         Revert SPARK-43043 Improve the performance of MapOutputTracker.updateMapOutput
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44634'>SPARK-44634</a>] -         Encoders.bean does no longer support nested beans with type arguments
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44653'>SPARK-44653</a>] -         non-trivial DataFrame unions should not break caching
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44657'>SPARK-44657</a>] -         Incorrect limit handling and config parsing in Arrow collect
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44670'>SPARK-44670</a>] -         Fix the `test_to_excel` tests for python3.7
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44805'>SPARK-44805</a>] -         Data lost after union using spark.sql.parquet.enableNestedColumnVectorizedReader=true
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44813'>SPARK-44813</a>] -         The JIRA Python misses our assignee when it searches user again
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44840'>SPARK-44840</a>] -         array_insert() give wrong results for ngative index
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44843'>SPARK-44843</a>] -         flaky test: RocksDBStateStoreStreamingAggregationSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44846'>SPARK-44846</a>] -         PushFoldableIntoBranches in complex grouping expressions may cause bindReference error
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44854'>SPARK-44854</a>] -         Python timedelta to DayTimeIntervalType edge cases bug
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44871'>SPARK-44871</a>] -         Fix PERCENTILE_DISC behaviour
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44910'>SPARK-44910</a>] -         Encoders.bean does not support superclasses with generic type arguments
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44922'>SPARK-44922</a>] -         Disable o.a.p.h.InternalParquetRecordWriter logs for tests to reduce the log volume
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44925'>SPARK-44925</a>] -         K8s default service token file should not be materialized into token
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44935'>SPARK-44935</a>] -         Fix `RELEASE` file to have the correct information in Docker images
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44940'>SPARK-44940</a>] -         Improve performance of JSON parsing when &quot;spark.sql.json.enablePartialResults&quot; is enabled
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44973'>SPARK-44973</a>] -         Fix ArrayIndexOutOfBoundsException in conv()
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44990'>SPARK-44990</a>] -         CSV conversion performance severely degraded for null fields
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45054'>SPARK-45054</a>] -         HiveExternalCatalog.listPartitions should restore Spark SQL stats
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45057'>SPARK-45057</a>] -         Deadlock caused by rdd replication level of 2
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45075'>SPARK-45075</a>] -         Alter table with invalid default value will not report error
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45078'>SPARK-45078</a>] -         The ArrayInsert function should make explicit casting when element type not equals derived component type
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45079'>SPARK-45079</a>] -         percentile_approx() fails with an internal error on NULL accuracy
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45081'>SPARK-45081</a>] -         Encoders.bean does no longer work with read-only properties
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45100'>SPARK-45100</a>] -         reflect() fails with an internal error on NULL class and method
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45103'>SPARK-45103</a>] -         Update ORC to 1.8.5
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45109'>SPARK-45109</a>] -         Fix eas_decrypt and ln in connect
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45210'>SPARK-45210</a>] -         Switch languages consistently across docs for all code snippets (Spark 3.4 and below)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45227'>SPARK-45227</a>] -         Fix a subtle thread-safety issue with CoarseGrainedExecutorBackend where an executor process randomly gets stuck
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45237'>SPARK-45237</a>] -         Correct the default value of `spark.history.store.hybridStore.diskBackend` in `monitoring.md`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45282'>SPARK-45282</a>] -         Join loses records for cached datasets
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45311'>SPARK-45311</a>] -         Encoder fails on many &quot;NoSuchElementException: None.get&quot; since 3.4.x, search for an encoder for a generic type, and since 3.5.x isn&#39;t &quot;an expression encoder&quot;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45430'>SPARK-45430</a>] -         FramelessOffsetWindowFunctionFrame fails when ignore nulls and offset &gt; # of rows 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45433'>SPARK-45433</a>] -         CSV/JSON schema inference when timestamps do not match specified timestampFormat with only one row on each partition report error
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45473'>SPARK-45473</a>] -         Incorrect error message for RoundBase
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45508'>SPARK-45508</a>] -         Add &quot;--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED&quot; so Platform can access cleaner on Java 9+
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45592'>SPARK-45592</a>] -         AQE and InMemoryTableScanExec correctness bug
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45604'>SPARK-45604</a>] -         Converting timestamp_ntz to array&lt;timestamp_ntz&gt; can cause NPE or SEGFAULT on parquet vectorized reader
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45670'>SPARK-45670</a>] -         SparkSubmit does not support --total-executor-cores when deploying on K8s
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45678'>SPARK-45678</a>] -         Cover BufferReleasingInputStream.available under tryOrFetchFailedException
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45786'>SPARK-45786</a>] -         Inaccurate Decimal multiplication and division results
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45814'>SPARK-45814</a>] -         ArrowConverters.createEmptyArrowBatch may cause memory leak
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45847'>SPARK-45847</a>] -         CliSuite flakiness due to non-sequential guarantee for stdout&amp;stderr
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45878'>SPARK-45878</a>] -         ConcurrentModificationException in CliSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45884'>SPARK-45884</a>] -         Upgrade ORC to 1.8.6
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45896'>SPARK-45896</a>] -         Expression encoding fails for Seq/Map of Option[Seq/Date/Timestamp/BigDecimal]
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45920'>SPARK-45920</a>] -         group by ordinal should be idempotent
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45935'>SPARK-45935</a>] -         Fix RST files link substitutions error
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45963'>SPARK-45963</a>] -         Restore documentation for DSv2 API
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-46006'>SPARK-46006</a>] -         YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after YarnSchedulerBackend call stop
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-46016'>SPARK-46016</a>] -         Fix pandas API support list properly
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-46019'>SPARK-46019</a>] -         Fix HiveThriftServer2ListenerSuite and ThriftServerPageSuite to create java.io.tmpdir if it doesn&#39;t exist
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-46033'>SPARK-46033</a>] -         Fix flaky ArithmeticExpressionSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-46062'>SPARK-46062</a>] -         CTE reference node does not inherit the flag `isStreaming` from CTE definition node
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-46064'>SPARK-46064</a>] -         EliminateEventTimeWatermark does not consider the fact that isStreaming flag can change for current child during resolution
</li>
</ul>
            
<h2>        New Feature
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45735'>SPARK-45735</a>] -         Reenable CatalogTests without Spark Connect
</li>
</ul>
    
<h2>        Improvement
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44206'>SPARK-44206</a>] -         Dataset.selectExpr scope Session.active
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44415'>SPARK-44415</a>] -         Upgrade snappy-java to 1.1.10.2
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44875'>SPARK-44875</a>] -         commentor to commenter in merge script
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44920'>SPARK-44920</a>] -         Use await() instead of awaitUninterruptibly() in TransportClientFactory.createClient() 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44929'>SPARK-44929</a>] -         Standardize log output for console appender in tests
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45071'>SPARK-45071</a>] -         Optimize the processing speed of `BinaryArithmetic#dataType` when processing multi-column data
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45127'>SPARK-45127</a>] -         Exclude README.md from document build
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45286'>SPARK-45286</a>] -         Add back Matomo analytics to release docs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45588'>SPARK-45588</a>] -         Minor scaladoc improvement in StreamingForeachBatchHelper
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45640'>SPARK-45640</a>] -         Fix flaky ProtobufCatalystDataConversionSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45751'>SPARK-45751</a>] -         The default value of ‘spark.executor.logs.rolling.maxRetainedFiles&#39; on the official website is incorrect
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45829'>SPARK-45829</a>] -         The default value of ‘spark.executor.logs.rolling.maxSize&#39; on the official website is incorrect
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45882'>SPARK-45882</a>] -         BroadcastHashJoinExec propagate partitioning should respect CoalescedHashPartitioning
</li>
</ul>
    
<h2>        Test
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44544'>SPARK-44544</a>] -         Deduplicate run_python_packaging_tests
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44553'>SPARK-44553</a>] -         Ignoring `connect-check-protos` logic in GA testing
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44661'>SPARK-44661</a>] -         getMapOutputLocation should not throw NPE
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45568'>SPARK-45568</a>] -         WholeStageCodegenSparkSubmitSuite flakiness
</li>
</ul>
        
<h2>        Task
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44557'>SPARK-44557</a>] -         Flaky PIP packaging test
</li>
</ul>
                                                                                                                                        
<h2>        Documentation
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44725'>SPARK-44725</a>] -         Document spark.network.timeoutInterval
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44745'>SPARK-44745</a>] -         Document shuffle data recovery from the remounted K8s PVCs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44859'>SPARK-44859</a>] -         Fix incorrect property name `asyncProgressCheckpointingInterval` in structured streaming doc
</li>
</ul>