Release Notes - ASF JIRA

Release Notes - Spark - Version 3.3.4 - HTML format

Configure Release Notes

Sub-task

[SPARK-44857] - Fix getBaseURI error in Spark Worker LogPage UI buttons
[SPARK-45187] - Fix WorkerPage to use the same pattern for `logPage` urls
[SPARK-45749] - Fix Spark History Server to sort `Duration` column properly
[SPARK-46012] - EventLogFileReader should not read rolling logs if appStatus is missing
[SPARK-46095] - Document REST API for Spark Standalone Cluster

Bug

[SPARK-43327] - Trigger `committer.setupJob` before plan execute in `FileFormatWriter`
[SPARK-43393] - Sequence expression can overflow
[SPARK-44074] - `Logging plan changes for execution` test failed
[SPARK-44547] - BlockManagerDecommissioner throws exceptions when migrating RDD cached blocks to fallback storage
[SPARK-44581] - ShutdownHookManager get wrong hadoop user group information
[SPARK-44805] - Data lost after union using spark.sql.parquet.enableNestedColumnVectorizedReader=true
[SPARK-44813] - The JIRA Python misses our assignee when it searches user again
[SPARK-44843] - flaky test: RocksDBStateStoreStreamingAggregationSuite
[SPARK-44871] - Fix PERCENTILE_DISC behaviour
[SPARK-44925] - K8s default service token file should not be materialized into token
[SPARK-44935] - Fix `RELEASE` file to have the correct information in Docker images
[SPARK-44973] - Fix ArrayIndexOutOfBoundsException in conv()
[SPARK-44990] - CSV conversion performance severely degraded for null fields
[SPARK-45057] - Deadlock caused by rdd replication level of 2
[SPARK-45079] - percentile_approx() fails with an internal error on NULL accuracy
[SPARK-45100] - reflect() fails with an internal error on NULL class and method
[SPARK-45210] - Switch languages consistently across docs for all code snippets (Spark 3.4 and below)
[SPARK-45227] - Fix a subtle thread-safety issue with CoarseGrainedExecutorBackend where an executor process randomly gets stuck
[SPARK-45430] - FramelessOffsetWindowFunctionFrame fails when ignore nulls and offset > # of rows
[SPARK-45508] - Add "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED" so Platform can access cleaner on Java 9+
[SPARK-45580] - Subquery changes the output schema of the outer query
[SPARK-45670] - SparkSubmit does not support --total-executor-cores when deploying on K8s
[SPARK-45885] - Upgrade ORC to 1.7.10
[SPARK-45920] - group by ordinal should be idempotent
[SPARK-45935] - Fix RST files link substitutions error
[SPARK-46006] - YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after YarnSchedulerBackend call stop
[SPARK-46019] - Fix HiveThriftServer2ListenerSuite and ThriftServerPageSuite to create java.io.tmpdir if it doesn't exist
[SPARK-46092] - Overflow in Parquet row group filter creation causes incorrect results
[SPARK-46239] - Hide Jetty info

Improvement

[SPARK-44920] - Use await() instead of awaitUninterruptibly() in TransportClientFactory.createClient()
[SPARK-45127] - Exclude README.md from document build
[SPARK-45286] - Add back Matomo analytics to release docs
[SPARK-45751] - The default value of ‘spark.executor.logs.rolling.maxRetainedFiles' on the official website is incorrect
[SPARK-45829] - The default value of ‘spark.executor.logs.rolling.maxSize' on the official website is incorrect
[SPARK-46286] - Document spark.io.compression.zstd.bufferPool.enabled

Test

[SPARK-45568] - WholeStageCodegenSparkSubmitSuite flakiness

Documentation

[SPARK-44725] - Document spark.network.timeoutInterval

Edit/Copy Release Notes

The text area below allows the project release notes to be edited and copied to another document.

Release Notes - Spark - Version 3.3.4
    
<h2>        Sub-task
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44857'>SPARK-44857</a>] -         Fix getBaseURI error in Spark Worker LogPage UI buttons
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45187'>SPARK-45187</a>] -         Fix WorkerPage to use the same pattern for `logPage` urls
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45749'>SPARK-45749</a>] -         Fix Spark History Server to sort `Duration` column properly
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-46012'>SPARK-46012</a>] -         EventLogFileReader should not read rolling logs if appStatus is missing
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-46095'>SPARK-46095</a>] -         Document REST API for Spark Standalone Cluster
</li>
</ul>
            
<h2>        Bug
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-43327'>SPARK-43327</a>] -         Trigger `committer.setupJob` before plan execute in `FileFormatWriter`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-43393'>SPARK-43393</a>] -         Sequence expression can overflow
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44074'>SPARK-44074</a>] -         `Logging plan changes for execution` test failed
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44547'>SPARK-44547</a>] -         BlockManagerDecommissioner throws exceptions when migrating RDD cached blocks to fallback storage
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44581'>SPARK-44581</a>] -         ShutdownHookManager get wrong hadoop user group information
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44805'>SPARK-44805</a>] -         Data lost after union using spark.sql.parquet.enableNestedColumnVectorizedReader=true
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44813'>SPARK-44813</a>] -         The JIRA Python misses our assignee when it searches user again
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44843'>SPARK-44843</a>] -         flaky test: RocksDBStateStoreStreamingAggregationSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44871'>SPARK-44871</a>] -         Fix PERCENTILE_DISC behaviour
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44925'>SPARK-44925</a>] -         K8s default service token file should not be materialized into token
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44935'>SPARK-44935</a>] -         Fix `RELEASE` file to have the correct information in Docker images
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44973'>SPARK-44973</a>] -         Fix ArrayIndexOutOfBoundsException in conv()
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44990'>SPARK-44990</a>] -         CSV conversion performance severely degraded for null fields
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45057'>SPARK-45057</a>] -         Deadlock caused by rdd replication level of 2
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45079'>SPARK-45079</a>] -         percentile_approx() fails with an internal error on NULL accuracy
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45100'>SPARK-45100</a>] -         reflect() fails with an internal error on NULL class and method
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45210'>SPARK-45210</a>] -         Switch languages consistently across docs for all code snippets (Spark 3.4 and below)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45227'>SPARK-45227</a>] -         Fix a subtle thread-safety issue with CoarseGrainedExecutorBackend where an executor process randomly gets stuck
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45430'>SPARK-45430</a>] -         FramelessOffsetWindowFunctionFrame fails when ignore nulls and offset &gt; # of rows 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45508'>SPARK-45508</a>] -         Add &quot;--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED&quot; so Platform can access cleaner on Java 9+
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45580'>SPARK-45580</a>] -         Subquery changes the output schema of the outer query
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45670'>SPARK-45670</a>] -         SparkSubmit does not support --total-executor-cores when deploying on K8s
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45885'>SPARK-45885</a>] -         Upgrade ORC to 1.7.10
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45920'>SPARK-45920</a>] -         group by ordinal should be idempotent
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45935'>SPARK-45935</a>] -         Fix RST files link substitutions error
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-46006'>SPARK-46006</a>] -         YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after YarnSchedulerBackend call stop
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-46019'>SPARK-46019</a>] -         Fix HiveThriftServer2ListenerSuite and ThriftServerPageSuite to create java.io.tmpdir if it doesn&#39;t exist
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-46092'>SPARK-46092</a>] -         Overflow in Parquet row group filter creation causes incorrect results
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-46239'>SPARK-46239</a>] -         Hide Jetty info 
</li>
</ul>
                
<h2>        Improvement
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44920'>SPARK-44920</a>] -         Use await() instead of awaitUninterruptibly() in TransportClientFactory.createClient() 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45127'>SPARK-45127</a>] -         Exclude README.md from document build
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45286'>SPARK-45286</a>] -         Add back Matomo analytics to release docs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45751'>SPARK-45751</a>] -         The default value of ‘spark.executor.logs.rolling.maxRetainedFiles&#39; on the official website is incorrect
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45829'>SPARK-45829</a>] -         The default value of ‘spark.executor.logs.rolling.maxSize&#39; on the official website is incorrect
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-46286'>SPARK-46286</a>] -         Document spark.io.compression.zstd.bufferPool.enabled
</li>
</ul>
    
<h2>        Test
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-45568'>SPARK-45568</a>] -         WholeStageCodegenSparkSubmitSuite flakiness
</li>
</ul>
                                                                                                                                                
<h2>        Documentation
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-44725'>SPARK-44725</a>] -         Document spark.network.timeoutInterval
</li>
</ul>