Release Notes - ASF JIRA

Release Notes - Spark - Version 3.2.4 - HTML format

Configure Release Notes

Sub-task

[SPARK-41388] - getReusablePVCs should ignore recently created PVCs in the previous batch
[SPARK-42071] - Register scala.math.Ordering$Reverse to KyroSerializer

Bug

[SPARK-38173] - Quoted column cannot be recognized correctly when quotedRegexColumnNames is true
[SPARK-39399] - proxy-user not working for Spark on k8s in cluster deploy mode
[SPARK-39596] - Run `Linters, licenses, dependencies and documentation generation ` GitHub Actions failed
[SPARK-40817] - Remote spark.jars URIs ignored for Spark on Kubernetes in cluster mode
[SPARK-40819] - Parquet INT64 (TIMESTAMP(NANOS,true)) now throwing Illegal Parquet type instead of automatically converting to LongType
[SPARK-41162] - Anti-join must not be pushed below aggregation with ambiguous predicates
[SPARK-41254] - YarnAllocator.rpIdToYarnResource map is not properly updated
[SPARK-41376] - Executor netty direct memory check should respect spark.shuffle.io.preferDirectBufs
[SPARK-41554] - Decimal.changePrecision produces ArrayIndexOutOfBoundsException
[SPARK-41732] - Session window: analysis rule "SessionWindowing" does not apply tree-pattern based pruning
[SPARK-41952] - Upgrade Parquet to fix off-heap memory leaks in Zstd codec
[SPARK-41989] - PYARROW_IGNORE_TIMEZONE warning can break application logging setup
[SPARK-42090] - Introduce sasl retry count in RetryingBlockTransferor
[SPARK-42157] - `spark.scheduler.mode=FAIR` should provide FAIR scheduler
[SPARK-42168] - CoGroup with window function returns incorrect result when partition keys differ in order
[SPARK-42188] - Force SBT protobuf version to match Maven on branch 3.2 and 3.3
[SPARK-42201] - `build/sbt` should allow SBT_OPTS to override JVM memory setting
[SPARK-42259] - ResolveGroupingAnalytics should take care of Python UDAF
[SPARK-42462] - Prevent `docker-image-tool.sh` from publishing OCI manifests
[SPARK-42478] - Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory
[SPARK-42596] - [YARN] OMP_NUM_THREADS not set to number of executor cores by default
[SPARK-42649] - Remove the standard Apache License header from the top of third-party source files
[SPARK-42673] - Make build/mvn build Spark only with the verified maven version
[SPARK-42697] - /api/v1/applications return 0 for duration
[SPARK-42747] - Fix incorrect internal status of LoR and AFT
[SPARK-42785] - [K8S][Core] When spark submit without --deploy-mode, will face NPE in Kubernetes Case
[SPARK-42799] - Update SBT build `xercesImpl` version to match with pom.xml
[SPARK-42906] - Replace a starting digit with `x` in resource name prefix
[SPARK-42967] - Fix SparkListenerTaskStart.stageAttemptId when a task is started after the stage is cancelled
[SPARK-43004] - vendor==vendor typo in ResourceRequest.equals()
[SPARK-43005] - `v is v >= 0` typo in pyspark/pandas/config.py
[SPARK-43069] - Use `sbt-eclipse` instead of `sbteclipse-plugin`

Improvement

[SPARK-41360] - Avoid BlockManager re-registration if the executor has been lost
[SPARK-42934] - Testing OrcEncryptionSuite using maven is always skipped
[SPARK-43395] - Exclude macOS tar extended metadata in make-distribution.sh

Test

[SPARK-36883] - Upgrade R version to 4.1.1 in CI images
[SPARK-41863] - Skip `flake8` tests if the command is not available
[SPARK-41865] - Use pycodestyle to 2.7.0 to fix pycodestyle errors

Task

[SPARK-41415] - SASL Request Retries

Dependency upgrade

[SPARK-41030] - Upgrade Apache Ivy to 2.5.1

Github Integration

[SPARK-38261] - Sync missing R packages with CI

Edit/Copy Release Notes

The text area below allows the project release notes to be edited and copied to another document.

Release Notes - Spark - Version 3.2.4
    
<h2>        Sub-task
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-41388'>SPARK-41388</a>] -         getReusablePVCs should ignore recently created PVCs in the previous batch
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-42071'>SPARK-42071</a>] -         Register scala.math.Ordering$Reverse to KyroSerializer
</li>
</ul>
            
<h2>        Bug
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38173'>SPARK-38173</a>] -         Quoted column cannot be recognized correctly when quotedRegexColumnNames is true
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39399'>SPARK-39399</a>] -         proxy-user not working for Spark on k8s in cluster deploy mode
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39596'>SPARK-39596</a>] -         Run `Linters, licenses, dependencies and documentation generation ` GitHub Actions failed
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-40817'>SPARK-40817</a>] -         Remote spark.jars URIs ignored for Spark on Kubernetes in cluster mode 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-40819'>SPARK-40819</a>] -         Parquet INT64 (TIMESTAMP(NANOS,true)) now throwing Illegal Parquet type instead of automatically converting to LongType 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-41162'>SPARK-41162</a>] -         Anti-join must not be pushed below aggregation with ambiguous predicates
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-41254'>SPARK-41254</a>] -         YarnAllocator.rpIdToYarnResource map is not properly updated
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-41376'>SPARK-41376</a>] -         Executor netty direct memory check should respect spark.shuffle.io.preferDirectBufs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-41554'>SPARK-41554</a>] -         Decimal.changePrecision produces ArrayIndexOutOfBoundsException
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-41732'>SPARK-41732</a>] -         Session window: analysis rule &quot;SessionWindowing&quot; does not apply tree-pattern based pruning
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-41952'>SPARK-41952</a>] -         Upgrade Parquet to fix off-heap memory leaks in Zstd codec
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-41989'>SPARK-41989</a>] -         PYARROW_IGNORE_TIMEZONE warning can break application logging setup
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-42090'>SPARK-42090</a>] -         Introduce sasl retry count in RetryingBlockTransferor
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-42157'>SPARK-42157</a>] -         `spark.scheduler.mode=FAIR` should provide FAIR scheduler
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-42168'>SPARK-42168</a>] -         CoGroup with window function returns incorrect result when partition keys differ in order
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-42188'>SPARK-42188</a>] -         Force SBT protobuf version to match Maven on branch 3.2 and 3.3
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-42201'>SPARK-42201</a>] -         `build/sbt` should allow SBT_OPTS to override JVM memory setting
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-42259'>SPARK-42259</a>] -         ResolveGroupingAnalytics should take care of Python UDAF
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-42462'>SPARK-42462</a>] -         Prevent `docker-image-tool.sh` from publishing OCI manifests
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-42478'>SPARK-42478</a>] -         Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-42596'>SPARK-42596</a>] -         [YARN] OMP_NUM_THREADS not set to number of executor cores by default
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-42649'>SPARK-42649</a>] -         Remove the standard Apache License header from the top of third-party source files
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-42673'>SPARK-42673</a>] -         Make build/mvn build Spark only with the verified maven version
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-42697'>SPARK-42697</a>] -         /api/v1/applications return 0 for duration
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-42747'>SPARK-42747</a>] -         Fix incorrect internal status of LoR and AFT
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-42785'>SPARK-42785</a>] -         [K8S][Core] When spark submit without --deploy-mode, will face NPE in Kubernetes Case
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-42799'>SPARK-42799</a>] -         Update SBT build `xercesImpl` version to match with pom.xml
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-42906'>SPARK-42906</a>] -         Replace a starting digit with `x` in resource name prefix
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-42967'>SPARK-42967</a>] -         Fix SparkListenerTaskStart.stageAttemptId when a task is started after the stage is cancelled
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-43004'>SPARK-43004</a>] -         vendor==vendor typo in ResourceRequest.equals()
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-43005'>SPARK-43005</a>] -         `v is v &gt;= 0` typo in pyspark/pandas/config.py
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-43069'>SPARK-43069</a>] -         Use `sbt-eclipse` instead of `sbteclipse-plugin`
</li>
</ul>
                
<h2>        Improvement
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-41360'>SPARK-41360</a>] -         Avoid BlockManager re-registration if the executor has been lost
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-42934'>SPARK-42934</a>] -         Testing OrcEncryptionSuite using maven is always skipped
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-43395'>SPARK-43395</a>] -         Exclude macOS tar extended metadata in make-distribution.sh
</li>
</ul>
    
<h2>        Test
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-36883'>SPARK-36883</a>] -         Upgrade R version to 4.1.1 in CI images
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-41863'>SPARK-41863</a>] -         Skip `flake8` tests if the command is not available
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-41865'>SPARK-41865</a>] -         Use pycodestyle to 2.7.0 to fix pycodestyle errors
</li>
</ul>
        
<h2>        Task
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-41415'>SPARK-41415</a>] -         SASL Request Retries
</li>
</ul>
                                                    
<h2>        Dependency upgrade
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-41030'>SPARK-41030</a>] -         Upgrade Apache Ivy to 2.5.1
</li>
</ul>
                                                                                                
<h2>        Github Integration
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38261'>SPARK-38261</a>] -         Sync missing R packages with CI
</li>
</ul>