Release Notes - ASF JIRA

Release Notes - Spark - Version 3.2.2 - HTML format

Configure Release Notes

Sub-task

[SPARK-37675] - Prevent overwriting of push shuffle merged files once the shuffle is finalized
[SPARK-37735] - Add appId interface to KubernetesConf
[SPARK-37866] - Set file.encoding to UTF-8 for SBT tests
[SPARK-37995] - TPCDS 1TB q72 fails when spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly is false
[SPARK-37998] - Use `rbac.authorization.k8s.io/v1` instead of `v1beta1`
[SPARK-38013] - AQE can change bhj to smj if no extra shuffle introduce
[SPARK-38019] - ExecutorMonitor.timedOutExecutors should be deterministic
[SPARK-38023] - ExecutorMonitor.onExecutorRemoved should handle ExecutorDecommission as finished
[SPARK-38029] - Support docker-desktop K8S integration test in SBT
[SPARK-38030] - Query with cast containing non-nullable columns fails with AQE on Spark 3.1.1
[SPARK-38048] - Add IntegrationTestBackend.describePods to support all K8s test backends
[SPARK-38071] - Support K8s namespace parameter in SBT K8s IT
[SPARK-38072] - Support K8s imageTag parameter in SBT K8s IT
[SPARK-38081] - Support cloud-backend in K8s IT with SBT
[SPARK-38180] - Allow safe up-cast expressions in correlated equality predicates
[SPARK-38272] - Use docker-desktop instead of docker-for-desktop for Docker K8S IT deployMode and context name
[SPARK-38325] - ANSI mode: avoid potential runtime error in HashJoin.extractKeyExprAt()
[SPARK-38363] - Avoid runtime error in Dataset.summary() when ANSI mode is on
[SPARK-38392] - Add `spark-` prefix to namespaces and `-driver` suffix to drivers during IT
[SPARK-38398] - Add `priorityClassName` integration test case
[SPARK-38407] - ANSI Cast: loosen the limitation of casting non-null complex types
[SPARK-38430] - Add SBT commands to K8s IT readme
[SPARK-38538] - Fix driver environment verification in BasicDriverFeatureStepSuite
[SPARK-38787] - Possible correctness issue on stream-stream join when handling edge case
[SPARK-38809] - Implement option to skip null values in symmetric hash impl of stream-stream joins
[SPARK-39553] - Failed to remove shuffle ${shuffleId} - null when using Scala 2.13
[SPARK-39611] - PySpark support numpy 1.23.X

Bug

[SPARK-30062] - Add IMMEDIATE statement to the DB2 dialect truncate implementation
[SPARK-33206] - Spark Shuffle Index Cache calculates memory usage wrong
[SPARK-36553] - KMeans fails with NegativeArraySizeException for K = 50000 after issue #27758 was introduced
[SPARK-37290] - Exponential planning time in case of non-deterministic function
[SPARK-37498] - test_reuse_worker_of_parallelize_range is flaky
[SPARK-37544] - sequence over dates with month interval is producing incorrect results
[SPARK-37554] - Add PyArrow, pandas and plotly to release Docker image dependencies
[SPARK-37643] - when charVarcharAsString is true, char datatype partition table query incorrect
[SPARK-37690] - Recursive view `df` detected (cycle: `df` -> `df`)
[SPARK-37730] - plot.hist throws AttributeError on pandas=1.3.5
[SPARK-37793] - Invalid LocalMergedBlockData cause task hang
[SPARK-37865] - Spark should not dedup the groupingExpressions when the first child of Union has duplicate columns
[SPARK-37932] - Analyzer can fail when join left side and right side are the same view
[SPARK-37963] - Need to update Partition URI after renaming table in InMemoryCatalog
[SPARK-37977] - Upgrade ORC to 1.6.13
[SPARK-38017] - Fix the API doc for window to say it supports TimestampNTZType too as timeColumn
[SPARK-38018] - Fix ColumnVectorUtils.populate to handle CalendarIntervalType correctly
[SPARK-38042] - Encoder cannot be found when a tuple component is a type alias for an Array
[SPARK-38056] - Structured streaming not working in history server when using LevelDB
[SPARK-38073] - Update atexit function to avoid issues with late binding
[SPARK-38075] - Hive script transform with order by and limit will return fake rows
[SPARK-38120] - HiveExternalCatalog.listPartitions is failing when partition column name is upper case and dot in partition value
[SPARK-38151] - Handle `Pacific/Kanton` in DateTimeUtilsSuite
[SPARK-38178] - Correct the logic to measure the memory usage of RocksDB
[SPARK-38185] - Fix data incorrect if aggregate function is empty
[SPARK-38198] - Fix `QueryExecution.debug#toFile` use the passed in `maxFields` when `explainMode` is `CodegenMode`
[SPARK-38204] - All state operators are at a risk of inconsistency between state partitioning and operator partitioning
[SPARK-38221] - Group by a stream of complex expressions fails
[SPARK-38236] - Absolute file paths specified in create/alter table are treated as relative
[SPARK-38271] - PoissonSampler may output more rows than MaxRows
[SPARK-38273] - decodeUnsafeRows's iterators should close underlying input streams
[SPARK-38285] - ClassCastException: GenericArrayData cannot be cast to InternalRow
[SPARK-38286] - Union's maxRows and maxRowsPerPartition may overflow
[SPARK-38304] - Elt() should return null if index is null under ANSI mode
[SPARK-38309] - SHS has incorrect percentiles for shuffle read bytes and shuffle total blocks metrics
[SPARK-38320] - (flat)MapGroupsWithState can timeout groups which just received inputs in the same microbatch
[SPARK-38333] - DPP cause DataSourceScanExec java.lang.NullPointerException
[SPARK-38347] - Nullability propagation in transformUpWithNewOutput
[SPARK-38379] - Fix Kubernetes Client mode when mounting persistent volume with storage class
[SPARK-38411] - Use UTF-8 when doMergeApplicationListingInternal reads event logs
[SPARK-38412] - `from` and `to` is swapped in the StateSchemaCompatibilityChecker
[SPARK-38416] - Change day to month
[SPARK-38436] - Fix `test_ceil` to test `ceil`
[SPARK-38446] - Deadlock between ExecutorClassLoader and FileDownloadCallback caused by Log4j
[SPARK-38517] - Fix PySpark documentation generation (missing ipython_genutils)
[SPARK-38528] - NullPointerException when selecting a generator in a Stream of aggregate expressions
[SPARK-38542] - UnsafeHashedRelation should serialize numKeys out
[SPARK-38563] - Upgrade to Py4J 0.10.9.5
[SPARK-38579] - Requesting Restful API can cause NullPointerException
[SPARK-38587] - Validating new location for rename command should use formatted names
[SPARK-38614] - Don't push down limit through window that's using percent_rank
[SPARK-38631] - Arbitrary shell command injection via Utils.unpack()
[SPARK-38652] - uploadFileUri should preserve file scheme
[SPARK-38655] - OffsetWindowFunctionFrameBase cannot find the offset row whose input is not null
[SPARK-38677] - pyspark hangs in local mode running rdd map operation
[SPARK-38684] - Stream-stream outer join has a possible correctness issue due to weakly read consistent on outer iterators
[SPARK-38807] - Error when starting spark shell on Windows system
[SPARK-38830] - Warn on corrupted block messages
[SPARK-38868] - `assert_true` fails unconditionnaly after `left_outer` joins
[SPARK-38905] - Upgrade ORC to 1.6.14
[SPARK-38922] - TaskLocation.apply throw NullPointerException
[SPARK-38931] - RocksDB File manager would not create initial dfs directory with unknown number of keys on 1st empty checkpoint
[SPARK-38955] - Disable lineSep option in 'from_csv' and 'schema_of_csv'
[SPARK-38977] - Fix schema pruning with correlated subqueries
[SPARK-38990] - date_trunc and trunc both fail with format from column in inline table
[SPARK-38992] - Avoid using bash -c in ShellBasedGroupsMappingProvider
[SPARK-39060] - Typo in error messages of decimal overflow
[SPARK-39061] - Incorrect results or NPE when using Inline function against an array of dynamically created structs
[SPARK-39083] - Fix FsHistoryProvider race condition between update and clean app data
[SPARK-39084] - df.rdd.isEmpty() results in unexpected executor failure and JVM crash
[SPARK-39104] - Null Pointer Exeption on unpersist call
[SPARK-39107] - Silent change in regexp_replace's handling of empty strings
[SPARK-39258] - Fix `Hide credentials in show create table` after SPARK-35378
[SPARK-39259] - Timestamps returned by now() and equivalent functions are not consistent in subqueries
[SPARK-39283] - Spark tasks stuck forever due to deadlock between TaskMemoryManager and UnsafeExternalSorter
[SPARK-39293] - The accumulator of ArrayAggregate should copy the intermediate result if string, struct, array, or map
[SPARK-39340] - DS v2 agg pushdown should allow dots in the name of top-level columns
[SPARK-39355] - Single column uses quoted to construct UnresolvedAttribute
[SPARK-39376] - Do not output duplicated columns in star expansion of subquery alias of NATURAL/USING JOIN
[SPARK-39393] - Parquet data source only supports push-down predicate filters for non-repeated primitive types
[SPARK-39419] - When the comparator of ArraySort returns null, it should fail.
[SPARK-39421] - Sphinx build fails with "node class 'meta' is already registered, its visitors will be overridden"
[SPARK-39422] - SHOW CREATE TABLE should suggest 'AS SERDE' for Hive tables with unsupported serde configurations
[SPARK-39437] - normalize plan id separately in PlanStabilitySuite
[SPARK-39447] - Only non-broadcast query stage can propagate empty relation
[SPARK-39476] - Disable Unwrap cast optimize when casting from Long to Float/ Double or from Integer to Float
[SPARK-39496] - Inline eval path cannot handle null structs
[SPARK-39505] - Escape log content rendered in UI
[SPARK-39543] - The option of DataFrameWriterV2 should be passed to storage properties if fallback to v1
[SPARK-39548] - CreateView Command with a window clause query hit a wrong window definition not found issue
[SPARK-39551] - Add AQE invalid plan check
[SPARK-39570] - inline table should allow expressions with alias
[SPARK-39575] - ByteBuffer forget to rewind after get in AvroDeserializer
[SPARK-39621] - Make run-tests.py robust by avoiding `rmtree` usage
[SPARK-39650] - Streaming Deduplication should not check the schema of "value"
[SPARK-39672] - NotExists subquery failed with conflicting attributes
[SPARK-39758] - NPE on invalid patterns from the regexp functions
[SPARK-40804] - Missing handling a catalog name in destination tables in `RenameTableExec`
[SPARK-41336] - BroadcastExchange does not support the execute() code path. when AQE enabled

Improvement

[SPARK-36808] - Upgrade Kafka to 2.8.1
[SPARK-37670] - Support predicate pushdown and column pruning for de-duped CTEs
[SPARK-37891] - Add scalastyle check to disable scala.concurrent.ExecutionContext.Implicits.global
[SPARK-37934] - Upgrade Jetty version to 9.4.44
[SPARK-38007] - Update K8s doc to recommend K8s 1.20+
[SPARK-38046] - Fix KafkaSource/KafkaMicroBatch flaky test due to non-deterministic timing
[SPARK-38100] - Remove unused method in `Decimal`
[SPARK-38184] - Fix malformatted ExpressionDescription of `decode`
[SPARK-38211] - Add SQL migration guide on restoring loose upcast from string
[SPARK-38279] - Pin markupsafe to 2.0.1 fix linter failure
[SPARK-38305] - Check existence of file before untarring/zipping
[SPARK-38353] - Instrument __enter__ and __exit__ magic methods for pandas API on Spark
[SPARK-38487] - Fix docstrings of nlargest/nsmallest of DataFrame
[SPARK-38570] - Incorrect DynamicPartitionPruning caused by Literal
[SPARK-38816] - Wrong comment in random matrix generator in spark-als algorithm
[SPARK-38892] - Fix the UT of schema equal assert
[SPARK-38936] - Script transform feed thread should have name
[SPARK-39030] - Rename sum to avoid shading the builtin Python function
[SPARK-39154] - Remove outdated statements on distributed-sequence default index
[SPARK-39174] - Catalogs loading swallows missing classname for ClassNotFoundException
[SPARK-39240] - Source and binary releases using different tool to generates hashes for integrity

Test

[SPARK-38045] - More strict validation on plan check for stream-stream join unit test
[SPARK-38080] - Flaky test: StreamingQueryManagerSuite: 'awaitAnyTermination with timeout and resetTerminated'
[SPARK-38084] - Support `SKIP_PYTHON` and `SKIP_R` in `run-tests.py`
[SPARK-38142] - Move ArrowColumnVectorSuite to org.apache.spark.sql.vectorized
[SPARK-38297] - Fix mypy failure on DataFrame.to_numpy in pandas API on Spark
[SPARK-38786] - Test Bug in StatisticsSuite "change stats after add/drop partition command"
[SPARK-38927] - Skip NumPy/Pandas tests in `test_rdd.py` if not available
[SPARK-39252] - Flaky Test: pyspark.sql.tests.test_dataframe.DataFrameTests test_df_is_empty
[SPARK-39273] - Make PandasOnSparkTestCase inherit ReusedSQLTestCase
[SPARK-39373] - Recover branch-3.2 build broken by SPARK-39273 and SPARK-39252

Task

[SPARK-38122] - Update App Key of DocSearch
[SPARK-38144] - Remove unused `spark.storage.safetyFraction` config
[SPARK-38189] - Add priority scheduling doc for Spark on K8S
[SPARK-38318] - regression when replacing a dataset view
[SPARK-39367] - Review and fix issues in Scala/Java API docs of SQL module

Dependency upgrade

[SPARK-38303] - Upgrade ansi-regex from 5.0.0 to 5.0.1 in /dev
[SPARK-39099] - Add dependencies to Dockerfile for building Spark releases
[SPARK-39183] - Upgrade Apache Xerces Java to 2.12.2

Documentation

[SPARK-38606] - Update document to make a good guide of multiple versions of the Spark Shuffle Service
[SPARK-38629] - Two links beneath Spark SQL Guide/Data Sources do not work properly
[SPARK-39032] - Incorrectly formatted examples in pyspark.sql.functions.when
[SPARK-39219] - Promote Structured Streaming over Spark Streaming
[SPARK-39677] - Wrong args item formatting of the regexp functions

Edit/Copy Release Notes

The text area below allows the project release notes to be edited and copied to another document.

Release Notes - Spark - Version 3.2.2
    
<h2>        Sub-task
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37675'>SPARK-37675</a>] -         Prevent overwriting of push shuffle merged files once the shuffle is finalized
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37735'>SPARK-37735</a>] -         Add appId interface to KubernetesConf
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37866'>SPARK-37866</a>] -         Set file.encoding to UTF-8 for SBT tests
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37995'>SPARK-37995</a>] -         TPCDS 1TB q72 fails when spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly is false
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37998'>SPARK-37998</a>] -         Use `rbac.authorization.k8s.io/v1` instead of `v1beta1`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38013'>SPARK-38013</a>] -         AQE can change bhj to smj if no extra shuffle introduce
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38019'>SPARK-38019</a>] -         ExecutorMonitor.timedOutExecutors should be deterministic
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38023'>SPARK-38023</a>] -         ExecutorMonitor.onExecutorRemoved should handle ExecutorDecommission as finished
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38029'>SPARK-38029</a>] -         Support docker-desktop K8S integration test in SBT
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38030'>SPARK-38030</a>] -         Query with cast containing non-nullable columns fails with AQE on Spark 3.1.1
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38048'>SPARK-38048</a>] -         Add IntegrationTestBackend.describePods to support all K8s test backends
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38071'>SPARK-38071</a>] -         Support K8s namespace parameter in SBT K8s IT
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38072'>SPARK-38072</a>] -         Support K8s imageTag parameter in SBT K8s IT
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38081'>SPARK-38081</a>] -         Support cloud-backend in K8s IT with SBT
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38180'>SPARK-38180</a>] -         Allow safe up-cast expressions in correlated equality predicates
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38272'>SPARK-38272</a>] -         Use docker-desktop instead of docker-for-desktop for Docker K8S IT deployMode and context name 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38325'>SPARK-38325</a>] -         ANSI mode: avoid potential runtime error in HashJoin.extractKeyExprAt() 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38363'>SPARK-38363</a>] -         Avoid runtime error in Dataset.summary() when ANSI mode is on
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38392'>SPARK-38392</a>] -         Add `spark-` prefix to namespaces and `-driver` suffix to drivers during IT
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38398'>SPARK-38398</a>] -         Add `priorityClassName` integration test case
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38407'>SPARK-38407</a>] -         ANSI Cast: loosen the limitation of casting non-null complex types
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38430'>SPARK-38430</a>] -         Add SBT commands to K8s IT readme
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38538'>SPARK-38538</a>] -         Fix driver environment verification in BasicDriverFeatureStepSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38787'>SPARK-38787</a>] -         Possible correctness issue on stream-stream join when handling edge case
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38809'>SPARK-38809</a>] -         Implement option to skip null values in symmetric hash impl of stream-stream joins
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39553'>SPARK-39553</a>] -         Failed to remove shuffle ${shuffleId} - null when using Scala 2.13
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39611'>SPARK-39611</a>] -         PySpark support numpy 1.23.X
</li>
</ul>
            
<h2>        Bug
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30062'>SPARK-30062</a>] -         Add IMMEDIATE statement to the DB2 dialect truncate implementation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33206'>SPARK-33206</a>] -         Spark Shuffle Index Cache calculates memory usage wrong
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-36553'>SPARK-36553</a>] -         KMeans fails with NegativeArraySizeException for K = 50000 after issue #27758 was introduced
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37290'>SPARK-37290</a>] -         Exponential planning time in case of non-deterministic function
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37498'>SPARK-37498</a>] -          test_reuse_worker_of_parallelize_range is flaky
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37544'>SPARK-37544</a>] -         sequence over dates with month interval is producing incorrect results
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37554'>SPARK-37554</a>] -         Add PyArrow, pandas and plotly to release Docker image dependencies
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37643'>SPARK-37643</a>] -         when charVarcharAsString is true, char datatype partition table query incorrect
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37690'>SPARK-37690</a>] -         Recursive view `df` detected (cycle: `df` -&gt; `df`)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37730'>SPARK-37730</a>] -         plot.hist throws AttributeError on pandas=1.3.5
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37793'>SPARK-37793</a>] -         Invalid LocalMergedBlockData cause task hang
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37865'>SPARK-37865</a>] -         Spark should not dedup the groupingExpressions when the first child of Union has duplicate columns
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37932'>SPARK-37932</a>] -         Analyzer can fail when join left side and right side are the same view
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37963'>SPARK-37963</a>] -         Need to update Partition URI after renaming table in InMemoryCatalog
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37977'>SPARK-37977</a>] -         Upgrade ORC to 1.6.13
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38017'>SPARK-38017</a>] -         Fix the API doc for window to say it supports TimestampNTZType too as timeColumn
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38018'>SPARK-38018</a>] -         Fix ColumnVectorUtils.populate to handle CalendarIntervalType correctly
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38042'>SPARK-38042</a>] -         Encoder cannot be found when a tuple component is a type alias for an Array
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38056'>SPARK-38056</a>] -         Structured streaming not working in history server when using LevelDB
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38073'>SPARK-38073</a>] -         Update atexit function to avoid issues with late binding
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38075'>SPARK-38075</a>] -         Hive script transform with order by and limit will return fake rows
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38120'>SPARK-38120</a>] -         HiveExternalCatalog.listPartitions is failing when partition column name is upper case and dot in partition value
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38151'>SPARK-38151</a>] -         Handle `Pacific/Kanton` in DateTimeUtilsSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38178'>SPARK-38178</a>] -         Correct the logic to measure the memory usage of RocksDB
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38185'>SPARK-38185</a>] -         Fix data incorrect if aggregate function is empty
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38198'>SPARK-38198</a>] -         Fix `QueryExecution.debug#toFile` use the passed in `maxFields` when `explainMode` is `CodegenMode`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38204'>SPARK-38204</a>] -         All state operators are at a risk of inconsistency between state partitioning and operator partitioning
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38221'>SPARK-38221</a>] -         Group by a stream of complex expressions fails
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38236'>SPARK-38236</a>] -         Absolute file paths specified in create/alter table are treated as relative
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38271'>SPARK-38271</a>] -         PoissonSampler may output more rows than MaxRows
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38273'>SPARK-38273</a>] -         decodeUnsafeRows&#39;s iterators should close underlying input streams
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38285'>SPARK-38285</a>] -         ClassCastException: GenericArrayData cannot be cast to InternalRow
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38286'>SPARK-38286</a>] -         Union&#39;s maxRows and maxRowsPerPartition may overflow
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38304'>SPARK-38304</a>] -         Elt() should return null if index is null under ANSI mode
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38309'>SPARK-38309</a>] -         SHS has incorrect percentiles for shuffle read bytes and shuffle total blocks metrics
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38320'>SPARK-38320</a>] -         (flat)MapGroupsWithState can timeout groups which just received inputs in the same microbatch
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38333'>SPARK-38333</a>] -         DPP cause DataSourceScanExec java.lang.NullPointerException
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38347'>SPARK-38347</a>] -         Nullability propagation in transformUpWithNewOutput
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38379'>SPARK-38379</a>] -         Fix Kubernetes Client mode when mounting persistent volume with storage class
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38411'>SPARK-38411</a>] -         Use UTF-8 when doMergeApplicationListingInternal reads event logs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38412'>SPARK-38412</a>] -         `from` and `to` is swapped in the StateSchemaCompatibilityChecker
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38416'>SPARK-38416</a>] -         Change day to month 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38436'>SPARK-38436</a>] -         Fix `test_ceil` to test `ceil`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38446'>SPARK-38446</a>] -         Deadlock between ExecutorClassLoader and FileDownloadCallback caused by Log4j
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38517'>SPARK-38517</a>] -         Fix PySpark documentation generation (missing ipython_genutils)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38528'>SPARK-38528</a>] -         NullPointerException when selecting a generator in a Stream of aggregate expressions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38542'>SPARK-38542</a>] -         UnsafeHashedRelation should serialize numKeys out
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38563'>SPARK-38563</a>] -         Upgrade to Py4J 0.10.9.5
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38579'>SPARK-38579</a>] -         Requesting Restful API can cause NullPointerException
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38587'>SPARK-38587</a>] -         Validating new location for rename command should use formatted names
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38614'>SPARK-38614</a>] -         Don&#39;t push down limit through window that&#39;s using percent_rank
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38631'>SPARK-38631</a>] -         Arbitrary shell command injection via Utils.unpack()
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38652'>SPARK-38652</a>] -         uploadFileUri should preserve file scheme
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38655'>SPARK-38655</a>] -         OffsetWindowFunctionFrameBase cannot find the offset row whose input is not null
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38677'>SPARK-38677</a>] -         pyspark hangs in local mode running rdd map operation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38684'>SPARK-38684</a>] -         Stream-stream outer join has a possible correctness issue due to weakly read consistent on outer iterators
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38807'>SPARK-38807</a>] -         Error when starting spark shell on Windows system
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38830'>SPARK-38830</a>] -         Warn on corrupted block messages
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38868'>SPARK-38868</a>] -         `assert_true` fails unconditionnaly after `left_outer` joins
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38905'>SPARK-38905</a>] -         Upgrade ORC to 1.6.14
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38922'>SPARK-38922</a>] -         TaskLocation.apply throw NullPointerException
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38931'>SPARK-38931</a>] -         RocksDB File manager would not create initial dfs directory with unknown number of keys on 1st empty checkpoint
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38955'>SPARK-38955</a>] -         Disable lineSep option in &#39;from_csv&#39; and &#39;schema_of_csv&#39;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38977'>SPARK-38977</a>] -         Fix schema pruning with correlated subqueries
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38990'>SPARK-38990</a>] -         date_trunc and trunc both fail with format from column in inline table
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38992'>SPARK-38992</a>] -         Avoid using bash -c in ShellBasedGroupsMappingProvider
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39060'>SPARK-39060</a>] -         Typo in error messages of decimal overflow
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39061'>SPARK-39061</a>] -         Incorrect results or NPE when using Inline function against an array of dynamically created structs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39083'>SPARK-39083</a>] -         Fix FsHistoryProvider race condition between update and clean app data
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39084'>SPARK-39084</a>] -         df.rdd.isEmpty() results in unexpected executor failure and JVM crash
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39104'>SPARK-39104</a>] -         Null Pointer Exeption on unpersist call
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39107'>SPARK-39107</a>] -         Silent change in regexp_replace&#39;s handling of empty strings
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39258'>SPARK-39258</a>] -         Fix `Hide credentials in show create table` after SPARK-35378
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39259'>SPARK-39259</a>] -         Timestamps returned by now() and equivalent functions are not consistent in subqueries
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39283'>SPARK-39283</a>] -         Spark tasks stuck forever due to deadlock between TaskMemoryManager and UnsafeExternalSorter
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39293'>SPARK-39293</a>] -         The accumulator of ArrayAggregate should copy the intermediate result if string, struct, array, or map
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39340'>SPARK-39340</a>] -         DS v2 agg pushdown should allow dots in the name of top-level columns
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39355'>SPARK-39355</a>] -         Single column uses quoted to construct UnresolvedAttribute
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39376'>SPARK-39376</a>] -         Do not output duplicated columns in star expansion of subquery alias of NATURAL/USING JOIN
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39393'>SPARK-39393</a>] -         Parquet data source only supports push-down predicate filters for non-repeated primitive types
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39419'>SPARK-39419</a>] -         When the comparator of ArraySort returns null, it should fail.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39421'>SPARK-39421</a>] -         Sphinx build fails with &quot;node class &#39;meta&#39; is already registered, its visitors will be overridden&quot;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39422'>SPARK-39422</a>] -         SHOW CREATE TABLE should suggest &#39;AS SERDE&#39; for Hive tables with unsupported serde configurations
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39437'>SPARK-39437</a>] -         normalize plan id separately in PlanStabilitySuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39447'>SPARK-39447</a>] -         Only non-broadcast query stage can propagate empty relation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39476'>SPARK-39476</a>] -         Disable Unwrap cast optimize when casting from Long to Float/ Double or from Integer to Float
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39496'>SPARK-39496</a>] -         Inline eval path cannot handle null structs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39505'>SPARK-39505</a>] -         Escape log content rendered in UI
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39543'>SPARK-39543</a>] -         The option of DataFrameWriterV2 should be passed to storage properties if fallback to v1
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39548'>SPARK-39548</a>] -         CreateView Command with a window clause query hit a wrong window definition not found issue
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39551'>SPARK-39551</a>] -         Add AQE invalid plan check
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39570'>SPARK-39570</a>] -         inline table should allow expressions with alias
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39575'>SPARK-39575</a>] -         ByteBuffer forget to rewind after get in AvroDeserializer
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39621'>SPARK-39621</a>] -         Make run-tests.py robust by avoiding `rmtree` usage
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39650'>SPARK-39650</a>] -         Streaming Deduplication should not check the schema of &quot;value&quot;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39672'>SPARK-39672</a>] -         NotExists subquery failed with conflicting attributes
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39758'>SPARK-39758</a>] -         NPE on invalid patterns from the regexp functions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-40804'>SPARK-40804</a>] -         Missing handling a catalog name in destination tables in `RenameTableExec`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-41336'>SPARK-41336</a>] -         BroadcastExchange does not support the execute() code path. when AQE enabled
</li>
</ul>
                
<h2>        Improvement
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-36808'>SPARK-36808</a>] -         Upgrade Kafka to 2.8.1
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37670'>SPARK-37670</a>] -         Support predicate pushdown and column pruning for de-duped CTEs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37891'>SPARK-37891</a>] -         Add scalastyle check to disable scala.concurrent.ExecutionContext.Implicits.global
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37934'>SPARK-37934</a>] -         Upgrade Jetty version to 9.4.44
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38007'>SPARK-38007</a>] -         Update K8s doc to recommend K8s 1.20+
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38046'>SPARK-38046</a>] -         Fix KafkaSource/KafkaMicroBatch flaky test due to non-deterministic timing
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38100'>SPARK-38100</a>] -         Remove unused method in `Decimal`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38184'>SPARK-38184</a>] -         Fix malformatted ExpressionDescription of `decode`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38211'>SPARK-38211</a>] -         Add SQL migration guide on restoring loose upcast from string
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38279'>SPARK-38279</a>] -         Pin markupsafe to 2.0.1 fix linter failure
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38305'>SPARK-38305</a>] -         Check existence of file before untarring/zipping
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38353'>SPARK-38353</a>] -         Instrument __enter__ and __exit__ magic methods for pandas API on Spark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38487'>SPARK-38487</a>] -         Fix docstrings of nlargest/nsmallest of DataFrame
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38570'>SPARK-38570</a>] -         Incorrect DynamicPartitionPruning caused by Literal
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38816'>SPARK-38816</a>] -         Wrong comment in random matrix generator in spark-als algorithm 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38892'>SPARK-38892</a>] -         Fix the UT of schema equal assert
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38936'>SPARK-38936</a>] -         Script transform feed thread should have name
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39030'>SPARK-39030</a>] -         Rename sum to avoid shading the builtin Python function
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39154'>SPARK-39154</a>] -         Remove outdated statements on distributed-sequence default index 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39174'>SPARK-39174</a>] -         Catalogs loading swallows missing classname for ClassNotFoundException
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39240'>SPARK-39240</a>] -         Source and binary releases using different tool to generates hashes for integrity
</li>
</ul>
    
<h2>        Test
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38045'>SPARK-38045</a>] -         More strict validation on plan check for stream-stream join unit test
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38080'>SPARK-38080</a>] -         Flaky test: StreamingQueryManagerSuite: &#39;awaitAnyTermination with timeout and resetTerminated&#39;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38084'>SPARK-38084</a>] -         Support `SKIP_PYTHON` and `SKIP_R` in `run-tests.py`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38142'>SPARK-38142</a>] -         Move ArrowColumnVectorSuite to org.apache.spark.sql.vectorized
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38297'>SPARK-38297</a>] -         Fix mypy failure on DataFrame.to_numpy in pandas API on Spark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38786'>SPARK-38786</a>] -         Test Bug in StatisticsSuite &quot;change stats after add/drop partition command&quot;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38927'>SPARK-38927</a>] -         Skip NumPy/Pandas tests in `test_rdd.py` if not available
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39252'>SPARK-39252</a>] -         Flaky Test: pyspark.sql.tests.test_dataframe.DataFrameTests test_df_is_empty
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39273'>SPARK-39273</a>] -         Make PandasOnSparkTestCase inherit ReusedSQLTestCase
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39373'>SPARK-39373</a>] -         Recover branch-3.2 build broken by SPARK-39273 and SPARK-39252
</li>
</ul>
        
<h2>        Task
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38122'>SPARK-38122</a>] -         Update App Key of DocSearch
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38144'>SPARK-38144</a>] -         Remove unused `spark.storage.safetyFraction` config
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38189'>SPARK-38189</a>] -         Add priority scheduling doc for Spark on K8S
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38318'>SPARK-38318</a>] -         regression when replacing a dataset view
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39367'>SPARK-39367</a>] -         Review and fix issues in Scala/Java API docs of SQL module
</li>
</ul>
                                                    
<h2>        Dependency upgrade
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38303'>SPARK-38303</a>] -         Upgrade ansi-regex from 5.0.0 to 5.0.1 in /dev
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39099'>SPARK-39099</a>] -         Add dependencies to Dockerfile for building Spark releases
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39183'>SPARK-39183</a>] -         Upgrade Apache Xerces Java to 2.12.2
</li>
</ul>
                                                                                    
<h2>        Documentation
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38606'>SPARK-38606</a>] -         Update document to make a good guide of multiple versions of the Spark Shuffle Service 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38629'>SPARK-38629</a>] -         Two links beneath Spark SQL Guide/Data Sources do not work properly
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39032'>SPARK-39032</a>] -         Incorrectly formatted examples in pyspark.sql.functions.when
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39219'>SPARK-39219</a>] -         Promote Structured Streaming over Spark Streaming
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-39677'>SPARK-39677</a>] -         Wrong args item formatting of the regexp functions
</li>
</ul>