Release Notes - ASF JIRA

Release Notes - Spark - Version 3.2.1 - HTML format

Configure Release Notes

Sub-task

[SPARK-30789] - Support (IGNORE | RESPECT) NULLS for LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE
[SPARK-36632] - DivideYMInterval and DivideDTInterval should throw the same exception when divide by zero.
[SPARK-36754] - array_intersect should handle Double.NaN and Float.NaN
[SPARK-36785] - Fix ps.DataFrame.isin
[SPARK-36900] - "SPARK-36464: size returns correct positive number even with over 2GB data" will oom with JDK17
[SPARK-37023] - Avoid fetching merge status when shuffleMergeEnabled is false for a shuffleDependency during retry
[SPARK-37317] - Reduce weights in GaussianMixtureSuite
[SPARK-37389] - Check unclosed bracketed comments
[SPARK-37442] - In AQE, wrong InMemoryRelation size estimation causes "Cannot broadcast the table that is larger than 8GB: 8 GB" failure
[SPARK-37522] - Fix MultilayerPerceptronClassifierTest.test_raw_and_probability_prediction
[SPARK-37695] - Skip diagnosis ob merged blocks from push-based shuffle
[SPARK-37957] - Deterministic flag is not handled for V2 functions
[SPARK-38129] - Adaptively enable timeout for BroadcastQueryStageExec

Bug

[SPARK-23626] - DAGScheduler blocked due to JobSubmitted event
[SPARK-33277] - Python/Pandas UDF right after off-heap vectorized reader could cause executor crash.
[SPARK-36464] - Fix Underlying Size Variable Initialization in ChunkedByteBufferOutputStream for Writing Over 2GB Data
[SPARK-36717] - Wrong order of variable initialization may lead to incorrect behavior
[SPARK-36795] - Explain Formatted has Duplicated Node IDs with InMemoryRelation Present
[SPARK-36865] - Add PySpark API document of session_window
[SPARK-36905] - Reading Hive view without explicit column names fails in Spark
[SPARK-36979] - Add RewriteLateralSubquery rule into nonExcludableRules
[SPARK-36993] - Fix json_tuple throw NPE if fields exist no foldable null value
[SPARK-37004] - Job cancellation causes py4j errors on Jupyter due to pinned thread mode
[SPARK-37046] - Alter view does not preserve column case
[SPARK-37049] - executorIdleTimeout is not working for pending pods on K8s
[SPARK-37052] - Fix spark-3.2 can use --verbose with spark-shell
[SPARK-37057] - Fix wrong DocSearch facet filter in release-tag.sh
[SPARK-37060] - Report driver status does not handle response from backup masters
[SPARK-37061] - Custom V2 Metrics uses wrong classname for lookup
[SPARK-37064] - Fix outer join return the wrong max rows if other side is empty
[SPARK-37069] - HiveClientImpl throws NoSuchMethodError: org.apache.hadoop.hive.ql.metadata.Hive.getWithoutRegisterFns
[SPARK-37078] - Support old 3-parameter Sink constructors
[SPARK-37079] - Fix DataFrameWriterV2.partitionedBy to send the arguments to JVM properly
[SPARK-37088] - Python UDF after off-heap vectorized reader can cause crash due to use-after-free in writer thread
[SPARK-37089] - ParquetFileFormat registers task completion listeners lazily, causing Python writer thread to segfault when off-heap vectorized reader is enabled
[SPARK-37098] - Alter table properties should invalidate cache
[SPARK-37117] - Can't read files in one of Parquet encryption modes (external keymaterial)
[SPARK-37121] - TestUtils.isPythonVersionAtLeast38 returns incorrect results
[SPARK-37147] - MetricsReporter producing NullPointerException when element 'triggerExecution' not present in Map[]
[SPARK-37170] - Pin PySpark version installed in the Binder environment for tagged commit
[SPARK-37196] - NPE in org.apache.spark.sql.hive.HiveShim$.toCatalystDecimal(HiveShim.scala:106)
[SPARK-37202] - Temp view didn't collect temp function that registered with catalog API
[SPARK-37203] - Fix NotSerializableException when observe with TypedImperativeAggregate
[SPARK-37209] - YarnShuffleIntegrationSuite and other two similar cases in `resource-managers` test failed
[SPARK-37217] - The number of dynamic partitions should early check when writing to external tables
[SPARK-37238] - Upgrade ORC to 1.6.12
[SPARK-37252] - Ignore test_memory_limit on non-Linux environment
[SPARK-37253] - try_simplify_traceback should not fail when tb_frame.f_lineno is None
[SPARK-37260] - PYSPARK Arrow 3.2.0 docs link invalid
[SPARK-37270] - Incorect result of filter using isNull condition
[SPARK-37288] - Backport update pyspark.since annotation to 3.1 and 3.2
[SPARK-37302] - Explicitly download the dependencies of guava and jetty-io in test-dependencies.sh
[SPARK-37318] - Make FallbackStorageSuite robust in terms of DNS
[SPARK-37320] - Delete py_container_checks.zip after the test in DepsTestsSuite finishes
[SPARK-37388] - WidthBucket throws NullPointerException in WholeStageCodegenExec
[SPARK-37390] - Buggy method retrival in pyspark.docs.conf.setup
[SPARK-37391] - SIGNIFICANT bottleneck introduced by fix for SPARK-32001
[SPARK-37392] - Catalyst optimizer very time-consuming and memory-intensive with some "explode(array)"
[SPARK-37451] - Performance improvement regressed String to Decimal cast
[SPARK-37452] - Char and Varchar breaks backward compatibility between v3 and v2
[SPARK-37480] - Configurations in docs/running-on-kubernetes.md are not uptodate
[SPARK-37481] - Disappearance of skipped stages mislead the bug hunting
[SPARK-37524] - We should drop all tables after testing dynamic partition pruning
[SPARK-37534] - Bump dev.ludovic.netlib to 2.2.1
[SPARK-37556] - Deser void class fail with Java serialization
[SPARK-37577] - ClassCastException: ArrayType cannot be cast to StructType
[SPARK-37615] - Upgrade SBT to 1.5.6
[SPARK-37633] - Unwrap cast should skip if downcast failed with ansi enabled
[SPARK-37654] - Regression - NullPointerException in Row.getSeq when field null
[SPARK-37656] - Upgrade SBT to 1.5.7
[SPARK-37659] - Fix FsHistoryProvider race condition between list and delet log info
[SPARK-37678] - Incorrect annotations in SeriesGroupBy._cleanup_and_return
[SPARK-37728] - reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException
[SPARK-37779] - Make ColumnarToRowExec plan canonicalizable after (de)serialization
[SPARK-37800] - TreeNode.argString incorrectly formats arguments of type Set[_]
[SPARK-37802] - composite field name like `field name` doesn't work with Aggregate push down
[SPARK-37807] - Fix a typo in HttpAuthenticationException message
[SPARK-37855] - IllegalStateException when transforming an array inside a nested struct
[SPARK-37859] - SQL tables created with JDBC with Spark 3.1 are not readable with 3.2
[SPARK-37860] - [BUG] Revert: Fix taskid in the stage page task event timeline
[SPARK-37874] - Link to Pandas UDF documentation is broken

New Feature

[SPARK-38326] - aditya

Improvement

[SPARK-34399] - Add file commit time to metrics and shown in SQL Tab UI
[SPARK-35714] - Bug fix for deadlock during the executor shutdown
[SPARK-36659] - Promote spark.sql.execution.topKSortFallbackThreshold to user-faced config
[SPARK-37001] - Disable two level of map for final hash aggregation by default
[SPARK-37032] - Remove unuseable link in spark-3.2.0's doc
[SPARK-37058] - Add spark-shell command line unit test
[SPARK-37113] - Upgrade Parquet to 1.12.2
[SPARK-37134] - documentation - unclear "Using PySpark Native Features"
[SPARK-37208] - Support mapping Spark gpu/fpga resource types to custom YARN resource type
[SPARK-37214] - Fail query analysis earlier with invalid identifiers
[SPARK-37307] - Don't obtain JDBC connection for empty partition
[SPARK-37346] - Link migration guide for structured stream.
[SPARK-37460] - ALTER (DATABASE|SCHEMA|NAMESPACE) ... SET LOCATION command not documented
[SPARK-37505] - mesos module is missing log4j.properties file for UT
[SPARK-37513] - date +/- interval with only day-time fields returns different data type between Spark3.2 and Spark3.1
[SPARK-37594] - Make UT test("SPARK-34399: Add job commit duration metrics for DataWritingCommand") more stable
[SPARK-37705] - Write session time zone in the Parquet file metadata so that rebase can use it instead of JVM timezone
[SPARK-37784] - CodeGenerator.addBufferedState() does not properly handle UDTs
[SPARK-37959] - Fix the UT of checking norm in KMeans & BiKMeans
[SPARK-38639] - Support ignoreCorruptRecord flag to ensure querying broken sequence file table smoothly

Test

[SPARK-37218] - Parameterize `spark.sql.shuffle.partitions` in TPCDSQueryBenchmark
[SPARK-37322] - `run_scala_tests` should respect test module order
[SPARK-37871] - Use python3 instead of python in BaseScriptTransformation tests
[SPARK-37987] - Flaky Test: StreamingAggregationSuite.changing schema of state when restarting query - state format version 1

Task

[SPARK-37050] - Update conda installation instructions
[SPARK-37067] - DateTimeUtils.stringToTimestamp() incorrectly rejects timezone without colon
[SPARK-37446] - hive-2.3.9 related API use invoke method
[SPARK-37471] - spark-sql support nested bracketed comment
[SPARK-37497] - Promote ExecutorPods[PollingSnapshot|WatchSnapshot]Source to DeveloperApi

Documentation

[SPARK-36791] - this is a spelling mistakes in running-on-yarn.md file where JHS_POST should be JHS_HOST
[SPARK-36939] - Add orphan migration page into list in PySpark documentation
[SPARK-37624] - Suppress warnings for live pandas-on-Spark quickstart notebooks
[SPARK-37692] - sql-migration-guide wrong description
[SPARK-37818] - Add option for show create table command

Edit/Copy Release Notes

The text area below allows the project release notes to be edited and copied to another document.

Release Notes - Spark - Version 3.2.1
    
<h2>        Sub-task
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30789'>SPARK-30789</a>] -         Support (IGNORE | RESPECT) NULLS for LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-36632'>SPARK-36632</a>] -         DivideYMInterval and DivideDTInterval should throw the same exception when divide by zero.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-36754'>SPARK-36754</a>] -         array_intersect should handle Double.NaN and Float.NaN
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-36785'>SPARK-36785</a>] -         Fix ps.DataFrame.isin
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-36900'>SPARK-36900</a>] -         &quot;SPARK-36464: size returns correct positive number even with over 2GB data&quot; will oom with JDK17 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37023'>SPARK-37023</a>] -         Avoid fetching merge status when shuffleMergeEnabled is false for a shuffleDependency during retry
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37317'>SPARK-37317</a>] -         Reduce weights in GaussianMixtureSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37389'>SPARK-37389</a>] -         Check unclosed bracketed comments
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37442'>SPARK-37442</a>] -         In AQE, wrong InMemoryRelation size estimation causes &quot;Cannot broadcast the table that is larger than 8GB: 8 GB&quot; failure
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37522'>SPARK-37522</a>] -         Fix MultilayerPerceptronClassifierTest.test_raw_and_probability_prediction
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37695'>SPARK-37695</a>] -         Skip diagnosis ob merged blocks from push-based shuffle
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37957'>SPARK-37957</a>] -         Deterministic flag is not handled for V2 functions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38129'>SPARK-38129</a>] -         Adaptively enable timeout for BroadcastQueryStageExec
</li>
</ul>
            
<h2>        Bug
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23626'>SPARK-23626</a>] -          DAGScheduler blocked due to JobSubmitted event
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33277'>SPARK-33277</a>] -         Python/Pandas UDF right after off-heap vectorized reader could cause executor crash.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-36464'>SPARK-36464</a>] -         Fix Underlying Size Variable Initialization in ChunkedByteBufferOutputStream for Writing Over 2GB Data
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-36717'>SPARK-36717</a>] -         Wrong order of variable initialization may lead to incorrect behavior
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-36795'>SPARK-36795</a>] -         Explain Formatted has Duplicated Node IDs with InMemoryRelation Present
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-36865'>SPARK-36865</a>] -         Add PySpark API document of session_window
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-36905'>SPARK-36905</a>] -         Reading Hive view without explicit column names fails in Spark 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-36979'>SPARK-36979</a>] -         Add RewriteLateralSubquery rule into nonExcludableRules
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-36993'>SPARK-36993</a>] -         Fix json_tuple throw NPE if fields exist no foldable null value
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37004'>SPARK-37004</a>] -         Job cancellation causes py4j errors on Jupyter due to pinned thread mode
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37046'>SPARK-37046</a>] -         Alter view does not preserve column case
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37049'>SPARK-37049</a>] -         executorIdleTimeout is not working for pending pods on K8s
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37052'>SPARK-37052</a>] -         Fix spark-3.2 can use --verbose with spark-shell
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37057'>SPARK-37057</a>] -         Fix wrong DocSearch facet filter in release-tag.sh
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37060'>SPARK-37060</a>] -         Report driver status does not handle response from backup masters
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37061'>SPARK-37061</a>] -         Custom V2 Metrics uses wrong classname for lookup
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37064'>SPARK-37064</a>] -         Fix outer join return the wrong max rows if other side is empty
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37069'>SPARK-37069</a>] -         HiveClientImpl throws NoSuchMethodError: org.apache.hadoop.hive.ql.metadata.Hive.getWithoutRegisterFns
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37078'>SPARK-37078</a>] -         Support old 3-parameter Sink constructors
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37079'>SPARK-37079</a>] -         Fix DataFrameWriterV2.partitionedBy to send the arguments to JVM properly
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37088'>SPARK-37088</a>] -         Python UDF after off-heap vectorized reader can cause crash due to use-after-free in writer thread
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37089'>SPARK-37089</a>] -         ParquetFileFormat registers task completion listeners lazily, causing Python writer thread to segfault when off-heap vectorized reader is enabled
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37098'>SPARK-37098</a>] -         Alter table properties should invalidate cache
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37117'>SPARK-37117</a>] -         Can&#39;t read files in one of Parquet encryption modes (external keymaterial) 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37121'>SPARK-37121</a>] -         TestUtils.isPythonVersionAtLeast38 returns incorrect results
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37147'>SPARK-37147</a>] -         MetricsReporter producing NullPointerException when element &#39;triggerExecution&#39; not present in Map[]
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37170'>SPARK-37170</a>] -         Pin PySpark version installed in the Binder environment for tagged commit
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37196'>SPARK-37196</a>] -         NPE in org.apache.spark.sql.hive.HiveShim$.toCatalystDecimal(HiveShim.scala:106)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37202'>SPARK-37202</a>] -         Temp view didn&#39;t collect temp function that registered with catalog API
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37203'>SPARK-37203</a>] -         Fix NotSerializableException when observe with TypedImperativeAggregate
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37209'>SPARK-37209</a>] -         YarnShuffleIntegrationSuite  and other two similar cases in `resource-managers` test failed
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37217'>SPARK-37217</a>] -         The number of dynamic partitions should early check when writing to external tables
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37238'>SPARK-37238</a>] -         Upgrade ORC to 1.6.12
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37252'>SPARK-37252</a>] -         Ignore test_memory_limit on non-Linux environment
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37253'>SPARK-37253</a>] -         try_simplify_traceback should not fail when tb_frame.f_lineno is None
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37260'>SPARK-37260</a>] -         PYSPARK Arrow 3.2.0 docs link invalid
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37270'>SPARK-37270</a>] -         Incorect result of filter using isNull condition
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37288'>SPARK-37288</a>] -         Backport update pyspark.since annotation to 3.1 and 3.2
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37302'>SPARK-37302</a>] -         Explicitly download the dependencies of guava and jetty-io in test-dependencies.sh
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37318'>SPARK-37318</a>] -         Make FallbackStorageSuite robust in terms of DNS
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37320'>SPARK-37320</a>] -         Delete py_container_checks.zip after the test in DepsTestsSuite finishes
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37388'>SPARK-37388</a>] -         WidthBucket throws NullPointerException in WholeStageCodegenExec
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37390'>SPARK-37390</a>] -         Buggy method retrival in pyspark.docs.conf.setup
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37391'>SPARK-37391</a>] -         SIGNIFICANT bottleneck introduced by fix for SPARK-32001
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37392'>SPARK-37392</a>] -         Catalyst optimizer very time-consuming and memory-intensive with some &quot;explode(array)&quot; 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37451'>SPARK-37451</a>] -         Performance improvement regressed String to Decimal cast
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37452'>SPARK-37452</a>] -         Char and Varchar breaks backward compatibility between v3 and v2
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37480'>SPARK-37480</a>] -         Configurations in docs/running-on-kubernetes.md are not uptodate
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37481'>SPARK-37481</a>] -         Disappearance of skipped stages mislead the bug hunting 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37524'>SPARK-37524</a>] -         We should drop all tables after testing dynamic partition pruning
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37534'>SPARK-37534</a>] -         Bump dev.ludovic.netlib to 2.2.1
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37556'>SPARK-37556</a>] -         Deser void class fail with Java serialization
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37577'>SPARK-37577</a>] -         ClassCastException: ArrayType cannot be cast to StructType
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37615'>SPARK-37615</a>] -         Upgrade SBT to 1.5.6
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37633'>SPARK-37633</a>] -         Unwrap cast should skip if downcast failed with ansi enabled
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37654'>SPARK-37654</a>] -         Regression - NullPointerException in Row.getSeq when field null
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37656'>SPARK-37656</a>] -         Upgrade SBT to 1.5.7
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37659'>SPARK-37659</a>] -         Fix FsHistoryProvider race condition between list and delet log info
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37678'>SPARK-37678</a>] -         Incorrect annotations in SeriesGroupBy._cleanup_and_return 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37728'>SPARK-37728</a>] -         reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37779'>SPARK-37779</a>] -         Make ColumnarToRowExec plan canonicalizable after (de)serialization
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37800'>SPARK-37800</a>] -         TreeNode.argString incorrectly formats arguments of type Set[_]
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37802'>SPARK-37802</a>] -         composite field name like `field name` doesn&#39;t work with Aggregate push down
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37807'>SPARK-37807</a>] -         Fix a typo in HttpAuthenticationException message
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37855'>SPARK-37855</a>] -         IllegalStateException when transforming an array inside a nested struct
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37859'>SPARK-37859</a>] -         SQL tables created with JDBC with Spark 3.1 are not readable with 3.2
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37860'>SPARK-37860</a>] -         [BUG] Revert: Fix taskid in the stage page task event timeline
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37874'>SPARK-37874</a>] -         Link to Pandas UDF documentation is broken
</li>
</ul>
            
<h2>        New Feature
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38326'>SPARK-38326</a>] -         aditya
</li>
</ul>
    
<h2>        Improvement
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34399'>SPARK-34399</a>] -         Add file commit time to metrics and shown in SQL Tab UI
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-35714'>SPARK-35714</a>] -         Bug fix for deadlock during the executor shutdown
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-36659'>SPARK-36659</a>] -         Promote spark.sql.execution.topKSortFallbackThreshold to user-faced config
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37001'>SPARK-37001</a>] -         Disable two level of map for final hash aggregation by default
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37032'>SPARK-37032</a>] -         Remove unuseable link in spark-3.2.0&#39;s doc
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37058'>SPARK-37058</a>] -         Add spark-shell command line unit test
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37113'>SPARK-37113</a>] -         Upgrade Parquet to 1.12.2
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37134'>SPARK-37134</a>] -         documentation - unclear &quot;Using PySpark Native Features&quot;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37208'>SPARK-37208</a>] -         Support mapping Spark gpu/fpga resource types to custom YARN resource type
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37214'>SPARK-37214</a>] -         Fail query analysis earlier with invalid identifiers
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37307'>SPARK-37307</a>] -         Don&#39;t obtain JDBC connection for empty partition
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37346'>SPARK-37346</a>] -         Link migration guide for structured stream.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37460'>SPARK-37460</a>] -         ALTER (DATABASE|SCHEMA|NAMESPACE) ... SET LOCATION command not documented
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37505'>SPARK-37505</a>] -         mesos module is missing log4j.properties file for UT
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37513'>SPARK-37513</a>] -         date +/- interval with only day-time fields returns different data type between Spark3.2 and Spark3.1
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37594'>SPARK-37594</a>] -         Make UT test(&quot;SPARK-34399: Add job commit duration metrics for DataWritingCommand&quot;) more stable
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37705'>SPARK-37705</a>] -         Write session time zone in the Parquet file metadata so that rebase can use it instead of JVM timezone
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37784'>SPARK-37784</a>] -         CodeGenerator.addBufferedState() does not properly handle UDTs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37959'>SPARK-37959</a>] -         Fix the UT of checking norm in KMeans &amp; BiKMeans
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-38639'>SPARK-38639</a>] -         Support ignoreCorruptRecord flag to ensure querying broken sequence file table smoothly
</li>
</ul>
    
<h2>        Test
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37218'>SPARK-37218</a>] -         Parameterize `spark.sql.shuffle.partitions` in TPCDSQueryBenchmark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37322'>SPARK-37322</a>] -         `run_scala_tests` should respect test module order
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37871'>SPARK-37871</a>] -         Use python3 instead of python in BaseScriptTransformation tests
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37987'>SPARK-37987</a>] -         Flaky Test: StreamingAggregationSuite.changing schema of state when restarting query - state format version 1
</li>
</ul>
        
<h2>        Task
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37050'>SPARK-37050</a>] -         Update conda installation instructions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37067'>SPARK-37067</a>] -         DateTimeUtils.stringToTimestamp() incorrectly rejects timezone without colon
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37446'>SPARK-37446</a>] -         hive-2.3.9 related API use invoke method
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37471'>SPARK-37471</a>] -         spark-sql support nested bracketed comment
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37497'>SPARK-37497</a>] -         Promote ExecutorPods[PollingSnapshot|WatchSnapshot]Source to DeveloperApi
</li>
</ul>
                                                                                                                                        
<h2>        Documentation
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-36791'>SPARK-36791</a>] -         this is a spelling mistakes in running-on-yarn.md file where  JHS_POST should be JHS_HOST
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-36939'>SPARK-36939</a>] -         Add orphan migration page into list in PySpark documentation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37624'>SPARK-37624</a>] -         Suppress warnings for live pandas-on-Spark quickstart notebooks
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37692'>SPARK-37692</a>] -         sql-migration-guide wrong description
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-37818'>SPARK-37818</a>] -         Add option for show create table command
</li>
</ul>