Release Notes - ASF JIRA

Release Notes - Spark - Version 2.4.5 - HTML format

Configure Release Notes

Sub-task

[SPARK-28683] - Upgrade Scala to 2.12.10
[SPARK-29203] - Reduce shuffle partitions in SQLQueryTestSuite
[SPARK-29708] - Different answers in aggregates of duplicate grouping sets
[SPARK-30269] - Should use old partition stats to decide whether to update stats when analyzing partition

Bug

[SPARK-17398] - Failed to query on external JSon Partitioned table
[SPARK-21287] - Cannot use Int.MIN_VALUE as Spark SQL fetchsize
[SPARK-21492] - Memory leak in SortMergeJoin
[SPARK-22955] - Error generating jobs when Stopping JobGenerator gracefully
[SPARK-23435] - R tests should support latest testthat
[SPARK-23519] - Create View Commands Fails with The view output (col1,col1) contains duplicate column name
[SPARK-24152] - SparkR CRAN feasibility check server problem
[SPARK-24663] - Flaky test: StreamingContextSuite "stop slow receiver gracefully"
[SPARK-24666] - Word2Vec generate infinity vectors when numIterations are large
[SPARK-25277] - YARN applicationMaster metrics should not register static and JVM metrics
[SPARK-25753] - binaryFiles broken for small files
[SPARK-25903] - Flaky test: BarrierTaskContextSuite.throw exception on barrier() call timeout
[SPARK-26499] - JdbcUtils.makeGetter does not handle ByteType
[SPARK-26560] - Repeating select on udf function throws analysis exception - function not registered
[SPARK-26713] - PipedRDD may holds stdin writer and stdout read threads even if the task is finished
[SPARK-26985] - Test "access only some column of the all of columns " fails on big endian
[SPARK-26989] - Flaky test:DAGSchedulerSuite.Barrier task failures from the same stage attempt don't trigger multiple stage retries
[SPARK-27558] - NPE in TaskCompletionListener due to Spark OOM in UnsafeExternalSorter causing tasks to hang
[SPARK-27812] - kubernetes client import non-daemon thread which block jvm exit.
[SPARK-28599] - Fix `Execution Time` and `Duration` column sorting for ThriftServerSessionPage
[SPARK-28709] - Memory leaks after stopping of StreamingContext
[SPARK-28749] - Fix PySpark tests not to require kafka-0-8 in branch-2.4
[SPARK-28778] - Shuffle jobs fail due to incorrect advertised address when running in virtual network
[SPARK-28903] - Fix AWS JDK version conflict that breaks Pyspark Kinesis tests
[SPARK-28906] - `bin/spark-submit --version` shows incorrect info
[SPARK-28912] - MatchError exception in CheckpointWriteHandler
[SPARK-28917] - Jobs can hang because of race of RDD.dependencies
[SPARK-28921] - Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10, 1.12.10, 1.11.10)
[SPARK-28939] - SQL configuration are not always propagated
[SPARK-28963] - Fall back to archive.apache.org to download Maven if mirrors don't have requested version
[SPARK-29042] - Sampling-based RDD with unordered input should be INDETERMINATE
[SPARK-29045] - Test failed due to table already exists in SQLMetricsSuite
[SPARK-29046] - Possible NPE on SQLConf.get when SparkContext is stopping in another thread
[SPARK-29053] - Sort does not work on some columns
[SPARK-29055] - Spark UI storage memory increasing overtime
[SPARK-29101] - CSV datasource returns incorrect .count() from file with malformed records
[SPARK-29177] - Zombie tasks prevents executor from releasing when task exceeds maxResultSize
[SPARK-29186] - SubqueryAlias name value is null in Spark 2.4.3 Logical plan.
[SPARK-29213] - Make it consistent when get notnull output and generate null checks in FilterExec
[SPARK-29229] - Change the additional remote repository in IsolatedClientLoader to google minor
[SPARK-29240] - PySpark 2.4 about sql function 'element_at' param 'extraction'
[SPARK-29244] - ArrayIndexOutOfBoundsException on TaskCompletionListener during releasing of memory blocks
[SPARK-29450] - [SS] In streaming aggregation, metric for output rows is not measured in append mode
[SPARK-29494] - ArrayOutOfBoundsException when converting from string to timestamp
[SPARK-29498] - CatalogTable to HiveTable should not change the table's ownership
[SPARK-29556] - Avoid including path in error response from REST submission server
[SPARK-29560] - Add typesafe bintray repo for sbt-mima-plugin
[SPARK-29578] - JDK 1.8.0_232 timezone updates cause "Kwajalein" test failures again
[SPARK-29604] - SessionState is initialized with isolated classloader for Hive if spark.sql.hive.metastore.jars is being set
[SPARK-29637] - SHS Endpoint /applications/<app_id>/jobs/ doesn't include description
[SPARK-29647] - Use Python 3.7 in GitHub Action to recover lint-python
[SPARK-29651] - Incorrect parsing of interval seconds fraction
[SPARK-29666] - Release script fail to publish release under dry run mode
[SPARK-29682] - Failure when resolving conflicting references in Join:
[SPARK-29743] - sample should set needCopyResult to true if its child is
[SPARK-29758] - json_tuple truncates fields
[SPARK-29781] - Override SBT Jackson-databind dependency like Maven
[SPARK-29796] - HiveExternalCatalogVersionsSuite` should ignore preview release
[SPARK-29850] - sort-merge-join an empty table should not memory leak
[SPARK-29875] - Avoid to use deprecated pyarrow.open_stream API in Spark 2.4.x
[SPARK-29890] - Unable to fill na with 0 with duplicate columns
[SPARK-29904] - Parse timestamps in microsecond precision by JSON/CSV datasources
[SPARK-29918] - RecordBinaryComparator should check endianness when compared by long
[SPARK-29932] - lint-r should do non-zero exit in case of errors
[SPARK-29949] - JSON/CSV formats timestamps incorrectly
[SPARK-29970] - open/close state is not preserved for Timelineview
[SPARK-29971] - Multiple possible buffer leaks in TransportFrameDecoder and TransportCipher
[SPARK-30030] - Use RegexChecker instead of TokenChecker to check `org.apache.commons.lang.`
[SPARK-30050] - analyze table and rename table should not erase the bucketing metadata at hive side
[SPARK-30065] - Unable to drop na with duplicate columns
[SPARK-30082] - Zeros are being treated as NaNs
[SPARK-30129] - New auth engine does not keep client ID in TransportClient after auth
[SPARK-30198] - BytesToBytesMap does not grow internal long array as expected
[SPARK-30225] - "Stream is corrupted at" exception on reading disk-spilled data of a shuffle operation
[SPARK-30238] - hive partition pruning can only support string and integral types
[SPARK-30246] - Spark on Yarn External Shuffle Service Memory Leak
[SPARK-30263] - Don't log values of ignored non-Spark properties
[SPARK-30274] - Avoid BytesToBytesMap lookup hang forever when holding keys reaching max capacity
[SPARK-30285] - Fix deadlock between LiveListenerBus#stop and AsyncEventQueue#removeListenerOnError
[SPARK-30310] - SparkUncaughtExceptionHandler halts running process unexpectedly
[SPARK-30312] - Preserve path permission when truncate table
[SPARK-30325] - markPartitionCompleted cause task status inconsistent
[SPARK-30333] - Bump jackson-databind to 2.6.7.3
[SPARK-30447] - Constant propagation nullability issue
[SPARK-30450] - Exclude .git folder for python linter
[SPARK-30458] - The Executor Computing Time in Time Line of Stage Page is Wrong
[SPARK-30489] - Make build delete pyspark.zip file properly
[SPARK-30512] - Use a dedicated boss event group loop in the netty pipeline for external shuffle service
[SPARK-30553] - Fix structured-streaming java example error
[SPARK-30556] - Copy sparkContext.localproperties to child thread inSubqueryExec.executionContext
[SPARK-30572] - Add a fallback Maven repository
[SPARK-30633] - Codegen fails when xxHash seed is not an integer
[SPARK-30645] - collect() support Unicode charactes tests fails on Windows
[SPARK-30704] - Use jekyll-redirect-from 0.15.0 instead of the latest

Improvement

[SPARK-19147] - Gracefully handle error in task after executor is stopped
[SPARK-25392] - [Spark Job History]Inconsistent behaviour for pool details in spark web UI and history server page
[SPARK-26003] - Improve performance in SQLAppStatusListener
[SPARK-27122] - YARN test failures in Java 9+
[SPARK-27460] - Running slowest test suites in their own forked JVMs for higher parallelism
[SPARK-28678] - Specify that start index is 1-based in docstring of pyspark.sql.functions.slice
[SPARK-28938] - Move to supported OpenJDK docker image for Kubernetes
[SPARK-29011] - Upgrade netty-all to 4.1.39-Final
[SPARK-29075] - Add enforcer rule to ban duplicated pom dependency
[SPARK-29087] - Use DelegatingServletContextHandler to avoid CCE
[SPARK-29159] - Increase ReservedCodeCacheSize to 1G
[SPARK-29165] - Set log level of log generated code as ERROR in case of compile error on generated code in UT
[SPARK-29247] - HiveClientImpl may be log sensitive information
[SPARK-29410] - Update Commons BeanUtils to 1.9.4
[SPARK-29677] - Upgrade Kinesis Client
[SPARK-29820] - Use GitHub Action Cache for `./.m2/repository`
[SPARK-29964] - lintr github action failed due to buggy GnuPG
[SPARK-30318] - Bump jetty to 9.3.27.v20190418
[SPARK-30339] - Avoid to fail twice in function lookup
[SPARK-30410] - Calculating size of table having large number of partitions causes flooding logs
[SPARK-30601] - Add a Google Maven Central as a primary repository
[SPARK-30630] - Deprecate numTrees in GBT at 2.4.5 and remove it at 3.0.0

Test

[SPARK-23197] - Flaky test: spark.streaming.ReceiverSuite."receiver_life_cycle"
[SPARK-29104] - Fix Flaky Test - PipedRDDSuite. stdin_writer_thread_should_be_exited_when_task_is_finished
[SPARK-29286] - UnicodeDecodeError raised when running python tests on arm instance
[SPARK-30637] - upgrade testthat on jenkins workers to 2.0.0

Task

[SPARK-28951] - Add release announce template
[SPARK-29073] - Add GitHub Action to branch-2.4 for `Scala-2.11 / Scala-2.12` build
[SPARK-29201] - Add Hadoop 2.6 combination to GitHub Action
[SPARK-29445] - Bump netty-all from 4.1.39.Final to 4.1.42.Final

Documentation

[SPARK-28650] - Fix the guarantee of ForeachWriter
[SPARK-28977] - JDBC Dataframe Reader Doc Doesn't Match JDBC Data Source Page
[SPARK-29367] - pandas udf not working with latest pyarrow release (0.15.0)
[SPARK-29790] - Add notes about port being required for Kubernetes API URL when set as master
[SPARK-30236] - Clarify date and time patterns supported by date_format
[SPARK-30478] - Fix memory package doc

Edit/Copy Release Notes

The text area below allows the project release notes to be edited and copied to another document.

Release Notes - Spark - Version 2.4.5
    
<h2>        Sub-task
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28683'>SPARK-28683</a>] -         Upgrade Scala to 2.12.10
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29203'>SPARK-29203</a>] -         Reduce shuffle partitions in SQLQueryTestSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29708'>SPARK-29708</a>] -         Different answers in aggregates of duplicate grouping sets
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30269'>SPARK-30269</a>] -         Should use old partition stats to decide whether to update stats when analyzing partition
</li>
</ul>
            
<h2>        Bug
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-17398'>SPARK-17398</a>] -         Failed to query on external JSon Partitioned table
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21287'>SPARK-21287</a>] -         Cannot use Int.MIN_VALUE as Spark SQL fetchsize
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21492'>SPARK-21492</a>] -         Memory leak in SortMergeJoin
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22955'>SPARK-22955</a>] -         Error generating jobs when Stopping JobGenerator gracefully
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23435'>SPARK-23435</a>] -         R tests should support latest testthat
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23519'>SPARK-23519</a>] -         Create View Commands Fails with  The view output (col1,col1) contains duplicate column name
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24152'>SPARK-24152</a>] -         SparkR CRAN feasibility check server problem
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24663'>SPARK-24663</a>] -         Flaky test: StreamingContextSuite &quot;stop slow receiver gracefully&quot;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24666'>SPARK-24666</a>] -         Word2Vec generate infinity vectors when numIterations are large
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25277'>SPARK-25277</a>] -         YARN applicationMaster metrics should not register static and JVM metrics
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25753'>SPARK-25753</a>] -         binaryFiles broken for small files
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25903'>SPARK-25903</a>] -         Flaky test: BarrierTaskContextSuite.throw exception on barrier() call timeout
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26499'>SPARK-26499</a>] -         JdbcUtils.makeGetter does not handle ByteType
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26560'>SPARK-26560</a>] -         Repeating select on udf function throws analysis exception - function not registered
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26713'>SPARK-26713</a>] -         PipedRDD may holds stdin writer and stdout read threads even if the task is finished
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26985'>SPARK-26985</a>] -         Test &quot;access only some column of the all of columns &quot; fails on big endian
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26989'>SPARK-26989</a>] -         Flaky test:DAGSchedulerSuite.Barrier task failures from the same stage attempt don&#39;t trigger multiple stage retries
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27558'>SPARK-27558</a>] -         NPE in TaskCompletionListener due to Spark OOM in UnsafeExternalSorter causing tasks to hang
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27812'>SPARK-27812</a>] -         kubernetes client import non-daemon thread which block jvm exit.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28599'>SPARK-28599</a>] -         Fix `Execution Time` and `Duration` column sorting for ThriftServerSessionPage
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28709'>SPARK-28709</a>] -         Memory leaks after stopping of StreamingContext
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28749'>SPARK-28749</a>] -         Fix PySpark tests not to require kafka-0-8 in branch-2.4
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28778'>SPARK-28778</a>] -         Shuffle jobs fail due to incorrect advertised address when running in virtual network
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28903'>SPARK-28903</a>] -         Fix AWS JDK version conflict that breaks Pyspark Kinesis tests
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28906'>SPARK-28906</a>] -         `bin/spark-submit --version` shows incorrect info
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28912'>SPARK-28912</a>] -         MatchError exception in CheckpointWriteHandler
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28917'>SPARK-28917</a>] -         Jobs can hang because of race of RDD.dependencies
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28921'>SPARK-28921</a>] -         Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10, 1.12.10, 1.11.10)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28939'>SPARK-28939</a>] -         SQL configuration are not always propagated
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28963'>SPARK-28963</a>] -         Fall back to archive.apache.org to download Maven if mirrors don&#39;t have requested version
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29042'>SPARK-29042</a>] -         Sampling-based RDD with unordered input should be INDETERMINATE
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29045'>SPARK-29045</a>] -         Test failed due to table already exists in SQLMetricsSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29046'>SPARK-29046</a>] -         Possible NPE on SQLConf.get when SparkContext is stopping in another thread
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29053'>SPARK-29053</a>] -         Sort does not work on some columns
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29055'>SPARK-29055</a>] -         Spark UI storage memory increasing overtime
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29101'>SPARK-29101</a>] -         CSV datasource returns incorrect .count() from file with malformed records
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29177'>SPARK-29177</a>] -         Zombie tasks prevents executor from releasing when task exceeds maxResultSize
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29186'>SPARK-29186</a>] -         SubqueryAlias name value is null in Spark 2.4.3 Logical plan.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29213'>SPARK-29213</a>] -         Make it consistent when get notnull output and generate null checks in FilterExec
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29229'>SPARK-29229</a>] -         Change the additional remote repository in IsolatedClientLoader to google minor
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29240'>SPARK-29240</a>] -         PySpark 2.4 about sql function &#39;element_at&#39; param &#39;extraction&#39;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29244'>SPARK-29244</a>] -         ArrayIndexOutOfBoundsException on TaskCompletionListener during releasing of memory blocks
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29450'>SPARK-29450</a>] -         [SS] In streaming aggregation, metric for output rows is not measured in append mode
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29494'>SPARK-29494</a>] -         ArrayOutOfBoundsException when converting from string to timestamp
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29498'>SPARK-29498</a>] -         CatalogTable to HiveTable should not change the table&#39;s ownership
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29556'>SPARK-29556</a>] -         Avoid including path in error response from REST submission server
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29560'>SPARK-29560</a>] -         Add typesafe bintray repo for sbt-mima-plugin
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29578'>SPARK-29578</a>] -         JDK 1.8.0_232 timezone updates cause &quot;Kwajalein&quot; test failures again
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29604'>SPARK-29604</a>] -         SessionState is initialized with isolated classloader for Hive if spark.sql.hive.metastore.jars is being set
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29637'>SPARK-29637</a>] -         SHS Endpoint /applications/&lt;app_id&gt;/jobs/ doesn&#39;t include description
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29647'>SPARK-29647</a>] -         Use Python 3.7 in GitHub Action to recover lint-python
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29651'>SPARK-29651</a>] -         Incorrect parsing of interval seconds fraction
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29666'>SPARK-29666</a>] -         Release script fail to publish release under dry run mode
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29682'>SPARK-29682</a>] -         Failure when resolving conflicting references in Join:
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29743'>SPARK-29743</a>] -         sample should set needCopyResult to true if its child is
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29758'>SPARK-29758</a>] -         json_tuple truncates fields
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29781'>SPARK-29781</a>] -         Override SBT Jackson-databind dependency like Maven
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29796'>SPARK-29796</a>] -         HiveExternalCatalogVersionsSuite` should ignore preview release
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29850'>SPARK-29850</a>] -         sort-merge-join an empty table should not memory leak
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29875'>SPARK-29875</a>] -         Avoid to use deprecated pyarrow.open_stream API in Spark 2.4.x
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29890'>SPARK-29890</a>] -         Unable to fill na with 0 with duplicate columns
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29904'>SPARK-29904</a>] -         Parse timestamps in microsecond precision by JSON/CSV datasources
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29918'>SPARK-29918</a>] -         RecordBinaryComparator should check endianness when compared by long
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29932'>SPARK-29932</a>] -         lint-r should do non-zero exit in case of errors
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29949'>SPARK-29949</a>] -         JSON/CSV formats timestamps incorrectly
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29970'>SPARK-29970</a>] -         open/close state is not preserved for Timelineview
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29971'>SPARK-29971</a>] -         Multiple possible buffer leaks in TransportFrameDecoder and TransportCipher
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30030'>SPARK-30030</a>] -         Use RegexChecker instead of TokenChecker to check `org.apache.commons.lang.`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30050'>SPARK-30050</a>] -         analyze table and rename table should not erase the bucketing metadata at hive side
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30065'>SPARK-30065</a>] -         Unable to drop na with duplicate columns
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30082'>SPARK-30082</a>] -         Zeros are being treated as NaNs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30129'>SPARK-30129</a>] -         New auth engine does not keep client ID in TransportClient after auth
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30198'>SPARK-30198</a>] -         BytesToBytesMap does not grow internal long array as expected
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30225'>SPARK-30225</a>] -         &quot;Stream is corrupted at&quot; exception on reading disk-spilled data of a shuffle operation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30238'>SPARK-30238</a>] -         hive partition pruning can only support string and integral types
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30246'>SPARK-30246</a>] -         Spark on Yarn External Shuffle Service Memory Leak
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30263'>SPARK-30263</a>] -         Don&#39;t log values of ignored non-Spark properties
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30274'>SPARK-30274</a>] -         Avoid BytesToBytesMap lookup hang forever when holding keys reaching max capacity
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30285'>SPARK-30285</a>] -         Fix deadlock between LiveListenerBus#stop and AsyncEventQueue#removeListenerOnError
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30310'>SPARK-30310</a>] -         SparkUncaughtExceptionHandler halts running process unexpectedly
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30312'>SPARK-30312</a>] -         Preserve path permission when truncate table
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30325'>SPARK-30325</a>] -         markPartitionCompleted cause task status inconsistent
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30333'>SPARK-30333</a>] -         Bump  jackson-databind to 2.6.7.3 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30447'>SPARK-30447</a>] -         Constant propagation nullability issue
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30450'>SPARK-30450</a>] -         Exclude .git folder for python linter
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30458'>SPARK-30458</a>] -         The Executor Computing Time in Time Line of Stage Page is Wrong
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30489'>SPARK-30489</a>] -         Make build delete pyspark.zip file properly
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30512'>SPARK-30512</a>] -         Use a dedicated boss event group loop in the netty pipeline for external shuffle service
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30553'>SPARK-30553</a>] -         Fix structured-streaming java example error
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30556'>SPARK-30556</a>] -         Copy sparkContext.localproperties to child thread inSubqueryExec.executionContext
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30572'>SPARK-30572</a>] -         Add a fallback Maven repository
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30633'>SPARK-30633</a>] -         Codegen fails when xxHash seed is not an integer
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30645'>SPARK-30645</a>] -         collect() support Unicode charactes tests fails on Windows
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30704'>SPARK-30704</a>] -         Use jekyll-redirect-from 0.15.0 instead of the latest
</li>
</ul>
                
<h2>        Improvement
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19147'>SPARK-19147</a>] -         Gracefully handle error in task after executor is stopped
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25392'>SPARK-25392</a>] -         [Spark Job History]Inconsistent behaviour for pool details in spark web UI and history server page 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26003'>SPARK-26003</a>] -         Improve performance in SQLAppStatusListener 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27122'>SPARK-27122</a>] -         YARN test failures in Java 9+
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27460'>SPARK-27460</a>] -         Running slowest test suites in their own forked JVMs for higher parallelism
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28678'>SPARK-28678</a>] -         Specify that start index is 1-based in docstring of pyspark.sql.functions.slice
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28938'>SPARK-28938</a>] -         Move to supported OpenJDK docker image for Kubernetes
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29011'>SPARK-29011</a>] -         Upgrade netty-all to 4.1.39-Final
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29075'>SPARK-29075</a>] -         Add enforcer rule to ban duplicated pom dependency
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29087'>SPARK-29087</a>] -         Use DelegatingServletContextHandler to avoid CCE
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29159'>SPARK-29159</a>] -         Increase ReservedCodeCacheSize to 1G
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29165'>SPARK-29165</a>] -         Set log level of log generated code as ERROR in case of compile error on generated code in UT
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29247'>SPARK-29247</a>] -         HiveClientImpl may be log sensitive information
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29410'>SPARK-29410</a>] -         Update Commons BeanUtils to 1.9.4
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29677'>SPARK-29677</a>] -         Upgrade Kinesis Client
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29820'>SPARK-29820</a>] -         Use GitHub Action Cache for `./.m2/repository`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29964'>SPARK-29964</a>] -         lintr github action failed due to buggy GnuPG
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30318'>SPARK-30318</a>] -         Bump jetty to 9.3.27.v20190418
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30339'>SPARK-30339</a>] -         Avoid to fail twice in function lookup
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30410'>SPARK-30410</a>] -         Calculating size of table having large number of partitions causes flooding logs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30601'>SPARK-30601</a>] -         Add a Google Maven Central as a primary repository
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30630'>SPARK-30630</a>] -         Deprecate numTrees in GBT at 2.4.5 and remove it at 3.0.0
</li>
</ul>
    
<h2>        Test
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23197'>SPARK-23197</a>] -         Flaky test: spark.streaming.ReceiverSuite.&quot;receiver_life_cycle&quot;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29104'>SPARK-29104</a>] -         Fix Flaky Test - PipedRDDSuite. stdin_writer_thread_should_be_exited_when_task_is_finished
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29286'>SPARK-29286</a>] -         UnicodeDecodeError raised when running python tests on arm instance
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30637'>SPARK-30637</a>] -         upgrade testthat on jenkins workers to 2.0.0
</li>
</ul>
        
<h2>        Task
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28951'>SPARK-28951</a>] -         Add release announce template
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29073'>SPARK-29073</a>] -         Add GitHub Action to branch-2.4 for `Scala-2.11 / Scala-2.12` build
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29201'>SPARK-29201</a>] -         Add Hadoop 2.6 combination to GitHub Action
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29445'>SPARK-29445</a>] -         Bump netty-all from 4.1.39.Final to 4.1.42.Final
</li>
</ul>
                                                                                                                                        
<h2>        Documentation
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28650'>SPARK-28650</a>] -         Fix the guarantee of ForeachWriter
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28977'>SPARK-28977</a>] -         JDBC Dataframe Reader Doc Doesn&#39;t Match JDBC Data Source Page
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29367'>SPARK-29367</a>] -         pandas udf not working with latest pyarrow release (0.15.0)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29790'>SPARK-29790</a>] -         Add notes about port being required for Kubernetes API URL when set as master
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30236'>SPARK-30236</a>] -         Clarify date and time patterns supported by date_format
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30478'>SPARK-30478</a>] -         Fix memory package doc 
</li>
</ul>