Release Notes - ASF JIRA

Release Notes - Spark - Version 2.4.1 - HTML format

Configure Release Notes

Sub-task

[SPARK-25883] - Override method `prettyName` in `from_avro`/`to_avro`
[SPARK-26010] - SparkR vignette fails on CRAN on Java 11
[SPARK-26327] - Metrics in FileSourceScanExec not update correctly while relation.partitionSchema is set
[SPARK-26402] - Accessing nested fields with different cases in case insensitive mode

Bug

[SPARK-22148] - TaskSetManager.abortIfCompletelyBlacklisted should not abort when all current executors are blacklisted but dynamic allocation is enabled
[SPARK-23458] - Flaky test: OrcQuerySuite
[SPARK-24553] - Job UI redirect causing http 302 error
[SPARK-24669] - Managed table was not cleared of path after drop database cascade
[SPARK-24687] - When NoClassDefError thrown during task serialization will cause job hang
[SPARK-25451] - Stages page doesn't show the right number of the total tasks
[SPARK-25767] - Error reported in Spark logs when using the org.apache.spark:spark-sql_2.11:2.3.2 Java library
[SPARK-25786] - If the ByteBuffer.hasArray is false , it will throw UnsupportedOperationException for Kryo
[SPARK-25827] - Replicating a block > 2gb with encryption fails
[SPARK-25837] - Web UI does not respect spark.ui.retainedJobs in some instances
[SPARK-25863] - java.lang.UnsupportedOperationException: empty.max at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.updateAndGetCompilationStats(CodeGenerator.scala:1475)
[SPARK-25866] - Update KMeans formatVersion
[SPARK-25906] - spark-shell cannot handle `-i` option correctly
[SPARK-25909] - Error in documentation: number of cluster managers
[SPARK-25918] - LOAD DATA LOCAL INPATH should handle a relative path
[SPARK-25921] - Python worker reuse causes Barrier tasks to run without BarrierTaskContext
[SPARK-25922] - [K8] Spark Driver/Executor "spark-app-selector" label mismatch
[SPARK-25930] - Fix scala version string detection when maven-help-plugin is not pre-installed
[SPARK-25934] - Mesos: SPARK_CONF_DIR should not be propogated by spark submit
[SPARK-25979] - Window function: allow parentheses around window reference
[SPARK-25988] - Keep names unchanged when deduplicating the column names in Analyzer
[SPARK-25992] - Accumulators giving KeyError in pyspark
[SPARK-26011] - pyspark app with "spark.jars.packages" config does not work
[SPARK-26019] - pyspark/accumulators.py: "TypeError: object of type 'NoneType' has no len()" in authenticate_and_accum_updates()
[SPARK-26048] - Flume connector for Spark 2.4 does not exist in Maven repository
[SPARK-26057] - Table joining is broken in Spark 2.4
[SPARK-26078] - WHERE .. IN fails to filter rows when used in combination with UNION
[SPARK-26079] - Flaky test: StreamingQueryListenersConfSuite
[SPARK-26080] - Unable to run worker.py on Windows
[SPARK-26082] - Misnaming of spark.mesos.fetch(er)Cache.enable in MesosClusterScheduler
[SPARK-26084] - AggregateExpression.references fails on unresolved expression trees
[SPARK-26092] - Use CheckpointFileManager to write the streaming metadata file
[SPARK-26100] - [History server ]Jobs table and Aggregate metrics table are showing lesser number of tasks
[SPARK-26109] - Duration in the task summary metrics table and the task table are different
[SPARK-26114] - Memory leak of PartitionedPairBuffer when coalescing after repartitionAndSortWithinPartitions
[SPARK-26119] - Task metrics summary in the stage page should contain only successful tasks metrics
[SPARK-26137] - Linux file separator is hard coded in DependencyUtils used in deploy process
[SPARK-26147] - Python UDFs in join condition fail even when using columns from only one side of join
[SPARK-26181] - the `hasMinMaxStats` method of `ColumnStatsMap` is not correct
[SPARK-26184] - Last updated time is not getting updated in the History Server UI
[SPARK-26186] - In progress applications with last updated time is lesser than the cleaning interval are getting removed during cleaning logs
[SPARK-26188] - Spark 2.4.0 Partitioning behavior breaks backwards compatibility
[SPARK-26198] - Metadata serialize null values throw NPE
[SPARK-26201] - python broadcast.value on driver fails with disk encryption enabled
[SPARK-26211] - Fix InSet for binary, and struct and array with null.
[SPARK-26219] - Executor summary is not getting updated for failure jobs in history server UI
[SPARK-26228] - OOM issue encountered when computing Gramian matrix
[SPARK-26233] - Incorrect decimal value with java beans and first/last/max... functions
[SPARK-26256] - Add proper labels when deleting pods
[SPARK-26265] - deadlock between TaskMemoryManager and BytesToBytesMap$MapIterator
[SPARK-26267] - Kafka source may reprocess data
[SPARK-26269] - YarnAllocator should have same blacklist behaviour with YARN to maxmize use of cluster resource
[SPARK-26307] - Fix CTAS when INSERT a partitioned table using Hive serde
[SPARK-26315] - auto cast threshold from Integer to Float in approxSimilarityJoin of BucketedRandomProjectionLSHModel
[SPARK-26351] - Documented formula of precision at k does not match the actual code
[SPARK-26352] - Join reordering should not change the order of output attributes
[SPARK-26355] - Add a workaround for PyArrow 0.11.
[SPARK-26366] - Except with transform regression
[SPARK-26370] - Fix resolution of higher-order function for the same identifier.
[SPARK-26379] - Use dummy TimeZoneId for CurrentTimestamp to avoid UnresolvedException in CurrentBatchTimestamp
[SPARK-26382] - prefix sorter should handle -0.0
[SPARK-26394] - Annotation error for Utils.timeStringAsMs
[SPARK-26422] - Unable to disable Hive support in SparkR when Hadoop version is unsupported
[SPARK-26426] - ExpressionInfo related unit tests fail in Windows
[SPARK-26427] - Upgrade Apache ORC to 1.5.4
[SPARK-26444] - Stage color doesn't change with it's status
[SPARK-26496] - Avoid to use Random.nextString in StreamingInnerJoinSuite
[SPARK-26501] - Unexpected overriden of exitFn in SparkSubmitSuite
[SPARK-26537] - update the release scripts to point to gitbox
[SPARK-26538] - Postgres numeric array support
[SPARK-26545] - Fix typo in EqualNullSafe's truth table comment
[SPARK-26551] - Selecting one complex field and having is null predicate on another complex field can cause error
[SPARK-26554] - Update `release-util.sh` to avoid GitBox fake 200 headers
[SPARK-26559] - ML image can't work with numpy versions prior to 1.9
[SPARK-26571] - Update Hive Serde mapping with canonical name of Parquet and Orc FileFormat
[SPARK-26572] - Join on distinct column with monotonically_increasing_id produces wrong output
[SPARK-26576] - Broadcast hint not applied to partitioned table
[SPARK-26583] - Add `paranamer` dependency to `core` module
[SPARK-26586] - Streaming queries should have isolated SparkSessions and confs
[SPARK-26606] - parameters passed in extraJavaOptions are not being picked up
[SPARK-26615] - Fixing transport server/client resource leaks in the core unittests
[SPARK-26629] - Error with multiple file stream in a query + restart on a batch that has no data for one file stream
[SPARK-26638] - Pyspark vector classes always return error for unary negation
[SPARK-26665] - BlockTransferService.fetchBlockSync may hang forever
[SPARK-26677] - Incorrect results of not(eqNullSafe) when data read from Parquet file
[SPARK-26680] - StackOverflowError if Stream passed to groupBy
[SPARK-26682] - Task attempt ID collision causes lost data
[SPARK-26706] - Fix Cast$mayTruncate for bytes
[SPARK-26708] - Incorrect result caused by inconsistency between a SQL cache's cached RDD and its physical plan
[SPARK-26709] - OptimizeMetadataOnlyQuery does not correctly handle the files with zero record
[SPARK-26718] - Fixed integer overflow in SS kafka rateLimit calculation
[SPARK-26726] - Synchronize the amount of memory used by the broadcast variable to the UI display
[SPARK-26732] - Flaky test: SparkContextInfoSuite.getRDDStorageInfo only reports on RDDs that actually persist data
[SPARK-26734] - StackOverflowError on WAL serialization caused by large receivedBlockQueue
[SPARK-26740] - Statistics for date and timestamp columns depend on system time zone
[SPARK-26745] - Non-parsing Dataset.count() optimization causes inconsistent results for JSON inputs with empty lines
[SPARK-26751] - HiveSessionImpl might have memory leak since Operation do not close properly
[SPARK-26757] - GraphX EdgeRDDImpl and VertexRDDImpl `count` method cannot handle empty RDDs
[SPARK-26758] - Idle Executors are not getting killed after spark.dynamicAllocation.executorIdleTimeout value
[SPARK-26806] - EventTimeStats.merge doesn't handle "zero.merge(zero)" correctly
[SPARK-26859] - Fix field writer index bug in non-vectorized ORC deserializer
[SPARK-26864] - Query may return incorrect result when python udf is used as a join condition and the udf uses attributes from both legs of left semi join.
[SPARK-26873] - FileFormatWriter creates inconsistent MR job IDs
[SPARK-26927] - Race condition may cause dynamic allocation not working
[SPARK-26950] - Make RandomDataGenerator use Float.NaN or Double.NaN for all NaN values
[SPARK-26990] - Difference in handling of mixed-case partition column names after SPARK-26188
[SPARK-27019] - Spark UI's SQL tab shows inconsistent values
[SPARK-27065] - avoid more than one active task set managers for a stage
[SPARK-27078] - Read Hive materialized view throw MatchError
[SPARK-27080] - Read parquet file with merging metastore schema should compare schema field in uniform case.
[SPARK-27094] - Thread interrupt being swallowed while launching executors in YarnAllocator
[SPARK-27097] - Avoid embedding platform-dependent offsets literally in whole-stage generated code
[SPARK-27107] - Spark SQL Job failing because of Kryo buffer overflow with ORC
[SPARK-27111] - A continuous query may fail with InterruptedException when kafka consumer temporally 0 partitions temporally
[SPARK-27112] - Spark Scheduler encounters two independent Deadlocks when trying to kill executors either due to dynamic allocation or blacklisting
[SPARK-27134] - array_distinct function does not work correctly with columns containing array of array
[SPARK-27160] - Incorrect Literal Casting of DecimalType in OrcFilters
[SPARK-27165] - Upgrade Apache ORC to 1.5.5
[SPARK-27178] - k8s test failing due to missing nss library in dockerfile
[SPARK-27198] - Heartbeat interval mismatch in driver and executor

New Feature

[SPARK-25635] - Support selective direct encoding in native ORC write
[SPARK-26118] - Make Jetty's requestHeaderSize configurable in Spark
[SPARK-26605] - New executors failing with expired tokens in client mode
[SPARK-26910] - Re-release SparkR to CRAN

Improvement

[SPARK-25023] - Clarify Spark security documentation
[SPARK-25778] - WriteAheadLogBackedBlockRDD in YARN Cluster Mode Fails due lack of access to tmpDir from $PWD to HDFS
[SPARK-25904] - Avoid allocating arrays too large for JVMs
[SPARK-26266] - Update to Scala 2.12.8
[SPARK-26316] - Because of the perf degradation in TPC-DS, we currently partial revert SPARK-21052：Add hash map metrics to join,
[SPARK-26392] - Cancel pending allocate requests by taking locality preference into account
[SPARK-26409] - SQLConf should be serializable in test sessions
[SPARK-26604] - Register channel for stream request
[SPARK-26633] - Add ExecutorClassLoader.getResourceAsStream
[SPARK-27046] - Remove SPARK-19185 related references from documentation since its resolved

Test

[SPARK-25899] - Flaky test: CoarseGrainedSchedulerBackendSuite.compute max number of concurrent tasks can be launched
[SPARK-26029] - Bump previousSparkVersion in MimaBuild.scala to be 2.3.0
[SPARK-26042] - KafkaContinuousSourceTopicDeletionSuite may hang forever
[SPARK-26069] - Flaky test: RpcIntegrationSuite.sendRpcWithStreamFailures
[SPARK-26120] - Fix a streaming query leak in Structured Streaming R tests

Task

[SPARK-26607] - Remove Spark 2.2.x testing from HiveExternalCatalogVersionsSuite
[SPARK-26897] - Update Spark 2.3.x testing from HiveExternalCatalogVersionsSuite
[SPARK-27274] - Refer to Scala 2.12 in docs; deprecate Scala 2.11 support in 2.4.1

Dependency upgrade

[SPARK-26742] - Bump Kubernetes Client Version to 4.1.2

Documentation

[SPARK-25933] - Fix pstats reference for spark.python.profile.dump in configuration.md
[SPARK-26207] - add PowerIterationClustering (PIC) doc in 2.4 branch
[SPARK-26932] - Add a warning for Hive 2.1.1 ORC reader issue

Edit/Copy Release Notes

The text area below allows the project release notes to be edited and copied to another document.

Release Notes - Spark - Version 2.4.1
    
<h2>        Sub-task
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25883'>SPARK-25883</a>] -         Override method `prettyName` in `from_avro`/`to_avro`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26010'>SPARK-26010</a>] -         SparkR vignette fails on CRAN on Java 11
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26327'>SPARK-26327</a>] -         Metrics in FileSourceScanExec not update correctly while relation.partitionSchema is set
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26402'>SPARK-26402</a>] -         Accessing nested fields with different cases in case insensitive mode
</li>
</ul>
            
<h2>        Bug
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22148'>SPARK-22148</a>] -         TaskSetManager.abortIfCompletelyBlacklisted should not abort when all current executors are blacklisted but dynamic allocation is enabled
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23458'>SPARK-23458</a>] -          Flaky test: OrcQuerySuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24553'>SPARK-24553</a>] -         Job UI redirect causing http 302 error
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24669'>SPARK-24669</a>] -         Managed table was not cleared of path after drop database cascade
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24687'>SPARK-24687</a>] -         When NoClassDefError thrown during task serialization will cause job hang
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25451'>SPARK-25451</a>] -         Stages page doesn&#39;t show the right number of the total tasks
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25767'>SPARK-25767</a>] -         Error reported in Spark logs when using the org.apache.spark:spark-sql_2.11:2.3.2 Java library
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25786'>SPARK-25786</a>] -         If the ByteBuffer.hasArray is false , it will throw UnsupportedOperationException for Kryo
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25827'>SPARK-25827</a>] -         Replicating a block &gt; 2gb with encryption fails
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25837'>SPARK-25837</a>] -         Web UI does not respect spark.ui.retainedJobs in some instances
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25863'>SPARK-25863</a>] -         java.lang.UnsupportedOperationException: empty.max at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.updateAndGetCompilationStats(CodeGenerator.scala:1475)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25866'>SPARK-25866</a>] -         Update KMeans formatVersion
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25906'>SPARK-25906</a>] -         spark-shell cannot handle `-i` option correctly
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25909'>SPARK-25909</a>] -         Error in documentation: number of cluster managers
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25918'>SPARK-25918</a>] -         LOAD DATA LOCAL INPATH should handle a relative path
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25921'>SPARK-25921</a>] -         Python worker reuse causes Barrier tasks to run without BarrierTaskContext
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25922'>SPARK-25922</a>] -         [K8] Spark Driver/Executor &quot;spark-app-selector&quot; label mismatch
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25930'>SPARK-25930</a>] -         Fix scala version string detection when maven-help-plugin is not pre-installed
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25934'>SPARK-25934</a>] -         Mesos: SPARK_CONF_DIR should not be propogated by spark submit
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25979'>SPARK-25979</a>] -         Window function: allow parentheses around window reference
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25988'>SPARK-25988</a>] -         Keep names unchanged when deduplicating the column names in Analyzer
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25992'>SPARK-25992</a>] -         Accumulators giving KeyError in pyspark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26011'>SPARK-26011</a>] -         pyspark app with &quot;spark.jars.packages&quot; config does not work
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26019'>SPARK-26019</a>] -         pyspark/accumulators.py: &quot;TypeError: object of type &#39;NoneType&#39; has no len()&quot; in authenticate_and_accum_updates()
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26048'>SPARK-26048</a>] -         Flume connector for Spark 2.4 does not exist in Maven repository
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26057'>SPARK-26057</a>] -         Table joining is broken in Spark 2.4
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26078'>SPARK-26078</a>] -         WHERE .. IN fails to filter rows when used in combination with UNION
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26079'>SPARK-26079</a>] -         Flaky test: StreamingQueryListenersConfSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26080'>SPARK-26080</a>] -         Unable to run worker.py on Windows
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26082'>SPARK-26082</a>] -         Misnaming of spark.mesos.fetch(er)Cache.enable in MesosClusterScheduler
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26084'>SPARK-26084</a>] -         AggregateExpression.references fails on unresolved expression trees
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26092'>SPARK-26092</a>] -         Use CheckpointFileManager to write the streaming metadata file
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26100'>SPARK-26100</a>] -         [History server ]Jobs table and Aggregate metrics table are showing lesser number of tasks 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26109'>SPARK-26109</a>] -         Duration in the task summary metrics table and the task table are different
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26114'>SPARK-26114</a>] -         Memory leak of PartitionedPairBuffer when coalescing after repartitionAndSortWithinPartitions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26119'>SPARK-26119</a>] -         Task metrics summary in the stage page should contain only successful tasks metrics
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26137'>SPARK-26137</a>] -         Linux file separator is hard coded in DependencyUtils used in deploy process
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26147'>SPARK-26147</a>] -         Python UDFs in join condition fail even when using columns from only one side of join
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26181'>SPARK-26181</a>] -         the `hasMinMaxStats` method of `ColumnStatsMap` is not correct
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26184'>SPARK-26184</a>] -         Last updated time is not getting updated in the History Server UI
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26186'>SPARK-26186</a>] -         In progress applications with last updated time is lesser than the cleaning interval are getting removed during cleaning logs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26188'>SPARK-26188</a>] -         Spark 2.4.0 Partitioning behavior breaks backwards compatibility
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26198'>SPARK-26198</a>] -         Metadata serialize null values throw NPE
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26201'>SPARK-26201</a>] -         python broadcast.value on driver fails with disk encryption enabled
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26211'>SPARK-26211</a>] -         Fix InSet for binary, and struct and array with null.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26219'>SPARK-26219</a>] -         Executor summary is not getting updated for failure jobs in history server UI
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26228'>SPARK-26228</a>] -         OOM issue encountered when computing Gramian matrix 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26233'>SPARK-26233</a>] -         Incorrect decimal value with java beans and first/last/max... functions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26256'>SPARK-26256</a>] -         Add proper labels when deleting pods
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26265'>SPARK-26265</a>] -         deadlock between TaskMemoryManager and BytesToBytesMap$MapIterator
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26267'>SPARK-26267</a>] -         Kafka source may reprocess data
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26269'>SPARK-26269</a>] -         YarnAllocator should have same blacklist behaviour with YARN to maxmize use of cluster resource
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26307'>SPARK-26307</a>] -         Fix CTAS when INSERT a partitioned table using Hive serde
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26315'>SPARK-26315</a>] -         auto cast threshold from Integer to Float in approxSimilarityJoin of BucketedRandomProjectionLSHModel
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26351'>SPARK-26351</a>] -         Documented formula of precision at k does not match the actual code
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26352'>SPARK-26352</a>] -         Join reordering should not change the order of output attributes
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26355'>SPARK-26355</a>] -         Add a workaround for PyArrow 0.11.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26366'>SPARK-26366</a>] -         Except with transform regression
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26370'>SPARK-26370</a>] -         Fix resolution of higher-order function for the same identifier.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26379'>SPARK-26379</a>] -         Use dummy TimeZoneId for CurrentTimestamp to avoid UnresolvedException in CurrentBatchTimestamp
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26382'>SPARK-26382</a>] -         prefix sorter should handle -0.0
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26394'>SPARK-26394</a>] -         Annotation error for Utils.timeStringAsMs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26422'>SPARK-26422</a>] -         Unable to disable Hive support in SparkR when Hadoop version is unsupported
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26426'>SPARK-26426</a>] -         ExpressionInfo related unit tests fail in Windows
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26427'>SPARK-26427</a>] -         Upgrade Apache ORC to 1.5.4
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26444'>SPARK-26444</a>] -         Stage color doesn&#39;t change with it&#39;s status
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26496'>SPARK-26496</a>] -         Avoid to use Random.nextString in StreamingInnerJoinSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26501'>SPARK-26501</a>] -         Unexpected overriden of exitFn in SparkSubmitSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26537'>SPARK-26537</a>] -         update the release scripts to point to gitbox
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26538'>SPARK-26538</a>] -         Postgres numeric array support
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26545'>SPARK-26545</a>] -         Fix typo in EqualNullSafe&#39;s truth table comment
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26551'>SPARK-26551</a>] -         Selecting one complex field and having is null predicate on another complex field can cause error
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26554'>SPARK-26554</a>] -         Update `release-util.sh` to avoid GitBox fake 200 headers
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26559'>SPARK-26559</a>] -         ML image can&#39;t work with numpy versions prior to 1.9
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26571'>SPARK-26571</a>] -         Update Hive Serde mapping with canonical name of Parquet and Orc FileFormat
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26572'>SPARK-26572</a>] -         Join on distinct column with monotonically_increasing_id produces wrong output
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26576'>SPARK-26576</a>] -         Broadcast hint not applied to partitioned table
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26583'>SPARK-26583</a>] -         Add `paranamer` dependency to `core` module
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26586'>SPARK-26586</a>] -         Streaming queries should have isolated SparkSessions and confs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26606'>SPARK-26606</a>] -         parameters passed in extraJavaOptions are not being picked up 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26615'>SPARK-26615</a>] -         Fixing transport server/client resource leaks in the core unittests 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26629'>SPARK-26629</a>] -         Error with multiple file stream in a query + restart on a batch that has no data for one file stream
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26638'>SPARK-26638</a>] -         Pyspark vector classes always return error for unary negation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26665'>SPARK-26665</a>] -         BlockTransferService.fetchBlockSync may hang forever
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26677'>SPARK-26677</a>] -         Incorrect results of not(eqNullSafe) when data read from Parquet file 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26680'>SPARK-26680</a>] -         StackOverflowError if Stream passed to groupBy
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26682'>SPARK-26682</a>] -         Task attempt ID collision causes lost data
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26706'>SPARK-26706</a>] -         Fix Cast$mayTruncate for bytes
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26708'>SPARK-26708</a>] -         Incorrect result caused by inconsistency between a SQL cache&#39;s cached RDD and its physical plan
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26709'>SPARK-26709</a>] -         OptimizeMetadataOnlyQuery does not correctly handle the files with zero record
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26718'>SPARK-26718</a>] -         Fixed integer overflow in SS kafka rateLimit calculation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26726'>SPARK-26726</a>] -           Synchronize the amount of memory used by the broadcast variable to the UI display
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26732'>SPARK-26732</a>] -         Flaky test: SparkContextInfoSuite.getRDDStorageInfo only reports on RDDs that actually persist data
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26734'>SPARK-26734</a>] -         StackOverflowError on WAL serialization caused by large receivedBlockQueue
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26740'>SPARK-26740</a>] -         Statistics for date and timestamp columns depend on system time zone
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26745'>SPARK-26745</a>] -         Non-parsing Dataset.count() optimization causes inconsistent results for JSON inputs with empty lines
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26751'>SPARK-26751</a>] -         HiveSessionImpl might have memory leak since Operation do not close properly
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26757'>SPARK-26757</a>] -         GraphX EdgeRDDImpl and VertexRDDImpl `count` method cannot handle empty RDDs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26758'>SPARK-26758</a>] -         Idle Executors are not getting killed after spark.dynamicAllocation.executorIdleTimeout value
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26806'>SPARK-26806</a>] -         EventTimeStats.merge doesn&#39;t handle &quot;zero.merge(zero)&quot; correctly
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26859'>SPARK-26859</a>] -         Fix field writer index bug in non-vectorized ORC deserializer
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26864'>SPARK-26864</a>] -         Query may return incorrect result when python udf is used as a join condition and the udf uses attributes from both legs of left semi join.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26873'>SPARK-26873</a>] -         FileFormatWriter creates inconsistent MR job IDs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26927'>SPARK-26927</a>] -         Race condition may cause dynamic allocation not working
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26950'>SPARK-26950</a>] -         Make RandomDataGenerator use Float.NaN or Double.NaN for all NaN values
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26990'>SPARK-26990</a>] -         Difference in handling of mixed-case partition column names after SPARK-26188
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27019'>SPARK-27019</a>] -         Spark UI&#39;s SQL tab shows inconsistent values
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27065'>SPARK-27065</a>] -         avoid more than one active task set managers for a stage
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27078'>SPARK-27078</a>] -         Read Hive materialized view throw MatchError
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27080'>SPARK-27080</a>] -         Read parquet file with merging metastore schema should compare schema field in uniform case.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27094'>SPARK-27094</a>] -         Thread interrupt being swallowed while launching executors in YarnAllocator
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27097'>SPARK-27097</a>] -         Avoid embedding platform-dependent offsets literally in whole-stage generated code
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27107'>SPARK-27107</a>] -         Spark SQL Job failing because of Kryo buffer overflow with ORC
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27111'>SPARK-27111</a>] -         A continuous query may fail with InterruptedException when kafka consumer temporally 0 partitions temporally
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27112'>SPARK-27112</a>] -         Spark Scheduler encounters two independent Deadlocks when trying to kill executors either due to dynamic allocation or blacklisting 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27134'>SPARK-27134</a>] -         array_distinct function does not work correctly with columns containing array of array
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27160'>SPARK-27160</a>] -         Incorrect Literal Casting of DecimalType in OrcFilters
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27165'>SPARK-27165</a>] -         Upgrade Apache ORC to 1.5.5
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27178'>SPARK-27178</a>] -         k8s test failing due to missing nss library in dockerfile
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27198'>SPARK-27198</a>] -         Heartbeat interval mismatch in driver and executor
</li>
</ul>
            
<h2>        New Feature
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25635'>SPARK-25635</a>] -         Support selective direct encoding in native ORC write
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26118'>SPARK-26118</a>] -         Make Jetty&#39;s requestHeaderSize configurable in Spark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26605'>SPARK-26605</a>] -         New executors failing with expired tokens in client mode
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26910'>SPARK-26910</a>] -         Re-release SparkR to CRAN
</li>
</ul>
    
<h2>        Improvement
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25023'>SPARK-25023</a>] -         Clarify Spark security documentation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25778'>SPARK-25778</a>] -         WriteAheadLogBackedBlockRDD in YARN Cluster Mode Fails due lack of access to tmpDir from $PWD to HDFS
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25904'>SPARK-25904</a>] -         Avoid allocating arrays too large for JVMs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26266'>SPARK-26266</a>] -         Update to Scala 2.12.8
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26316'>SPARK-26316</a>] -         Because of the perf degradation in TPC-DS, we currently partial revert SPARK-21052：Add hash map metrics to join,
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26392'>SPARK-26392</a>] -         Cancel pending allocate requests by taking locality preference into account
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26409'>SPARK-26409</a>] -         SQLConf should be serializable in test sessions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26604'>SPARK-26604</a>] -         Register channel for stream request
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26633'>SPARK-26633</a>] -         Add ExecutorClassLoader.getResourceAsStream
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27046'>SPARK-27046</a>] -         Remove SPARK-19185 related references from documentation since its resolved
</li>
</ul>
    
<h2>        Test
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25899'>SPARK-25899</a>] -         Flaky test: CoarseGrainedSchedulerBackendSuite.compute max number of concurrent tasks can be launched
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26029'>SPARK-26029</a>] -         Bump previousSparkVersion in MimaBuild.scala to be 2.3.0
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26042'>SPARK-26042</a>] -         KafkaContinuousSourceTopicDeletionSuite may hang forever
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26069'>SPARK-26069</a>] -         Flaky test: RpcIntegrationSuite.sendRpcWithStreamFailures
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26120'>SPARK-26120</a>] -         Fix a streaming query leak in Structured Streaming R tests
</li>
</ul>
        
<h2>        Task
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26607'>SPARK-26607</a>] -         Remove Spark 2.2.x testing from HiveExternalCatalogVersionsSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26897'>SPARK-26897</a>] -         Update Spark 2.3.x testing from HiveExternalCatalogVersionsSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27274'>SPARK-27274</a>] -         Refer to Scala 2.12 in docs; deprecate Scala 2.11 support in 2.4.1
</li>
</ul>
                                                    
<h2>        Dependency upgrade
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26742'>SPARK-26742</a>] -         Bump Kubernetes Client Version to 4.1.2
</li>
</ul>
                                                                                    
<h2>        Documentation
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25933'>SPARK-25933</a>] -         Fix pstats reference for spark.python.profile.dump in configuration.md
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26207'>SPARK-26207</a>] -         add PowerIterationClustering  (PIC) doc in 2.4 branch
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26932'>SPARK-26932</a>] -         Add a warning for Hive 2.1.1 ORC reader issue
</li>
</ul>