Release Notes - ASF JIRA

Release Notes - Spark - Version 2.4.4 - HTML format

Configure Release Notes

Sub-task

[SPARK-27441] - Add read/write tests to Hive serde tables

Bug

[SPARK-21882] - OutputMetrics doesn't count written bytes correctly in the saveAsHadoopDataset function
[SPARK-24285] - Flaky test: ContinuousSuite.query without test harness
[SPARK-25139] - PythonRunner#WriterThread released block after TaskRunner finally block which invoke BlockManager#releaseAllLocksForTask
[SPARK-26038] - Decimal toScalaBigInt/toJavaBigInteger not work for decimals not fitting in long
[SPARK-26045] - Error in the spark 2.4 release package with the spark-avro_2.11 depdency
[SPARK-26152] - Synchronize Worker Cleanup with Worker Shutdown
[SPARK-26555] - Thread safety issue causes createDataset to fail with misleading errors
[SPARK-26812] - PushProjectionThroughUnion nullability issue
[SPARK-26895] - When running spark 2.3 as a proxy user (--proxy-user), SparkSubmit fails to resolve globs owned by target user
[SPARK-26995] - Running Spark in Docker image with Alpine Linux 3.9.0 throws errors when using snappy
[SPARK-27018] - Checkpointed RDD deleted prematurely when using GBTClassifier
[SPARK-27100] - Use `Array` instead of `Seq` in `FilePartition` to prevent StackOverflowError
[SPARK-27159] - Update MsSqlServer dialect handling of BLOB type
[SPARK-27234] - Continuous Streaming does not support python UDFs
[SPARK-27298] - Dataset except operation gives different results(dataset count) on Spark 2.3.0 Windows and Spark 2.3.0 Linux environment
[SPARK-27330] - ForeachWriter is not being closed once a batch is aborted
[SPARK-27347] - Fix supervised driver retry logic when agent crashes/restarts
[SPARK-27416] - UnsafeMapData & UnsafeArrayData Kryo serialization breaks when two machines have different Oops size
[SPARK-27485] - EnsureRequirements.reorder should handle duplicate expressions gracefully
[SPARK-27577] - Wrong thresholds selected by BinaryClassificationMetrics when downsampling
[SPARK-27596] - The JDBC 'query' option doesn't work for Oracle database
[SPARK-27621] - Calling transform() method on a LinearRegressionModel throws NoSuchElementException
[SPARK-27624] - Fix CalenderInterval to show an empty interval correctly
[SPARK-27626] - Fix `docker-image-tool.sh` to be robust in non-bash shell env
[SPARK-27657] - ml.util.Instrumentation.logFailure doesn't log error message
[SPARK-27671] - Fix error when casting from a nested null in a struct
[SPARK-27711] - InputFileBlockHolder should be unset at the end of tasks
[SPARK-27735] - Interval string in upper case is not supported in Trigger
[SPARK-27781] - Tried to access method org.apache.avro.specific.SpecificData.<init>()V
[SPARK-27798] - ConvertToLocalRelation should tolerate expression reusing output object
[SPARK-27858] - Fix for avro deserialization on union types with multiple non-null types
[SPARK-27863] - Metadata files and temporary files should not be counted as data files
[SPARK-27869] - Redact sensitive information in System Properties from UI
[SPARK-27873] - Csv reader, adding a corrupt record column causes error if enforceSchema=false
[SPARK-27907] - HiveUDAF should return NULL in case of 0 rows
[SPARK-27917] - Semantic equals of CaseWhen is failing with case sensitivity of column Names
[SPARK-27992] - PySpark socket server should sync with JVM connection thread future
[SPARK-28015] - Check stringToDate() consumes entire input for the yyyy and yyyy-[m]m formats
[SPARK-28025] - HDFSBackedStateStoreProvider should not leak .crc files
[SPARK-28058] - Reading csv with DROPMALFORMED sometimes doesn't drop malformed records
[SPARK-28081] - word2vec 'large' count value too low for very large corpora
[SPARK-28153] - Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF)
[SPARK-28156] - Join plan sometimes does not use cached query
[SPARK-28157] - Make SHS clear KVStore LogInfo for the blacklisted entries
[SPARK-28160] - TransportClient.sendRpcSync may hang forever
[SPARK-28164] - usage description does not match with shell scripts
[SPARK-28302] - SparkLauncher: The process cannot access the file because it is being used by another process
[SPARK-28308] - CalendarInterval sub-second part should be padded before parsing
[SPARK-28371] - Parquet "starts with" filter is not null-safe
[SPARK-28404] - Fix negative timeout value in RateStreamContinuousPartitionReader
[SPARK-28430] - Some stage table rows render wrong number of columns if tasks are missing metrics
[SPARK-28468] - Upgrade pip to fix `sphinx` install error
[SPARK-28489] - KafkaOffsetRangeCalculator.getRanges may drop offsets
[SPARK-28582] - Pyspark daemon exit failed when receive SIGTERM on py3.7
[SPARK-28606] - Update CRAN key to recover docker image generation
[SPARK-28638] - Task summary metrics are wrong when there are running tasks
[SPARK-28642] - Hide credentials in show create table
[SPARK-28647] - Recover additional metric feature and remove additional-metrics.js
[SPARK-28699] - Cache an indeterminate RDD could lead to incorrect result while stage rerun
[SPARK-28766] - Fix CRAN incoming feasibility warning on invalid URL
[SPARK-28775] - DateTimeUtilsSuite fails for JDKs using the tzdata2018i or newer timezone database
[SPARK-28780] - Delete the incorrect setWeightCol method in LinearSVCModel
[SPARK-28844] - Fix typo in SQLConf FILE_COMRESSION_FACTOR
[SPARK-28868] - Specify Jekyll version to 3.8.6 in release docker image
[SPARK-29414] - HasOutputCol param isSet() property is not preserved after persistence
[SPARK-29773] - Unable to process empty ORC files in Hive Table using Spark SQL
[SPARK-31604] - java.lang.IllegalArgumentException: Frame length should be positive

New Feature

[SPARK-35197] - Accumulators Explore Page on Spark UI on History Server

Improvement

[SPARK-24898] - Adding spark.checkpoint.compress to the docs
[SPARK-26192] - MesosClusterScheduler reads options from dispatcher conf instead of submission conf
[SPARK-27672] - Add since info to string expressions
[SPARK-27673] - Add since info to random. regex, null expressions
[SPARK-27771] - Add SQL description for grouping functions (cube, rollup, grouping and grouping_id)
[SPARK-27794] - Use secure URLs for downloading CRAN artifacts
[SPARK-27973] - Streaming sample DirectKafkaWordCount should mention GroupId in usage
[SPARK-28154] - GMM fix double caching
[SPARK-28170] - DenseVector .toArray() and .values documentation do not specify they are aliases
[SPARK-28378] - Remove usage of cgi.escape
[SPARK-28421] - SparseVector.apply performance optimization
[SPARK-28496] - Use branch name instead of tag during dry-run
[SPARK-28545] - Add the hash map size to the directional log of ObjectAggregationIterator
[SPARK-28564] - Access history application defaults to the last attempt id
[SPARK-28649] - Git Ignore does not ignore python/.eggs
[SPARK-28713] - Bump checkstyle from 8.14 to 8.23

Test

[SPARK-24352] - Flaky test: StandaloneDynamicAllocationSuite
[SPARK-27168] - Add docker integration test for MsSql Server
[SPARK-28031] - Improve or remove doctest on over function of Column
[SPARK-28247] - Flaky test: "query without test harness" in ContinuousSuite
[SPARK-28261] - Flaky test: org.apache.spark.network.TransportClientFactorySuite.reuseClientsUpToConfigVariable
[SPARK-28335] - Flaky test: org.apache.spark.streaming.kafka010.DirectKafkaStreamSuite.offset recovery from kafka
[SPARK-28357] - Fix Flaky Test - FileAppenderSuite.rolling file appender - size-based rolling compressed
[SPARK-28361] - Test equality of generated code with id in class name
[SPARK-28418] - Flaky Test: pyspark.sql.tests.test_dataframe: test_query_execution_listener_on_collect
[SPARK-28535] - Flaky test: JobCancellationSuite."interruptible iterator of shuffle reader"
[SPARK-28881] - toPandas with Arrow should not return a DataFrame when the result size exceeds `spark.driver.maxResultSize`

Umbrella

[SPARK-27726] - Performance of InMemoryStore suffers under load

Documentation

[SPARK-27800] - Example for xor function has a wrong answer
[SPARK-28464] - Document kafka minPartitions option in "Structured Streaming + Kafka Integration Guide"
[SPARK-28609] - Fix broken styles/links and make up-to-date
[SPARK-28777] - Pyspark sql function "format_string" has the wrong parameters in doc string
[SPARK-28871] - Some codes in 'Policy for handling multiple watermarks' does not show friendly

Edit/Copy Release Notes

The text area below allows the project release notes to be edited and copied to another document.

Release Notes - Spark - Version 2.4.4
    
<h2>        Sub-task
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27441'>SPARK-27441</a>] -         Add read/write tests to Hive serde tables
</li>
</ul>
            
<h2>        Bug
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21882'>SPARK-21882</a>] -         OutputMetrics doesn&#39;t count written bytes correctly in the saveAsHadoopDataset function
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24285'>SPARK-24285</a>] -         Flaky test: ContinuousSuite.query without test harness
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25139'>SPARK-25139</a>] -         PythonRunner#WriterThread released block after TaskRunner finally block which  invoke BlockManager#releaseAllLocksForTask
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26038'>SPARK-26038</a>] -         Decimal toScalaBigInt/toJavaBigInteger not work for decimals not fitting in long
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26045'>SPARK-26045</a>] -         Error in the spark 2.4 release package with the spark-avro_2.11 depdency
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26152'>SPARK-26152</a>] -         Synchronize Worker Cleanup with Worker Shutdown
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26555'>SPARK-26555</a>] -         Thread safety issue causes createDataset to fail with misleading errors
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26812'>SPARK-26812</a>] -         PushProjectionThroughUnion nullability issue
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26895'>SPARK-26895</a>] -         When running spark 2.3 as a proxy user (--proxy-user), SparkSubmit fails to resolve globs owned by target user
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26995'>SPARK-26995</a>] -         Running Spark in Docker image with Alpine Linux 3.9.0 throws errors when using snappy
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27018'>SPARK-27018</a>] -         Checkpointed RDD deleted prematurely when using GBTClassifier
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27100'>SPARK-27100</a>] -         Use `Array` instead of `Seq` in `FilePartition` to prevent StackOverflowError
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27159'>SPARK-27159</a>] -         Update MsSqlServer dialect handling of BLOB type
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27234'>SPARK-27234</a>] -         Continuous Streaming does not support python UDFs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27298'>SPARK-27298</a>] -         Dataset except operation gives different results(dataset count) on Spark 2.3.0 Windows and Spark 2.3.0 Linux environment
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27330'>SPARK-27330</a>] -         ForeachWriter is not being closed once a batch is aborted
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27347'>SPARK-27347</a>] -         Fix supervised driver retry logic when agent crashes/restarts
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27416'>SPARK-27416</a>] -         UnsafeMapData &amp; UnsafeArrayData Kryo serialization breaks when two machines have different Oops size
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27485'>SPARK-27485</a>] -         EnsureRequirements.reorder should handle duplicate expressions gracefully
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27577'>SPARK-27577</a>] -         Wrong thresholds selected by BinaryClassificationMetrics when downsampling
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27596'>SPARK-27596</a>] -         The JDBC &#39;query&#39; option doesn&#39;t work for Oracle database
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27621'>SPARK-27621</a>] -         Calling transform() method on a LinearRegressionModel throws NoSuchElementException
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27624'>SPARK-27624</a>] -         Fix CalenderInterval to show an empty interval correctly
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27626'>SPARK-27626</a>] -         Fix `docker-image-tool.sh` to be robust in non-bash shell env
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27657'>SPARK-27657</a>] -         ml.util.Instrumentation.logFailure doesn&#39;t log error message
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27671'>SPARK-27671</a>] -         Fix error when casting from a nested null in a struct
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27711'>SPARK-27711</a>] -         InputFileBlockHolder should be unset at the end of tasks
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27735'>SPARK-27735</a>] -         Interval string in upper case is not supported in Trigger
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27781'>SPARK-27781</a>] -         Tried to access method org.apache.avro.specific.SpecificData.&lt;init&gt;()V
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27798'>SPARK-27798</a>] -         ConvertToLocalRelation should tolerate expression reusing output object
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27858'>SPARK-27858</a>] -         Fix for avro deserialization on union types with multiple non-null types
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27863'>SPARK-27863</a>] -         Metadata files and temporary files should not be counted as data files
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27869'>SPARK-27869</a>] -         Redact sensitive information in System Properties from UI
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27873'>SPARK-27873</a>] -         Csv reader, adding a corrupt record column causes error if enforceSchema=false
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27907'>SPARK-27907</a>] -         HiveUDAF should return NULL in case of 0 rows
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27917'>SPARK-27917</a>] -         Semantic equals of CaseWhen is failing with case sensitivity of column Names
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27992'>SPARK-27992</a>] -         PySpark socket server should sync with JVM connection thread future
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28015'>SPARK-28015</a>] -         Check stringToDate() consumes entire input for the yyyy and yyyy-[m]m formats
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28025'>SPARK-28025</a>] -         HDFSBackedStateStoreProvider should not leak .crc files 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28058'>SPARK-28058</a>] -         Reading csv with DROPMALFORMED sometimes doesn&#39;t drop malformed records
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28081'>SPARK-28081</a>] -         word2vec &#39;large&#39; count value too low for very large corpora
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28153'>SPARK-28153</a>] -         Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28156'>SPARK-28156</a>] -         Join plan sometimes does not use cached query
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28157'>SPARK-28157</a>] -         Make SHS clear KVStore LogInfo for the blacklisted entries
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28160'>SPARK-28160</a>] -         TransportClient.sendRpcSync may hang forever
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28164'>SPARK-28164</a>] -         usage description does not match with shell scripts
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28302'>SPARK-28302</a>] -         SparkLauncher: The process cannot access the file because it is being used by another process
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28308'>SPARK-28308</a>] -         CalendarInterval sub-second part should be padded before parsing
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28371'>SPARK-28371</a>] -         Parquet &quot;starts with&quot; filter is not null-safe
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28404'>SPARK-28404</a>] -         Fix negative timeout value in RateStreamContinuousPartitionReader
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28430'>SPARK-28430</a>] -         Some stage table rows render wrong number of columns if tasks are missing metrics 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28468'>SPARK-28468</a>] -         Upgrade pip to fix `sphinx` install error
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28489'>SPARK-28489</a>] -         KafkaOffsetRangeCalculator.getRanges may drop offsets
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28582'>SPARK-28582</a>] -         Pyspark daemon exit failed when receive SIGTERM on py3.7
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28606'>SPARK-28606</a>] -         Update CRAN key to recover docker image generation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28638'>SPARK-28638</a>] -         Task summary metrics are wrong when there are running tasks
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28642'>SPARK-28642</a>] -         Hide credentials in show create table
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28647'>SPARK-28647</a>] -         Recover additional metric feature and remove additional-metrics.js
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28699'>SPARK-28699</a>] -         Cache an indeterminate RDD could lead to incorrect result while stage rerun
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28766'>SPARK-28766</a>] -         Fix CRAN incoming feasibility warning on invalid URL
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28775'>SPARK-28775</a>] -         DateTimeUtilsSuite fails for JDKs using the tzdata2018i or newer timezone database
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28780'>SPARK-28780</a>] -         Delete the incorrect setWeightCol method in LinearSVCModel
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28844'>SPARK-28844</a>] -         Fix typo in SQLConf FILE_COMRESSION_FACTOR
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28868'>SPARK-28868</a>] -         Specify Jekyll version to 3.8.6 in release docker image
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29414'>SPARK-29414</a>] -         HasOutputCol param isSet() property is not preserved after persistence
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29773'>SPARK-29773</a>] -         Unable to process empty ORC files in Hive Table using Spark SQL
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-31604'>SPARK-31604</a>] -         java.lang.IllegalArgumentException: Frame length should be positive
</li>
</ul>
            
<h2>        New Feature
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-35197'>SPARK-35197</a>] -         Accumulators Explore Page on Spark UI on History Server
</li>
</ul>
    
<h2>        Improvement
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24898'>SPARK-24898</a>] -         Adding spark.checkpoint.compress to the docs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26192'>SPARK-26192</a>] -         MesosClusterScheduler reads options from dispatcher conf instead of submission conf
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27672'>SPARK-27672</a>] -         Add since info to string expressions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27673'>SPARK-27673</a>] -         Add since info to random. regex, null expressions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27771'>SPARK-27771</a>] -         Add SQL description for grouping functions (cube, rollup, grouping and grouping_id)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27794'>SPARK-27794</a>] -         Use secure URLs for downloading CRAN artifacts
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27973'>SPARK-27973</a>] -         Streaming sample DirectKafkaWordCount should mention GroupId in usage
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28154'>SPARK-28154</a>] -         GMM fix double caching
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28170'>SPARK-28170</a>] -         DenseVector .toArray() and .values documentation do not specify they are aliases
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28378'>SPARK-28378</a>] -         Remove usage of cgi.escape
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28421'>SPARK-28421</a>] -         SparseVector.apply performance optimization
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28496'>SPARK-28496</a>] -         Use branch name instead of tag during dry-run
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28545'>SPARK-28545</a>] -         Add the hash map size to the directional log of ObjectAggregationIterator
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28564'>SPARK-28564</a>] -         Access history application defaults to the last attempt id
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28649'>SPARK-28649</a>] -         Git Ignore does not ignore python/.eggs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28713'>SPARK-28713</a>] -         Bump checkstyle from 8.14 to 8.23
</li>
</ul>
    
<h2>        Test
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24352'>SPARK-24352</a>] -         Flaky test: StandaloneDynamicAllocationSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27168'>SPARK-27168</a>] -         Add docker integration test for MsSql Server
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28031'>SPARK-28031</a>] -         Improve or remove doctest on over function of Column
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28247'>SPARK-28247</a>] -         Flaky test: &quot;query without test harness&quot; in ContinuousSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28261'>SPARK-28261</a>] -         Flaky test: org.apache.spark.network.TransportClientFactorySuite.reuseClientsUpToConfigVariable
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28335'>SPARK-28335</a>] -         Flaky test: org.apache.spark.streaming.kafka010.DirectKafkaStreamSuite.offset recovery from kafka
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28357'>SPARK-28357</a>] -         Fix Flaky Test - FileAppenderSuite.rolling file appender - size-based rolling compressed
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28361'>SPARK-28361</a>] -         Test equality of generated code with id in class name
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28418'>SPARK-28418</a>] -         Flaky Test: pyspark.sql.tests.test_dataframe: test_query_execution_listener_on_collect
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28535'>SPARK-28535</a>] -         Flaky test: JobCancellationSuite.&quot;interruptible iterator of shuffle reader&quot;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28881'>SPARK-28881</a>] -         toPandas with Arrow should not return a DataFrame when the result size exceeds `spark.driver.maxResultSize`
</li>
</ul>
                                                                                
<h2>        Umbrella
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27726'>SPARK-27726</a>] -         Performance of InMemoryStore suffers under load
</li>
</ul>
                                                                
<h2>        Documentation
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27800'>SPARK-27800</a>] -         Example for xor function has a wrong answer
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28464'>SPARK-28464</a>] -         Document kafka minPartitions option in &quot;Structured Streaming + Kafka Integration Guide&quot; 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28609'>SPARK-28609</a>] -         Fix broken styles/links and make up-to-date
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28777'>SPARK-28777</a>] -         Pyspark sql function &quot;format_string&quot; has the wrong parameters in doc string
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28871'>SPARK-28871</a>] -         Some codes in &#39;Policy for handling multiple watermarks&#39; does not show friendly 
</li>
</ul>