Release Notes - ASF JIRA

Release Notes - Spark - Version 2.3.1 - HTML format

Configure Release Notes

Sub-task

[SPARK-23706] - spark.conf.get(value, default=None) should produce None in PySpark
[SPARK-23748] - Support select from temp tables
[SPARK-23942] - PySpark's collect doesn't trigger QueryExecutionListener
[SPARK-24334] - Race condition in ArrowPythonRunner causes unclean shutdown of Arrow memory allocator

Bug

[SPARK-10878] - Race condition when resolving Maven coordinates via Ivy
[SPARK-19181] - SparkListenerSuite.local metrics fails when average executorDeserializeTime is too short.
[SPARK-19613] - Flaky test: StateStoreRDDSuite
[SPARK-21945] - pyspark --py-files doesn't work in yarn client mode
[SPARK-22371] - dag-scheduler-event-loop thread stopped with error Attempted to access garbage collected accumulator 5605982
[SPARK-23004] - Structured Streaming raise "llegalStateException: Cannot remove after already committed or aborted"
[SPARK-23020] - Re-enable Flaky Test: org.apache.spark.launcher.SparkLauncherSuite.testInProcessLauncher
[SPARK-23173] - from_json can produce nulls for fields which are marked as non-nullable
[SPARK-23288] - Incorrect number of written records in structured streaming
[SPARK-23291] - SparkR : substr : In SparkR dataframe , starting and ending position arguments in "substr" is giving wrong result when the position is greater than 1
[SPARK-23340] - Upgrade Apache ORC to 1.4.3
[SPARK-23365] - DynamicAllocation with failure in straggler task can lead to a hung spark job
[SPARK-23406] - Stream-stream self joins does not work
[SPARK-23433] - java.lang.IllegalStateException: more than one active taskSet for stage
[SPARK-23434] - Spark should not warn `metadata directory` for a HDFS file path
[SPARK-23436] - Incorrect Date column Inference in partition discovery
[SPARK-23438] - DStreams could lose blocks with WAL enabled when driver crashes
[SPARK-23448] - Dataframe returns wrong result when column don't respect datatype
[SPARK-23449] - Extra java options lose order in Docker context
[SPARK-23457] - Register task completion listeners first for ParquetFileFormat
[SPARK-23462] - Improve the error message in `StructType`
[SPARK-23489] - Flaky Test: HiveExternalCatalogVersionsSuite
[SPARK-23490] - Check storage.locationUri with existing table in CreateTable
[SPARK-23508] - blockManagerIdCache in BlockManagerId may cause oom
[SPARK-23517] - Make pyspark.util._exception_message produce the trace from Java side for Py4JJavaError
[SPARK-23523] - Incorrect result caused by the rule OptimizeMetadataOnlyQuery
[SPARK-23524] - Big local shuffle blocks should not be checked for corruption.
[SPARK-23525] - ALTER TABLE CHANGE COLUMN COMMENT doesn't work for external hive table
[SPARK-23551] - Exclude `hadoop-mapreduce-client-core` dependency from `orc-mapreduce`
[SPARK-23569] - pandas_udf does not work with type-annotated python functions
[SPARK-23570] - Add Spark-2.3 in HiveExternalCatalogVersionsSuite
[SPARK-23598] - WholeStageCodegen can lead to IllegalAccessError calling append for HashAggregateExec
[SPARK-23599] - The UUID() expression is too non-deterministic
[SPARK-23608] - SHS needs synchronization between attachSparkUI and detachSparkUI functions
[SPARK-23614] - Union produces incorrect results when caching is used
[SPARK-23623] - Avoid concurrent use of cached KafkaConsumer in CachedKafkaConsumer (kafka-0-10-sql)
[SPARK-23630] - Spark-on-YARN missing user customizations of hadoop config
[SPARK-23637] - Yarn might allocate more resource if a same executor is killed multiple times.
[SPARK-23639] - SparkSQL CLI fails talk to Kerberized metastore when use proxy user
[SPARK-23649] - CSV schema inferring fails on some UTF-8 chars
[SPARK-23658] - InProcessAppHandle uses the wrong class in getLogger
[SPARK-23660] - Yarn throws exception in cluster mode when the application is small
[SPARK-23670] - Memory leak of SparkPlanGraphWrapper in sparkUI
[SPARK-23671] - SHS is ignoring number of replay threads
[SPARK-23697] - Accumulators of Spark 1.x no longer work with Spark 2.x
[SPARK-23728] - ML test with expected exceptions testing streaming fails on 2.3
[SPARK-23729] - Glob resolution breaks remote naming of files/archives
[SPARK-23734] - InvalidSchemaException While Saving ALSModel
[SPARK-23754] - StopIterator exception in Python UDF results in partial result
[SPARK-23759] - Unable to bind Spark UI to specific host name / IP
[SPARK-23760] - CodegenContext.withSubExprEliminationExprs should save/restore CSE state correctly
[SPARK-23775] - Flaky test: DataFrameRangeSuite
[SPARK-23780] - Failed to use googleVis library with new SparkR
[SPARK-23788] - Race condition in StreamingQuerySuite
[SPARK-23802] - PropagateEmptyRelation can leave query plan in unresolved state
[SPARK-23806] - Broadcast. unpersist can cause fatal exception when used with dynamic allocation
[SPARK-23808] - Test spark sessions should set default session
[SPARK-23809] - Active SparkSession should be set by getOrCreate
[SPARK-23815] - Spark writer dynamic partition overwrite mode fails to write output on multi level partition
[SPARK-23816] - FetchFailedException when killing speculative task
[SPARK-23823] - ResolveReferences loses correct origin
[SPARK-23827] - StreamingJoinExec should ensure that input data is partitioned into specific number of partitions
[SPARK-23835] - When Dataset.as converts column from nullable to non-nullable type, null Doubles are converted silently to -1
[SPARK-23850] - We should not redact username|user|url from UI by default
[SPARK-23852] - Parquet MR bug can lead to incorrect SQL results
[SPARK-23853] - Skip doctests which require hive support built in PySpark
[SPARK-23868] - Fix scala.MatchError in literals.sql.out
[SPARK-23941] - Mesos task failed on specific spark app name
[SPARK-23971] - Should not leak Spark sessions across test suites
[SPARK-23986] - CompileException when using too many avg aggregation after joining
[SPARK-23989] - When using `SortShuffleWriter`, the data will be overwritten
[SPARK-23991] - data loss when allocateBlocksToBatch
[SPARK-24002] - Task not serializable caused by org.apache.parquet.io.api.Binary$ByteBufferBackedBinary.getBytes
[SPARK-24007] - EqualNullSafe for FloatType and DoubleType might generate a wrong result by codegen.
[SPARK-24021] - Fix bug in BlacklistTracker's updateBlacklistForFetchFailure
[SPARK-24022] - Flaky test: SparkContextSuite
[SPARK-24033] - LAG Window function broken in Spark 2.3
[SPARK-24062] - SASL encryption cannot be worked in ThriftServer
[SPARK-24067] - Backport SPARK-17147 to 2.3 (Spark Streaming Kafka 0.10 Consumer Can't Handle Non-consecutive Offsets (i.e. Log Compaction))
[SPARK-24068] - CSV schema inferring doesn't work for compressed files
[SPARK-24085] - Scalar subquery error
[SPARK-24104] - SQLAppStatusListener overwrites metrics onDriverAccumUpdates instead of updating them
[SPARK-24107] - ChunkedByteBuffer.writeFully method has not reset the limit value
[SPARK-24133] - Reading Parquet files containing large strings can fail with java.lang.ArrayIndexOutOfBoundsException
[SPARK-24166] - InMemoryTableScanExec should not access SQLConf at executor side
[SPARK-24168] - WindowExec should not access SQLConf at executor side
[SPARK-24169] - JsonToStructs should not access SQLConf at executor side
[SPARK-24214] - StreamingRelationV2/StreamingExecutionRelation/ContinuousExecutionRelation.toJSON should not fail
[SPARK-24230] - With Parquet 1.10 upgrade has errors in the vectorized reader
[SPARK-24255] - Require Java 8 in SparkR description
[SPARK-24257] - LongToUnsafeRowMap calculate the new size may be wrong
[SPARK-24259] - ArrayWriter for Arrow produces wrong output
[SPARK-24263] - SparkR java check breaks on openjdk
[SPARK-24309] - AsyncEventQueue should handle an interrupt from a Listener
[SPARK-24313] - Collection functions interpreted execution doesn't work with complex types
[SPARK-24322] - Upgrade Apache ORC to 1.4.4
[SPARK-24364] - Files deletion after globbing may fail StructuredStreaming jobs
[SPARK-24373] - "df.cache() df.count()" no longer eagerly caches data when the analyzed plans are different after re-analyzing the plans
[SPARK-24384] - spark-submit --py-files with .py files doesn't work in client mode before context initialization
[SPARK-24399] - Reused Exchange is used where it should not be
[SPARK-24414] - Stages page doesn't show all task attempts when failures
[SPARK-26612] - Speculation kill causing finished stage recomputed
[SPARK-26614] - Speculation kill might cause job failure

New Feature

[SPARK-23948] - Trigger mapstage's job listener in submitMissingTasks
[SPARK-24465] - LSHModel should support Structured Streaming for transform

Improvement

[SPARK-23040] - BlockStoreShuffleReader's return Iterator isn't interruptible if aggregator or ordering is specified
[SPARK-23553] - Tests should not assume the default value of `spark.sql.sources.default`
[SPARK-23624] - Revise doc of method pushFilters
[SPARK-23628] - WholeStageCodegen can generate methods with too many params
[SPARK-23644] - SHS with proxy doesn't show applications
[SPARK-23645] - pandas_udf can not be called with keyword arguments
[SPARK-23691] - Use sql_conf util in PySpark tests where possible
[SPARK-23695] - Confusing error message for PySpark's Kinesis tests when its jar is missing but enabled
[SPARK-23769] - Remove unnecessary scalastyle check disabling
[SPARK-23822] - Improve error message for Parquet schema mismatches
[SPARK-23838] - SparkUI: Running SQL query displayed as "completed" in SQL tab
[SPARK-23867] - com.codahale.metrics.Counter output in log message has no toString method
[SPARK-23962] - Flaky tests from SQLMetricsTestUtils.currentExecutionIds
[SPARK-23963] - Queries on text-based Hive tables grow disproportionately slower as the number of columns increase
[SPARK-24014] - Add onStreamingStarted method to StreamingListener
[SPARK-24128] - Mention spark.sql.crossJoin.enabled in implicit cartesian product error msg
[SPARK-24188] - /api/v1/version not working
[SPARK-24246] - Improve AnalysisException by setting the cause when it's available
[SPARK-24262] - Fix typo in UDF error message

Test

[SPARK-22882] - ML test for StructuredStreaming: spark.ml.classification
[SPARK-22883] - ML test for StructuredStreaming: spark.ml.feature, A-M
[SPARK-22915] - ML test for StructuredStreaming: spark.ml.feature, N-Z
[SPARK-23881] - Flaky test: JobCancellationSuite."interruptible iterator of shuffle reader"

Task

[SPARK-23601] - Remove .md5 files from release
[SPARK-24392] - Mark pandas_udf as Experimental

Request

[SPARK-28295] - Is there a way of getting feature names from pyspark.ml.regression GeneralizedLinearRegression?

Documentation

[SPARK-23329] - Update the function descriptions with the arguments and returned values of the trigonometric functions
[SPARK-23642] - isZero scaladoc for LongAccumulator describes wrong method
[SPARK-24378] - Incorrect examples for date_trunc function in spark 2.3.0
[SPARK-24444] - Improve pandas_udf GROUPED_MAP docs to explain column assignment

Edit/Copy Release Notes

The text area below allows the project release notes to be edited and copied to another document.

Release Notes - Spark - Version 2.3.1
    
<h2>        Sub-task
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23706'>SPARK-23706</a>] -         spark.conf.get(value, default=None) should produce None in PySpark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23748'>SPARK-23748</a>] -         Support select from temp tables
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23942'>SPARK-23942</a>] -         PySpark&#39;s collect doesn&#39;t trigger QueryExecutionListener
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24334'>SPARK-24334</a>] -         Race condition in ArrowPythonRunner causes unclean shutdown of Arrow memory allocator
</li>
</ul>
            
<h2>        Bug
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-10878'>SPARK-10878</a>] -         Race condition when resolving Maven coordinates via Ivy
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19181'>SPARK-19181</a>] -         SparkListenerSuite.local metrics fails when average executorDeserializeTime is too short.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19613'>SPARK-19613</a>] -         Flaky test: StateStoreRDDSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21945'>SPARK-21945</a>] -         pyspark --py-files doesn&#39;t work in yarn client mode
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22371'>SPARK-22371</a>] -         dag-scheduler-event-loop thread stopped with error  Attempted to access garbage collected accumulator 5605982
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23004'>SPARK-23004</a>] -         Structured Streaming raise &quot;llegalStateException: Cannot remove after already committed or aborted&quot;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23020'>SPARK-23020</a>] -         Re-enable Flaky Test: org.apache.spark.launcher.SparkLauncherSuite.testInProcessLauncher
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23173'>SPARK-23173</a>] -         from_json can produce nulls for fields which are marked as non-nullable
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23288'>SPARK-23288</a>] -         Incorrect number of written records in structured streaming
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23291'>SPARK-23291</a>] -         SparkR : substr : In SparkR dataframe , starting and ending position arguments in &quot;substr&quot; is giving wrong result  when the position is greater than 1
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23340'>SPARK-23340</a>] -         Upgrade Apache ORC to 1.4.3
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23365'>SPARK-23365</a>] -         DynamicAllocation with failure in straggler task can lead to a hung spark job
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23406'>SPARK-23406</a>] -         Stream-stream self joins does not work
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23433'>SPARK-23433</a>] -         java.lang.IllegalStateException: more than one active taskSet for stage
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23434'>SPARK-23434</a>] -         Spark should not warn `metadata directory` for a HDFS file path
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23436'>SPARK-23436</a>] -         Incorrect Date column Inference in partition discovery
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23438'>SPARK-23438</a>] -         DStreams could lose blocks with WAL enabled when driver crashes
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23448'>SPARK-23448</a>] -         Dataframe returns wrong result when column don&#39;t respect datatype
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23449'>SPARK-23449</a>] -         Extra java options lose order in Docker context
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23457'>SPARK-23457</a>] -         Register task completion listeners first for ParquetFileFormat
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23462'>SPARK-23462</a>] -         Improve the error message in `StructType`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23489'>SPARK-23489</a>] -         Flaky Test: HiveExternalCatalogVersionsSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23490'>SPARK-23490</a>] -         Check storage.locationUri with existing table in CreateTable 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23508'>SPARK-23508</a>] -         blockManagerIdCache in BlockManagerId may cause oom
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23517'>SPARK-23517</a>] -         Make pyspark.util._exception_message produce the trace from Java side for Py4JJavaError
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23523'>SPARK-23523</a>] -         Incorrect result caused by the rule OptimizeMetadataOnlyQuery
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23524'>SPARK-23524</a>] -         Big local shuffle blocks should not be checked for corruption.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23525'>SPARK-23525</a>] -         ALTER TABLE CHANGE COLUMN COMMENT doesn&#39;t work for external hive table
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23551'>SPARK-23551</a>] -         Exclude `hadoop-mapreduce-client-core` dependency from `orc-mapreduce`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23569'>SPARK-23569</a>] -         pandas_udf does not work with type-annotated python functions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23570'>SPARK-23570</a>] -         Add Spark-2.3 in HiveExternalCatalogVersionsSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23598'>SPARK-23598</a>] -         WholeStageCodegen can lead to IllegalAccessError  calling append for HashAggregateExec
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23599'>SPARK-23599</a>] -         The UUID() expression is too non-deterministic
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23608'>SPARK-23608</a>] -         SHS needs synchronization between attachSparkUI and detachSparkUI functions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23614'>SPARK-23614</a>] -         Union produces incorrect results when caching is used
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23623'>SPARK-23623</a>] -         Avoid concurrent use of cached KafkaConsumer in CachedKafkaConsumer (kafka-0-10-sql)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23630'>SPARK-23630</a>] -         Spark-on-YARN missing user customizations of hadoop config
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23637'>SPARK-23637</a>] -         Yarn might allocate more resource if a same executor is killed multiple times.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23639'>SPARK-23639</a>] -         SparkSQL CLI fails talk to Kerberized metastore when use proxy user
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23649'>SPARK-23649</a>] -         CSV schema inferring fails on some UTF-8 chars
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23658'>SPARK-23658</a>] -         InProcessAppHandle uses the wrong class in getLogger
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23660'>SPARK-23660</a>] -         Yarn throws exception in cluster mode when the application is small
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23670'>SPARK-23670</a>] -         Memory leak of SparkPlanGraphWrapper in sparkUI
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23671'>SPARK-23671</a>] -         SHS is ignoring number of replay threads
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23697'>SPARK-23697</a>] -         Accumulators of Spark 1.x no longer work with Spark 2.x
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23728'>SPARK-23728</a>] -         ML test with expected exceptions testing streaming fails on 2.3
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23729'>SPARK-23729</a>] -         Glob resolution breaks remote naming of files/archives
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23734'>SPARK-23734</a>] -         InvalidSchemaException While Saving ALSModel
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23754'>SPARK-23754</a>] -         StopIterator exception in Python UDF results in partial result
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23759'>SPARK-23759</a>] -         Unable to bind Spark UI to specific host name / IP
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23760'>SPARK-23760</a>] -         CodegenContext.withSubExprEliminationExprs should save/restore CSE state correctly
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23775'>SPARK-23775</a>] -         Flaky test: DataFrameRangeSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23780'>SPARK-23780</a>] -         Failed to use googleVis library with new SparkR
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23788'>SPARK-23788</a>] -         Race condition in StreamingQuerySuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23802'>SPARK-23802</a>] -         PropagateEmptyRelation can leave query plan in unresolved state
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23806'>SPARK-23806</a>] -         Broadcast. unpersist can cause fatal exception when used with dynamic allocation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23808'>SPARK-23808</a>] -         Test spark sessions should set default session
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23809'>SPARK-23809</a>] -         Active SparkSession should be set by getOrCreate
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23815'>SPARK-23815</a>] -         Spark writer dynamic partition overwrite mode fails to write output on multi level partition
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23816'>SPARK-23816</a>] -         FetchFailedException when killing speculative task
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23823'>SPARK-23823</a>] -         ResolveReferences loses correct origin
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23827'>SPARK-23827</a>] -         StreamingJoinExec should ensure that input data is partitioned into specific number of partitions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23835'>SPARK-23835</a>] -         When Dataset.as converts column from nullable to non-nullable type, null Doubles are converted silently to -1
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23850'>SPARK-23850</a>] -         We should not redact username|user|url from UI by default
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23852'>SPARK-23852</a>] -         Parquet MR bug can lead to incorrect SQL results
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23853'>SPARK-23853</a>] -         Skip doctests which require hive support built in PySpark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23868'>SPARK-23868</a>] -         Fix scala.MatchError in literals.sql.out 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23941'>SPARK-23941</a>] -         Mesos task failed on specific spark app name
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23971'>SPARK-23971</a>] -         Should not leak Spark sessions across test suites
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23986'>SPARK-23986</a>] -         CompileException when using too many avg aggregation after joining
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23989'>SPARK-23989</a>] -         When using `SortShuffleWriter`, the data will be overwritten
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23991'>SPARK-23991</a>] -         data loss when allocateBlocksToBatch
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24002'>SPARK-24002</a>] -         Task not serializable caused by org.apache.parquet.io.api.Binary$ByteBufferBackedBinary.getBytes
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24007'>SPARK-24007</a>] -         EqualNullSafe for FloatType and DoubleType might generate a wrong result by codegen.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24021'>SPARK-24021</a>] -         Fix bug in BlacklistTracker&#39;s updateBlacklistForFetchFailure
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24022'>SPARK-24022</a>] -         Flaky test: SparkContextSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24033'>SPARK-24033</a>] -         LAG Window function broken in Spark 2.3
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24062'>SPARK-24062</a>] -         SASL encryption cannot be worked in ThriftServer
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24067'>SPARK-24067</a>] -         Backport SPARK-17147 to 2.3 (Spark Streaming Kafka 0.10 Consumer Can&#39;t Handle Non-consecutive Offsets (i.e. Log Compaction))
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24068'>SPARK-24068</a>] -         CSV schema inferring doesn&#39;t work for compressed files
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24085'>SPARK-24085</a>] -         Scalar subquery error
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24104'>SPARK-24104</a>] -         SQLAppStatusListener overwrites metrics onDriverAccumUpdates instead of updating them
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24107'>SPARK-24107</a>] -         ChunkedByteBuffer.writeFully method has not reset the limit value
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24133'>SPARK-24133</a>] -         Reading Parquet files containing large strings can fail with java.lang.ArrayIndexOutOfBoundsException
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24166'>SPARK-24166</a>] -         InMemoryTableScanExec should not access SQLConf at executor side
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24168'>SPARK-24168</a>] -         WindowExec should not access SQLConf at executor side
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24169'>SPARK-24169</a>] -         JsonToStructs should not access SQLConf at executor side
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24214'>SPARK-24214</a>] -         StreamingRelationV2/StreamingExecutionRelation/ContinuousExecutionRelation.toJSON should not fail
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24230'>SPARK-24230</a>] -         With Parquet 1.10 upgrade has errors in the vectorized reader
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24255'>SPARK-24255</a>] -         Require Java 8 in SparkR description
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24257'>SPARK-24257</a>] -         LongToUnsafeRowMap calculate the new size may be wrong
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24259'>SPARK-24259</a>] -         ArrayWriter for Arrow produces wrong output
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24263'>SPARK-24263</a>] -         SparkR java check breaks on openjdk
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24309'>SPARK-24309</a>] -         AsyncEventQueue should handle an interrupt from a Listener
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24313'>SPARK-24313</a>] -         Collection functions interpreted execution doesn&#39;t work with complex types
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24322'>SPARK-24322</a>] -         Upgrade Apache ORC to 1.4.4
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24364'>SPARK-24364</a>] -         Files deletion after globbing may fail StructuredStreaming jobs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24373'>SPARK-24373</a>] -         &quot;df.cache() df.count()&quot; no longer eagerly caches data when the analyzed plans are different after re-analyzing the plans
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24384'>SPARK-24384</a>] -         spark-submit --py-files with .py files doesn&#39;t work in client mode before context initialization
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24399'>SPARK-24399</a>] -         Reused Exchange is used where it should not be
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24414'>SPARK-24414</a>] -         Stages page doesn&#39;t show all task attempts when failures
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26612'>SPARK-26612</a>] -         Speculation kill causing finished stage recomputed
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26614'>SPARK-26614</a>] -         Speculation kill might cause job failure
</li>
</ul>
            
<h2>        New Feature
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23948'>SPARK-23948</a>] -         Trigger mapstage&#39;s job listener in submitMissingTasks
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24465'>SPARK-24465</a>] -         LSHModel should support Structured Streaming for transform
</li>
</ul>
    
<h2>        Improvement
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23040'>SPARK-23040</a>] -         BlockStoreShuffleReader&#39;s return Iterator isn&#39;t interruptible if aggregator or ordering is specified
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23553'>SPARK-23553</a>] -         Tests should not assume the default value of `spark.sql.sources.default`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23624'>SPARK-23624</a>] -         Revise doc of method pushFilters
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23628'>SPARK-23628</a>] -         WholeStageCodegen can generate methods with too many params
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23644'>SPARK-23644</a>] -         SHS with proxy doesn&#39;t show applications
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23645'>SPARK-23645</a>] -         pandas_udf can not be called with keyword arguments
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23691'>SPARK-23691</a>] -         Use sql_conf util in PySpark tests where possible
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23695'>SPARK-23695</a>] -         Confusing error message for PySpark&#39;s Kinesis tests when its jar is missing but enabled
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23769'>SPARK-23769</a>] -         Remove unnecessary scalastyle check disabling
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23822'>SPARK-23822</a>] -         Improve error message for Parquet schema mismatches
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23838'>SPARK-23838</a>] -         SparkUI: Running SQL query displayed as &quot;completed&quot; in SQL tab
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23867'>SPARK-23867</a>] -         com.codahale.metrics.Counter output in log message has no toString method
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23962'>SPARK-23962</a>] -         Flaky tests from SQLMetricsTestUtils.currentExecutionIds
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23963'>SPARK-23963</a>] -         Queries on text-based Hive tables grow disproportionately slower as the number of columns increase
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24014'>SPARK-24014</a>] -         Add onStreamingStarted method to StreamingListener
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24128'>SPARK-24128</a>] -         Mention spark.sql.crossJoin.enabled in implicit cartesian product error msg
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24188'>SPARK-24188</a>] -         /api/v1/version not working
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24246'>SPARK-24246</a>] -         Improve AnalysisException by setting the cause when it&#39;s available
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24262'>SPARK-24262</a>] -         Fix typo in UDF error message
</li>
</ul>
    
<h2>        Test
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22882'>SPARK-22882</a>] -         ML test for StructuredStreaming: spark.ml.classification
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22883'>SPARK-22883</a>] -         ML test for StructuredStreaming: spark.ml.feature, A-M
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22915'>SPARK-22915</a>] -         ML test for StructuredStreaming: spark.ml.feature, N-Z
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23881'>SPARK-23881</a>] -         Flaky test: JobCancellationSuite.&quot;interruptible iterator of shuffle reader&quot;
</li>
</ul>
        
<h2>        Task
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23601'>SPARK-23601</a>] -         Remove .md5 files from release
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24392'>SPARK-24392</a>] -         Mark pandas_udf as Experimental
</li>
</ul>
                                                        
<h2>        Request
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28295'>SPARK-28295</a>] -         Is there a way of getting feature names from pyspark.ml.regression GeneralizedLinearRegression?
</li>
</ul>
                                                                                
<h2>        Documentation
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23329'>SPARK-23329</a>] -         Update the function descriptions with the arguments and returned values of the trigonometric functions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23642'>SPARK-23642</a>] -         isZero scaladoc for LongAccumulator describes wrong method
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24378'>SPARK-24378</a>] -         Incorrect examples for date_trunc function in spark 2.3.0
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24444'>SPARK-24444</a>] -         Improve pandas_udf GROUPED_MAP docs to explain column assignment
</li>
</ul>