Sub-task
- [SPARK-23706] - spark.conf.get(value, default=None) should produce None in PySpark
- [SPARK-23748] - Support select from temp tables
- [SPARK-23942] - PySpark's collect doesn't trigger QueryExecutionListener
- [SPARK-24334] - Race condition in ArrowPythonRunner causes unclean shutdown of Arrow memory allocator
Bug
- [SPARK-10878] - Race condition when resolving Maven coordinates via Ivy
- [SPARK-19181] - SparkListenerSuite.local metrics fails when average executorDeserializeTime is too short.
- [SPARK-19613] - Flaky test: StateStoreRDDSuite
- [SPARK-21945] - pyspark --py-files doesn't work in yarn client mode
- [SPARK-22371] - dag-scheduler-event-loop thread stopped with error Attempted to access garbage collected accumulator 5605982
- [SPARK-23004] - Structured Streaming raise "llegalStateException: Cannot remove after already committed or aborted"
- [SPARK-23020] - Re-enable Flaky Test: org.apache.spark.launcher.SparkLauncherSuite.testInProcessLauncher
- [SPARK-23173] - from_json can produce nulls for fields which are marked as non-nullable
- [SPARK-23288] - Incorrect number of written records in structured streaming
- [SPARK-23291] - SparkR : substr : In SparkR dataframe , starting and ending position arguments in "substr" is giving wrong result when the position is greater than 1
- [SPARK-23340] - Upgrade Apache ORC to 1.4.3
- [SPARK-23365] - DynamicAllocation with failure in straggler task can lead to a hung spark job
- [SPARK-23406] - Stream-stream self joins does not work
- [SPARK-23433] - java.lang.IllegalStateException: more than one active taskSet for stage
- [SPARK-23434] - Spark should not warn `metadata directory` for a HDFS file path
- [SPARK-23436] - Incorrect Date column Inference in partition discovery
- [SPARK-23438] - DStreams could lose blocks with WAL enabled when driver crashes
- [SPARK-23448] - Dataframe returns wrong result when column don't respect datatype
- [SPARK-23449] - Extra java options lose order in Docker context
- [SPARK-23457] - Register task completion listeners first for ParquetFileFormat
- [SPARK-23462] - Improve the error message in `StructType`
- [SPARK-23489] - Flaky Test: HiveExternalCatalogVersionsSuite
- [SPARK-23490] - Check storage.locationUri with existing table in CreateTable
- [SPARK-23508] - blockManagerIdCache in BlockManagerId may cause oom
- [SPARK-23517] - Make pyspark.util._exception_message produce the trace from Java side for Py4JJavaError
- [SPARK-23523] - Incorrect result caused by the rule OptimizeMetadataOnlyQuery
- [SPARK-23524] - Big local shuffle blocks should not be checked for corruption.
- [SPARK-23525] - ALTER TABLE CHANGE COLUMN COMMENT doesn't work for external hive table
- [SPARK-23551] - Exclude `hadoop-mapreduce-client-core` dependency from `orc-mapreduce`
- [SPARK-23569] - pandas_udf does not work with type-annotated python functions
- [SPARK-23570] - Add Spark-2.3 in HiveExternalCatalogVersionsSuite
- [SPARK-23598] - WholeStageCodegen can lead to IllegalAccessError calling append for HashAggregateExec
- [SPARK-23599] - The UUID() expression is too non-deterministic
- [SPARK-23608] - SHS needs synchronization between attachSparkUI and detachSparkUI functions
- [SPARK-23614] - Union produces incorrect results when caching is used
- [SPARK-23623] - Avoid concurrent use of cached KafkaConsumer in CachedKafkaConsumer (kafka-0-10-sql)
- [SPARK-23630] - Spark-on-YARN missing user customizations of hadoop config
- [SPARK-23637] - Yarn might allocate more resource if a same executor is killed multiple times.
- [SPARK-23639] - SparkSQL CLI fails talk to Kerberized metastore when use proxy user
- [SPARK-23649] - CSV schema inferring fails on some UTF-8 chars
- [SPARK-23658] - InProcessAppHandle uses the wrong class in getLogger
- [SPARK-23660] - Yarn throws exception in cluster mode when the application is small
- [SPARK-23670] - Memory leak of SparkPlanGraphWrapper in sparkUI
- [SPARK-23671] - SHS is ignoring number of replay threads
- [SPARK-23697] - Accumulators of Spark 1.x no longer work with Spark 2.x
- [SPARK-23728] - ML test with expected exceptions testing streaming fails on 2.3
- [SPARK-23729] - Glob resolution breaks remote naming of files/archives
- [SPARK-23734] - InvalidSchemaException While Saving ALSModel
- [SPARK-23754] - StopIterator exception in Python UDF results in partial result
- [SPARK-23759] - Unable to bind Spark UI to specific host name / IP
- [SPARK-23760] - CodegenContext.withSubExprEliminationExprs should save/restore CSE state correctly
- [SPARK-23775] - Flaky test: DataFrameRangeSuite
- [SPARK-23780] - Failed to use googleVis library with new SparkR
- [SPARK-23788] - Race condition in StreamingQuerySuite
- [SPARK-23802] - PropagateEmptyRelation can leave query plan in unresolved state
- [SPARK-23806] - Broadcast. unpersist can cause fatal exception when used with dynamic allocation
- [SPARK-23808] - Test spark sessions should set default session
- [SPARK-23809] - Active SparkSession should be set by getOrCreate
- [SPARK-23815] - Spark writer dynamic partition overwrite mode fails to write output on multi level partition
- [SPARK-23816] - FetchFailedException when killing speculative task
- [SPARK-23823] - ResolveReferences loses correct origin
- [SPARK-23827] - StreamingJoinExec should ensure that input data is partitioned into specific number of partitions
- [SPARK-23835] - When Dataset.as converts column from nullable to non-nullable type, null Doubles are converted silently to -1
- [SPARK-23850] - We should not redact username|user|url from UI by default
- [SPARK-23852] - Parquet MR bug can lead to incorrect SQL results
- [SPARK-23853] - Skip doctests which require hive support built in PySpark
- [SPARK-23868] - Fix scala.MatchError in literals.sql.out
- [SPARK-23941] - Mesos task failed on specific spark app name
- [SPARK-23971] - Should not leak Spark sessions across test suites
- [SPARK-23986] - CompileException when using too many avg aggregation after joining
- [SPARK-23989] - When using `SortShuffleWriter`, the data will be overwritten
- [SPARK-23991] - data loss when allocateBlocksToBatch
- [SPARK-24002] - Task not serializable caused by org.apache.parquet.io.api.Binary$ByteBufferBackedBinary.getBytes
- [SPARK-24007] - EqualNullSafe for FloatType and DoubleType might generate a wrong result by codegen.
- [SPARK-24021] - Fix bug in BlacklistTracker's updateBlacklistForFetchFailure
- [SPARK-24022] - Flaky test: SparkContextSuite
- [SPARK-24033] - LAG Window function broken in Spark 2.3
- [SPARK-24062] - SASL encryption cannot be worked in ThriftServer
- [SPARK-24067] - Backport SPARK-17147 to 2.3 (Spark Streaming Kafka 0.10 Consumer Can't Handle Non-consecutive Offsets (i.e. Log Compaction))
- [SPARK-24068] - CSV schema inferring doesn't work for compressed files
- [SPARK-24085] - Scalar subquery error
- [SPARK-24104] - SQLAppStatusListener overwrites metrics onDriverAccumUpdates instead of updating them
- [SPARK-24107] - ChunkedByteBuffer.writeFully method has not reset the limit value
- [SPARK-24133] - Reading Parquet files containing large strings can fail with java.lang.ArrayIndexOutOfBoundsException
- [SPARK-24166] - InMemoryTableScanExec should not access SQLConf at executor side
- [SPARK-24168] - WindowExec should not access SQLConf at executor side
- [SPARK-24169] - JsonToStructs should not access SQLConf at executor side
- [SPARK-24214] - StreamingRelationV2/StreamingExecutionRelation/ContinuousExecutionRelation.toJSON should not fail
- [SPARK-24230] - With Parquet 1.10 upgrade has errors in the vectorized reader
- [SPARK-24255] - Require Java 8 in SparkR description
- [SPARK-24257] - LongToUnsafeRowMap calculate the new size may be wrong
- [SPARK-24259] - ArrayWriter for Arrow produces wrong output
- [SPARK-24263] - SparkR java check breaks on openjdk
- [SPARK-24309] - AsyncEventQueue should handle an interrupt from a Listener
- [SPARK-24313] - Collection functions interpreted execution doesn't work with complex types
- [SPARK-24322] - Upgrade Apache ORC to 1.4.4
- [SPARK-24364] - Files deletion after globbing may fail StructuredStreaming jobs
- [SPARK-24373] - "df.cache() df.count()" no longer eagerly caches data when the analyzed plans are different after re-analyzing the plans
- [SPARK-24384] - spark-submit --py-files with .py files doesn't work in client mode before context initialization
- [SPARK-24399] - Reused Exchange is used where it should not be
- [SPARK-24414] - Stages page doesn't show all task attempts when failures
- [SPARK-26612] - Speculation kill causing finished stage recomputed
- [SPARK-26614] - Speculation kill might cause job failure
New Feature
- [SPARK-23948] - Trigger mapstage's job listener in submitMissingTasks
- [SPARK-24465] - LSHModel should support Structured Streaming for transform
Improvement
- [SPARK-23040] - BlockStoreShuffleReader's return Iterator isn't interruptible if aggregator or ordering is specified
- [SPARK-23553] - Tests should not assume the default value of `spark.sql.sources.default`
- [SPARK-23624] - Revise doc of method pushFilters
- [SPARK-23628] - WholeStageCodegen can generate methods with too many params
- [SPARK-23644] - SHS with proxy doesn't show applications
- [SPARK-23645] - pandas_udf can not be called with keyword arguments
- [SPARK-23691] - Use sql_conf util in PySpark tests where possible
- [SPARK-23695] - Confusing error message for PySpark's Kinesis tests when its jar is missing but enabled
- [SPARK-23769] - Remove unnecessary scalastyle check disabling
- [SPARK-23822] - Improve error message for Parquet schema mismatches
- [SPARK-23838] - SparkUI: Running SQL query displayed as "completed" in SQL tab
- [SPARK-23867] - com.codahale.metrics.Counter output in log message has no toString method
- [SPARK-23962] - Flaky tests from SQLMetricsTestUtils.currentExecutionIds
- [SPARK-23963] - Queries on text-based Hive tables grow disproportionately slower as the number of columns increase
- [SPARK-24014] - Add onStreamingStarted method to StreamingListener
- [SPARK-24128] - Mention spark.sql.crossJoin.enabled in implicit cartesian product error msg
- [SPARK-24188] - /api/v1/version not working
- [SPARK-24246] - Improve AnalysisException by setting the cause when it's available
- [SPARK-24262] - Fix typo in UDF error message
Test
- [SPARK-22882] - ML test for StructuredStreaming: spark.ml.classification
- [SPARK-22883] - ML test for StructuredStreaming: spark.ml.feature, A-M
- [SPARK-22915] - ML test for StructuredStreaming: spark.ml.feature, N-Z
- [SPARK-23881] - Flaky test: JobCancellationSuite."interruptible iterator of shuffle reader"
Task
- [SPARK-23601] - Remove .md5 files from release
- [SPARK-24392] - Mark pandas_udf as Experimental
Request
- [SPARK-28295] - Is there a way of getting feature names from pyspark.ml.regression GeneralizedLinearRegression?
Documentation
- [SPARK-23329] - Update the function descriptions with the arguments and returned values of the trigonometric functions
- [SPARK-23642] - isZero scaladoc for LongAccumulator describes wrong method
- [SPARK-24378] - Incorrect examples for date_trunc function in spark 2.3.0
- [SPARK-24444] - Improve pandas_udf GROUPED_MAP docs to explain column assignment
Edit/Copy Release Notes
The text area below allows the project release notes to be edited and copied to another document.