Release Notes - ASF JIRA

Release Notes - Spark - Version 2.4.0 - HTML format

Configure Release Notes

Sub-task

[SPARK-6236] - Support caching blocks larger than 2G
[SPARK-6237] - Support uploading blocks > 2GB as a stream
[SPARK-10884] - Support prediction on single instance for regression and classification related models
[SPARK-11239] - PMML export for ML linear regression
[SPARK-12850] - Support bucket pruning (predicate pushdown for bucketed tables)
[SPARK-14376] - spark.ml parity for trees
[SPARK-14540] - Support Scala 2.12 closures and Java 8 lambdas in ClosureCleaner
[SPARK-17091] - Convert IN predicate to equivalent Parquet filter
[SPARK-19826] - spark.ml Python API for PIC
[SPARK-20114] - spark.ml parity for sequential pattern mining - PrefixSpan
[SPARK-21088] - CrossValidator, TrainValidationSplit should collect all models when fitting: Python API
[SPARK-21898] - Feature parity for KolmogorovSmirnovTest in MLlib
[SPARK-22187] - Update unsaferow format for saved state such that we can set timeouts when state is null
[SPARK-22239] - User-defined window functions with pandas udf (unbounded window)
[SPARK-22274] - User-defined aggregation functions with pandas udf
[SPARK-22362] - Add unit test for Window Aggregate Functions
[SPARK-22624] - Expose range partitioning shuffle introduced by SPARK-22614
[SPARK-23011] - Support alternative function form with group aggregate pandas UDF
[SPARK-23030] - Decrease memory consumption with toPandas() collection using Arrow
[SPARK-23046] - Have RFormula include VectorSizeHint in pipeline
[SPARK-23096] - Migrate rate source to v2
[SPARK-23097] - Migrate text socket source to v2
[SPARK-23099] - Migrate foreach sink
[SPARK-23120] - Add PMML pipeline export support to PySpark
[SPARK-23203] - DataSourceV2 should use immutable trees.
[SPARK-23323] - DataSourceV2 should use the output commit coordinator.
[SPARK-23325] - DataSourceV2 readers should always produce InternalRow.
[SPARK-23341] - DataSourceOptions should handle path and table names to avoid confusion.
[SPARK-23344] - Add KMeans distanceMeasure param to PySpark
[SPARK-23352] - Explicitly specify supported types in Pandas UDFs
[SPARK-23362] - Migrate Kafka microbatch source to v2
[SPARK-23380] - Adds a conf for Arrow fallback in toPandas/createDataFrame with Pandas DataFrame
[SPARK-23401] - Improve test cases for all supported types and unsupported types
[SPARK-23418] - DataSourceV2 should not allow userSpecifiedSchema without ReadSupportWithSchema
[SPARK-23491] - continuous symptom
[SPARK-23503] - continuous execution should sequence committed epochs
[SPARK-23555] - Add BinaryType support for Arrow in PySpark
[SPARK-23559] - add epoch ID to data writer factory
[SPARK-23577] - Supports line separator for text datasource
[SPARK-23581] - Add an interpreted version of GenerateUnsafeProjection
[SPARK-23582] - Add interpreted execution to StaticInvoke expression
[SPARK-23583] - Add interpreted execution to Invoke expression
[SPARK-23584] - Add interpreted execution to NewInstance expression
[SPARK-23585] - Add interpreted execution for UnwrapOption expression
[SPARK-23586] - Add interpreted execution for WrapOption expression
[SPARK-23587] - Add interpreted execution for MapObjects expression
[SPARK-23588] - Add interpreted execution for CatalystToExternalMap expression
[SPARK-23589] - Add interpreted execution for ExternalMapToCatalyst expression
[SPARK-23590] - Add interpreted execution for CreateExternalRow expression
[SPARK-23591] - Add interpreted execution for EncodeUsingSerializer expression
[SPARK-23592] - Add interpreted execution for DecodeUsingSerializer expression
[SPARK-23593] - Add interpreted execution for InitializeJavaBean expression
[SPARK-23594] - Add interpreted execution for GetExternalRowField expression
[SPARK-23595] - Add interpreted execution for ValidateExternalType expression
[SPARK-23596] - Modify Dataset test harness to include interpreted execution
[SPARK-23597] - Audit Spark SQL code base for non-interpreted expressions
[SPARK-23611] - Extend ExpressionEvalHelper harness to also test failures
[SPARK-23615] - Add maxDF Parameter to Python CountVectorizer
[SPARK-23633] - Update Pandas UDFs section in sql-programming-guide
[SPARK-23687] - Add MemoryStream
[SPARK-23688] - Refactor tests away from rate source
[SPARK-23690] - VectorAssembler should have handleInvalid to handle columns with null values
[SPARK-23706] - spark.conf.get(value, default=None) should produce None in PySpark
[SPARK-23711] - Add fallback to interpreted execution logic
[SPARK-23713] - Clean-up UnsafeWriter classes
[SPARK-23723] - New encoding option for json datasource
[SPARK-23724] - Custom record separator for jsons in charsets different from UTF-8
[SPARK-23727] - Support DATE predict push down in parquet
[SPARK-23736] - High-order function: concat(array1, array2, ..., arrayN) → array
[SPARK-23747] - Add EpochCoordinator unit tests
[SPARK-23748] - Support select from temp tables
[SPARK-23762] - UTF8StringBuilder uses MemoryBlock
[SPARK-23765] - Supports line separator for json datasource
[SPARK-23783] - Add new generic export trait for ML pipelines
[SPARK-23807] - Add Hadoop 3 profile with relevant POM fix ups
[SPARK-23821] - High-order function: flatten(x) → array
[SPARK-23826] - TestHiveSparkSession should set default session
[SPARK-23847] - Add asc_nulls_first, asc_nulls_last to PySpark
[SPARK-23859] - Initial PR for Instrumentation improvements: UUID and logging levels
[SPARK-23864] - Add Unsafe* copy methods to UnsafeWriter
[SPARK-23870] - Forward RFormula handleInvalid Param to VectorAssembler
[SPARK-23871] - add python api for VectorAssembler handleInvalid
[SPARK-23900] - format_number udf should take user specifed format as argument
[SPARK-23902] - Provide an option in months_between UDF to disable rounding-off
[SPARK-23903] - Add support for date extract
[SPARK-23905] - Add UDF weekday
[SPARK-23908] - High-order function: transform(array<T>, function<T, U>) → array<U>
[SPARK-23909] - High-order function: filter(array<T>, function<T, boolean>) → array<T>
[SPARK-23911] - High-order function: aggregate(array<T>, initialState S, inputFunction<S, T, S>, outputFunction<S, R>) → R
[SPARK-23912] - High-order function: array_distinct(x) → array
[SPARK-23913] - High-order function: array_intersect(x, y) → array
[SPARK-23914] - High-order function: array_union(x, y) → array
[SPARK-23915] - High-order function: array_except(x, y) → array
[SPARK-23916] - High-order function: array_join(x, delimiter, null_replacement) → varchar
[SPARK-23917] - High-order function: array_max(x) → x
[SPARK-23918] - High-order function: array_min(x) → x
[SPARK-23919] - High-order function: array_position(x, element) → bigint
[SPARK-23920] - High-order function: array_remove(x, element) → array
[SPARK-23921] - High-order function: array_sort(x) → array
[SPARK-23922] - High-order function: arrays_overlap(x, y) → boolean
[SPARK-23923] - High-order function: cardinality(x) → bigint
[SPARK-23924] - High-order function: element_at
[SPARK-23925] - High-order function: repeat(element, count) → array
[SPARK-23926] - High-order function: reverse(x) → array
[SPARK-23927] - High-order function: sequence
[SPARK-23928] - High-order function: shuffle(x) → array
[SPARK-23930] - High-order function: slice(x, start, length) → array
[SPARK-23931] - High-order function: array_zip(array1, array2[, ...]) → array<row>
[SPARK-23932] - High-order function: zip_with(array<T>, array<U>, function<T, U, R>) → array<R>
[SPARK-23933] - High-order function: map(array<K>, array<V>) → map<K,V>
[SPARK-23934] - High-order function: map_from_entries(array<row<K, V>>) → map<K,V>
[SPARK-23936] - High-order function: map_concat(map1<K, V>, map2<K, V>, ..., mapN<K, V>) → map<K,V>
[SPARK-23942] - PySpark's collect doesn't trigger QueryExecutionListener
[SPARK-23990] - Instruments logging improvements - ML regression package
[SPARK-24026] - spark.ml Scala/Java API for PIC
[SPARK-24038] - refactor continuous write exec to its own class
[SPARK-24039] - remove restarting iterators hack
[SPARK-24040] - support single partition aggregates
[SPARK-24054] - Add array_position function / element_at functions
[SPARK-24069] - Add array_max / array_min functions
[SPARK-24070] - TPC-DS Performance Tests for Parquet 1.10.0 Upgrade
[SPARK-24071] - Micro-benchmark of Parquet Filter Pushdown
[SPARK-24073] - DataSourceV2: Rename DataReaderFactory to InputPartition.
[SPARK-24115] - improve instrumentation for spark.ml.tuning
[SPARK-24119] - Add interpreted execution to SortPrefix expression
[SPARK-24132] - Instrumentation improvement for classification
[SPARK-24146] - spark.ml parity for sequential pattern mining - PrefixSpan: Python API
[SPARK-24155] - Instrumentation improvement for clustering
[SPARK-24157] - Enable no-data micro batches for streaming aggregation and deduplication
[SPARK-24158] - Enable no-data micro batches for streaming joins
[SPARK-24159] - Enable no-data micro batches for streaming mapGroupswithState
[SPARK-24185] - add flatten function
[SPARK-24186] - add array_reverse and concat
[SPARK-24187] - add array_join
[SPARK-24197] - add array_sort function
[SPARK-24198] - add slice function
[SPARK-24234] - create the bottom-of-task RDD with row buffer
[SPARK-24235] - create the top-of-task RDD sending rows to the remote buffer
[SPARK-24251] - DataSourceV2: Add AppendData logical operation
[SPARK-24290] - Instrumentation Improvement: add logNamedValue taking Array types
[SPARK-24296] - Support replicating blocks larger than 2 GB
[SPARK-24297] - Change default value for spark.maxRemoteBlockSizeFetchToMem to be < 2GB
[SPARK-24307] - Support sending messages over 2GB from memory
[SPARK-24310] - Instrumentation for frequent pattern mining
[SPARK-24324] - Pandas Grouped Map UserDefinedFunction mixes column labels
[SPARK-24325] - Tests for Hadoop's LinesReader
[SPARK-24331] - Add arrays_overlap / array_repeat / map_entries
[SPARK-24334] - Race condition in ArrowPythonRunner causes unclean shutdown of Arrow memory allocator
[SPARK-24386] - implement continuous processing coalesce(1)
[SPARK-24418] - Upgrade to Scala 2.11.12
[SPARK-24419] - Upgrade SBT to 0.13.17 with Scala 2.10.7
[SPARK-24420] - Upgrade ASM to 6.x to support JDK9+
[SPARK-24439] - Add distanceMeasure to BisectingKMeans in PySpark
[SPARK-24478] - DataSourceV2 should push filters and projection at physical plan conversion
[SPARK-24535] - Fix java version parsing in SparkR on Windows
[SPARK-24537] - Add array_remove / array_zip / map_from_arrays / array_distinct
[SPARK-24549] - Support DecimalType push down to the parquet data sources
[SPARK-24624] - Can not mix vectorized and non-vectorized UDFs
[SPARK-24638] - StringStartsWith support push down
[SPARK-24706] - Support ByteType and ShortType pushdown to parquet
[SPARK-24716] - Refactor ParquetFilters
[SPARK-24718] - Timestamp support pushdown to parquet data source
[SPARK-24771] - Upgrade AVRO version from 1.7.7 to 1.8.2
[SPARK-24772] - support reading AVRO logical types - Date
[SPARK-24773] - support reading AVRO logical types - Timestamp with different precisions
[SPARK-24774] - support reading AVRO logical types - Decimal
[SPARK-24776] - AVRO unit test: use SQLTestUtils and Replace deprecated methods
[SPARK-24777] - Add write benchmark for AVRO
[SPARK-24800] - Refactor Avro Serializer and Deserializer
[SPARK-24805] - Don't ignore files without .avro extension by default
[SPARK-24810] - Fix paths to resource files in AvroSuite
[SPARK-24811] - Add function `from_avro` and `to_avro`
[SPARK-24836] - New option - ignoreExtension
[SPARK-24854] - Gather all options into AvroOptions
[SPARK-24876] - Simplify schema serialization
[SPARK-24881] - New options - compression and compressionLevel
[SPARK-24883] - Remove implicit class AvroDataFrameWriter/AvroDataFrameReader
[SPARK-24887] - Use SerializableConfiguration in Spark util
[SPARK-24924] - Add mapping for built-in Avro data source
[SPARK-24967] - Use internal.Logging instead for logging
[SPARK-24971] - remove SupportsDeprecatedScanRow
[SPARK-24976] - Allow None for Decimal type conversion (specific to PyArrow 0.9.0)
[SPARK-24990] - merge ReadSupport and ReadSupportWithSchema
[SPARK-24991] - use InternalRow in DataSourceWriter
[SPARK-25002] - Avro: revise the output record namespace
[SPARK-25007] - Add array_intersect / array_except /array_union / array_shuffle to SparkR
[SPARK-25029] - Scala 2.12 issues: TaskNotSerializable and Janino "Two non-abstract methods ..." errors
[SPARK-25044] - Address translation of LMF closure primitive args to Object in Scala 2.12
[SPARK-25047] - Can't assign SerializedLambda to scala.Function1 in deserialization of BucketedRandomProjectionLSHModel
[SPARK-25068] - High-order function: exists(array<T>, function<T, boolean>) → boolean
[SPARK-25099] - Generate Avro Binary files in test suite
[SPARK-25104] - Validate user specified output schema
[SPARK-25127] - DataSourceV2: Remove SupportsPushDownCatalystFilters
[SPARK-25133] - Documentaion: AVRO data source guide
[SPARK-25160] - Remove sql configuration spark.sql.avro.outputTimestampType
[SPARK-25179] - Document the features that require Pyarrow 0.10
[SPARK-25207] - Case-insensitve field resolution for filter pushdown when reading Parquet
[SPARK-25256] - Plan mismatch errors in Hive tests in 2.12
[SPARK-25298] - spark-tools build failure for Scala 2.12
[SPARK-25304] - enable HiveSparkSubmitSuite SPARK-8489 test for Scala 2.12
[SPARK-25320] - ML, Graph 2.4 QA: API: Binary incompatible changes
[SPARK-25321] - ML, Graph 2.4 QA: API: New Scala APIs, docs
[SPARK-25324] - ML 2.4 QA: API: Java compatibility, docs
[SPARK-25328] - Add an example for having two columns as the grouping key in group aggregate pandas UDF
[SPARK-25337] - HiveExternalCatalogVersionsSuite + Scala 2.12 = NoSuchMethodError: org.apache.spark.sql.execution.datasources.FileFormat.$init$(Lorg/apache/spark/sql/execution/datasources/FileFormat;)
[SPARK-25460] - DataSourceV2: Structured Streaming does not respect SessionConfigSupport
[SPARK-25572] - SparkR tests failed on CRAN on Java 10
[SPARK-25601] - Register Grouped aggregate UDF Vectorized UDFs for SQL Statement
[SPARK-25690] - Analyzer rule "HandleNullInputsForUDF" does not stabilize and can be applied infinitely
[SPARK-25718] - Detect recursive reference in Avro schema and throw exception
[SPARK-25842] - Deprecate APIs introduced in SPARK-21608

Bug

[SPARK-6951] - History server slow startup if the event log directory is large
[SPARK-10878] - Race condition when resolving Maven coordinates via Ivy
[SPARK-15125] - CSV data source recognizes empty quoted strings in the input as null.
[SPARK-15750] - Constructing FPGrowth fails when no numPartitions specified in pyspark
[SPARK-16451] - Spark-shell / pyspark should finish gracefully when "SaslException: GSS initiate failed" is hit
[SPARK-17088] - IsolatedClientLoader fails to load Hive client when sharesHadoopClasses is false
[SPARK-17147] - Spark Streaming Kafka 0.10 Consumer Can't Handle Non-consecutive Offsets (i.e. Log Compaction)
[SPARK-17166] - CTAS lost table properties after conversion to data source tables.
[SPARK-17756] - java.lang.ClassCastException when using cartesian with DStream.transform
[SPARK-17916] - CSV data source treats empty string as null no matter what nullValue option is
[SPARK-18371] - Spark Streaming backpressure bug - generates a batch with large number of records
[SPARK-18630] - PySpark ML memory leak
[SPARK-19181] - SparkListenerSuite.local metrics fails when average executorDeserializeTime is too short.
[SPARK-19185] - ConcurrentModificationExceptions with CachedKafkaConsumers when Windowing
[SPARK-19613] - Flaky test: StateStoreRDDSuite
[SPARK-20947] - Encoding/decoding issue in PySpark pipe implementation
[SPARK-21168] - KafkaRDD should always set kafka clientId.
[SPARK-21402] - Fix java array of structs deserialization
[SPARK-21479] - Outer join filter pushdown in null supplying table when condition is on one of the joined columns
[SPARK-21525] - ReceiverSupervisorImpl seems to ignore the error code when writing to the WAL
[SPARK-21673] - Spark local directory is not set correctly
[SPARK-21685] - Params isSet in scala Transformer triggered by _setDefault in pyspark
[SPARK-21743] - top-most limit should not cause memory leak
[SPARK-21811] - Inconsistency when finding the widest common type of a combination of DateType, StringType, and NumericType
[SPARK-21896] - Stack Overflow when window function nested inside aggregate function
[SPARK-21945] - pyspark --py-files doesn't work in yarn client mode
[SPARK-22151] - PYTHONPATH not picked up from the spark.yarn.appMasterEnv properly
[SPARK-22279] - Turn on spark.sql.hive.convertMetastoreOrc by default
[SPARK-22297] - Flaky test: BlockManagerSuite "Shuffle registration timeout and maxAttempts conf"
[SPARK-22357] - SparkContext.binaryFiles ignore minPartitions parameter
[SPARK-22371] - dag-scheduler-event-loop thread stopped with error Attempted to access garbage collected accumulator 5605982
[SPARK-22384] - Refine partition pruning when attribute is wrapped in Cast
[SPARK-22430] - Unknown tag warnings when building R docs with Roxygen 6.0.1
[SPARK-22577] - executor page blacklist status should update with TaskSet level blacklisting
[SPARK-22606] - There may be two or more tasks in one executor will use the same kafka consumer at the same time, then it will throw an exception: "KafkaConsumer is not safe for multi-threaded access"
[SPARK-22676] - Avoid iterating all partition paths when spark.sql.hive.verifyPartitionPath=true
[SPARK-22713] - OOM caused by the memory contention and memory leak in TaskMemoryManager
[SPARK-22809] - pyspark is sensitive to imports with dots
[SPARK-22949] - Reduce memory requirement for TrainValidationSplit
[SPARK-22968] - java.lang.IllegalStateException: No current assignment for partition kssh-2
[SPARK-22974] - CountVectorModel does not attach attributes to output column
[SPARK-23004] - Structured Streaming raise "llegalStateException: Cannot remove after already committed or aborted"
[SPARK-23007] - Add schema evolution test suite for file-based data sources
[SPARK-23020] - Re-enable Flaky Test: org.apache.spark.launcher.SparkLauncherSuite.testInProcessLauncher
[SPARK-23028] - Bump master branch version to 2.4.0-SNAPSHOT
[SPARK-23038] - Update docker/spark-test (JDK/OS)
[SPARK-23042] - Use OneHotEncoderModel to encode labels in MultilayerPerceptronClassifier
[SPARK-23044] - merge script has bug when assigning jiras to non-contributors
[SPARK-23059] - Correct some improper with view related method usage
[SPARK-23088] - History server not showing incomplete/running applications
[SPARK-23094] - Json Readers choose wrong encoding when bad records are present and fail
[SPARK-23152] - Invalid guard condition in org.apache.spark.ml.classification.Classifier
[SPARK-23173] - from_json can produce nulls for fields which are marked as non-nullable
[SPARK-23189] - reflect stage level blacklisting on executor tab
[SPARK-23200] - Reset configuration when restarting from checkpoints
[SPARK-23240] - PythonWorkerFactory issues unhelpful message when pyspark.daemon produces bogus stdout
[SPARK-23243] - Shuffle+Repartition on an RDD could lead to incorrect answers
[SPARK-23271] - Parquet output contains only "_SUCCESS" file after empty DataFrame saving
[SPARK-23288] - Incorrect number of written records in structured streaming
[SPARK-23291] - SparkR : substr : In SparkR dataframe , starting and ending position arguments in "substr" is giving wrong result when the position is greater than 1
[SPARK-23306] - Race condition in TaskMemoryManager
[SPARK-23340] - Upgrade Apache ORC to 1.4.3
[SPARK-23355] - convertMetastore should not ignore table properties
[SPARK-23361] - Driver restart fails if it happens after 7 days from app submission
[SPARK-23365] - DynamicAllocation with failure in straggler task can lead to a hung spark job
[SPARK-23377] - Bucketizer with multiple columns persistence bug
[SPARK-23394] - Storage info's Cached Partitions doesn't consider the replications (but sc.getRDDStorageInfo does)
[SPARK-23405] - The task will hang up when a small table left semi join a big table
[SPARK-23406] - Stream-stream self joins does not work
[SPARK-23408] - Flaky test: StreamingOuterJoinSuite.left outer early state exclusion on right
[SPARK-23415] - BufferHolderSparkSubmitSuite is flaky
[SPARK-23416] - Flaky test: KafkaSourceStressForDontFailOnDataLossSuite.stress test for failOnDataLoss=false
[SPARK-23417] - pyspark tests give wrong sbt instructions
[SPARK-23425] - load data for hdfs file path with wild card usage is not working properly
[SPARK-23433] - java.lang.IllegalStateException: more than one active taskSet for stage
[SPARK-23434] - Spark should not warn `metadata directory` for a HDFS file path
[SPARK-23436] - Incorrect Date column Inference in partition discovery
[SPARK-23438] - DStreams could lose blocks with WAL enabled when driver crashes
[SPARK-23449] - Extra java options lose order in Docker context
[SPARK-23457] - Register task completion listeners first for ParquetFileFormat
[SPARK-23459] - Improve the error message when unknown column is specified in partition columns
[SPARK-23461] - vignettes should include model predictions for some ML models
[SPARK-23462] - Improve the error message in `StructType`
[SPARK-23476] - Spark will not start in local mode with authentication on
[SPARK-23486] - LookupFunctions should not check the same function name more than once
[SPARK-23489] - Flaky Test: HiveExternalCatalogVersionsSuite
[SPARK-23490] - Check storage.locationUri with existing table in CreateTable
[SPARK-23496] - Locality of coalesced partitions can be severely skewed by the order of input partitions
[SPARK-23508] - blockManagerIdCache in BlockManagerId may cause oom
[SPARK-23514] - Replace spark.sparkContext.hadoopConfiguration by spark.sessionState.newHadoopConf()
[SPARK-23522] - pyspark should always use sys.exit rather than exit
[SPARK-23523] - Incorrect result caused by the rule OptimizeMetadataOnlyQuery
[SPARK-23524] - Big local shuffle blocks should not be checked for corruption.
[SPARK-23525] - ALTER TABLE CHANGE COLUMN COMMENT doesn't work for external hive table
[SPARK-23547] - Cleanup the .pipeout file when the Hive Session closed
[SPARK-23549] - Spark SQL unexpected behavior when comparing timestamp to date
[SPARK-23551] - Exclude `hadoop-mapreduce-client-core` dependency from `orc-mapreduce`
[SPARK-23569] - pandas_udf does not work with type-annotated python functions
[SPARK-23570] - Add Spark-2.3 in HiveExternalCatalogVersionsSuite
[SPARK-23574] - SinglePartition in data source V2 scan
[SPARK-23598] - WholeStageCodegen can lead to IllegalAccessError calling append for HashAggregateExec
[SPARK-23599] - The UUID() expression is too non-deterministic
[SPARK-23602] - PrintToStderr should behave the same in interpreted mode
[SPARK-23608] - SHS needs synchronization between attachSparkUI and detachSparkUI functions
[SPARK-23614] - Union produces incorrect results when caching is used
[SPARK-23618] - docker-image-tool.sh Fails While Building Image
[SPARK-23620] - Split thread dump lines by using the br tag
[SPARK-23623] - Avoid concurrent use of cached KafkaConsumer in CachedKafkaConsumer (kafka-0-10-sql)
[SPARK-23630] - Spark-on-YARN missing user customizations of hadoop config
[SPARK-23635] - Spark executor env variable is overwritten by same name AM env variable
[SPARK-23636] - [SPARK 2.2] | Kafka Consumer | KafkaUtils.createRDD throws Exception - java.util.ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access
[SPARK-23637] - Yarn might allocate more resource if a same executor is killed multiple times.
[SPARK-23639] - SparkSQL CLI fails talk to Kerberized metastore when use proxy user
[SPARK-23640] - Hadoop config may override spark config
[SPARK-23649] - CSV schema inferring fails on some UTF-8 chars
[SPARK-23658] - InProcessAppHandle uses the wrong class in getLogger
[SPARK-23660] - Yarn throws exception in cluster mode when the application is small
[SPARK-23663] - Spark Streaming Kafka 010 , fails with "java.util.ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access"
[SPARK-23666] - Undeterministic column name with UDFs
[SPARK-23670] - Memory leak of SparkPlanGraphWrapper in sparkUI
[SPARK-23671] - SHS is ignoring number of replay threads
[SPARK-23679] - uiWebUrl show inproper URL when running on YARN
[SPARK-23680] - entrypoint.sh does not accept arbitrary UIDs, returning as an error
[SPARK-23682] - Memory issue with Spark structured streaming
[SPARK-23697] - Accumulators of Spark 1.x no longer work with Spark 2.x
[SPARK-23698] - Spark code contains numerous undefined names in Python 3
[SPARK-23729] - Glob resolution breaks remote naming of files/archives
[SPARK-23731] - FileSourceScanExec throws NullPointerException in subexpression elimination
[SPARK-23732] - Broken link to scala source code in Spark Scala api Scaladoc
[SPARK-23743] - IsolatedClientLoader.isSharedClass returns an unindented result against `slf4j` keyword
[SPARK-23754] - StopIterator exception in Python UDF results in partial result
[SPARK-23759] - Unable to bind Spark UI to specific host name / IP
[SPARK-23760] - CodegenContext.withSubExprEliminationExprs should save/restore CSE state correctly
[SPARK-23775] - Flaky test: DataFrameRangeSuite
[SPARK-23778] - SparkContext.emptyRDD confuses SparkContext.union
[SPARK-23780] - Failed to use googleVis library with new SparkR
[SPARK-23785] - LauncherBackend doesn't check state of connection before setting state
[SPARK-23786] - CSV schema validation - column names are not checked
[SPARK-23787] - SparkSubmitSuite::"download remote resource if it is not supported by yarn" fails on Hadoop 2.9
[SPARK-23788] - Race condition in StreamingQuerySuite
[SPARK-23794] - UUID() should be stateful
[SPARK-23799] - [CBO] FilterEstimation.evaluateInSet produces devision by zero in a case of empty table with analyzed statistics
[SPARK-23802] - PropagateEmptyRelation can leave query plan in unresolved state
[SPARK-23806] - Broadcast. unpersist can cause fatal exception when used with dynamic allocation
[SPARK-23808] - Test spark sessions should set default session
[SPARK-23809] - Active SparkSession should be set by getOrCreate
[SPARK-23815] - Spark writer dynamic partition overwrite mode fails to write output on multi level partition
[SPARK-23816] - FetchFailedException when killing speculative task
[SPARK-23823] - ResolveReferences loses correct origin
[SPARK-23825] - [K8s] Spark pods should request memory + memoryOverhead as resources
[SPARK-23827] - StreamingJoinExec should ensure that input data is partitioned into specific number of partitions
[SPARK-23829] - spark-sql-kafka source in spark 2.3 causes reading stream failure frequently
[SPARK-23834] - Flaky test: LauncherServerSuite.testAppHandleDisconnect
[SPARK-23835] - When Dataset.as converts column from nullable to non-nullable type, null Doubles are converted silently to -1
[SPARK-23850] - We should not redact username|user|url from UI by default
[SPARK-23852] - Parquet MR bug can lead to incorrect SQL results
[SPARK-23853] - Skip doctests which require hive support built in PySpark
[SPARK-23857] - In mesos cluster mode spark submit requires the keytab to be available on the local file system.
[SPARK-23868] - Fix scala.MatchError in literals.sql.out
[SPARK-23882] - Is UTF8StringSuite.writeToOutputStreamUnderflow() supported?
[SPARK-23888] - speculative task should not run on a given host where another attempt is already running on
[SPARK-23893] - Possible overflow in long = int * int
[SPARK-23941] - Mesos task failed on specific spark app name
[SPARK-23951] - Use java classed in ExprValue and simplify a bunch of stuff
[SPARK-23971] - Should not leak Spark sessions across test suites
[SPARK-23975] - Allow Clustering to take Arrays of Double as input features
[SPARK-23976] - UTF8String.concat() or ByteArray.concat() may allocate shorter structure.
[SPARK-23986] - CompileException when using too many avg aggregation after joining
[SPARK-23989] - When using `SortShuffleWriter`, the data will be overwritten
[SPARK-23991] - data loss when allocateBlocksToBatch
[SPARK-23997] - Configurable max number of buckets
[SPARK-24002] - Task not serializable caused by org.apache.parquet.io.api.Binary$ByteBufferBackedBinary.getBytes
[SPARK-24007] - EqualNullSafe for FloatType and DoubleType might generate a wrong result by codegen.
[SPARK-24012] - Union of map and other compatible column
[SPARK-24013] - ApproximatePercentile grinds to a halt on sorted input.
[SPARK-24021] - Fix bug in BlacklistTracker's updateBlacklistForFetchFailure
[SPARK-24022] - Flaky test: SparkContextSuite
[SPARK-24033] - LAG Window function broken in Spark 2.3
[SPARK-24043] - InterpretedPredicate.eval fails if expression tree contains Nondeterministic expressions
[SPARK-24050] - StreamingQuery does not calculate input / processing rates in some cases
[SPARK-24056] - Make consumer creation lazy in Kafka source for Structured streaming
[SPARK-24061] - [SS]TypedFilter is not supported in Continuous Processing
[SPARK-24062] - SASL encryption cannot be worked in ThriftServer
[SPARK-24068] - CSV schema inferring doesn't work for compressed files
[SPARK-24076] - very bad performance when shuffle.partition = 8192
[SPARK-24085] - Scalar subquery error
[SPARK-24104] - SQLAppStatusListener overwrites metrics onDriverAccumUpdates instead of updating them
[SPARK-24107] - ChunkedByteBuffer.writeFully method has not reset the limit value
[SPARK-24108] - ChunkedByteBuffer.writeFully method has not reset the limit value
[SPARK-24110] - Avoid calling UGI loginUserFromKeytab in ThriftServer
[SPARK-24123] - Fix a flaky test `DateTimeUtilsSuite.monthsBetween`
[SPARK-24133] - Reading Parquet files containing large strings can fail with java.lang.ArrayIndexOutOfBoundsException
[SPARK-24137] - [K8s] Mount temporary directories in emptydir volumes
[SPARK-24141] - Fix bug in CoarseGrainedSchedulerBackend.killExecutors
[SPARK-24143] - filter empty blocks when convert mapstatus to (blockId, size) pair
[SPARK-24151] - CURRENT_DATE, CURRENT_TIMESTAMP incorrectly resolved as column names when caseSensitive is enabled
[SPARK-24165] - UDF within when().otherwise() raises NullPointerException
[SPARK-24166] - InMemoryTableScanExec should not access SQLConf at executor side
[SPARK-24167] - ParquetFilters should not access SQLConf at executor side
[SPARK-24168] - WindowExec should not access SQLConf at executor side
[SPARK-24169] - JsonToStructs should not access SQLConf at executor side
[SPARK-24190] - lineSep shouldn't be required in JSON write
[SPARK-24195] - sc.addFile for local:/ path is broken
[SPARK-24214] - StreamingRelationV2/StreamingExecutionRelation/ContinuousExecutionRelation.toJSON should not fail
[SPARK-24216] - Spark TypedAggregateExpression uses getSimpleName that is not safe in scala
[SPARK-24228] - Fix the lint error
[SPARK-24230] - With Parquet 1.10 upgrade has errors in the vectorized reader
[SPARK-24241] - Do not fail fast when dynamic resource allocation enabled with 0 executor
[SPARK-24255] - Require Java 8 in SparkR description
[SPARK-24257] - LongToUnsafeRowMap calculate the new size may be wrong
[SPARK-24259] - ArrayWriter for Arrow produces wrong output
[SPARK-24263] - SparkR java check breaks on openjdk
[SPARK-24276] - semanticHash() returns different values for semantically the same IS IN
[SPARK-24294] - Throw SparkException when OOM in BroadcastExchangeExec
[SPARK-24300] - generateLDAData in ml.cluster.LDASuite didn't set seed correctly
[SPARK-24309] - AsyncEventQueue should handle an interrupt from a Listener
[SPARK-24313] - Collection functions interpreted execution doesn't work with complex types
[SPARK-24319] - run-example can not print usage
[SPARK-24322] - Upgrade Apache ORC to 1.4.4
[SPARK-24341] - Codegen compile error from predicate subquery
[SPARK-24348] - scala.MatchError in the "element_at" expression
[SPARK-24350] - ClassCastException in "array_position" function
[SPARK-24351] - offsetLog/commitLog purge thresholdBatchId should be computed with current committed epoch but not currentBatchId in CP mode
[SPARK-24364] - Files deletion after globbing may fail StructuredStreaming jobs
[SPARK-24368] - Flaky tests: org.apache.spark.sql.execution.datasources.csv.UnivocityParserSuite
[SPARK-24369] - A bug when having multiple distinct aggregations
[SPARK-24373] - "df.cache() df.count()" no longer eagerly caches data when the analyzed plans are different after re-analyzing the plans
[SPARK-24377] - Make --py-files work in non pyspark application
[SPARK-24380] - argument quoting/escaping broken in mesos cluster scheduler
[SPARK-24384] - spark-submit --py-files with .py files doesn't work in client mode before context initialization
[SPARK-24385] - Trivially-true EqualNullSafe should be handled like EqualTo in Dataset.join
[SPARK-24391] - from_json should support arrays of primitives, and more generally all JSON
[SPARK-24414] - Stages page doesn't show all task attempts when failures
[SPARK-24415] - Stage page aggregated executor metrics wrong when failures
[SPARK-24416] - Update configuration definition for spark.blacklist.killBlacklistedExecutors
[SPARK-24446] - Library path with special characters breaks Spark on YARN
[SPARK-24452] - long = int*int or long = int+int may cause overflow.
[SPARK-24453] - Fix error recovering from the failure in a no-data batch
[SPARK-24466] - TextSocketMicroBatchReader no longer works with nc utility
[SPARK-24468] - DecimalType `adjustPrecisionScale` might fail when scale is negative
[SPARK-24488] - Analyzer throws when generator is aliased multiple times
[SPARK-24495] - SortMergeJoin with duplicate keys wrong results
[SPARK-24500] - UnsupportedOperationException when trying to execute Union plan with Stream of children
[SPARK-24506] - Spark.ui.filters not applied to /sqlserver/ url
[SPARK-24520] - Double braces in link
[SPARK-24526] - Spaces in the build dir causes failures in the build/mvn script
[SPARK-24530] - Sphinx doesn't render autodoc_docstring_signature correctly (with Python 2?) and pyspark.ml docs are broken
[SPARK-24531] - HiveExternalCatalogVersionsSuite failing due to missing 2.2.0 version
[SPARK-24536] - Query with nonsensical LIMIT hits AssertionError
[SPARK-24548] - JavaPairRDD to Dataset<Row> in SPARK generates ambiguous results
[SPARK-24552] - Task attempt numbers are reused when stages are retried
[SPARK-24553] - Job UI redirect causing http 302 error
[SPARK-24556] - ReusedExchange should rewrite output partitioning also when child's partitioning is RangePartitioning
[SPARK-24563] - Allow running PySpark shell without Hive
[SPARK-24569] - Spark Aggregator with output type Option[Boolean] creates column of type Row
[SPARK-24573] - SBT Java checkstyle affecting the build
[SPARK-24578] - Reading remote cache block behavior changes and causes timeout issue
[SPARK-24583] - Wrong schema type in InsertIntoDataSourceCommand
[SPARK-24588] - StreamingSymmetricHashJoinExec should require HashClusteredPartitioning from children
[SPARK-24589] - OutputCommitCoordinator may allow duplicate commits
[SPARK-24594] - Introduce metrics for YARN executor allocation problems
[SPARK-24598] - SPARK SQL:Datatype overflow conditions gives incorrect result
[SPARK-24603] - Typo in comments
[SPARK-24610] - wholeTextFiles broken for small files
[SPARK-24613] - Cache with UDF could not be matched with subsequent dependent caches
[SPARK-24633] - arrays_zip function's code generator splits input processing incorrectly
[SPARK-24645] - Skip parsing when csvColumnPruning enabled and partitions scanned only
[SPARK-24648] - SQLMetrics counters are not thread safe
[SPARK-24653] - Flaky test "JoinSuite.test SortMergeJoin (with spill)"
[SPARK-24659] - GenericArrayData.equals should respect element type differences
[SPARK-24660] - SHS is not showing properly errors when downloading logs
[SPARK-24676] - Project required data from parsed data when csvColumnPruning disabled
[SPARK-24677] - TaskSetManager not updating successfulTaskDurations for old stage attempts
[SPARK-24681] - Cannot create a view from a table when a nested column name contains ':'
[SPARK-24694] - Integration tests pass only one app argument
[SPARK-24698] - In Pyspark's ML, an Identifiable's UID has 20 random characters rather than the 12 mentioned in the documentation.
[SPARK-24699] - Watermark / Append mode should work with Trigger.Once
[SPARK-24704] - The order of stages in the DAG graph is incorrect
[SPARK-24705] - Spark.sql.adaptive.enabled=true is enabled and self-join query
[SPARK-24711] - Integration tests will not work with exclude/include tags
[SPARK-24713] - AppMatser of spark streaming kafka OOM if there are hundreds of topics consumed
[SPARK-24715] - sbt build brings a wrong jline versions
[SPARK-24717] - Split out min retain version of state for memory in HDFSBackedStateStoreProvider
[SPARK-24721] - Failed to use PythonUDF with literal inputs in filter with data sources
[SPARK-24734] - Fix containsNull of Concat for array type.
[SPARK-24739] - PySpark does not work with Python 3.7.0
[SPARK-24742] - Field Metadata raises NullPointerException in hashCode method
[SPARK-24743] - Update the JavaDirectKafkaWordCount example to support the new API of Kafka
[SPARK-24749] - Cannot filter array<struct> with named_struct
[SPARK-24754] - Minhash integer overflow
[SPARK-24755] - Executor loss can cause task to not be resubmitted
[SPARK-24781] - Using a reference from Dataset in Filter/Sort might not work.
[SPARK-24787] - Events being dropped at an alarming rate due to hsync being slow for eventLogging
[SPARK-24788] - RelationalGroupedDataset.toString throws errors when grouping by UnresolvedAttribute
[SPARK-24804] - There are duplicate words in the title in the DatasetSuite
[SPARK-24809] - Serializing LongHashedRelation in executor may result in data error
[SPARK-24812] - Last Access Time in the table description is not valid
[SPARK-24813] - HiveExternalCatalogVersionsSuite still flaky; fall back to Apache archive
[SPARK-24829] - In Spark Thrift Server, CAST AS FLOAT inconsistent with spark-shell or spark-sql
[SPARK-24846] - Stabilize expression cannonicalization
[SPARK-24850] - Query plan string representation grows exponentially on queries with recursive cached datasets
[SPARK-24870] - Cache can't work normally if there are case letters in SQL
[SPARK-24873] - increase switch to shielding frequent interaction reports with yarn
[SPARK-24878] - Fix reverse function for array type of primitive type containing null.
[SPARK-24879] - NPE in Hive partition filter pushdown for `partCol IN (NULL, ....)`
[SPARK-24880] - Fix the group id for spark-kubernetes-integration-tests
[SPARK-24889] - dataset.unpersist() doesn't update storage memory stats
[SPARK-24891] - Fix HandleNullInputsForUDF rule
[SPARK-24895] - Spark 2.4.0 Snapshot artifacts has broken metadata due to mismatched filenames
[SPARK-24896] - Uuid expression should produce different values in each execution under streaming query
[SPARK-24908] - [R] remove spaces to make lintr happy
[SPARK-24909] - Spark scheduler can hang when fetch failures, executor lost, task running on lost executor, and multiple stage attempts
[SPARK-24911] - SHOW CREATE TABLE drops escaping of nested column names
[SPARK-24919] - Scala linter rule for sparkContext.hadoopConfiguration
[SPARK-24927] - The hadoop-provided profile doesn't play well with Snappy-compressed Parquet files
[SPARK-24934] - Complex type and binary type in in-memory partition pruning does not work due to missing upper/lower bounds cases
[SPARK-24937] - Datasource partition table should load empty static partitions
[SPARK-24948] - SHS filters wrongly some applications due to permission check
[SPARK-24950] - scala DateTimeUtilsSuite daysToMillis and millisToDays fails w/java 8 181-b13
[SPARK-24957] - Decimal arithmetic can lead to wrong values using codegen
[SPARK-24963] - Integration tests will fail if they run in a namespace not being the default
[SPARK-24966] - Fix the precedence rule for set operations.
[SPARK-24972] - PivotFirst could not handle pivot columns of complex types
[SPARK-24981] - ShutdownHook timeout causes job to fail when succeeded when SparkContext stop() not called by user program
[SPARK-24987] - Kafka Cached Consumer Leaking File Descriptors
[SPARK-24997] - Support MINUS ALL
[SPARK-25004] - Add spark.executor.pyspark.memory config to set resource.RLIMIT_AS
[SPARK-25005] - Structured streaming doesn't support kafka transaction (creating empty offset with abort & markers)
[SPARK-25009] - Standalone Cluster mode application submit is not working
[SPARK-25010] - Rand/Randn should produce different values for each execution in streaming query
[SPARK-25011] - Add PrefixSpan to __all__ in fpm.py
[SPARK-25019] - The published spark sql pom does not exclude the normal version of orc-core
[SPARK-25021] - Add spark.executor.pyspark.memory support to Kubernetes
[SPARK-25028] - AnalyzePartitionCommand failed with NPE if value is null
[SPARK-25031] - The schema of MapType can not be printed correctly
[SPARK-25033] - Bump Apache commons.{httpclient, httpcore}
[SPARK-25036] - Scala 2.12 issues: Compilation error with sbt
[SPARK-25041] - genjavadoc-plugin_0.10 is not found with sbt in scala-2.12
[SPARK-25046] - Alter View can excute sql like "ALTER VIEW ... AS INSERT INTO"
[SPARK-25058] - Use Block.isEmpty/nonEmpty to check whether the code is empty or not.
[SPARK-25072] - PySpark custom Row class can be given extra parameters
[SPARK-25076] - SQLConf should not be retrieved from a stopped SparkSession
[SPARK-25081] - Nested spill in ShuffleExternalSorter may access a released memory page
[SPARK-25084] - "distribute by" on multiple columns may lead to codegen issue
[SPARK-25090] - java.lang.ClassCastException when using a CrossValidator
[SPARK-25092] - Add RewriteExceptAll, RewriteIntersectAll and RewriteCorrelatedScalarSubquery in the list of nonExcludableRules
[SPARK-25096] - Loosen nullability if the cast is force-nullable.
[SPARK-25114] - RecordBinaryComparator may return wrong result when subtraction between two words is divisible by Integer.MAX_VALUE
[SPARK-25116] - Fix the "exit code 1" error when terminating Kafka tests
[SPARK-25124] - VectorSizeHint.size is buggy, breaking streaming pipeline
[SPARK-25126] - avoid creating OrcFile.Reader for all orc files
[SPARK-25132] - Case-insensitive field resolution when reading from Parquet
[SPARK-25134] - Csv column pruning with checking of headers throws incorrect error
[SPARK-25137] - NumberFormatException` when starting spark-shell from Mac terminal
[SPARK-25149] - Personalized PageRank raises an error if vertexIDs are > MaxInt
[SPARK-25159] - json schema inference should only trigger one job
[SPARK-25161] - Fix several bugs in failure handling of barrier execution mode
[SPARK-25163] - Flaky test: o.a.s.util.collection.ExternalAppendOnlyMapSuite.spilling with compression
[SPARK-25164] - Parquet reader builds entire list of columns once for each column
[SPARK-25167] - Minor fixes for R sql tests (tests that fail in development environment)
[SPARK-25174] - ApplicationMaster suspends when unregistering itself from RM with extreme large diagnostic message
[SPARK-25175] - Field resolution should fail if there's ambiguity for ORC native reader
[SPARK-25176] - Kryo fails to serialize a parametrised type hierarchy
[SPARK-25181] - Block Manager master and slave thread pools are unbounded
[SPARK-25183] - Spark HiveServer2 registers shutdown hook with JVM, not ShutdownHookManager; race conditions can arise
[SPARK-25204] - rate source test is flaky
[SPARK-25205] - typo in spark.network.crypto.keyFactoryIteration
[SPARK-25206] - wrong records are returned when Hive metastore schema and parquet schema are in different letter cases
[SPARK-25214] - Kafka v2 source may return duplicated records when `failOnDataLoss` is `false`
[SPARK-25218] - Potential resource leaks in TransportServer and SocketAuthHelper
[SPARK-25221] - [DEPLOY] Consistent trailing whitespace treatment of conf values
[SPARK-25231] - Running a Large Job with Speculation On Causes Executor Heartbeats to Time Out on Driver
[SPARK-25237] - FileScanRdd's inputMetrics is wrong when select the datasource table with limit
[SPARK-25240] - A deadlock in ALTER TABLE RECOVER PARTITIONS
[SPARK-25264] - Fix comma-delineated arguments passed into PythonRunner and RRunner
[SPARK-25266] - Fix memory leak in Barrier Execution Mode
[SPARK-25268] - runParallelPersonalizedPageRank throws serialization Exception
[SPARK-25278] - Number of output rows metric of union of views is multiplied by their occurrences
[SPARK-25283] - A deadlock in UnionRDD
[SPARK-25288] - Kafka transaction tests are flaky
[SPARK-25289] - ChiSqSelector max on empty collection
[SPARK-25291] - Flakiness of tests in terms of executor memory (SecretsTestSuite)
[SPARK-25295] - Pod names conflicts in client mode, if previous submission was not a clean shutdown.
[SPARK-25306] - Avoid skewed filter trees to speed up `createFilter` in ORC
[SPARK-25307] - ArraySort function may return a error in the code generation phase.
[SPARK-25308] - ArrayContains function may return a error in the code generation phase.
[SPARK-25310] - ArraysOverlap may throw a CompileException
[SPARK-25313] - Fix regression in FileFormatWriter output schema
[SPARK-25314] - Invalid PythonUDF - requires attributes from more than one child - in "on" join condition
[SPARK-25317] - MemoryBlock performance regression
[SPARK-25330] - Permission issue after upgrade hadoop version to 2.7.7
[SPARK-25352] - Perform ordered global limit when limit number is bigger than topKSortFallbackThreshold
[SPARK-25357] - Add metadata to SparkPlanInfo to dump more information like file path to event log
[SPARK-25363] - Schema pruning doesn't work if nested column is used in where clause
[SPARK-25368] - Incorrect constraint inference returns wrong result
[SPARK-25371] - Vector Assembler with no input columns leads to opaque error
[SPARK-25387] - Malformed CSV causes NPE
[SPARK-25389] - INSERT OVERWRITE DIRECTORY STORED AS should prevent duplicate fields
[SPARK-25398] - Minor bugs from comparing unrelated types
[SPARK-25399] - Reusing execution threads from continuous processing for microbatch streaming can result in correctness issues
[SPARK-25402] - Null handling in BooleanSimplification
[SPARK-25406] - Incorrect usage of withSQLConf method in Parquet schema pruning test suite masks failing tests
[SPARK-25416] - ArrayPosition function may return incorrect result when right expression is implicitly downcasted.
[SPARK-25417] - ArrayContains function may return incorrect result when right expression is implicitly down casted
[SPARK-25425] - Extra options must overwrite sessions options
[SPARK-25427] - Add BloomFilter creation test cases
[SPARK-25431] - Fix function examples and unify the format of the example results.
[SPARK-25438] - Fix FilterPushdownBenchmark to use the same memory assumption
[SPARK-25439] - TPCHQuerySuite customer.c_nationkey should be bigint instead of string
[SPARK-25443] - fix issues when building docs with release scripts in docker
[SPARK-25450] - PushProjectThroughUnion rule uses the same exprId for project expressions in each Union child, causing mistakes in constant propagation
[SPARK-25471] - Fix tests for Python 3.6 with Pandas 0.23+
[SPARK-25495] - FetchedData.reset doesn't reset _nextOffsetInFetchedData and _offsetAfterPoll
[SPARK-25502] - [Spark Job History] Empty Page when page number exceeds the reatinedTask size
[SPARK-25503] - [Spark Job History] Total task message in stage page is ambiguous
[SPARK-25505] - The output order of grouping columns in Pivot is different from the input order
[SPARK-25509] - SHS V2 cannot enabled in Windows, because POSIX permissions is not support.
[SPARK-25519] - ArrayRemove function may return incorrect result when right expression is implicitly downcasted.
[SPARK-25521] - Job id showing null when insert into command Job is finished.
[SPARK-25522] - Improve type promotion for input arguments of elementAt function
[SPARK-25533] - Inconsistent message for Completed Jobs in the JobUI, when there are failed jobs, compared to spark2.2
[SPARK-25536] - executorSource.METRIC read wrong record in Executor.scala Line444
[SPARK-25538] - incorrect row counts after distinct()
[SPARK-25542] - Flaky test: OpenHashMapSuite
[SPARK-25543] - Confusing log messages at DEBUG level, in K8s mode.
[SPARK-25546] - RDDInfo uses SparkEnv before it may have been initialized
[SPARK-25568] - Continue to update the remaining accumulators when failing to update one accumulator
[SPARK-25570] - Replace 2.3.1 with 2.3.2 in HiveExternalCatalogVersionsSuite
[SPARK-25578] - Update to Scala 2.12.7
[SPARK-25579] - Use quoted attribute names if needed in pushed ORC predicates
[SPARK-25591] - PySpark Accumulators with multiple PythonUDFs
[SPARK-25602] - SparkPlan.getByteArrayRdd should not consume the input when not necessary
[SPARK-25636] - spark-submit swallows the failure reason when there is an error connecting to master
[SPARK-25644] - Fix java foreachBatch API
[SPARK-25646] - docker-image-tool.sh doesn't work on developer build
[SPARK-25660] - Impossible to use the backward slash as the CSV fields delimiter
[SPARK-25669] - Check CSV header only when it exists
[SPARK-25671] - Build external/spark-ganglia-lgpl in Jenkins Test
[SPARK-25674] - If the records are incremented by more than 1 at a time,the number of bytes might rarely ever get updated
[SPARK-25677] - Configuring zstd compression in JDBC throwing IllegalArgumentException Exception
[SPARK-25697] - When zstd compression enabled in progress application is throwing Error in UI
[SPARK-25704] - Replication of > 2GB block fails due to bad config default
[SPARK-25708] - HAVING without GROUP BY means global aggregate
[SPARK-25714] - Null Handling in the Optimizer rule BooleanSimplification
[SPARK-25726] - Flaky test: SaveIntoDataSourceCommandSuite.`simpleString is redacted`
[SPARK-25727] - makeCopy failed in InMemoryRelation
[SPARK-25738] - LOAD DATA INPATH doesn't work if hdfs conf includes port
[SPARK-25741] - Long URLs are not rendered properly in web UI
[SPARK-25768] - Constant argument expecting Hive UDAFs doesn't work
[SPARK-25793] - Loading model bug in BisectingKMeans
[SPARK-25795] - Fix CSV SparkR SQL Example
[SPARK-25797] - Views created via 2.1 cannot be read via 2.2+
[SPARK-25799] - DataSourceApiV2 scan reuse does not respect options
[SPARK-25801] - pandas_udf grouped_map fails with input dataframe with more than 255 columns
[SPARK-25803] - The -n option to docker-image-tool.sh causes other options to be ignored
[SPARK-25816] - Functions does not resolve Columns correctly
[SPARK-25822] - Fix a race condition when releasing a Python worker
[SPARK-25832] - remove newly added map related functions
[SPARK-25835] - Propagate scala 2.12 profile in k8s integration tests
[SPARK-25840] - `make-distribution.sh` should not fail due to missing LICENSE-binary
[SPARK-25854] - mvn helper script always exits w/1, causing mvn builds to fail
[SPARK-26612] - Speculation kill causing finished stage recomputed
[SPARK-26614] - Speculation kill might cause job failure
[SPARK-26802] - CVE-2018-11760: Apache Spark local privilege escalation vulnerability
[SPARK-28626] - Spark leaves unencrypted data on local disk, even with encryption turned on (CVE-2019-10099)
[SPARK-34381] - c

Epic

[SPARK-24374] - SPIP: Support Barrier Execution Mode in Apache Spark

Story

[SPARK-24124] - Spark history server should create spark.history.store.path and set permissions properly
[SPARK-24852] - Have spark.ml training use updated `Instrumentation` APIs.
[SPARK-25234] - SparkR:::parallelize doesn't handle integer overflow properly
[SPARK-25248] - Audit barrier APIs for Spark 2.4
[SPARK-25345] - Deprecate readImages APIs from ImageSchema
[SPARK-25347] - Document image data source in doc site

New Feature

[SPARK-10697] - Lift Calculation in Association Rule mining
[SPARK-14682] - Provide evaluateEachIteration method or equivalent for spark.ml GBTs
[SPARK-15064] - Locale support in StopWordsRemover
[SPARK-15784] - Add Power Iteration Clustering to spark.ml
[SPARK-19480] - Higher order functions in SQL
[SPARK-21274] - Implement EXCEPT ALL and INTERSECT ALL
[SPARK-22119] - Add cosine distance to KMeans
[SPARK-22880] - Add option to cascade jdbc truncate if database supports this (PostgreSQL and Oracle)
[SPARK-23010] - Add integration testing for Kubernetes backend into the apache/spark repository
[SPARK-23146] - Support client mode for Kubernetes cluster backend
[SPARK-23235] - Add executor Threaddump to api
[SPARK-23541] - Allow Kafka source to read data with greater parallelism than the number of topic-partitions
[SPARK-23751] - Kolmogorov-Smirnoff test Python API in pyspark.ml
[SPARK-23846] - samplingRatio for schema inferring of CSV datasource
[SPARK-23856] - Spark jdbc setQueryTimeout option
[SPARK-23948] - Trigger mapstage's job listener in submitMissingTasks
[SPARK-23984] - PySpark Bindings for K8S
[SPARK-24027] - Support MapType(StringType, DataType) as root type by from_json
[SPARK-24193] - Sort by disk when number of limit is big in TakeOrderedAndProjectExec
[SPARK-24231] - Python API: Provide evaluateEachIteration method or equivalent for spark.ml GBTs
[SPARK-24232] - Allow referring to kubernetes secrets as env variable
[SPARK-24288] - Enable preventing predicate pushdown
[SPARK-24371] - Added isInCollection in DataFrame API for Scala and Java.
[SPARK-24372] - Create script for preparing RCs
[SPARK-24396] - Add Structured Streaming ForeachWriter for python
[SPARK-24397] - Add TaskContext.getLocalProperties in Python
[SPARK-24411] - Adding native Java tests for `isInCollection`
[SPARK-24412] - Adding docs about automagical type casting in `isin` and `isInCollection` APIs
[SPARK-24433] - R Bindings for K8S
[SPARK-24435] - Support user-supplied YAML that can be merged with k8s pod descriptions
[SPARK-24465] - LSHModel should support Structured Streaming for transform
[SPARK-24479] - Register StreamingQueryListener in Spark Conf
[SPARK-24499] - Split the page of sql-programming-guide.html to multiple separate pages
[SPARK-24542] - Hive UDF series UDFXPathXXXX allow users to pass carefully crafted XML to access arbitrary files
[SPARK-24662] - Structured Streaming should support LIMIT
[SPARK-24730] - Add policy to choose max as global watermark when streaming query has multiple watermarks
[SPARK-24768] - Have a built-in AVRO data source implementation
[SPARK-24795] - Implement barrier execution mode
[SPARK-24802] - Optimization Rule Exclusion
[SPARK-24817] - Implement BarrierTaskContext.barrier()
[SPARK-24819] - Fail fast when no enough slots to launch the barrier stage on job submitted
[SPARK-24820] - Fail fast when submitted job contains PartitionPruningRDD in a barrier stage
[SPARK-24821] - Fail fast when submitted job compute on a subset of all the partitions for a barrier stage
[SPARK-24822] - Python support for barrier execution mode
[SPARK-24918] - Executor Plugin API
[SPARK-25468] - Highlight current page index in the history server

Improvement

[SPARK-3159] - Check for reducible DecisionTree
[SPARK-4502] - Spark SQL reads unneccesary nested fields from Parquet
[SPARK-7132] - Add fit with validation set to spark.ml GBT
[SPARK-9312] - The OneVsRest model does not provide rawPrediction
[SPARK-11630] - ClosureCleaner incorrectly warns for class based closures
[SPARK-13343] - speculative tasks that didn't commit shouldn't be marked as success
[SPARK-14712] - spark.ml LogisticRegressionModel.toString should summarize model
[SPARK-15009] - PySpark CountVectorizerModel should be able to construct from vocabulary list
[SPARK-16406] - Reference resolution for large number of columns should be faster
[SPARK-16501] - spark.mesos.secret exposed on UI and command line
[SPARK-16617] - Upgrade to Avro 1.8.x
[SPARK-16630] - Blacklist a node if executors won't launch on it.
[SPARK-18057] - Update structured streaming kafka from 0.10.0.1 to 2.0.0
[SPARK-18230] - MatrixFactorizationModel.recommendProducts throws NoSuchElement exception when the user does not exist
[SPARK-19018] - spark csv writer charset support
[SPARK-19602] - Unable to query using the fully qualified column name of the form ( <DBNAME>.<TABLENAME>.<COLUMNNAME>)
[SPARK-19724] - create a managed table with an existed default location should throw an exception
[SPARK-19947] - RFormulaModel always throws Exception on transforming data with NULL or Unseen labels
[SPARK-20087] - Include accumulators / taskMetrics when sending TaskKilled to onTaskEnd listeners
[SPARK-20168] - Enable kinesis to start stream from Initial position specified by a timestamp
[SPARK-20538] - Dataset.reduce operator should use withNewExecutionId (as foreach or foreachPartition)
[SPARK-20659] - Remove StorageStatus, or make it private.
[SPARK-20937] - Describe spark.sql.parquet.writeLegacyFormat property in Spark SQL, DataFrames and Datasets Guide
[SPARK-21318] - The exception message thrown by `lookupFunction` is ambiguous.
[SPARK-21590] - Structured Streaming window start time should support negative values to adjust time zone
[SPARK-21687] - Spark SQL should set createTime for Hive partition
[SPARK-21741] - Python API for DataFrame-based multivariate summarizer
[SPARK-21783] - Turn on ORC filter push-down by default
[SPARK-21860] - Improve memory reuse for heap memory in `HeapMemoryAllocator`
[SPARK-21960] - Spark Streaming Dynamic Allocation should respect spark.executor.instances
[SPARK-22068] - Reduce the duplicate code between putIteratorAsValues and putIteratorAsBytes
[SPARK-22144] - ExchangeCoordinator will not combine the partitions of an 0 sized pre-shuffle
[SPARK-22210] - Online LDA variationalTopicInference should use random seed to have stable behavior
[SPARK-22219] - Refector "spark.sql.codegen.comments"
[SPARK-22269] - Java style checks should be run in Jenkins
[SPARK-22666] - Spark datasource for image format
[SPARK-22683] - DynamicAllocation wastes resources by allocating containers that will barely be used
[SPARK-22751] - Improve ML RandomForest shuffle performance
[SPARK-22814] - JDBC support date/timestamp type as partitionColumn
[SPARK-22839] - Refactor Kubernetes code for configuring driver/executor pods to use consistent and cleaner abstraction
[SPARK-22856] - Add wrapper for codegen output and nullability
[SPARK-22941] - Allow SparkSubmit to throw exceptions instead of exiting / printing errors.
[SPARK-22959] - Configuration to select the modules for daemon and worker in PySpark
[SPARK-23012] - Support for predicate pushdown and partition pruning when left joining large Hive tables
[SPARK-23024] - Spark ui about the contents of the form need to have hidden and show features, when the table records very much.
[SPARK-23031] - Merge script should allow arbitrary assignees
[SPARK-23034] - Display tablename for `HiveTableScan` node in UI
[SPARK-23040] - BlockStoreShuffleReader's return Iterator isn't interruptible if aggregator or ordering is specified
[SPARK-23043] - Upgrade json4s-jackson to 3.5.3
[SPARK-23085] - API parity for mllib.linalg.Vectors.sparse
[SPARK-23159] - Update Cloudpickle to match version 0.4.3
[SPARK-23161] - Add missing APIs to Python GBTClassifier
[SPARK-23162] - PySpark ML LinearRegressionSummary missing r2adj
[SPARK-23166] - Add maxDF Parameter to CountVectorizer
[SPARK-23167] - Update TPCDS queries from v1.4 to v2.7 (latest)
[SPARK-23174] - Fix pep8 to latest official version
[SPARK-23188] - Make vectorized columar reader batch size configurable
[SPARK-23202] - Add new API in DataSourceWriter: onDataWriterCommit
[SPARK-23217] - Add cosine distance measure to ClusteringEvaluator
[SPARK-23228] - Able to track Python create SparkSession in JVM
[SPARK-23247] - combines Unsafe operations and statistics operations in Scan Data Source
[SPARK-23253] - Only write shuffle temporary index file when there is not an existing one
[SPARK-23259] - Clean up legacy code around hive external catalog
[SPARK-23285] - Allow spark.executor.cores to be fractional
[SPARK-23295] - Exclude Waring message when generating versions in make-distribution.sh
[SPARK-23303] - improve the explain result for data source v2 relations
[SPARK-23318] - FP-growth: WARN FPGrowth: Input data is not cached
[SPARK-23336] - Upgrade snappy-java to 1.1.7.1
[SPARK-23359] - Adds an alias 'names' of 'fieldNames' in Scala's StructType
[SPARK-23366] - Improve hot reading path in ReadAheadInputStream
[SPARK-23372] - Writing empty struct in parquet fails during execution. It should fail earlier during analysis.
[SPARK-23375] - Optimizer should remove unneeded Sort
[SPARK-23378] - move setCurrentDatabase from HiveExternalCatalog to HiveClientImpl
[SPARK-23379] - remove redundant metastore access if the current database name is the same
[SPARK-23382] - Spark Streaming ui about the contents of the form need to have hidden and show features, when the table records very much.
[SPARK-23383] - Make a distribution should exit with usage while detecting wrong options
[SPARK-23389] - When the shuffle dependency specifies aggregation ,and `dependency.mapSideCombine=false`, we should be able to use serialized sorting.
[SPARK-23412] - Add cosine distance measure to BisectingKMeans
[SPARK-23424] - Add codegenStageId in comment
[SPARK-23445] - ColumnStat refactoring
[SPARK-23447] - Cleanup codegen template for Literal
[SPARK-23455] - Default Params in ML should be saved separately
[SPARK-23456] - Turn on `native` ORC implementation by default
[SPARK-23466] - Remove redundant null checks in generated Java code by GenerateUnsafeProjection
[SPARK-23500] - Filters on named_structs could be pushed into scans
[SPARK-23510] - Support read data from Hive 2.2 and Hive 2.3 metastore
[SPARK-23518] - Avoid metastore access when users only want to read and store data frames
[SPARK-23528] - Add numIter to ClusteringSummary
[SPARK-23529] - Specify hostpath volume and mount the volume in Spark driver and executor pods in Kubernetes
[SPARK-23538] - Simplify SSL configuration for https client
[SPARK-23550] - Cleanup unused / redundant methods in Utils object
[SPARK-23553] - Tests should not assume the default value of `spark.sql.sources.default`
[SPARK-23562] - RFormula handleInvalid should handle invalid values in non-string columns.
[SPARK-23564] - the optimized logical plan about Left anti join should be further optimization
[SPARK-23565] - Improved error message for when the number of sources for a query changes
[SPARK-23568] - Silhouette should get number of features from metadata if available
[SPARK-23572] - Update security.md to cover new features
[SPARK-23573] - Create linter rule to prevent misuse of SparkContext.hadoopConfiguration in SQL modules
[SPARK-23604] - ParquetInteroperabilityTest timestamp test should use Statistics.hasNonNullValue
[SPARK-23624] - Revise doc of method pushFilters
[SPARK-23627] - Provide isEmpty() function in DataSet
[SPARK-23628] - WholeStageCodegen can generate methods with too many params
[SPARK-23644] - SHS with proxy doesn't show applications
[SPARK-23645] - pandas_udf can not be called with keyword arguments
[SPARK-23654] - Cut jets3t as a dependency of spark-core
[SPARK-23656] - Assertion in XXH64Suite.testKnownByteArrayInputs() is not performed on big endian platform
[SPARK-23672] - Document Support returning lists in Arrow UDFs
[SPARK-23675] - Title add spark logo, use spark logo image
[SPARK-23683] - FileCommitProtocol.instantiate to require 3-arg constructor for dynamic partition overwrite
[SPARK-23691] - Use sql_conf util in PySpark tests where possible
[SPARK-23695] - Confusing error message for PySpark's Kinesis tests when its jar is missing but enabled
[SPARK-23699] - PySpark should raise same Error when Arrow fallback is disabled
[SPARK-23700] - Cleanup unused imports
[SPARK-23708] - Comment of ShutdownHookManager.addShutdownHook is error
[SPARK-23769] - Remove unnecessary scalastyle check disabling
[SPARK-23770] - Expose repartitionByRange in SparkR
[SPARK-23772] - Provide an option to ignore column of all null values or empty map/array during JSON schema inference
[SPARK-23776] - pyspark-sql tests should display build instructions when components are missing
[SPARK-23803] - Support bucket pruning to optimize filtering on a bucketed column
[SPARK-23820] - Allow the long form of call sites to be recorded in the log
[SPARK-23822] - Improve error message for Parquet schema mismatches
[SPARK-23828] - PySpark StringIndexerModel should have constructor from labels
[SPARK-23830] - Spark on YARN in cluster deploy mode fail with NullPointerException when a Spark application is a Scala class not object
[SPARK-23838] - SparkUI: Running SQL query displayed as "completed" in SQL tab
[SPARK-23841] - NodeIdCache should unpersist the last cached nodeIdsForInstances
[SPARK-23861] - Clarify behavior of default window frame boundaries with and without orderBy clause
[SPARK-23867] - com.codahale.metrics.Counter output in log message has no toString method
[SPARK-23873] - Use accessors in interpreted LambdaVariable
[SPARK-23874] - Upgrade apache/arrow to 0.10.0
[SPARK-23875] - Create IndexedSeq wrapper for ArrayData
[SPARK-23877] - Metadata-only queries do not push down filter conditions
[SPARK-23880] - table cache should be lazy and don't trigger any job
[SPARK-23892] - Improve coverage and fix lint error in UTF8String-related Suite
[SPARK-23896] - Improve PartitioningAwareFileIndex
[SPARK-23944] - Add Param set functions to LSHModel types
[SPARK-23947] - Add hashUTF8String convenience method to hasher classes
[SPARK-23956] - Use effective RPC port in AM registration
[SPARK-23957] - Sorts in subqueries are redundant and can be removed
[SPARK-23960] - Mark HashAggregateExec.bufVars as transient
[SPARK-23962] - Flaky tests from SQLMetricsTestUtils.currentExecutionIds
[SPARK-23963] - Queries on text-based Hive tables grow disproportionately slower as the number of columns increase
[SPARK-23966] - Refactoring all checkpoint file writing logic in a common interface
[SPARK-23972] - Upgrade to Parquet 1.10
[SPARK-23973] - Remove consecutive sorts
[SPARK-23979] - MultiAlias should not be a CodegenFallback
[SPARK-24003] - Add support to provide spark.executor.extraJavaOptions in terms of App Id and/or Executor Id's
[SPARK-24005] - Remove usage of Scala’s parallel collection
[SPARK-24014] - Add onStreamingStarted method to StreamingListener
[SPARK-24017] - Refactor ExternalCatalog to be an interface
[SPARK-24024] - Fix deviance calculations in GLM to handle corner cases
[SPARK-24029] - Set "reuse address" flag on listen sockets
[SPARK-24035] - SQL syntax for Pivot
[SPARK-24057] - put the real data type in the AssertionError message
[SPARK-24058] - Default Params in ML should be saved separately: Python API
[SPARK-24072] - clearly define pushed filters
[SPARK-24083] - Diagnostics message for uncaught exceptions should include the stacktrace
[SPARK-24094] - Change description strings of v2 streaming sources to reflect the change
[SPARK-24111] - Add TPCDS v2.7 (latest) queries in TPCDSQueryBenchmark
[SPARK-24117] - Unified the getSizePerRow
[SPARK-24121] - The API for handling expression code generation in expression codegen
[SPARK-24126] - PySpark tests leave a lot of garbage in /tmp
[SPARK-24127] - Support text socket source in continuous mode
[SPARK-24128] - Mention spark.sql.crossJoin.enabled in implicit cartesian product error msg
[SPARK-24129] - Add option to pass --build-arg's to docker-image-tool.sh
[SPARK-24131] - Add majorMinorVersion API to PySpark for determining Spark versions
[SPARK-24136] - MemoryStreamDataReader.next should skip sleeping if record is available
[SPARK-24149] - Automatic namespaces discovery in HDFS federation
[SPARK-24156] - Enable no-data micro batches for more eager streaming state clean up
[SPARK-24160] - ShuffleBlockFetcherIterator should fail if it receives zero-size blocks
[SPARK-24161] - Enable debug package feature on structured streaming
[SPARK-24172] - we should not apply operator pushdown to data source v2 many times
[SPARK-24181] - Better error message for writing sorted data
[SPARK-24182] - Improve error message for client mode when AM fails
[SPARK-24188] - /api/v1/version not working
[SPARK-24204] - Verify a write schema in Json/Orc/ParquetFileFormat
[SPARK-24206] - Improve DataSource benchmark code for read and pushdown
[SPARK-24209] - 0 configuration Knox gateway support in SHS
[SPARK-24215] - Implement eager evaluation for DataFrame APIs
[SPARK-24242] - RangeExec should have correct outputOrdering
[SPARK-24244] - Parse only required columns of CSV file
[SPARK-24246] - Improve AnalysisException by setting the cause when it's available
[SPARK-24248] - [K8S] Use the Kubernetes cluster as the backing store for the state of pods
[SPARK-24250] - support accessing SQLConf inside tasks
[SPARK-24262] - Fix typo in UDF error message
[SPARK-24268] - DataType in error messages are not coherent
[SPARK-24275] - Revise doc comments in InputPartition
[SPARK-24277] - Code clean up in SQL module: HadoopMapReduceCommitProtocol/FileFormatWriter
[SPARK-24303] - Update cloudpickle to v0.4.4
[SPARK-24305] - Avoid serialization of private fields in new collection expressions
[SPARK-24308] - Handle DataReaderFactory to InputPartition renames in left over classes
[SPARK-24312] - Upgrade to 2.3.3 for Hive Metastore Client 2.3
[SPARK-24321] - Extract common code from Divide/Remainder to a base trait
[SPARK-24326] - Add local:// scheme support for the app jar in mesos cluster mode
[SPARK-24327] - Verify and normalize a partition column name based on the JDBC resolved schema
[SPARK-24329] - Remove comments filtering before parsing of CSV files
[SPARK-24330] - Refactor ExecuteWriteTask in FileFormatWriter with DataWriter(V2)
[SPARK-24332] - Fix places reading 'spark.network.timeout' as milliseconds
[SPARK-24337] - Improve the error message for invalid SQL conf value
[SPARK-24339] - spark sql can not prune column in transform/map/reduce query
[SPARK-24356] - Duplicate strings in File.path managed by FileSegmentManagedBuffer
[SPARK-24361] - Polish code block manipulation API
[SPARK-24365] - Add data source write benchmark
[SPARK-24366] - Improve error message for Catalyst type converters
[SPARK-24367] - Parquet: use JOB_SUMMARY_LEVEL instead of deprecated flag ENABLE_JOB_SUMMARY
[SPARK-24381] - Improve Unit Test Coverage of NOT IN subqueries
[SPARK-24408] - Move abs function to math_funcs group
[SPARK-24423] - Add a new option `query` for JDBC sources
[SPARK-24424] - Support ANSI-SQL compliant syntax for GROUPING SET
[SPARK-24428] - Remove unused code and fix any related doc in K8s module
[SPARK-24441] - Expose total estimated size of states in HDFSBackedStateStoreProvider
[SPARK-24454] - ml.image doesn't have __all__ explicitly defined
[SPARK-24455] - fix typo in TaskSchedulerImpl's comments
[SPARK-24470] - RestSubmissionClient to be robust against 404 & non json responses
[SPARK-24477] - Import submodules under pyspark.ml by default
[SPARK-24485] - Measure and log elapsed time for filesystem operations in HDFSBackedStateStoreProvider
[SPARK-24490] - Use WebUI.addStaticHandler in web UIs
[SPARK-24505] - Convert strings in codegen to blocks: Cast and BoundAttribute
[SPARK-24518] - Using Hadoop credential provider API to store password
[SPARK-24519] - MapStatus has 2000 hardcoded
[SPARK-24525] - Provide an option to limit MemorySink memory usage
[SPARK-24534] - Add a way to bypass entrypoint.sh script if no spark cmd is passed
[SPARK-24543] - Support any DataType as DDL string for from_json's schema
[SPARK-24547] - Spark on K8s docker-image-tool.sh improvements
[SPARK-24551] - Add Integration tests for Secrets
[SPARK-24555] - logNumExamples in KMeans/BiKM/GMM/AFT/NB
[SPARK-24557] - ClusteringEvaluator support array input
[SPARK-24558] - Driver prints the wrong info in the log when the executor which holds cacheBlock is IDLE.Time-out value displayed is not as per configuration value.
[SPARK-24565] - Add API for in Structured Streaming for exposing output rows of each microbatch as a DataFrame
[SPARK-24566] - Fix spark.storage.blockManagerSlaveTimeoutMs default config
[SPARK-24571] - Support literals with values of the Char type
[SPARK-24574] - improve array_contains function of the sql component to deal with Column type
[SPARK-24575] - Prohibit window expressions inside WHERE and HAVING clauses
[SPARK-24576] - Upgrade Apache ORC to 1.5.2
[SPARK-24596] - Non-cascading Cache Invalidation
[SPARK-24605] - size(null) should return null
[SPARK-24609] - PySpark/SparkR doc doesn't explain RandomForestClassifier.featureSubsetStrategy well
[SPARK-24614] - PySpark - Fix SyntaxWarning on tests.py
[SPARK-24626] - Parallelize size calculation in Analyze Table command
[SPARK-24635] - Remove Blocks class
[SPARK-24636] - Type Coercion of Arrays for array_join Function
[SPARK-24637] - Add metrics regarding state and watermark to dropwizard metrics
[SPARK-24646] - Support wildcard '*' for to spark.yarn.dist.forceDownloadSchemes
[SPARK-24658] - Remove workaround for ANTLR bug
[SPARK-24665] - Add SQLConf in PySpark to manage all sql configs
[SPARK-24673] - scala sql function from_utc_timestamp second argument could be Column instead of String
[SPARK-24675] - Rename table: validate existence of new location
[SPARK-24678] - We should use 'PROCESS_LOCAL' first for Spark-Streaming
[SPARK-24683] - SparkLauncher.NO_RESOURCE doesn't work with Java applications
[SPARK-24685] - Adjust release scripts to build all versions for older releases
[SPARK-24688] - Clarify comments about LabeledPoint as (label, feature) pair rather than (feature, label)
[SPARK-24691] - Add new API `supportDataType` in FileFormat
[SPARK-24692] - Improvement FilterPushdownBenchmark
[SPARK-24696] - ColumnPruning rule fails to remove extra Project
[SPARK-24697] - Fix the reported start offsets in streaming query progress
[SPARK-24709] - Inferring schema from JSON string literal
[SPARK-24722] - Column-based API for pivoting
[SPARK-24727] - The cache 100 in CodeGenerator is too small for streaming
[SPARK-24732] - Type coercion between MapTypes.
[SPARK-24737] - Type coercion between StructTypes.
[SPARK-24747] - Make spark.ml.util.Instrumentation class more flexible
[SPARK-24757] - Improve error message for broadcast timeouts
[SPARK-24759] - No reordering keys for broadcast hash join
[SPARK-24761] - Check modifiability of config parameters
[SPARK-24763] - Remove redundant key data from value in streaming aggregation
[SPARK-24782] - Simplify conf access in expressions
[SPARK-24785] - Making sure REPL prints Spark UI info and then Welcome message
[SPARK-24790] - Allow complex aggregate expressions in Pivot
[SPARK-24801] - Empty byte[] arrays in spark.network.sasl.SaslEncryption$EncryptedMessage can waste a lot of memory
[SPARK-24807] - Adding files/jars twice: output a warning and add a note
[SPARK-24849] - Convert StructType to DDL string
[SPARK-24858] - Avoid unnecessary parquet footer reads
[SPARK-24860] - Expose dynamic partition overwrite per write operation
[SPARK-24865] - Remove AnalysisBarrier
[SPARK-24868] - add sequence function in Python
[SPARK-24871] - Refactor Concat and MapConcat to avoid creating concatenator object for each row.
[SPARK-24890] - Short circuiting the `if` condition when `trueValue` and `falseValue` are the same
[SPARK-24893] - Remove the entire CaseWhen if all the outputs are semantic equivalence
[SPARK-24926] - Ensure numCores is used consistently in all netty configuration (driver and executors)
[SPARK-24929] - Merge script swallow KeyboardInterrupt
[SPARK-24940] - Coalesce and Repartition Hint for SQL Queries
[SPARK-24943] - Convert a SQL Struct to StructType
[SPARK-24945] - Switch to uniVocity >= 2.7.2
[SPARK-24951] - Table valued functions should throw AnalysisException instead of IllegalArgumentException
[SPARK-24952] - Support LZMA2 compression by Avro datasource
[SPARK-24954] - Fail fast on job submit if run a barrier stage with dynamic resource allocation enabled
[SPARK-24956] - Upgrade maven from 3.3.9 to 3.5.4
[SPARK-24960] - k8s: explicitly expose ports on driver container
[SPARK-24962] - refactor CodeGenerator.createUnsafeArray
[SPARK-24978] - Add spark.sql.fast.hash.aggregate.row.max.capacity to configure the capacity of fast aggregation.
[SPARK-24979] - add AnalysisHelper#resolveOperatorsUp
[SPARK-24982] - UDAF resolution should not throw java.lang.AssertionError
[SPARK-24992] - spark should randomize yarn local dir selection
[SPARK-24993] - Make Avro fast again
[SPARK-24996] - Use DSL to simplify DeclarativeAggregate
[SPARK-24999] - Reduce unnecessary 'new' memory operations
[SPARK-25001] - Fix build miscellaneous warnings
[SPARK-25018] - Use `Co-Authored-By` git trailer in `merge_spark_pr.py`
[SPARK-25025] - Remove the default value of isAll in INTERSECT/EXCEPT
[SPARK-25043] - spark-sql should print the appId and master on startup
[SPARK-25045] - Make `RDDBarrier.mapParititions` similar to `RDD.mapPartitions`
[SPARK-25069] - Using UnsafeAlignedOffset to make the entire record of 8 byte Items aligned like which is used in UnsafeExternalSorter
[SPARK-25073] - Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always reports an error request to adjust yarn.scheduler.maximum-allocation-mb
[SPARK-25077] - Delete unused variable in WindowExec
[SPARK-25088] - Rest Server default & doc updates
[SPARK-25093] - CodeFormatter could avoid creating regex object again and again
[SPARK-25105] - Importing all of pyspark.sql.functions should bring PandasUDFType in as well
[SPARK-25108] - Dataset.show() generates incorrect padding for Unicode Character
[SPARK-25111] - increment kinesis client/producer lib versions & aws-sdk to match
[SPARK-25113] - Add logging to CodeGenerator when any generated method's bytecode size goes above HugeMethodLimit
[SPARK-25115] - Eliminate extra memory copy done when a ByteBuf is used that is backed by > 1 ByteBuffer.
[SPARK-25117] - Add EXEPT ALL and INTERSECT ALL support in R.
[SPARK-25122] - Deduplication of supports equals code
[SPARK-25140] - Add optional logging to UnsafeProjection.create when it falls back to interpreted mode
[SPARK-25142] - Add error messages when Python worker could not open socket in `_load_from_socket`.
[SPARK-25170] - Add Task Metrics description to the documentation
[SPARK-25178] - Directly ship the StructType objects of the keySchema / valueSchema for xxxHashMapGenerator
[SPARK-25208] - Loosen Cast.forceNullable for DecimalType.
[SPARK-25209] - Optimization in Dataset.apply for DataFrames
[SPARK-25212] - Support Filter in ConvertToLocalRelation
[SPARK-25228] - Add executor CPU Time metric
[SPARK-25233] - Give the user the option of specifying a fixed minimum message per partition per batch when using kafka direct API with backpressure
[SPARK-25235] - Merge the REPL code in Scala 2.11 and 2.12 branches
[SPARK-25241] - Configurable empty values when reading/writing CSV files
[SPARK-25252] - Support arrays of any types in to_json
[SPARK-25253] - Refactor pyspark connection & authentication
[SPARK-25260] - Fix namespace handling in SchemaConverters.toAvroType
[SPARK-25261] - Standardize the default units of spark.driver|executor.memory
[SPARK-25275] - require memberhip in wheel to run 'su' (in dockerfiles)
[SPARK-25286] - Remove dangerous parmap
[SPARK-25287] - Check for JIRA_USERNAME and JIRA_PASSWORD up front in merge_spark_pr.py
[SPARK-25300] - Unified the configuration parameter `spark.shuffle.service.enabled`
[SPARK-25318] - Add exception handling when wrapping the input stream during the the fetch or stage retry in response to a corrupted block
[SPARK-25335] - Skip Zinc downloading if it's installed in the system
[SPARK-25375] - Reenable qualified perm. function checks in UDFSuite
[SPARK-25384] - Clarify fromJsonForceNullableSchema will be removed in Spark 3.0
[SPARK-25400] - Increase timeouts in schedulerIntegrationSuite
[SPARK-25445] - publish a scala 2.12 build with Spark 2.4
[SPARK-25469] - Eval methods of Concat, Reverse and ElementAt should use pattern matching only once
[SPARK-25639] - Add documentation on foreachBatch, and multiple watermark policy
[SPARK-25754] - Change CDN for MathJax
[SPARK-25859] - add scala/java/python example and doc for PrefixSpan

Test

[SPARK-16139] - Audit tests for leaked threads
[SPARK-22882] - ML test for StructuredStreaming: spark.ml.classification
[SPARK-22883] - ML test for StructuredStreaming: spark.ml.feature, A-M
[SPARK-22884] - ML test for StructuredStreaming: spark.ml.clustering
[SPARK-22885] - ML test for StructuredStreaming: spark.ml.tuning
[SPARK-22886] - ML test for StructuredStreaming: spark.ml.recommendation
[SPARK-22915] - ML test for StructuredStreaming: spark.ml.feature, N-Z
[SPARK-23169] - Run lintr on the changes of lint-r script and .lintr configuration
[SPARK-23392] - Add some test case for images feature
[SPARK-23849] - Tests for the samplingRatio option of json schema inferring
[SPARK-23881] - Flaky test: JobCancellationSuite."interruptible iterator of shuffle reader"
[SPARK-24044] - Explicitly print out skipped tests from unittest module
[SPARK-24502] - flaky test: UnsafeRowSerializerSuite
[SPARK-24521] - Fix ineffective test in CachedTableSuite
[SPARK-24562] - Allow running same tests with multiple configs in SQLQueryTestSuite
[SPARK-24564] - Add test suite for RecordBinaryComparator
[SPARK-24740] - PySpark tests do not pass with NumPy 0.14.x+
[SPARK-24840] - do not use dummy filter to switch codegen on/off
[SPARK-24861] - create corrected temp directories in RateSourceSuite
[SPARK-24886] - Increase Jenkins build time
[SPARK-25141] - Modify tests for higher-order functions to check bind method.
[SPARK-25184] - Flaky test: FlatMapGroupsWithState "streaming with processing time timeout"
[SPARK-25238] - Lint-Python: Upgrading to the current version of pycodestyle fails
[SPARK-25249] - Add a unit test for OpenHashMap
[SPARK-25267] - Disable ConvertToLocalRelation in the test cases of sql/core and sql/hive
[SPARK-25290] - BytesToBytesMapOnHeapSuite randomizedStressTest can cause OutOfMemoryError
[SPARK-25296] - Create ExplainSuite
[SPARK-25422] - flaky test: org.apache.spark.DistributedSuite.caching on disk, replicated (encryption = on) (with replication as stream)
[SPARK-25453] - OracleIntegrationSuite IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]
[SPARK-25456] - PythonForeachWriterSuite failing
[SPARK-25673] - Remove Travis CI which enables Java lint check
[SPARK-25736] - add tests to verify the behavior of multi-column count
[SPARK-25805] - Flaky test: DataFrameSuite.SPARK-25159 unittest failure

Wish

[SPARK-23131] - Kryo raises StackOverflow during serializing GLR model
[SPARK-25258] - Upgrade kryo package to version 4.0.2

Task

[SPARK-20220] - Add thrift scheduling pool config in scheduling docs
[SPARK-23092] - Migrate MemoryStream to DataSource V2
[SPARK-23451] - Deprecate KMeans computeCost
[SPARK-23501] - Refactor AllStagesPage in order to avoid redundant code
[SPARK-23533] - Add support for changing ContinuousDataReader's startOffset
[SPARK-23601] - Remove .md5 files from release
[SPARK-24392] - Mark pandas_udf as Experimental
[SPARK-24533] - typesafe has rebranded to lightbend. change the build/mvn endpoint from downloads.typesafe.com to downloads.lightbend.com
[SPARK-24654] - Update, fix LICENSE and NOTICE, and specialize for source vs binary
[SPARK-25063] - Rename class KnowNotNull to KnownNotNull
[SPARK-25095] - Python support for BarrierTaskContext
[SPARK-25213] - DataSourceV2 doesn't seem to produce unsafe rows
[SPARK-25336] - Revert SPARK-24863 and SPARK-24748
[SPARK-25836] - (Temporarily) disable automatic build/test of kubernetes-integration-tests

Dependency upgrade

[SPARK-20395] - Update Scala to 2.11.11 and zinc to 0.3.15
[SPARK-23509] - Upgrade commons-net from 2.2 to 3.1

Request

[SPARK-21607] - Can dropTempView function add a param like dropTempView(viewName: String, dropSelfOnly: Boolean)

Umbrella

[SPARK-6235] - Address various 2G limits
[SPARK-14220] - Build and test Spark against Scala 2.12
[SPARK-23899] - Built-in SQL Function Improvement
[SPARK-24090] - Kubernetes Backend Hotlist for Spark 2.4
[SPARK-25319] - Spark MLlib, GraphX 2.4 QA umbrella
[SPARK-25419] - Parquet predicate pushdown improvement

Documentation

[SPARK-21261] - SparkSQL regexpExpressions example
[SPARK-23231] - Add doc for string indexer ordering to user guide (also to RFormula guide)
[SPARK-23254] - Add user guide entry for DataFrame multivariate summary
[SPARK-23256] - Add columnSchema method to PySpark image reader
[SPARK-23329] - Update the function descriptions with the arguments and returned values of the trigonometric functions
[SPARK-23566] - Arguement name fix
[SPARK-23642] - isZero scaladoc for LongAccumulator describes wrong method
[SPARK-23792] - Documentation improvements for datetime functions
[SPARK-24134] - A missing full-stop in doc "Tuning Spark"
[SPARK-24171] - Update comments for non-deterministic functions
[SPARK-24191] - Scala example code for Power Iteration Clustering in Spark ML examples
[SPARK-24224] - Java example code for Power Iteration Clustering in spark.ml
[SPARK-24378] - Incorrect examples for date_trunc function in spark 2.3.0
[SPARK-24444] - Improve pandas_udf GROUPED_MAP docs to explain column assignment
[SPARK-24507] - Description in "Level of Parallelism in Data Receiving" section of Spark Streaming Programming Guide in is not relevan for the recent Kafka direct apprach
[SPARK-24628] - Typos of the example code in docs/mllib-data-types.md
[SPARK-25082] - Documentation for Spark Function expm1 is incomplete
[SPARK-25273] - How to install testthat v1.0.2
[SPARK-25583] - Add newly added History server related configurations in the documentation
[SPARK-25656] - Add an example section about how to use Parquet/ORC library options

Edit/Copy Release Notes

The text area below allows the project release notes to be edited and copied to another document.

Release Notes - Spark - Version 2.4.0
    
<h2>        Sub-task
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-6236'>SPARK-6236</a>] -         Support caching blocks larger than 2G
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-6237'>SPARK-6237</a>] -         Support uploading blocks &gt; 2GB as a stream
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-10884'>SPARK-10884</a>] -         Support prediction on single instance for regression and classification related models
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-11239'>SPARK-11239</a>] -         PMML export for ML linear regression
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-12850'>SPARK-12850</a>] -         Support bucket pruning (predicate pushdown for bucketed tables)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14376'>SPARK-14376</a>] -         spark.ml parity for trees
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14540'>SPARK-14540</a>] -         Support Scala 2.12 closures and Java 8 lambdas in ClosureCleaner
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-17091'>SPARK-17091</a>] -         Convert IN predicate to equivalent Parquet filter
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19826'>SPARK-19826</a>] -         spark.ml Python API for PIC
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20114'>SPARK-20114</a>] -         spark.ml parity for sequential pattern mining - PrefixSpan
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21088'>SPARK-21088</a>] -         CrossValidator, TrainValidationSplit should collect all models when fitting: Python API
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21898'>SPARK-21898</a>] -         Feature parity for KolmogorovSmirnovTest in MLlib
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22187'>SPARK-22187</a>] -         Update unsaferow format for saved state such that we can set timeouts when state is null
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22239'>SPARK-22239</a>] -         User-defined window functions with pandas udf (unbounded window)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22274'>SPARK-22274</a>] -         User-defined aggregation functions with pandas udf
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22362'>SPARK-22362</a>] -         Add unit test for Window Aggregate Functions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22624'>SPARK-22624</a>] -         Expose range partitioning shuffle introduced by SPARK-22614
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23011'>SPARK-23011</a>] -         Support alternative function form with group aggregate pandas UDF
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23030'>SPARK-23030</a>] -         Decrease memory consumption with toPandas() collection using Arrow
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23046'>SPARK-23046</a>] -         Have RFormula include VectorSizeHint in pipeline
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23096'>SPARK-23096</a>] -         Migrate rate source to v2
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23097'>SPARK-23097</a>] -         Migrate text socket source to v2
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23099'>SPARK-23099</a>] -         Migrate foreach sink
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23120'>SPARK-23120</a>] -         Add PMML pipeline export support to PySpark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23203'>SPARK-23203</a>] -         DataSourceV2 should use immutable trees.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23323'>SPARK-23323</a>] -         DataSourceV2 should use the output commit coordinator.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23325'>SPARK-23325</a>] -         DataSourceV2 readers should always produce InternalRow.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23341'>SPARK-23341</a>] -         DataSourceOptions should handle path and table names to avoid confusion.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23344'>SPARK-23344</a>] -         Add KMeans distanceMeasure param to PySpark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23352'>SPARK-23352</a>] -         Explicitly specify supported types in Pandas UDFs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23362'>SPARK-23362</a>] -         Migrate Kafka microbatch source to v2
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23380'>SPARK-23380</a>] -         Adds a conf for Arrow fallback in toPandas/createDataFrame with Pandas DataFrame
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23401'>SPARK-23401</a>] -         Improve test cases for all supported types and unsupported types
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23418'>SPARK-23418</a>] -         DataSourceV2 should not allow userSpecifiedSchema without ReadSupportWithSchema
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23491'>SPARK-23491</a>] -         continuous symptom
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23503'>SPARK-23503</a>] -         continuous execution should sequence committed epochs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23555'>SPARK-23555</a>] -         Add BinaryType support for Arrow in PySpark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23559'>SPARK-23559</a>] -         add epoch ID to data writer factory
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23577'>SPARK-23577</a>] -         Supports line separator for text datasource
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23581'>SPARK-23581</a>] -         Add an interpreted version of GenerateUnsafeProjection
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23582'>SPARK-23582</a>] -         Add interpreted execution to StaticInvoke expression
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23583'>SPARK-23583</a>] -         Add interpreted execution to Invoke expression
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23584'>SPARK-23584</a>] -         Add interpreted execution to NewInstance expression
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23585'>SPARK-23585</a>] -         Add interpreted execution for UnwrapOption expression
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23586'>SPARK-23586</a>] -         Add interpreted execution for WrapOption expression
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23587'>SPARK-23587</a>] -         Add interpreted execution for MapObjects expression
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23588'>SPARK-23588</a>] -         Add interpreted execution for CatalystToExternalMap expression
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23589'>SPARK-23589</a>] -         Add interpreted execution for ExternalMapToCatalyst expression
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23590'>SPARK-23590</a>] -         Add interpreted execution for CreateExternalRow expression
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23591'>SPARK-23591</a>] -         Add interpreted execution for EncodeUsingSerializer expression
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23592'>SPARK-23592</a>] -         Add interpreted execution for DecodeUsingSerializer expression
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23593'>SPARK-23593</a>] -         Add interpreted execution for InitializeJavaBean expression
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23594'>SPARK-23594</a>] -         Add interpreted execution for GetExternalRowField expression
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23595'>SPARK-23595</a>] -         Add interpreted execution for ValidateExternalType expression
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23596'>SPARK-23596</a>] -         Modify Dataset test harness to include interpreted execution
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23597'>SPARK-23597</a>] -         Audit Spark SQL code base for non-interpreted expressions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23611'>SPARK-23611</a>] -         Extend ExpressionEvalHelper harness to also test failures
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23615'>SPARK-23615</a>] -         Add maxDF Parameter to Python CountVectorizer
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23633'>SPARK-23633</a>] -          Update Pandas UDFs section in sql-programming-guide
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23687'>SPARK-23687</a>] -         Add MemoryStream
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23688'>SPARK-23688</a>] -         Refactor tests away from rate source
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23690'>SPARK-23690</a>] -         VectorAssembler should have handleInvalid to handle columns with null values
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23706'>SPARK-23706</a>] -         spark.conf.get(value, default=None) should produce None in PySpark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23711'>SPARK-23711</a>] -         Add fallback to interpreted execution logic
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23713'>SPARK-23713</a>] -         Clean-up UnsafeWriter classes
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23723'>SPARK-23723</a>] -         New encoding option for json datasource
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23724'>SPARK-23724</a>] -         Custom record separator for jsons in charsets different from UTF-8
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23727'>SPARK-23727</a>] -         Support DATE predict push down in parquet
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23736'>SPARK-23736</a>] -         High-order function: concat(array1, array2, ..., arrayN) → array
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23747'>SPARK-23747</a>] -         Add EpochCoordinator unit tests
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23748'>SPARK-23748</a>] -         Support select from temp tables
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23762'>SPARK-23762</a>] -         UTF8StringBuilder uses MemoryBlock
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23765'>SPARK-23765</a>] -         Supports line separator for json datasource
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23783'>SPARK-23783</a>] -         Add new generic export trait for ML pipelines
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23807'>SPARK-23807</a>] -         Add Hadoop 3 profile with relevant POM fix ups
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23821'>SPARK-23821</a>] -         High-order function: flatten(x) → array
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23826'>SPARK-23826</a>] -         TestHiveSparkSession should set default session
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23847'>SPARK-23847</a>] -         Add asc_nulls_first, asc_nulls_last to PySpark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23859'>SPARK-23859</a>] -         Initial PR for Instrumentation improvements: UUID and logging levels
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23864'>SPARK-23864</a>] -         Add Unsafe* copy methods to UnsafeWriter
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23870'>SPARK-23870</a>] -          Forward RFormula handleInvalid Param to VectorAssembler
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23871'>SPARK-23871</a>] -         add python api for VectorAssembler handleInvalid
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23900'>SPARK-23900</a>] -         format_number udf should take user specifed format as argument
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23902'>SPARK-23902</a>] -         Provide an option in months_between UDF to disable rounding-off
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23903'>SPARK-23903</a>] -         Add support for date extract
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23905'>SPARK-23905</a>] -         Add UDF weekday
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23908'>SPARK-23908</a>] -         High-order function: transform(array&lt;T&gt;, function&lt;T, U&gt;) → array&lt;U&gt;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23909'>SPARK-23909</a>] -         High-order function: filter(array&lt;T&gt;, function&lt;T, boolean&gt;) → array&lt;T&gt;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23911'>SPARK-23911</a>] -         High-order function: aggregate(array&lt;T&gt;, initialState S, inputFunction&lt;S, T, S&gt;, outputFunction&lt;S, R&gt;) → R
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23912'>SPARK-23912</a>] -         High-order function: array_distinct(x) → array
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23913'>SPARK-23913</a>] -         High-order function: array_intersect(x, y) → array
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23914'>SPARK-23914</a>] -         High-order function: array_union(x, y) → array
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23915'>SPARK-23915</a>] -         High-order function: array_except(x, y) → array
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23916'>SPARK-23916</a>] -         High-order function: array_join(x, delimiter, null_replacement) → varchar
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23917'>SPARK-23917</a>] -         High-order function: array_max(x) → x
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23918'>SPARK-23918</a>] -         High-order function: array_min(x) → x
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23919'>SPARK-23919</a>] -         High-order function: array_position(x, element) → bigint
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23920'>SPARK-23920</a>] -         High-order function: array_remove(x, element) → array
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23921'>SPARK-23921</a>] -         High-order function: array_sort(x) → array
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23922'>SPARK-23922</a>] -         High-order function: arrays_overlap(x, y) → boolean
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23923'>SPARK-23923</a>] -         High-order function: cardinality(x) → bigint
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23924'>SPARK-23924</a>] -         High-order function: element_at
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23925'>SPARK-23925</a>] -         High-order function: repeat(element, count) → array
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23926'>SPARK-23926</a>] -         High-order function: reverse(x) → array
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23927'>SPARK-23927</a>] -         High-order function: sequence
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23928'>SPARK-23928</a>] -         High-order function: shuffle(x) → array
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23930'>SPARK-23930</a>] -         High-order function: slice(x, start, length) → array
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23931'>SPARK-23931</a>] -         High-order function: array_zip(array1, array2[, ...]) → array&lt;row&gt;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23932'>SPARK-23932</a>] -         High-order function: zip_with(array&lt;T&gt;, array&lt;U&gt;, function&lt;T, U, R&gt;) → array&lt;R&gt;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23933'>SPARK-23933</a>] -         High-order function: map(array&lt;K&gt;, array&lt;V&gt;) → map&lt;K,V&gt;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23934'>SPARK-23934</a>] -         High-order function: map_from_entries(array&lt;row&lt;K, V&gt;&gt;) → map&lt;K,V&gt;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23936'>SPARK-23936</a>] -         High-order function: map_concat(map1&lt;K, V&gt;, map2&lt;K, V&gt;, ..., mapN&lt;K, V&gt;) → map&lt;K,V&gt;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23942'>SPARK-23942</a>] -         PySpark&#39;s collect doesn&#39;t trigger QueryExecutionListener
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23990'>SPARK-23990</a>] -         Instruments logging improvements - ML regression package
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24026'>SPARK-24026</a>] -         spark.ml Scala/Java API for PIC
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24038'>SPARK-24038</a>] -         refactor continuous write exec to its own class
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24039'>SPARK-24039</a>] -         remove restarting iterators hack
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24040'>SPARK-24040</a>] -         support single partition aggregates
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24054'>SPARK-24054</a>] -         Add array_position function /  element_at functions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24069'>SPARK-24069</a>] -         Add array_max / array_min functions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24070'>SPARK-24070</a>] -         TPC-DS Performance Tests for Parquet 1.10.0 Upgrade
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24071'>SPARK-24071</a>] -         Micro-benchmark of Parquet Filter Pushdown
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24073'>SPARK-24073</a>] -         DataSourceV2: Rename DataReaderFactory to InputPartition.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24115'>SPARK-24115</a>] -         improve instrumentation for spark.ml.tuning
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24119'>SPARK-24119</a>] -         Add interpreted execution to SortPrefix expression
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24132'>SPARK-24132</a>] -         Instrumentation improvement for classification
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24146'>SPARK-24146</a>] -         spark.ml parity for sequential pattern mining - PrefixSpan: Python API
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24155'>SPARK-24155</a>] -         Instrumentation improvement for clustering
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24157'>SPARK-24157</a>] -         Enable no-data micro batches for streaming aggregation and deduplication
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24158'>SPARK-24158</a>] -         Enable no-data micro batches for streaming joins
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24159'>SPARK-24159</a>] -         Enable no-data micro batches for streaming mapGroupswithState
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24185'>SPARK-24185</a>] -         add  flatten function
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24186'>SPARK-24186</a>] -         add array_reverse and concat 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24187'>SPARK-24187</a>] -         add array_join
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24197'>SPARK-24197</a>] -         add array_sort function
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24198'>SPARK-24198</a>] -         add slice function
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24234'>SPARK-24234</a>] -         create the bottom-of-task RDD with row buffer
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24235'>SPARK-24235</a>] -         create the top-of-task RDD sending rows to the remote buffer
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24251'>SPARK-24251</a>] -         DataSourceV2: Add AppendData logical operation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24290'>SPARK-24290</a>] -         Instrumentation Improvement: add logNamedValue taking Array types
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24296'>SPARK-24296</a>] -         Support replicating blocks larger than 2 GB
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24297'>SPARK-24297</a>] -         Change default value for spark.maxRemoteBlockSizeFetchToMem to be &lt; 2GB
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24307'>SPARK-24307</a>] -         Support sending messages over 2GB from memory
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24310'>SPARK-24310</a>] -         Instrumentation for frequent pattern mining
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24324'>SPARK-24324</a>] -         Pandas Grouped Map UserDefinedFunction mixes column labels
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24325'>SPARK-24325</a>] -         Tests for Hadoop&#39;s LinesReader
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24331'>SPARK-24331</a>] -         Add arrays_overlap / array_repeat / map_entries  
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24334'>SPARK-24334</a>] -         Race condition in ArrowPythonRunner causes unclean shutdown of Arrow memory allocator
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24386'>SPARK-24386</a>] -         implement continuous processing coalesce(1)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24418'>SPARK-24418</a>] -         Upgrade to Scala 2.11.12
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24419'>SPARK-24419</a>] -         Upgrade SBT to 0.13.17 with Scala 2.10.7
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24420'>SPARK-24420</a>] -         Upgrade ASM to 6.x to support JDK9+
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24439'>SPARK-24439</a>] -         Add distanceMeasure to BisectingKMeans in PySpark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24478'>SPARK-24478</a>] -         DataSourceV2 should push filters and projection at physical plan conversion
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24535'>SPARK-24535</a>] -         Fix java version parsing in SparkR on Windows
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24537'>SPARK-24537</a>] -         Add array_remove / array_zip / map_from_arrays / array_distinct
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24549'>SPARK-24549</a>] -         Support DecimalType push down to the parquet data sources
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24624'>SPARK-24624</a>] -         Can not mix vectorized and non-vectorized UDFs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24638'>SPARK-24638</a>] -         StringStartsWith support push down
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24706'>SPARK-24706</a>] -         Support ByteType and ShortType pushdown to parquet
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24716'>SPARK-24716</a>] -         Refactor ParquetFilters
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24718'>SPARK-24718</a>] -         Timestamp support pushdown to parquet data source
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24771'>SPARK-24771</a>] -         Upgrade AVRO version from 1.7.7 to 1.8.2
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24772'>SPARK-24772</a>] -         support reading AVRO logical types - Date
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24773'>SPARK-24773</a>] -         support reading AVRO logical types - Timestamp with different precisions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24774'>SPARK-24774</a>] -         support reading AVRO logical types - Decimal
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24776'>SPARK-24776</a>] -         AVRO unit test: use SQLTestUtils and Replace deprecated methods
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24777'>SPARK-24777</a>] -         Add write benchmark for AVRO
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24800'>SPARK-24800</a>] -         Refactor Avro Serializer and Deserializer
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24805'>SPARK-24805</a>] -         Don&#39;t ignore files without .avro extension by default
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24810'>SPARK-24810</a>] -         Fix paths to resource files in AvroSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24811'>SPARK-24811</a>] -         Add function `from_avro` and `to_avro`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24836'>SPARK-24836</a>] -         New option - ignoreExtension
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24854'>SPARK-24854</a>] -         Gather all options into AvroOptions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24876'>SPARK-24876</a>] -         Simplify schema serialization
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24881'>SPARK-24881</a>] -         New options - compression and compressionLevel
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24883'>SPARK-24883</a>] -         Remove implicit class AvroDataFrameWriter/AvroDataFrameReader
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24887'>SPARK-24887</a>] -         Use SerializableConfiguration in Spark util
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24924'>SPARK-24924</a>] -         Add mapping for built-in Avro data source
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24967'>SPARK-24967</a>] -         Use internal.Logging instead for logging
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24971'>SPARK-24971</a>] -         remove SupportsDeprecatedScanRow
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24976'>SPARK-24976</a>] -         Allow None for Decimal type conversion (specific to PyArrow 0.9.0)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24990'>SPARK-24990</a>] -         merge ReadSupport and ReadSupportWithSchema
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24991'>SPARK-24991</a>] -         use InternalRow in DataSourceWriter
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25002'>SPARK-25002</a>] -         Avro: revise the output record namespace
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25007'>SPARK-25007</a>] -         Add array_intersect / array_except /array_union / array_shuffle to SparkR
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25029'>SPARK-25029</a>] -         Scala 2.12 issues: TaskNotSerializable and Janino &quot;Two non-abstract methods ...&quot; errors
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25044'>SPARK-25044</a>] -         Address translation of LMF closure primitive args to Object in Scala 2.12
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25047'>SPARK-25047</a>] -         Can&#39;t assign SerializedLambda to scala.Function1 in deserialization of BucketedRandomProjectionLSHModel
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25068'>SPARK-25068</a>] -         High-order function: exists(array&lt;T&gt;, function&lt;T, boolean&gt;) → boolean
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25099'>SPARK-25099</a>] -         Generate Avro Binary files in test suite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25104'>SPARK-25104</a>] -         Validate user specified output schema
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25127'>SPARK-25127</a>] -         DataSourceV2: Remove SupportsPushDownCatalystFilters
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25133'>SPARK-25133</a>] -         Documentaion: AVRO data source guide
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25160'>SPARK-25160</a>] -         Remove sql configuration spark.sql.avro.outputTimestampType
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25179'>SPARK-25179</a>] -         Document the features that require Pyarrow 0.10
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25207'>SPARK-25207</a>] -         Case-insensitve field resolution for filter pushdown when reading Parquet
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25256'>SPARK-25256</a>] -         Plan mismatch errors in Hive tests in 2.12
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25298'>SPARK-25298</a>] -         spark-tools build failure for Scala 2.12
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25304'>SPARK-25304</a>] -         enable HiveSparkSubmitSuite SPARK-8489 test for Scala 2.12
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25320'>SPARK-25320</a>] -         ML, Graph 2.4 QA: API: Binary incompatible changes
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25321'>SPARK-25321</a>] -         ML, Graph 2.4 QA: API: New Scala APIs, docs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25324'>SPARK-25324</a>] -         ML 2.4 QA: API: Java compatibility, docs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25328'>SPARK-25328</a>] -         Add an example for having two columns as the grouping key in group aggregate pandas UDF
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25337'>SPARK-25337</a>] -         HiveExternalCatalogVersionsSuite + Scala 2.12 = NoSuchMethodError: org.apache.spark.sql.execution.datasources.FileFormat.$init$(Lorg/apache/spark/sql/execution/datasources/FileFormat;)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25460'>SPARK-25460</a>] -         DataSourceV2: Structured Streaming does not respect SessionConfigSupport
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25572'>SPARK-25572</a>] -         SparkR tests failed on CRAN on Java 10
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25601'>SPARK-25601</a>] -         Register Grouped aggregate UDF Vectorized UDFs for SQL Statement
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25690'>SPARK-25690</a>] -         Analyzer rule &quot;HandleNullInputsForUDF&quot; does not stabilize and can be applied infinitely
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25718'>SPARK-25718</a>] -         Detect recursive reference in Avro schema and throw exception
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25842'>SPARK-25842</a>] -         Deprecate APIs introduced in SPARK-21608
</li>
</ul>
            
<h2>        Bug
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-6951'>SPARK-6951</a>] -         History server slow startup if the event log directory is large
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-10878'>SPARK-10878</a>] -         Race condition when resolving Maven coordinates via Ivy
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-15125'>SPARK-15125</a>] -         CSV data source recognizes empty quoted strings in the input as null. 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-15750'>SPARK-15750</a>] -         Constructing FPGrowth fails when no numPartitions specified in pyspark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-16451'>SPARK-16451</a>] -         Spark-shell / pyspark should finish gracefully when &quot;SaslException: GSS initiate failed&quot; is hit
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-17088'>SPARK-17088</a>] -         IsolatedClientLoader fails to load Hive client when sharesHadoopClasses is false
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-17147'>SPARK-17147</a>] -         Spark Streaming Kafka 0.10 Consumer Can&#39;t Handle Non-consecutive Offsets (i.e. Log Compaction)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-17166'>SPARK-17166</a>] -         CTAS lost table properties after conversion to data source tables.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-17756'>SPARK-17756</a>] -         java.lang.ClassCastException when using cartesian with DStream.transform
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-17916'>SPARK-17916</a>] -         CSV data source treats empty string as null no matter what nullValue option is
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-18371'>SPARK-18371</a>] -         Spark Streaming backpressure bug - generates a batch with large number of records
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-18630'>SPARK-18630</a>] -         PySpark ML memory leak
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19181'>SPARK-19181</a>] -         SparkListenerSuite.local metrics fails when average executorDeserializeTime is too short.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19185'>SPARK-19185</a>] -         ConcurrentModificationExceptions with CachedKafkaConsumers when Windowing
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19613'>SPARK-19613</a>] -         Flaky test: StateStoreRDDSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20947'>SPARK-20947</a>] -         Encoding/decoding issue in PySpark pipe implementation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21168'>SPARK-21168</a>] -         KafkaRDD should always set kafka clientId.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21402'>SPARK-21402</a>] -         Fix java array of structs deserialization
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21479'>SPARK-21479</a>] -         Outer join filter pushdown in null supplying table when condition is on one of the joined columns
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21525'>SPARK-21525</a>] -         ReceiverSupervisorImpl seems to ignore the error code when writing to the WAL
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21673'>SPARK-21673</a>] -         Spark local directory is not set correctly
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21685'>SPARK-21685</a>] -         Params isSet in scala Transformer triggered by _setDefault in pyspark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21743'>SPARK-21743</a>] -         top-most limit should not cause memory leak
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21811'>SPARK-21811</a>] -         Inconsistency when finding the widest common type of a combination of DateType, StringType, and NumericType
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21896'>SPARK-21896</a>] -         Stack Overflow when window function nested inside aggregate function
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21945'>SPARK-21945</a>] -         pyspark --py-files doesn&#39;t work in yarn client mode
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22151'>SPARK-22151</a>] -         PYTHONPATH not picked up from the spark.yarn.appMasterEnv properly
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22279'>SPARK-22279</a>] -         Turn on spark.sql.hive.convertMetastoreOrc by default
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22297'>SPARK-22297</a>] -         Flaky test: BlockManagerSuite &quot;Shuffle registration timeout and maxAttempts conf&quot;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22357'>SPARK-22357</a>] -         SparkContext.binaryFiles ignore minPartitions parameter
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22371'>SPARK-22371</a>] -         dag-scheduler-event-loop thread stopped with error  Attempted to access garbage collected accumulator 5605982
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22384'>SPARK-22384</a>] -         Refine partition pruning when attribute is wrapped in Cast
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22430'>SPARK-22430</a>] -         Unknown tag warnings when building R docs with Roxygen 6.0.1
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22577'>SPARK-22577</a>] -         executor page blacklist status should update with TaskSet level blacklisting
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22606'>SPARK-22606</a>] -         There may be two or more tasks in one executor will use the same kafka consumer at the same time, then it will throw an exception: &quot;KafkaConsumer is not safe for multi-threaded access&quot; 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22676'>SPARK-22676</a>] -         Avoid iterating all partition paths when spark.sql.hive.verifyPartitionPath=true
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22713'>SPARK-22713</a>] -         OOM caused by the memory contention and memory leak in TaskMemoryManager
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22809'>SPARK-22809</a>] -         pyspark is sensitive to imports with dots
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22949'>SPARK-22949</a>] -         Reduce memory requirement for TrainValidationSplit
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22968'>SPARK-22968</a>] -         java.lang.IllegalStateException: No current assignment for partition kssh-2
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22974'>SPARK-22974</a>] -         CountVectorModel does not attach attributes to output column
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23004'>SPARK-23004</a>] -         Structured Streaming raise &quot;llegalStateException: Cannot remove after already committed or aborted&quot;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23007'>SPARK-23007</a>] -         Add schema evolution test suite for file-based data sources
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23020'>SPARK-23020</a>] -         Re-enable Flaky Test: org.apache.spark.launcher.SparkLauncherSuite.testInProcessLauncher
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23028'>SPARK-23028</a>] -         Bump master branch version to 2.4.0-SNAPSHOT
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23038'>SPARK-23038</a>] -         Update docker/spark-test (JDK/OS)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23042'>SPARK-23042</a>] -         Use OneHotEncoderModel to encode labels in MultilayerPerceptronClassifier
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23044'>SPARK-23044</a>] -         merge script has bug when assigning jiras to non-contributors
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23059'>SPARK-23059</a>] -         Correct some improper with view related method usage
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23088'>SPARK-23088</a>] -         History server not showing incomplete/running applications
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23094'>SPARK-23094</a>] -         Json Readers choose wrong encoding when bad records are present and fail
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23152'>SPARK-23152</a>] -         Invalid guard condition in org.apache.spark.ml.classification.Classifier
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23173'>SPARK-23173</a>] -         from_json can produce nulls for fields which are marked as non-nullable
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23189'>SPARK-23189</a>] -         reflect stage level blacklisting on executor tab 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23200'>SPARK-23200</a>] -         Reset configuration when restarting from checkpoints
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23240'>SPARK-23240</a>] -         PythonWorkerFactory issues unhelpful message when pyspark.daemon produces bogus stdout
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23243'>SPARK-23243</a>] -         Shuffle+Repartition on an RDD could lead to incorrect answers
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23271'>SPARK-23271</a>] -         Parquet output contains only &quot;_SUCCESS&quot; file after empty DataFrame saving 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23288'>SPARK-23288</a>] -         Incorrect number of written records in structured streaming
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23291'>SPARK-23291</a>] -         SparkR : substr : In SparkR dataframe , starting and ending position arguments in &quot;substr&quot; is giving wrong result  when the position is greater than 1
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23306'>SPARK-23306</a>] -         Race condition in TaskMemoryManager
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23340'>SPARK-23340</a>] -         Upgrade Apache ORC to 1.4.3
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23355'>SPARK-23355</a>] -         convertMetastore should not ignore table properties
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23361'>SPARK-23361</a>] -         Driver restart fails if it happens after 7 days from app submission
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23365'>SPARK-23365</a>] -         DynamicAllocation with failure in straggler task can lead to a hung spark job
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23377'>SPARK-23377</a>] -         Bucketizer with multiple columns persistence bug
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23394'>SPARK-23394</a>] -         Storage info&#39;s Cached Partitions doesn&#39;t consider the replications (but sc.getRDDStorageInfo does)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23405'>SPARK-23405</a>] -         The task will hang up when a small table left semi join a big table
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23406'>SPARK-23406</a>] -         Stream-stream self joins does not work
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23408'>SPARK-23408</a>] -         Flaky test: StreamingOuterJoinSuite.left outer early state exclusion on right
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23415'>SPARK-23415</a>] -         BufferHolderSparkSubmitSuite is flaky
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23416'>SPARK-23416</a>] -         Flaky test: KafkaSourceStressForDontFailOnDataLossSuite.stress test for failOnDataLoss=false
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23417'>SPARK-23417</a>] -         pyspark tests give wrong sbt instructions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23425'>SPARK-23425</a>] -         load data for hdfs file path with wild card usage is not working properly
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23433'>SPARK-23433</a>] -         java.lang.IllegalStateException: more than one active taskSet for stage
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23434'>SPARK-23434</a>] -         Spark should not warn `metadata directory` for a HDFS file path
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23436'>SPARK-23436</a>] -         Incorrect Date column Inference in partition discovery
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23438'>SPARK-23438</a>] -         DStreams could lose blocks with WAL enabled when driver crashes
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23449'>SPARK-23449</a>] -         Extra java options lose order in Docker context
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23457'>SPARK-23457</a>] -         Register task completion listeners first for ParquetFileFormat
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23459'>SPARK-23459</a>] -         Improve the error message when unknown column is specified in partition columns
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23461'>SPARK-23461</a>] -         vignettes should include model predictions for some ML models
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23462'>SPARK-23462</a>] -         Improve the error message in `StructType`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23476'>SPARK-23476</a>] -         Spark will not start in local mode with authentication on
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23486'>SPARK-23486</a>] -         LookupFunctions should not check the same function name more than once
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23489'>SPARK-23489</a>] -         Flaky Test: HiveExternalCatalogVersionsSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23490'>SPARK-23490</a>] -         Check storage.locationUri with existing table in CreateTable 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23496'>SPARK-23496</a>] -         Locality of coalesced partitions can be severely skewed by the order of input partitions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23508'>SPARK-23508</a>] -         blockManagerIdCache in BlockManagerId may cause oom
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23514'>SPARK-23514</a>] -         Replace spark.sparkContext.hadoopConfiguration by spark.sessionState.newHadoopConf()
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23522'>SPARK-23522</a>] -         pyspark should always use sys.exit rather than exit
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23523'>SPARK-23523</a>] -         Incorrect result caused by the rule OptimizeMetadataOnlyQuery
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23524'>SPARK-23524</a>] -         Big local shuffle blocks should not be checked for corruption.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23525'>SPARK-23525</a>] -         ALTER TABLE CHANGE COLUMN COMMENT doesn&#39;t work for external hive table
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23547'>SPARK-23547</a>] -         Cleanup the .pipeout file when the Hive Session closed
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23549'>SPARK-23549</a>] -         Spark SQL unexpected behavior when comparing timestamp to date
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23551'>SPARK-23551</a>] -         Exclude `hadoop-mapreduce-client-core` dependency from `orc-mapreduce`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23569'>SPARK-23569</a>] -         pandas_udf does not work with type-annotated python functions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23570'>SPARK-23570</a>] -         Add Spark-2.3 in HiveExternalCatalogVersionsSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23574'>SPARK-23574</a>] -         SinglePartition in data source V2 scan
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23598'>SPARK-23598</a>] -         WholeStageCodegen can lead to IllegalAccessError  calling append for HashAggregateExec
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23599'>SPARK-23599</a>] -         The UUID() expression is too non-deterministic
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23602'>SPARK-23602</a>] -         PrintToStderr should behave the same in interpreted mode
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23608'>SPARK-23608</a>] -         SHS needs synchronization between attachSparkUI and detachSparkUI functions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23614'>SPARK-23614</a>] -         Union produces incorrect results when caching is used
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23618'>SPARK-23618</a>] -         docker-image-tool.sh Fails While Building Image
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23620'>SPARK-23620</a>] -         Split thread dump lines by using the br tag
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23623'>SPARK-23623</a>] -         Avoid concurrent use of cached KafkaConsumer in CachedKafkaConsumer (kafka-0-10-sql)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23630'>SPARK-23630</a>] -         Spark-on-YARN missing user customizations of hadoop config
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23635'>SPARK-23635</a>] -         Spark executor env variable is overwritten by same name AM env variable
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23636'>SPARK-23636</a>] -         [SPARK 2.2] | Kafka Consumer | KafkaUtils.createRDD throws Exception - java.util.ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23637'>SPARK-23637</a>] -         Yarn might allocate more resource if a same executor is killed multiple times.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23639'>SPARK-23639</a>] -         SparkSQL CLI fails talk to Kerberized metastore when use proxy user
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23640'>SPARK-23640</a>] -         Hadoop config may override spark config
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23649'>SPARK-23649</a>] -         CSV schema inferring fails on some UTF-8 chars
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23658'>SPARK-23658</a>] -         InProcessAppHandle uses the wrong class in getLogger
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23660'>SPARK-23660</a>] -         Yarn throws exception in cluster mode when the application is small
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23663'>SPARK-23663</a>] -         Spark Streaming Kafka 010 , fails with &quot;java.util.ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access&quot;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23666'>SPARK-23666</a>] -         Undeterministic column name with UDFs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23670'>SPARK-23670</a>] -         Memory leak of SparkPlanGraphWrapper in sparkUI
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23671'>SPARK-23671</a>] -         SHS is ignoring number of replay threads
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23679'>SPARK-23679</a>] -         uiWebUrl show inproper URL when running on YARN
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23680'>SPARK-23680</a>] -         entrypoint.sh does not accept arbitrary UIDs, returning as an error
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23682'>SPARK-23682</a>] -         Memory issue with Spark structured streaming
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23697'>SPARK-23697</a>] -         Accumulators of Spark 1.x no longer work with Spark 2.x
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23698'>SPARK-23698</a>] -         Spark code contains numerous undefined names in Python 3
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23729'>SPARK-23729</a>] -         Glob resolution breaks remote naming of files/archives
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23731'>SPARK-23731</a>] -         FileSourceScanExec throws NullPointerException in subexpression elimination
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23732'>SPARK-23732</a>] -         Broken link to scala source code in Spark Scala api Scaladoc
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23743'>SPARK-23743</a>] -         IsolatedClientLoader.isSharedClass returns an unindented result against `slf4j` keyword
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23754'>SPARK-23754</a>] -         StopIterator exception in Python UDF results in partial result
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23759'>SPARK-23759</a>] -         Unable to bind Spark UI to specific host name / IP
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23760'>SPARK-23760</a>] -         CodegenContext.withSubExprEliminationExprs should save/restore CSE state correctly
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23775'>SPARK-23775</a>] -         Flaky test: DataFrameRangeSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23778'>SPARK-23778</a>] -         SparkContext.emptyRDD confuses SparkContext.union
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23780'>SPARK-23780</a>] -         Failed to use googleVis library with new SparkR
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23785'>SPARK-23785</a>] -         LauncherBackend doesn&#39;t check state of connection before setting state
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23786'>SPARK-23786</a>] -         CSV schema validation - column names are not checked
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23787'>SPARK-23787</a>] -         SparkSubmitSuite::&quot;download remote resource if it is not supported by yarn&quot; fails on Hadoop 2.9
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23788'>SPARK-23788</a>] -         Race condition in StreamingQuerySuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23794'>SPARK-23794</a>] -         UUID() should be stateful
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23799'>SPARK-23799</a>] -         [CBO] FilterEstimation.evaluateInSet produces devision by zero in a case of empty table with analyzed statistics
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23802'>SPARK-23802</a>] -         PropagateEmptyRelation can leave query plan in unresolved state
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23806'>SPARK-23806</a>] -         Broadcast. unpersist can cause fatal exception when used with dynamic allocation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23808'>SPARK-23808</a>] -         Test spark sessions should set default session
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23809'>SPARK-23809</a>] -         Active SparkSession should be set by getOrCreate
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23815'>SPARK-23815</a>] -         Spark writer dynamic partition overwrite mode fails to write output on multi level partition
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23816'>SPARK-23816</a>] -         FetchFailedException when killing speculative task
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23823'>SPARK-23823</a>] -         ResolveReferences loses correct origin
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23825'>SPARK-23825</a>] -         [K8s] Spark pods should request memory + memoryOverhead as resources
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23827'>SPARK-23827</a>] -         StreamingJoinExec should ensure that input data is partitioned into specific number of partitions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23829'>SPARK-23829</a>] -         spark-sql-kafka source in spark 2.3 causes reading stream failure frequently
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23834'>SPARK-23834</a>] -         Flaky test: LauncherServerSuite.testAppHandleDisconnect
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23835'>SPARK-23835</a>] -         When Dataset.as converts column from nullable to non-nullable type, null Doubles are converted silently to -1
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23850'>SPARK-23850</a>] -         We should not redact username|user|url from UI by default
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23852'>SPARK-23852</a>] -         Parquet MR bug can lead to incorrect SQL results
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23853'>SPARK-23853</a>] -         Skip doctests which require hive support built in PySpark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23857'>SPARK-23857</a>] -         In mesos cluster mode spark submit requires the keytab to be available on the local file system.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23868'>SPARK-23868</a>] -         Fix scala.MatchError in literals.sql.out 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23882'>SPARK-23882</a>] -         Is UTF8StringSuite.writeToOutputStreamUnderflow() supported?
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23888'>SPARK-23888</a>] -         speculative task should not run on a given host where another attempt is already running on
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23893'>SPARK-23893</a>] -         Possible overflow in long = int * int
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23941'>SPARK-23941</a>] -         Mesos task failed on specific spark app name
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23951'>SPARK-23951</a>] -         Use java classed in ExprValue and simplify a bunch of stuff
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23971'>SPARK-23971</a>] -         Should not leak Spark sessions across test suites
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23975'>SPARK-23975</a>] -         Allow Clustering to take Arrays of Double as input features
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23976'>SPARK-23976</a>] -         UTF8String.concat() or ByteArray.concat() may allocate shorter structure.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23986'>SPARK-23986</a>] -         CompileException when using too many avg aggregation after joining
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23989'>SPARK-23989</a>] -         When using `SortShuffleWriter`, the data will be overwritten
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23991'>SPARK-23991</a>] -         data loss when allocateBlocksToBatch
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23997'>SPARK-23997</a>] -         Configurable max number of buckets
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24002'>SPARK-24002</a>] -         Task not serializable caused by org.apache.parquet.io.api.Binary$ByteBufferBackedBinary.getBytes
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24007'>SPARK-24007</a>] -         EqualNullSafe for FloatType and DoubleType might generate a wrong result by codegen.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24012'>SPARK-24012</a>] -         Union of map and other compatible column
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24013'>SPARK-24013</a>] -         ApproximatePercentile grinds to a halt on sorted input.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24021'>SPARK-24021</a>] -         Fix bug in BlacklistTracker&#39;s updateBlacklistForFetchFailure
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24022'>SPARK-24022</a>] -         Flaky test: SparkContextSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24033'>SPARK-24033</a>] -         LAG Window function broken in Spark 2.3
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24043'>SPARK-24043</a>] -         InterpretedPredicate.eval fails if expression tree contains Nondeterministic expressions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24050'>SPARK-24050</a>] -         StreamingQuery does not calculate input / processing rates in some cases
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24056'>SPARK-24056</a>] -         Make consumer creation lazy in Kafka source for Structured streaming
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24061'>SPARK-24061</a>] -         [SS]TypedFilter is not supported in Continuous Processing
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24062'>SPARK-24062</a>] -         SASL encryption cannot be worked in ThriftServer
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24068'>SPARK-24068</a>] -         CSV schema inferring doesn&#39;t work for compressed files
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24076'>SPARK-24076</a>] -         very bad performance when shuffle.partition = 8192
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24085'>SPARK-24085</a>] -         Scalar subquery error
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24104'>SPARK-24104</a>] -         SQLAppStatusListener overwrites metrics onDriverAccumUpdates instead of updating them
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24107'>SPARK-24107</a>] -         ChunkedByteBuffer.writeFully method has not reset the limit value
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24108'>SPARK-24108</a>] -         ChunkedByteBuffer.writeFully method has not reset the limit value
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24110'>SPARK-24110</a>] -         Avoid calling UGI loginUserFromKeytab in ThriftServer
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24123'>SPARK-24123</a>] -         Fix a flaky test `DateTimeUtilsSuite.monthsBetween`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24133'>SPARK-24133</a>] -         Reading Parquet files containing large strings can fail with java.lang.ArrayIndexOutOfBoundsException
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24137'>SPARK-24137</a>] -         [K8s] Mount temporary directories in emptydir volumes
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24141'>SPARK-24141</a>] -         Fix bug in CoarseGrainedSchedulerBackend.killExecutors
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24143'>SPARK-24143</a>] -         filter empty blocks when convert mapstatus to (blockId, size) pair
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24151'>SPARK-24151</a>] -         CURRENT_DATE, CURRENT_TIMESTAMP incorrectly resolved as column names when caseSensitive is enabled
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24165'>SPARK-24165</a>] -         UDF within when().otherwise() raises NullPointerException
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24166'>SPARK-24166</a>] -         InMemoryTableScanExec should not access SQLConf at executor side
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24167'>SPARK-24167</a>] -         ParquetFilters should not access SQLConf at executor side
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24168'>SPARK-24168</a>] -         WindowExec should not access SQLConf at executor side
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24169'>SPARK-24169</a>] -         JsonToStructs should not access SQLConf at executor side
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24190'>SPARK-24190</a>] -         lineSep shouldn&#39;t be required in JSON write
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24195'>SPARK-24195</a>] -         sc.addFile for local:/ path is broken
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24214'>SPARK-24214</a>] -         StreamingRelationV2/StreamingExecutionRelation/ContinuousExecutionRelation.toJSON should not fail
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24216'>SPARK-24216</a>] -         Spark TypedAggregateExpression uses getSimpleName that is not safe in scala
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24228'>SPARK-24228</a>] -         Fix the lint error
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24230'>SPARK-24230</a>] -         With Parquet 1.10 upgrade has errors in the vectorized reader
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24241'>SPARK-24241</a>] -         Do not fail fast when dynamic resource allocation enabled with 0 executor
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24255'>SPARK-24255</a>] -         Require Java 8 in SparkR description
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24257'>SPARK-24257</a>] -         LongToUnsafeRowMap calculate the new size may be wrong
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24259'>SPARK-24259</a>] -         ArrayWriter for Arrow produces wrong output
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24263'>SPARK-24263</a>] -         SparkR java check breaks on openjdk
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24276'>SPARK-24276</a>] -         semanticHash() returns different values for semantically the same IS IN
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24294'>SPARK-24294</a>] -         Throw SparkException when OOM in BroadcastExchangeExec
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24300'>SPARK-24300</a>] -         generateLDAData in ml.cluster.LDASuite didn&#39;t set seed correctly
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24309'>SPARK-24309</a>] -         AsyncEventQueue should handle an interrupt from a Listener
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24313'>SPARK-24313</a>] -         Collection functions interpreted execution doesn&#39;t work with complex types
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24319'>SPARK-24319</a>] -         run-example can not print usage
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24322'>SPARK-24322</a>] -         Upgrade Apache ORC to 1.4.4
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24341'>SPARK-24341</a>] -         Codegen compile error from predicate subquery
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24348'>SPARK-24348</a>] -         scala.MatchError in the &quot;element_at&quot; expression
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24350'>SPARK-24350</a>] -         ClassCastException in &quot;array_position&quot; function
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24351'>SPARK-24351</a>] -         offsetLog/commitLog purge thresholdBatchId should be computed with current committed epoch but not currentBatchId in CP mode
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24364'>SPARK-24364</a>] -         Files deletion after globbing may fail StructuredStreaming jobs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24368'>SPARK-24368</a>] -         Flaky tests: org.apache.spark.sql.execution.datasources.csv.UnivocityParserSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24369'>SPARK-24369</a>] -         A bug when having multiple distinct aggregations
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24373'>SPARK-24373</a>] -         &quot;df.cache() df.count()&quot; no longer eagerly caches data when the analyzed plans are different after re-analyzing the plans
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24377'>SPARK-24377</a>] -         Make --py-files work in non pyspark application
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24380'>SPARK-24380</a>] -         argument quoting/escaping broken in mesos cluster scheduler
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24384'>SPARK-24384</a>] -         spark-submit --py-files with .py files doesn&#39;t work in client mode before context initialization
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24385'>SPARK-24385</a>] -         Trivially-true EqualNullSafe should be handled like EqualTo in Dataset.join
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24391'>SPARK-24391</a>] -         from_json should support arrays of primitives, and more generally all JSON 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24414'>SPARK-24414</a>] -         Stages page doesn&#39;t show all task attempts when failures
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24415'>SPARK-24415</a>] -         Stage page aggregated executor metrics wrong when failures 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24416'>SPARK-24416</a>] -         Update configuration definition for spark.blacklist.killBlacklistedExecutors
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24446'>SPARK-24446</a>] -         Library path with special characters breaks Spark on YARN
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24452'>SPARK-24452</a>] -         long = int*int or long = int+int may cause overflow.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24453'>SPARK-24453</a>] -         Fix error recovering from the failure in a no-data batch
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24466'>SPARK-24466</a>] -         TextSocketMicroBatchReader no longer works with nc utility
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24468'>SPARK-24468</a>] -         DecimalType `adjustPrecisionScale` might fail when scale is negative
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24488'>SPARK-24488</a>] -         Analyzer throws when generator is aliased multiple times
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24495'>SPARK-24495</a>] -         SortMergeJoin with duplicate keys wrong results
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24500'>SPARK-24500</a>] -         UnsupportedOperationException when trying to execute Union plan with Stream of children
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24506'>SPARK-24506</a>] -         Spark.ui.filters not applied to /sqlserver/ url
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24520'>SPARK-24520</a>] -         Double braces in link
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24526'>SPARK-24526</a>] -         Spaces in the build dir causes failures in the build/mvn script
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24530'>SPARK-24530</a>] -         Sphinx doesn&#39;t render autodoc_docstring_signature correctly (with Python 2?) and pyspark.ml docs are broken
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24531'>SPARK-24531</a>] -         HiveExternalCatalogVersionsSuite failing due to missing 2.2.0 version
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24536'>SPARK-24536</a>] -         Query with nonsensical LIMIT hits AssertionError
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24548'>SPARK-24548</a>] -         JavaPairRDD to Dataset&lt;Row&gt; in SPARK generates ambiguous results
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24552'>SPARK-24552</a>] -         Task attempt numbers are reused when stages are retried
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24553'>SPARK-24553</a>] -         Job UI redirect causing http 302 error
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24556'>SPARK-24556</a>] -         ReusedExchange should rewrite output partitioning also when child&#39;s partitioning is RangePartitioning
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24563'>SPARK-24563</a>] -         Allow running PySpark shell without Hive
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24569'>SPARK-24569</a>] -         Spark Aggregator with output type Option[Boolean] creates column of type Row
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24573'>SPARK-24573</a>] -         SBT Java checkstyle affecting the build
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24578'>SPARK-24578</a>] -         Reading remote cache block behavior changes and causes timeout issue
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24583'>SPARK-24583</a>] -         Wrong schema type in InsertIntoDataSourceCommand
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24588'>SPARK-24588</a>] -         StreamingSymmetricHashJoinExec should require HashClusteredPartitioning from children
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24589'>SPARK-24589</a>] -         OutputCommitCoordinator may allow duplicate commits
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24594'>SPARK-24594</a>] -         Introduce metrics for YARN executor allocation problems 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24598'>SPARK-24598</a>] -         SPARK SQL:Datatype overflow conditions gives incorrect result
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24603'>SPARK-24603</a>] -         Typo in comments
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24610'>SPARK-24610</a>] -         wholeTextFiles broken for small files
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24613'>SPARK-24613</a>] -         Cache with UDF could not be matched with subsequent dependent caches
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24633'>SPARK-24633</a>] -         arrays_zip function&#39;s code generator splits input processing incorrectly
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24645'>SPARK-24645</a>] -         Skip parsing when csvColumnPruning enabled and partitions scanned only
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24648'>SPARK-24648</a>] -         SQLMetrics counters are not thread safe
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24653'>SPARK-24653</a>] -         Flaky test &quot;JoinSuite.test SortMergeJoin (with spill)&quot;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24659'>SPARK-24659</a>] -         GenericArrayData.equals should respect element type differences
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24660'>SPARK-24660</a>] -         SHS is not showing properly errors when downloading logs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24676'>SPARK-24676</a>] -         Project required data from parsed data when csvColumnPruning disabled
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24677'>SPARK-24677</a>] -         TaskSetManager not updating successfulTaskDurations for old stage attempts
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24681'>SPARK-24681</a>] -         Cannot create a view from a table when a nested column name contains &#39;:&#39;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24694'>SPARK-24694</a>] -         Integration tests pass only one app argument
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24698'>SPARK-24698</a>] -         In Pyspark&#39;s ML, an Identifiable&#39;s UID has 20 random characters rather than the 12 mentioned in the documentation.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24699'>SPARK-24699</a>] -         Watermark / Append mode should work with Trigger.Once
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24704'>SPARK-24704</a>] -         The order of stages in the DAG graph is incorrect
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24705'>SPARK-24705</a>] -         Spark.sql.adaptive.enabled=true is enabled and self-join query
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24711'>SPARK-24711</a>] -         Integration tests will not work with exclude/include tags
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24713'>SPARK-24713</a>] -         AppMatser of spark streaming kafka OOM if there are hundreds of topics consumed
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24715'>SPARK-24715</a>] -         sbt build brings a wrong jline versions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24717'>SPARK-24717</a>] -         Split out min retain version of state for memory in HDFSBackedStateStoreProvider
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24721'>SPARK-24721</a>] -         Failed to use PythonUDF with literal inputs in filter with data sources
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24734'>SPARK-24734</a>] -         Fix containsNull of Concat for array type.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24739'>SPARK-24739</a>] -         PySpark does not work with Python 3.7.0
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24742'>SPARK-24742</a>] -         Field Metadata raises NullPointerException in hashCode method
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24743'>SPARK-24743</a>] -         Update the JavaDirectKafkaWordCount example to support the new API of Kafka
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24749'>SPARK-24749</a>] -         Cannot filter array&lt;struct&gt; with named_struct
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24754'>SPARK-24754</a>] -         Minhash integer overflow
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24755'>SPARK-24755</a>] -         Executor loss can cause task to not be resubmitted
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24781'>SPARK-24781</a>] -         Using a reference from Dataset in Filter/Sort might not work.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24787'>SPARK-24787</a>] -         Events being dropped at an alarming rate due to hsync being slow for eventLogging
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24788'>SPARK-24788</a>] -         RelationalGroupedDataset.toString throws errors when grouping by UnresolvedAttribute
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24804'>SPARK-24804</a>] -         There are duplicate words in the title in the DatasetSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24809'>SPARK-24809</a>] -         Serializing LongHashedRelation in executor may result in data error
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24812'>SPARK-24812</a>] -         Last Access Time in the table description is not valid
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24813'>SPARK-24813</a>] -         HiveExternalCatalogVersionsSuite still flaky; fall back to Apache archive
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24829'>SPARK-24829</a>] -         In Spark Thrift Server, CAST AS FLOAT inconsistent with spark-shell or spark-sql 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24846'>SPARK-24846</a>] -         Stabilize expression cannonicalization
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24850'>SPARK-24850</a>] -         Query plan string representation grows exponentially on queries with recursive cached datasets
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24870'>SPARK-24870</a>] -         Cache can&#39;t work normally if there are case letters in SQL
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24873'>SPARK-24873</a>] -         increase switch to shielding frequent interaction reports with yarn
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24878'>SPARK-24878</a>] -         Fix reverse function for array type of primitive type containing null.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24879'>SPARK-24879</a>] -         NPE in Hive partition filter pushdown for `partCol IN (NULL, ....)`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24880'>SPARK-24880</a>] -         Fix the group id for spark-kubernetes-integration-tests
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24889'>SPARK-24889</a>] -         dataset.unpersist() doesn&#39;t update storage memory stats
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24891'>SPARK-24891</a>] -         Fix HandleNullInputsForUDF rule
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24895'>SPARK-24895</a>] -         Spark 2.4.0 Snapshot artifacts has broken metadata due to mismatched filenames
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24896'>SPARK-24896</a>] -         Uuid expression should produce different values in each execution under streaming query
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24908'>SPARK-24908</a>] -         [R] remove spaces to make lintr happy
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24909'>SPARK-24909</a>] -         Spark scheduler can hang when fetch failures, executor lost, task running on lost executor, and multiple stage attempts
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24911'>SPARK-24911</a>] -         SHOW CREATE TABLE drops escaping of nested column names
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24919'>SPARK-24919</a>] -         Scala linter rule for sparkContext.hadoopConfiguration
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24927'>SPARK-24927</a>] -         The hadoop-provided profile doesn&#39;t play well with Snappy-compressed Parquet files
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24934'>SPARK-24934</a>] -         Complex type and binary type in in-memory partition pruning does not work due to missing upper/lower bounds cases
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24937'>SPARK-24937</a>] -         Datasource partition table should load empty static partitions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24948'>SPARK-24948</a>] -         SHS filters wrongly some applications due to permission check
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24950'>SPARK-24950</a>] -         scala DateTimeUtilsSuite daysToMillis and millisToDays fails w/java 8 181-b13
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24957'>SPARK-24957</a>] -         Decimal arithmetic can lead to wrong values using codegen
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24963'>SPARK-24963</a>] -         Integration tests will fail if they run in a namespace not being the default
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24966'>SPARK-24966</a>] -         Fix the precedence rule for set operations.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24972'>SPARK-24972</a>] -         PivotFirst could not handle pivot columns of complex types
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24981'>SPARK-24981</a>] -         ShutdownHook timeout causes job to fail when succeeded when SparkContext stop() not called by user program
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24987'>SPARK-24987</a>] -         Kafka Cached Consumer Leaking File Descriptors
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24997'>SPARK-24997</a>] -         Support MINUS ALL
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25004'>SPARK-25004</a>] -         Add spark.executor.pyspark.memory config to set resource.RLIMIT_AS
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25005'>SPARK-25005</a>] -         Structured streaming doesn&#39;t support kafka transaction (creating empty offset with abort &amp; markers)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25009'>SPARK-25009</a>] -         Standalone Cluster mode application submit is not working
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25010'>SPARK-25010</a>] -         Rand/Randn should produce different values for each execution in streaming query
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25011'>SPARK-25011</a>] -         Add PrefixSpan to __all__ in fpm.py
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25019'>SPARK-25019</a>] -         The published spark sql pom does not exclude the normal version of orc-core 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25021'>SPARK-25021</a>] -         Add spark.executor.pyspark.memory support to Kubernetes
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25028'>SPARK-25028</a>] -         AnalyzePartitionCommand failed with NPE if value is null
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25031'>SPARK-25031</a>] -         The schema of MapType can not be printed correctly
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25033'>SPARK-25033</a>] -         Bump Apache commons.{httpclient, httpcore}
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25036'>SPARK-25036</a>] -         Scala 2.12 issues: Compilation error with sbt
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25041'>SPARK-25041</a>] -         genjavadoc-plugin_0.10 is not found with sbt in scala-2.12
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25046'>SPARK-25046</a>] -         Alter View  can excute sql  like &quot;ALTER VIEW ... AS INSERT INTO&quot; 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25058'>SPARK-25058</a>] -         Use Block.isEmpty/nonEmpty to check whether the code is empty or not.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25072'>SPARK-25072</a>] -         PySpark custom Row class can be given extra parameters
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25076'>SPARK-25076</a>] -         SQLConf should not be retrieved from a stopped SparkSession
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25081'>SPARK-25081</a>] -         Nested spill in ShuffleExternalSorter may access a released memory page 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25084'>SPARK-25084</a>] -         &quot;distribute by&quot; on multiple columns may lead to codegen issue
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25090'>SPARK-25090</a>] -         java.lang.ClassCastException when using a CrossValidator
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25092'>SPARK-25092</a>] -         Add RewriteExceptAll, RewriteIntersectAll and RewriteCorrelatedScalarSubquery in the list of nonExcludableRules
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25096'>SPARK-25096</a>] -         Loosen nullability if the cast is force-nullable.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25114'>SPARK-25114</a>] -         RecordBinaryComparator may return wrong result when subtraction between two words is divisible by Integer.MAX_VALUE
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25116'>SPARK-25116</a>] -         Fix the &quot;exit code 1&quot; error when terminating Kafka tests
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25124'>SPARK-25124</a>] -         VectorSizeHint.size is buggy, breaking streaming pipeline
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25126'>SPARK-25126</a>] -         avoid creating OrcFile.Reader for all orc files
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25132'>SPARK-25132</a>] -         Case-insensitive field resolution when reading from Parquet
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25134'>SPARK-25134</a>] -         Csv column pruning with checking of headers throws incorrect error
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25137'>SPARK-25137</a>] -         NumberFormatException` when starting spark-shell  from Mac terminal
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25149'>SPARK-25149</a>] -         Personalized PageRank raises an error if vertexIDs are &gt; MaxInt
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25159'>SPARK-25159</a>] -         json schema inference should only trigger one job
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25161'>SPARK-25161</a>] -         Fix several bugs in failure handling of barrier execution mode
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25163'>SPARK-25163</a>] -         Flaky test: o.a.s.util.collection.ExternalAppendOnlyMapSuite.spilling with compression
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25164'>SPARK-25164</a>] -         Parquet reader builds entire list of columns once for each column
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25167'>SPARK-25167</a>] -         Minor fixes for R sql tests (tests that fail in development environment)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25174'>SPARK-25174</a>] -         ApplicationMaster suspends when unregistering itself from RM with extreme large diagnostic message
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25175'>SPARK-25175</a>] -         Field resolution should fail if there&#39;s ambiguity for ORC native reader
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25176'>SPARK-25176</a>] -         Kryo fails to serialize a parametrised type hierarchy
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25181'>SPARK-25181</a>] -         Block Manager master and slave thread pools are unbounded
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25183'>SPARK-25183</a>] -         Spark HiveServer2 registers shutdown hook with JVM, not ShutdownHookManager; race conditions can arise
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25204'>SPARK-25204</a>] -         rate source test is flaky
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25205'>SPARK-25205</a>] -         typo in spark.network.crypto.keyFactoryIteration
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25206'>SPARK-25206</a>] -         wrong records are returned when Hive metastore schema and parquet schema are in different letter cases
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25214'>SPARK-25214</a>] -         Kafka v2 source may return duplicated records when `failOnDataLoss` is `false`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25218'>SPARK-25218</a>] -         Potential resource leaks in TransportServer and SocketAuthHelper
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25221'>SPARK-25221</a>] -         [DEPLOY] Consistent trailing whitespace treatment of conf values
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25231'>SPARK-25231</a>] -         Running a Large Job with Speculation On Causes Executor Heartbeats to Time Out on Driver
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25237'>SPARK-25237</a>] -         FileScanRdd&#39;s inputMetrics is wrong  when select the datasource table with limit
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25240'>SPARK-25240</a>] -         A deadlock in ALTER TABLE RECOVER PARTITIONS
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25264'>SPARK-25264</a>] -         Fix comma-delineated arguments passed into PythonRunner and RRunner
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25266'>SPARK-25266</a>] -         Fix memory leak in Barrier Execution Mode
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25268'>SPARK-25268</a>] -         runParallelPersonalizedPageRank throws serialization Exception
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25278'>SPARK-25278</a>] -         Number of output rows metric of union of views is multiplied by their occurrences
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25283'>SPARK-25283</a>] -         A deadlock in UnionRDD
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25288'>SPARK-25288</a>] -         Kafka transaction tests are flaky
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25289'>SPARK-25289</a>] -         ChiSqSelector max on empty collection
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25291'>SPARK-25291</a>] -         Flakiness of tests in terms of executor memory (SecretsTestSuite)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25295'>SPARK-25295</a>] -         Pod names conflicts in client mode, if previous submission was not a clean shutdown.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25306'>SPARK-25306</a>] -         Avoid skewed filter trees to speed up `createFilter` in ORC
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25307'>SPARK-25307</a>] -         ArraySort function may return a error in the code generation phase.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25308'>SPARK-25308</a>] -         ArrayContains function may return a error in the code generation phase.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25310'>SPARK-25310</a>] -         ArraysOverlap may throw a CompileException
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25313'>SPARK-25313</a>] -         Fix regression in FileFormatWriter output schema
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25314'>SPARK-25314</a>] -         Invalid PythonUDF - requires attributes from more than one child - in &quot;on&quot; join condition
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25317'>SPARK-25317</a>] -         MemoryBlock performance regression
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25330'>SPARK-25330</a>] -         Permission issue after upgrade hadoop version to 2.7.7
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25352'>SPARK-25352</a>] -         Perform ordered global limit when limit number is bigger than topKSortFallbackThreshold
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25357'>SPARK-25357</a>] -         Add metadata to SparkPlanInfo to dump more information like file path to event log
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25363'>SPARK-25363</a>] -         Schema pruning doesn&#39;t work if nested column is used in where clause
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25368'>SPARK-25368</a>] -         Incorrect constraint inference returns wrong result
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25371'>SPARK-25371</a>] -         Vector Assembler with no input columns leads to opaque error
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25387'>SPARK-25387</a>] -         Malformed CSV causes NPE
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25389'>SPARK-25389</a>] -         INSERT OVERWRITE DIRECTORY STORED AS should prevent duplicate fields
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25398'>SPARK-25398</a>] -         Minor bugs from comparing unrelated types
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25399'>SPARK-25399</a>] -         Reusing execution threads from continuous processing for microbatch streaming can result in correctness issues
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25402'>SPARK-25402</a>] -         Null handling in BooleanSimplification
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25406'>SPARK-25406</a>] -         Incorrect usage of withSQLConf method in Parquet schema pruning test suite masks failing tests
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25416'>SPARK-25416</a>] -         ArrayPosition function may return incorrect result when right expression is implicitly downcasted.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25417'>SPARK-25417</a>] -         ArrayContains function may return incorrect result when right expression is implicitly down casted
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25425'>SPARK-25425</a>] -         Extra options must overwrite sessions options
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25427'>SPARK-25427</a>] -         Add BloomFilter creation test cases
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25431'>SPARK-25431</a>] -         Fix function examples and unify the format of the example results.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25438'>SPARK-25438</a>] -         Fix FilterPushdownBenchmark to use the same memory assumption
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25439'>SPARK-25439</a>] -         TPCHQuerySuite customer.c_nationkey should be bigint instead of string
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25443'>SPARK-25443</a>] -         fix issues when building docs with release scripts in docker
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25450'>SPARK-25450</a>] -         PushProjectThroughUnion rule uses the same exprId for project expressions in each Union child, causing mistakes in constant propagation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25471'>SPARK-25471</a>] -         Fix tests for Python 3.6 with Pandas 0.23+
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25495'>SPARK-25495</a>] -         FetchedData.reset doesn&#39;t reset _nextOffsetInFetchedData and _offsetAfterPoll
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25502'>SPARK-25502</a>] -         [Spark Job History] Empty Page when page number exceeds the reatinedTask size 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25503'>SPARK-25503</a>] -         [Spark Job History] Total task message in stage page is ambiguous
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25505'>SPARK-25505</a>] -         The output order of grouping columns in Pivot is different from the input order
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25509'>SPARK-25509</a>] -         SHS V2 cannot enabled in Windows, because POSIX permissions is not support.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25519'>SPARK-25519</a>] -         ArrayRemove function may return incorrect result when right expression is implicitly downcasted.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25521'>SPARK-25521</a>] -         Job id showing null when insert into command Job is finished.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25522'>SPARK-25522</a>] -          Improve type promotion for input arguments of elementAt function
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25533'>SPARK-25533</a>] -         Inconsistent message for Completed Jobs in the  JobUI, when there are failed jobs, compared to spark2.2
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25536'>SPARK-25536</a>] -         executorSource.METRIC read wrong record in Executor.scala Line444
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25538'>SPARK-25538</a>] -         incorrect row counts after distinct()
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25542'>SPARK-25542</a>] -         Flaky test: OpenHashMapSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25543'>SPARK-25543</a>] -         Confusing log messages at DEBUG level, in K8s mode.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25546'>SPARK-25546</a>] -         RDDInfo uses SparkEnv before it may have been initialized
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25568'>SPARK-25568</a>] -         Continue to update the remaining accumulators when failing to update one accumulator
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25570'>SPARK-25570</a>] -         Replace 2.3.1 with 2.3.2 in HiveExternalCatalogVersionsSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25578'>SPARK-25578</a>] -         Update to Scala 2.12.7
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25579'>SPARK-25579</a>] -         Use quoted attribute names if needed in pushed ORC predicates
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25591'>SPARK-25591</a>] -         PySpark Accumulators with multiple PythonUDFs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25602'>SPARK-25602</a>] -         SparkPlan.getByteArrayRdd should not consume the input when not necessary
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25636'>SPARK-25636</a>] -         spark-submit swallows the failure reason when there is an error connecting to master
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25644'>SPARK-25644</a>] -         Fix java foreachBatch API
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25646'>SPARK-25646</a>] -         docker-image-tool.sh doesn&#39;t work on developer build
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25660'>SPARK-25660</a>] -         Impossible to use the backward slash as the CSV fields delimiter 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25669'>SPARK-25669</a>] -         Check CSV header only when it exists
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25671'>SPARK-25671</a>] -         Build external/spark-ganglia-lgpl in Jenkins Test 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25674'>SPARK-25674</a>] -         If the records are incremented by more than 1 at a time,the number of bytes might rarely ever get updated
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25677'>SPARK-25677</a>] -         Configuring zstd compression in JDBC throwing IllegalArgumentException Exception
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25697'>SPARK-25697</a>] -         When zstd compression enabled in progress application is throwing Error in UI
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25704'>SPARK-25704</a>] -         Replication of &gt; 2GB block fails due to bad config default
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25708'>SPARK-25708</a>] -         HAVING without GROUP BY means global aggregate
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25714'>SPARK-25714</a>] -         Null Handling in the Optimizer rule BooleanSimplification
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25726'>SPARK-25726</a>] -         Flaky test: SaveIntoDataSourceCommandSuite.`simpleString is redacted`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25727'>SPARK-25727</a>] -         makeCopy failed in InMemoryRelation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25738'>SPARK-25738</a>] -         LOAD DATA INPATH doesn&#39;t work if hdfs conf includes port
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25741'>SPARK-25741</a>] -         Long URLs are not rendered properly in web UI
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25768'>SPARK-25768</a>] -         Constant argument expecting Hive UDAFs doesn&#39;t work
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25793'>SPARK-25793</a>] -         Loading model bug in BisectingKMeans
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25795'>SPARK-25795</a>] -         Fix CSV SparkR SQL Example
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25797'>SPARK-25797</a>] -         Views created via 2.1 cannot be read via 2.2+
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25799'>SPARK-25799</a>] -         DataSourceApiV2 scan reuse does not respect options
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25801'>SPARK-25801</a>] -         pandas_udf grouped_map fails with input dataframe with more than 255 columns
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25803'>SPARK-25803</a>] -         The -n option to docker-image-tool.sh causes other options to be ignored
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25816'>SPARK-25816</a>] -         Functions does not resolve Columns correctly
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25822'>SPARK-25822</a>] -         Fix a race condition when releasing a Python worker
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25832'>SPARK-25832</a>] -         remove newly added map related functions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25835'>SPARK-25835</a>] -         Propagate scala 2.12 profile in k8s integration tests
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25840'>SPARK-25840</a>] -         `make-distribution.sh` should not fail due to missing LICENSE-binary
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25854'>SPARK-25854</a>] -         mvn helper script always exits w/1, causing mvn builds to fail
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26612'>SPARK-26612</a>] -         Speculation kill causing finished stage recomputed
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26614'>SPARK-26614</a>] -         Speculation kill might cause job failure
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26802'>SPARK-26802</a>] -         CVE-2018-11760: Apache Spark local privilege escalation vulnerability
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-28626'>SPARK-28626</a>] -         Spark leaves unencrypted data on local disk, even with encryption turned on (CVE-2019-10099)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34381'>SPARK-34381</a>] -         c
</li>
</ul>
    
<h2>        Epic
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24374'>SPARK-24374</a>] -         SPIP: Support Barrier Execution Mode in Apache Spark
</li>
</ul>
    
<h2>        Story
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24124'>SPARK-24124</a>] -         Spark history server should create spark.history.store.path and set permissions properly
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24852'>SPARK-24852</a>] -         Have spark.ml training use updated `Instrumentation` APIs.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25234'>SPARK-25234</a>] -         SparkR:::parallelize doesn&#39;t handle integer overflow properly
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25248'>SPARK-25248</a>] -         Audit barrier APIs for Spark 2.4
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25345'>SPARK-25345</a>] -         Deprecate readImages APIs from ImageSchema
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25347'>SPARK-25347</a>] -         Document image data source in doc site
</li>
</ul>
    
<h2>        New Feature
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-10697'>SPARK-10697</a>] -         Lift Calculation in Association Rule mining
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14682'>SPARK-14682</a>] -         Provide evaluateEachIteration method or equivalent for spark.ml GBTs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-15064'>SPARK-15064</a>] -         Locale support in StopWordsRemover
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-15784'>SPARK-15784</a>] -         Add Power Iteration Clustering to spark.ml
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19480'>SPARK-19480</a>] -         Higher order functions in SQL
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21274'>SPARK-21274</a>] -         Implement EXCEPT ALL and INTERSECT ALL
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22119'>SPARK-22119</a>] -         Add cosine distance to KMeans
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22880'>SPARK-22880</a>] -         Add option to cascade jdbc truncate if database supports this (PostgreSQL and Oracle)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23010'>SPARK-23010</a>] -         Add integration testing for Kubernetes backend into the apache/spark repository
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23146'>SPARK-23146</a>] -         Support client mode for Kubernetes cluster backend
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23235'>SPARK-23235</a>] -         Add executor Threaddump to api
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23541'>SPARK-23541</a>] -         Allow Kafka source to read data with greater parallelism than the number of topic-partitions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23751'>SPARK-23751</a>] -         Kolmogorov-Smirnoff test Python API in pyspark.ml
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23846'>SPARK-23846</a>] -         samplingRatio for schema inferring of CSV datasource
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23856'>SPARK-23856</a>] -         Spark jdbc setQueryTimeout option
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23948'>SPARK-23948</a>] -         Trigger mapstage&#39;s job listener in submitMissingTasks
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23984'>SPARK-23984</a>] -         PySpark Bindings for K8S
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24027'>SPARK-24027</a>] -         Support MapType(StringType, DataType) as root type by from_json
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24193'>SPARK-24193</a>] -         Sort by disk when number of limit is big in TakeOrderedAndProjectExec
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24231'>SPARK-24231</a>] -         Python API: Provide evaluateEachIteration method or equivalent for spark.ml GBTs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24232'>SPARK-24232</a>] -         Allow referring to kubernetes secrets as env variable
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24288'>SPARK-24288</a>] -         Enable preventing predicate pushdown
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24371'>SPARK-24371</a>] -         Added isInCollection in DataFrame API for Scala and Java.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24372'>SPARK-24372</a>] -         Create script for preparing RCs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24396'>SPARK-24396</a>] -         Add Structured Streaming ForeachWriter for python
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24397'>SPARK-24397</a>] -         Add TaskContext.getLocalProperties in Python
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24411'>SPARK-24411</a>] -         Adding native Java tests for `isInCollection`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24412'>SPARK-24412</a>] -         Adding docs about automagical type casting in `isin` and `isInCollection` APIs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24433'>SPARK-24433</a>] -         R Bindings for K8S
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24435'>SPARK-24435</a>] -         Support user-supplied YAML that can be merged with k8s pod descriptions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24465'>SPARK-24465</a>] -         LSHModel should support Structured Streaming for transform
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24479'>SPARK-24479</a>] -         Register StreamingQueryListener in Spark Conf 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24499'>SPARK-24499</a>] -         Split the page of sql-programming-guide.html to multiple separate pages
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24542'>SPARK-24542</a>] -         Hive UDF series UDFXPathXXXX allow users to pass carefully crafted XML to access arbitrary files
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24662'>SPARK-24662</a>] -         Structured Streaming should support LIMIT
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24730'>SPARK-24730</a>] -         Add policy to choose max as global watermark when streaming query has multiple watermarks
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24768'>SPARK-24768</a>] -         Have a built-in AVRO data source implementation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24795'>SPARK-24795</a>] -         Implement barrier execution mode
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24802'>SPARK-24802</a>] -         Optimization Rule Exclusion
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24817'>SPARK-24817</a>] -         Implement BarrierTaskContext.barrier()
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24819'>SPARK-24819</a>] -         Fail fast when no enough slots to launch the barrier stage on job submitted
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24820'>SPARK-24820</a>] -         Fail fast when submitted job contains PartitionPruningRDD in a barrier stage
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24821'>SPARK-24821</a>] -         Fail fast when submitted job compute on a subset of all the partitions for a barrier stage
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24822'>SPARK-24822</a>] -         Python support for barrier execution mode
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24918'>SPARK-24918</a>] -         Executor Plugin API
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25468'>SPARK-25468</a>] -         Highlight current page index in the history server
</li>
</ul>
    
<h2>        Improvement
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-3159'>SPARK-3159</a>] -         Check for reducible DecisionTree
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-4502'>SPARK-4502</a>] -         Spark SQL reads unneccesary nested fields from Parquet
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-7132'>SPARK-7132</a>] -         Add fit with validation set to spark.ml GBT
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-9312'>SPARK-9312</a>] -         The OneVsRest model does not provide rawPrediction
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-11630'>SPARK-11630</a>] -         ClosureCleaner incorrectly warns for class based closures
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13343'>SPARK-13343</a>] -         speculative tasks that didn&#39;t commit shouldn&#39;t be marked as success
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14712'>SPARK-14712</a>] -         spark.ml LogisticRegressionModel.toString should summarize model
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-15009'>SPARK-15009</a>] -         PySpark CountVectorizerModel should be able to construct from vocabulary list
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-16406'>SPARK-16406</a>] -         Reference resolution for large number of columns should be faster
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-16501'>SPARK-16501</a>] -         spark.mesos.secret exposed on UI and command line
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-16617'>SPARK-16617</a>] -         Upgrade to Avro 1.8.x
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-16630'>SPARK-16630</a>] -         Blacklist a node if executors won&#39;t launch on it.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-18057'>SPARK-18057</a>] -         Update structured streaming kafka from 0.10.0.1 to 2.0.0
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-18230'>SPARK-18230</a>] -         MatrixFactorizationModel.recommendProducts throws NoSuchElement exception when the user does not exist
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19018'>SPARK-19018</a>] -         spark csv writer charset support
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19602'>SPARK-19602</a>] -         Unable to query using the fully qualified column name of the form ( &lt;DBNAME&gt;.&lt;TABLENAME&gt;.&lt;COLUMNNAME&gt;)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19724'>SPARK-19724</a>] -         create a managed table with an existed default location should throw an exception
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19947'>SPARK-19947</a>] -         RFormulaModel always throws Exception on transforming data with NULL or Unseen labels
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20087'>SPARK-20087</a>] -         Include accumulators / taskMetrics when sending TaskKilled to onTaskEnd listeners
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20168'>SPARK-20168</a>] -         Enable kinesis to start stream from Initial position specified by a timestamp
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20538'>SPARK-20538</a>] -         Dataset.reduce operator should use withNewExecutionId (as foreach or foreachPartition)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20659'>SPARK-20659</a>] -         Remove StorageStatus, or make it private.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20937'>SPARK-20937</a>] -         Describe spark.sql.parquet.writeLegacyFormat property in Spark SQL, DataFrames and Datasets Guide
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21318'>SPARK-21318</a>] -         The exception message thrown by `lookupFunction` is ambiguous.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21590'>SPARK-21590</a>] -         Structured Streaming window start time should support negative values to adjust time zone
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21687'>SPARK-21687</a>] -         Spark SQL should set createTime for Hive partition
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21741'>SPARK-21741</a>] -         Python API for DataFrame-based multivariate summarizer
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21783'>SPARK-21783</a>] -         Turn on ORC filter push-down by default
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21860'>SPARK-21860</a>] -         Improve memory reuse for heap memory in `HeapMemoryAllocator`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21960'>SPARK-21960</a>] -         Spark Streaming Dynamic Allocation should respect spark.executor.instances
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22068'>SPARK-22068</a>] -         Reduce the duplicate code between putIteratorAsValues and putIteratorAsBytes
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22144'>SPARK-22144</a>] -         ExchangeCoordinator will not combine the partitions of an 0 sized pre-shuffle
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22210'>SPARK-22210</a>] -         Online LDA variationalTopicInference  should use random seed to have stable behavior
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22219'>SPARK-22219</a>] -         Refector &quot;spark.sql.codegen.comments&quot;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22269'>SPARK-22269</a>] -         Java style checks should be run in Jenkins
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22666'>SPARK-22666</a>] -         Spark datasource for image format
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22683'>SPARK-22683</a>] -         DynamicAllocation wastes resources by allocating containers that will barely be used
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22751'>SPARK-22751</a>] -         Improve ML RandomForest shuffle performance
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22814'>SPARK-22814</a>] -         JDBC support date/timestamp type as partitionColumn
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22839'>SPARK-22839</a>] -         Refactor Kubernetes code for configuring driver/executor pods to use consistent and cleaner abstraction
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22856'>SPARK-22856</a>] -         Add wrapper for codegen output and nullability
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22941'>SPARK-22941</a>] -         Allow SparkSubmit to throw exceptions instead of exiting / printing errors.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22959'>SPARK-22959</a>] -         Configuration to select the modules for daemon and worker in PySpark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23012'>SPARK-23012</a>] -         Support for predicate pushdown and partition pruning when left joining large Hive tables
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23024'>SPARK-23024</a>] -         Spark ui about the contents of the form need to have hidden and show features, when the table records very much. 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23031'>SPARK-23031</a>] -         Merge script should allow arbitrary assignees
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23034'>SPARK-23034</a>] -         Display tablename for `HiveTableScan` node in UI
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23040'>SPARK-23040</a>] -         BlockStoreShuffleReader&#39;s return Iterator isn&#39;t interruptible if aggregator or ordering is specified
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23043'>SPARK-23043</a>] -         Upgrade json4s-jackson to 3.5.3
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23085'>SPARK-23085</a>] -         API parity for mllib.linalg.Vectors.sparse 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23159'>SPARK-23159</a>] -         Update Cloudpickle to match version 0.4.3
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23161'>SPARK-23161</a>] -         Add missing APIs to Python GBTClassifier
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23162'>SPARK-23162</a>] -         PySpark ML LinearRegressionSummary missing r2adj
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23166'>SPARK-23166</a>] -         Add maxDF Parameter to CountVectorizer
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23167'>SPARK-23167</a>] -         Update TPCDS queries from v1.4 to v2.7 (latest)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23174'>SPARK-23174</a>] -         Fix pep8 to latest official version
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23188'>SPARK-23188</a>] -         Make vectorized columar reader batch size configurable
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23202'>SPARK-23202</a>] -         Add new API in DataSourceWriter: onDataWriterCommit
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23217'>SPARK-23217</a>] -         Add cosine distance measure to ClusteringEvaluator
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23228'>SPARK-23228</a>] -         Able to track Python create SparkSession in JVM
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23247'>SPARK-23247</a>] -         combines Unsafe operations and statistics operations in Scan Data Source
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23253'>SPARK-23253</a>] -         Only write shuffle temporary index file when there is not an existing one
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23259'>SPARK-23259</a>] -         Clean up legacy code around hive external catalog
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23285'>SPARK-23285</a>] -         Allow spark.executor.cores to be fractional
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23295'>SPARK-23295</a>] -         Exclude Waring message when generating versions  in make-distribution.sh
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23303'>SPARK-23303</a>] -         improve the explain result for data source v2 relations
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23318'>SPARK-23318</a>] -         FP-growth: WARN FPGrowth: Input data is not cached
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23336'>SPARK-23336</a>] -         Upgrade snappy-java to 1.1.7.1
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23359'>SPARK-23359</a>] -         Adds an alias &#39;names&#39; of &#39;fieldNames&#39; in Scala&#39;s StructType
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23366'>SPARK-23366</a>] -         Improve hot reading path in ReadAheadInputStream
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23372'>SPARK-23372</a>] -         Writing empty struct in parquet fails during execution. It should fail earlier during analysis.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23375'>SPARK-23375</a>] -         Optimizer should remove unneeded Sort
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23378'>SPARK-23378</a>] -         move setCurrentDatabase from HiveExternalCatalog to HiveClientImpl
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23379'>SPARK-23379</a>] -         remove redundant metastore access if the current database name is the same
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23382'>SPARK-23382</a>] -         Spark Streaming ui about the contents of the form need to have hidden and show features, when the table records very much.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23383'>SPARK-23383</a>] -         Make a distribution should exit with usage while detecting wrong options
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23389'>SPARK-23389</a>] -         When the shuffle dependency specifies aggregation ,and `dependency.mapSideCombine=false`,  we should be able to use serialized sorting.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23412'>SPARK-23412</a>] -         Add cosine distance measure to BisectingKMeans
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23424'>SPARK-23424</a>] -         Add codegenStageId in comment
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23445'>SPARK-23445</a>] -         ColumnStat refactoring
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23447'>SPARK-23447</a>] -         Cleanup codegen template for Literal
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23455'>SPARK-23455</a>] -         Default Params in ML should be saved separately
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23456'>SPARK-23456</a>] -         Turn on `native` ORC implementation by default
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23466'>SPARK-23466</a>] -         Remove redundant null checks in generated Java code by GenerateUnsafeProjection
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23500'>SPARK-23500</a>] -         Filters on named_structs could be pushed into scans
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23510'>SPARK-23510</a>] -         Support read data from Hive 2.2 and Hive 2.3 metastore
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23518'>SPARK-23518</a>] -         Avoid metastore access when users only want to read and store data frames
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23528'>SPARK-23528</a>] -         Add numIter to ClusteringSummary
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23529'>SPARK-23529</a>] -         Specify hostpath volume and mount the volume in Spark driver and executor pods in Kubernetes
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23538'>SPARK-23538</a>] -         Simplify SSL configuration for https client
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23550'>SPARK-23550</a>] -         Cleanup unused / redundant methods in Utils object
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23553'>SPARK-23553</a>] -         Tests should not assume the default value of `spark.sql.sources.default`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23562'>SPARK-23562</a>] -         RFormula handleInvalid should handle invalid values in non-string columns.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23564'>SPARK-23564</a>] -         the  optimized logical plan about Left anti join should be further optimization
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23565'>SPARK-23565</a>] -         Improved error message for when the number of sources for a query changes
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23568'>SPARK-23568</a>] -         Silhouette should get number of features from metadata if available
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23572'>SPARK-23572</a>] -         Update security.md to cover new features
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23573'>SPARK-23573</a>] -         Create linter rule to prevent misuse of SparkContext.hadoopConfiguration in SQL modules
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23604'>SPARK-23604</a>] -         ParquetInteroperabilityTest timestamp test should use Statistics.hasNonNullValue
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23624'>SPARK-23624</a>] -         Revise doc of method pushFilters
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23627'>SPARK-23627</a>] -         Provide isEmpty() function in DataSet
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23628'>SPARK-23628</a>] -         WholeStageCodegen can generate methods with too many params
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23644'>SPARK-23644</a>] -         SHS with proxy doesn&#39;t show applications
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23645'>SPARK-23645</a>] -         pandas_udf can not be called with keyword arguments
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23654'>SPARK-23654</a>] -         Cut jets3t as a dependency of spark-core
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23656'>SPARK-23656</a>] -         Assertion in XXH64Suite.testKnownByteArrayInputs() is not performed on big endian platform
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23672'>SPARK-23672</a>] -         Document Support returning lists in Arrow UDFs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23675'>SPARK-23675</a>] -         Title add spark logo, use spark logo image
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23683'>SPARK-23683</a>] -         FileCommitProtocol.instantiate to require 3-arg constructor for dynamic partition overwrite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23691'>SPARK-23691</a>] -         Use sql_conf util in PySpark tests where possible
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23695'>SPARK-23695</a>] -         Confusing error message for PySpark&#39;s Kinesis tests when its jar is missing but enabled
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23699'>SPARK-23699</a>] -         PySpark should raise same Error when Arrow fallback is disabled
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23700'>SPARK-23700</a>] -         Cleanup unused imports
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23708'>SPARK-23708</a>] -         Comment of ShutdownHookManager.addShutdownHook is error
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23769'>SPARK-23769</a>] -         Remove unnecessary scalastyle check disabling
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23770'>SPARK-23770</a>] -         Expose repartitionByRange in SparkR
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23772'>SPARK-23772</a>] -         Provide an option to ignore column of all null values or empty map/array during JSON schema inference
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23776'>SPARK-23776</a>] -         pyspark-sql tests should display build instructions when components are missing
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23803'>SPARK-23803</a>] -         Support bucket pruning to optimize filtering on a bucketed column
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23820'>SPARK-23820</a>] -         Allow the long form of call sites to be recorded in the log
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23822'>SPARK-23822</a>] -         Improve error message for Parquet schema mismatches
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23828'>SPARK-23828</a>] -         PySpark StringIndexerModel should have constructor from labels
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23830'>SPARK-23830</a>] -         Spark on YARN in cluster deploy mode fail with NullPointerException when a Spark application is a Scala class not object
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23838'>SPARK-23838</a>] -         SparkUI: Running SQL query displayed as &quot;completed&quot; in SQL tab
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23841'>SPARK-23841</a>] -         NodeIdCache should unpersist the last cached nodeIdsForInstances
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23861'>SPARK-23861</a>] -         Clarify behavior of default window frame boundaries with and without orderBy clause
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23867'>SPARK-23867</a>] -         com.codahale.metrics.Counter output in log message has no toString method
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23873'>SPARK-23873</a>] -         Use accessors in interpreted LambdaVariable
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23874'>SPARK-23874</a>] -         Upgrade apache/arrow to 0.10.0
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23875'>SPARK-23875</a>] -         Create IndexedSeq wrapper for ArrayData
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23877'>SPARK-23877</a>] -         Metadata-only queries do not push down filter conditions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23880'>SPARK-23880</a>] -         table cache should be lazy and don&#39;t trigger any job
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23892'>SPARK-23892</a>] -         Improve coverage and fix lint error in UTF8String-related Suite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23896'>SPARK-23896</a>] -         Improve PartitioningAwareFileIndex
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23944'>SPARK-23944</a>] -         Add Param set functions to LSHModel types
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23947'>SPARK-23947</a>] -         Add hashUTF8String convenience method to hasher classes
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23956'>SPARK-23956</a>] -         Use effective RPC port in AM registration 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23957'>SPARK-23957</a>] -         Sorts in subqueries are redundant and can be removed
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23960'>SPARK-23960</a>] -         Mark HashAggregateExec.bufVars as transient
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23962'>SPARK-23962</a>] -         Flaky tests from SQLMetricsTestUtils.currentExecutionIds
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23963'>SPARK-23963</a>] -         Queries on text-based Hive tables grow disproportionately slower as the number of columns increase
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23966'>SPARK-23966</a>] -         Refactoring all checkpoint file writing logic in a common interface
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23972'>SPARK-23972</a>] -         Upgrade to Parquet 1.10
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23973'>SPARK-23973</a>] -         Remove consecutive sorts
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23979'>SPARK-23979</a>] -         MultiAlias should not be a CodegenFallback
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24003'>SPARK-24003</a>] -         Add support to provide spark.executor.extraJavaOptions in terms of App Id and/or Executor Id&#39;s
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24005'>SPARK-24005</a>] -         Remove usage of Scala’s parallel collection
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24014'>SPARK-24014</a>] -         Add onStreamingStarted method to StreamingListener
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24017'>SPARK-24017</a>] -         Refactor ExternalCatalog to be an interface
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24024'>SPARK-24024</a>] -         Fix deviance calculations in GLM to handle corner cases
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24029'>SPARK-24029</a>] -         Set &quot;reuse address&quot; flag on listen sockets
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24035'>SPARK-24035</a>] -         SQL syntax for Pivot
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24057'>SPARK-24057</a>] -         put the real data type in the AssertionError message
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24058'>SPARK-24058</a>] -         Default Params in ML should be saved separately: Python API
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24072'>SPARK-24072</a>] -         clearly define pushed filters
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24083'>SPARK-24083</a>] -         Diagnostics message for uncaught exceptions should include the stacktrace
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24094'>SPARK-24094</a>] -         Change description strings of v2 streaming sources to reflect the change
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24111'>SPARK-24111</a>] -         Add TPCDS v2.7 (latest) queries in TPCDSQueryBenchmark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24117'>SPARK-24117</a>] -         Unified the getSizePerRow
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24121'>SPARK-24121</a>] -         The API for handling expression code generation in expression codegen
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24126'>SPARK-24126</a>] -         PySpark tests leave a lot of garbage in /tmp
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24127'>SPARK-24127</a>] -         Support text socket source in continuous mode
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24128'>SPARK-24128</a>] -         Mention spark.sql.crossJoin.enabled in implicit cartesian product error msg
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24129'>SPARK-24129</a>] -         Add option to pass --build-arg&#39;s to docker-image-tool.sh
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24131'>SPARK-24131</a>] -         Add majorMinorVersion API to PySpark for determining Spark versions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24136'>SPARK-24136</a>] -         MemoryStreamDataReader.next should skip sleeping if record is available
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24149'>SPARK-24149</a>] -         Automatic namespaces discovery in HDFS federation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24156'>SPARK-24156</a>] -         Enable no-data micro batches for more eager streaming state clean up 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24160'>SPARK-24160</a>] -         ShuffleBlockFetcherIterator should fail if it receives zero-size blocks
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24161'>SPARK-24161</a>] -         Enable debug package feature on structured streaming
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24172'>SPARK-24172</a>] -         we should not apply operator pushdown to data source v2 many times
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24181'>SPARK-24181</a>] -         Better error message for writing sorted data
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24182'>SPARK-24182</a>] -         Improve error message for client mode when AM fails
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24188'>SPARK-24188</a>] -         /api/v1/version not working
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24204'>SPARK-24204</a>] -         Verify a write schema in Json/Orc/ParquetFileFormat
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24206'>SPARK-24206</a>] -         Improve DataSource benchmark code for read and pushdown
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24209'>SPARK-24209</a>] -         0 configuration Knox gateway support in SHS
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24215'>SPARK-24215</a>] -         Implement eager evaluation for DataFrame APIs 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24242'>SPARK-24242</a>] -         RangeExec should have correct outputOrdering
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24244'>SPARK-24244</a>] -         Parse only required columns of CSV file
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24246'>SPARK-24246</a>] -         Improve AnalysisException by setting the cause when it&#39;s available
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24248'>SPARK-24248</a>] -         [K8S] Use the Kubernetes cluster as the backing store for the state of pods
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24250'>SPARK-24250</a>] -         support accessing SQLConf inside tasks
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24262'>SPARK-24262</a>] -         Fix typo in UDF error message
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24268'>SPARK-24268</a>] -         DataType in error messages are not coherent
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24275'>SPARK-24275</a>] -         Revise doc comments in InputPartition
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24277'>SPARK-24277</a>] -         Code clean up in SQL module: HadoopMapReduceCommitProtocol/FileFormatWriter
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24303'>SPARK-24303</a>] -         Update cloudpickle to v0.4.4
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24305'>SPARK-24305</a>] -         Avoid serialization of private fields in new collection expressions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24308'>SPARK-24308</a>] -         Handle DataReaderFactory to InputPartition renames in left over classes
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24312'>SPARK-24312</a>] -         Upgrade to 2.3.3 for Hive Metastore Client 2.3
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24321'>SPARK-24321</a>] -         Extract common code from Divide/Remainder to a base trait
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24326'>SPARK-24326</a>] -          Add local:// scheme support for the app jar in mesos cluster mode
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24327'>SPARK-24327</a>] -         Verify and normalize a partition column name based on the JDBC resolved schema
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24329'>SPARK-24329</a>] -         Remove comments filtering before parsing of CSV files
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24330'>SPARK-24330</a>] -         Refactor ExecuteWriteTask in FileFormatWriter with DataWriter(V2)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24332'>SPARK-24332</a>] -         Fix places reading &#39;spark.network.timeout&#39; as milliseconds
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24337'>SPARK-24337</a>] -         Improve the error message for invalid SQL conf value
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24339'>SPARK-24339</a>] -         spark sql can not prune column in transform/map/reduce query
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24356'>SPARK-24356</a>] -         Duplicate strings in File.path managed by FileSegmentManagedBuffer
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24361'>SPARK-24361</a>] -         Polish code block manipulation API
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24365'>SPARK-24365</a>] -         Add data source write benchmark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24366'>SPARK-24366</a>] -         Improve error message for Catalyst type converters
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24367'>SPARK-24367</a>] -         Parquet: use JOB_SUMMARY_LEVEL instead of deprecated flag ENABLE_JOB_SUMMARY
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24381'>SPARK-24381</a>] -         Improve Unit Test Coverage of NOT IN subqueries
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24408'>SPARK-24408</a>] -         Move abs function to math_funcs group
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24423'>SPARK-24423</a>] -         Add a new option `query` for JDBC sources
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24424'>SPARK-24424</a>] -         Support ANSI-SQL compliant syntax for  GROUPING SET
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24428'>SPARK-24428</a>] -         Remove unused code and fix any related doc in K8s module
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24441'>SPARK-24441</a>] -         Expose total estimated size of states in HDFSBackedStateStoreProvider
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24454'>SPARK-24454</a>] -         ml.image doesn&#39;t have __all__ explicitly defined
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24455'>SPARK-24455</a>] -         fix typo in TaskSchedulerImpl&#39;s comments
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24470'>SPARK-24470</a>] -         RestSubmissionClient to be robust against 404 &amp; non json responses
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24477'>SPARK-24477</a>] -         Import submodules under pyspark.ml by default
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24485'>SPARK-24485</a>] -         Measure and log elapsed time for filesystem operations in HDFSBackedStateStoreProvider
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24490'>SPARK-24490</a>] -         Use WebUI.addStaticHandler in web UIs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24505'>SPARK-24505</a>] -         Convert strings in codegen to blocks: Cast and BoundAttribute
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24518'>SPARK-24518</a>] -         Using Hadoop credential provider API to store password
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24519'>SPARK-24519</a>] -         MapStatus has 2000 hardcoded
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24525'>SPARK-24525</a>] -         Provide an option to limit MemorySink memory usage
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24534'>SPARK-24534</a>] -         Add a way to bypass entrypoint.sh script if no spark cmd is passed
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24543'>SPARK-24543</a>] -         Support any DataType as DDL string for from_json&#39;s schema
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24547'>SPARK-24547</a>] -         Spark on K8s docker-image-tool.sh improvements
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24551'>SPARK-24551</a>] -         Add Integration tests for Secrets
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24555'>SPARK-24555</a>] -         logNumExamples in KMeans/BiKM/GMM/AFT/NB
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24557'>SPARK-24557</a>] -         ClusteringEvaluator support array input
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24558'>SPARK-24558</a>] -         Driver prints the wrong info in the log when the executor which holds cacheBlock is IDLE.Time-out value displayed is not as per configuration value.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24565'>SPARK-24565</a>] -         Add API for in Structured Streaming for exposing output rows of each microbatch as a DataFrame
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24566'>SPARK-24566</a>] -         Fix spark.storage.blockManagerSlaveTimeoutMs default config
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24571'>SPARK-24571</a>] -         Support literals with values of the Char type
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24574'>SPARK-24574</a>] -         improve array_contains function of the sql component to deal with Column type
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24575'>SPARK-24575</a>] -         Prohibit window expressions inside WHERE and HAVING clauses
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24576'>SPARK-24576</a>] -          Upgrade Apache ORC to 1.5.2
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24596'>SPARK-24596</a>] -         Non-cascading Cache Invalidation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24605'>SPARK-24605</a>] -         size(null) should return null
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24609'>SPARK-24609</a>] -         PySpark/SparkR doc doesn&#39;t explain RandomForestClassifier.featureSubsetStrategy well
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24614'>SPARK-24614</a>] -         PySpark - Fix SyntaxWarning on tests.py
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24626'>SPARK-24626</a>] -         Parallelize size calculation in Analyze Table command
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24635'>SPARK-24635</a>] -         Remove Blocks class
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24636'>SPARK-24636</a>] -         Type Coercion of Arrays for array_join Function
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24637'>SPARK-24637</a>] -         Add metrics regarding state and watermark to dropwizard metrics
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24646'>SPARK-24646</a>] -         Support wildcard &#39;*&#39; for to spark.yarn.dist.forceDownloadSchemes
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24658'>SPARK-24658</a>] -         Remove workaround for ANTLR bug
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24665'>SPARK-24665</a>] -         Add SQLConf in PySpark to manage all sql configs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24673'>SPARK-24673</a>] -         scala sql function from_utc_timestamp second argument could be Column instead of String
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24675'>SPARK-24675</a>] -         Rename table: validate existence of new location
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24678'>SPARK-24678</a>] -         We should use &#39;PROCESS_LOCAL&#39; first for Spark-Streaming
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24683'>SPARK-24683</a>] -         SparkLauncher.NO_RESOURCE doesn&#39;t work with Java applications
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24685'>SPARK-24685</a>] -         Adjust release scripts to build all versions for older releases
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24688'>SPARK-24688</a>] -         Clarify comments about LabeledPoint as (label, feature) pair rather than (feature, label)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24691'>SPARK-24691</a>] -         Add new API `supportDataType` in FileFormat
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24692'>SPARK-24692</a>] -         Improvement FilterPushdownBenchmark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24696'>SPARK-24696</a>] -         ColumnPruning rule fails to remove extra Project
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24697'>SPARK-24697</a>] -         Fix the reported start offsets in streaming query progress
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24709'>SPARK-24709</a>] -         Inferring schema from JSON string literal
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24722'>SPARK-24722</a>] -         Column-based API for pivoting
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24727'>SPARK-24727</a>] -         The cache 100 in CodeGenerator is too small for streaming
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24732'>SPARK-24732</a>] -         Type coercion between MapTypes.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24737'>SPARK-24737</a>] -         Type coercion between StructTypes.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24747'>SPARK-24747</a>] -         Make spark.ml.util.Instrumentation class more flexible
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24757'>SPARK-24757</a>] -         Improve error message for broadcast timeouts
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24759'>SPARK-24759</a>] -         No reordering keys for broadcast hash join
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24761'>SPARK-24761</a>] -         Check modifiability of config parameters
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24763'>SPARK-24763</a>] -         Remove redundant key data from value in streaming aggregation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24782'>SPARK-24782</a>] -         Simplify conf access in expressions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24785'>SPARK-24785</a>] -         Making sure REPL prints Spark UI info and then Welcome message
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24790'>SPARK-24790</a>] -         Allow complex aggregate expressions in Pivot
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24801'>SPARK-24801</a>] -         Empty byte[] arrays in spark.network.sasl.SaslEncryption$EncryptedMessage can waste a lot of memory
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24807'>SPARK-24807</a>] -         Adding files/jars twice: output a warning and add a note
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24849'>SPARK-24849</a>] -         Convert StructType to DDL string
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24858'>SPARK-24858</a>] -         Avoid unnecessary parquet footer reads
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24860'>SPARK-24860</a>] -         Expose dynamic partition overwrite per write operation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24865'>SPARK-24865</a>] -         Remove AnalysisBarrier
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24868'>SPARK-24868</a>] -         add sequence function in Python
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24871'>SPARK-24871</a>] -         Refactor Concat and MapConcat to avoid creating concatenator object for each row.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24890'>SPARK-24890</a>] -         Short circuiting the `if` condition when `trueValue` and `falseValue` are the same
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24893'>SPARK-24893</a>] -         Remove the entire CaseWhen if all the outputs are semantic equivalence
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24926'>SPARK-24926</a>] -         Ensure numCores is used consistently in all netty configuration (driver and executors)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24929'>SPARK-24929</a>] -         Merge script swallow KeyboardInterrupt
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24940'>SPARK-24940</a>] -         Coalesce and Repartition Hint for SQL Queries
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24943'>SPARK-24943</a>] -         Convert a SQL Struct to StructType
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24945'>SPARK-24945</a>] -         Switch to uniVocity &gt;= 2.7.2
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24951'>SPARK-24951</a>] -         Table valued functions should throw AnalysisException instead of IllegalArgumentException
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24952'>SPARK-24952</a>] -         Support LZMA2 compression by Avro datasource
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24954'>SPARK-24954</a>] -         Fail fast on job submit if run a barrier stage with dynamic resource allocation enabled
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24956'>SPARK-24956</a>] -         Upgrade maven from 3.3.9 to 3.5.4
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24960'>SPARK-24960</a>] -         k8s: explicitly expose ports on driver container
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24962'>SPARK-24962</a>] -         refactor CodeGenerator.createUnsafeArray
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24978'>SPARK-24978</a>] -         Add spark.sql.fast.hash.aggregate.row.max.capacity to configure the capacity of fast aggregation.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24979'>SPARK-24979</a>] -         add AnalysisHelper#resolveOperatorsUp
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24982'>SPARK-24982</a>] -         UDAF resolution should not throw java.lang.AssertionError
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24992'>SPARK-24992</a>] -         spark should randomize yarn local dir selection
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24993'>SPARK-24993</a>] -         Make Avro fast again
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24996'>SPARK-24996</a>] -         Use DSL to simplify DeclarativeAggregate
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24999'>SPARK-24999</a>] -         Reduce unnecessary &#39;new&#39; memory operations
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25001'>SPARK-25001</a>] -         Fix build miscellaneous warnings
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25018'>SPARK-25018</a>] -         Use `Co-Authored-By` git trailer in `merge_spark_pr.py`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25025'>SPARK-25025</a>] -         Remove the default value of isAll in INTERSECT/EXCEPT
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25043'>SPARK-25043</a>] -         spark-sql should print the appId and master on startup
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25045'>SPARK-25045</a>] -         Make `RDDBarrier.mapParititions` similar to `RDD.mapPartitions`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25069'>SPARK-25069</a>] -         Using UnsafeAlignedOffset to make the entire record of 8 byte Items aligned like which is used in UnsafeExternalSorter 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25073'>SPARK-25073</a>] -         Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always reports an error request to adjust yarn.scheduler.maximum-allocation-mb
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25077'>SPARK-25077</a>] -         Delete unused variable in WindowExec
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25088'>SPARK-25088</a>] -         Rest Server default &amp; doc updates
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25093'>SPARK-25093</a>] -         CodeFormatter could avoid creating regex object again and again
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25105'>SPARK-25105</a>] -         Importing all of pyspark.sql.functions should bring PandasUDFType in as well
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25108'>SPARK-25108</a>] -         Dataset.show() generates incorrect padding for Unicode Character
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25111'>SPARK-25111</a>] -         increment kinesis client/producer lib versions &amp; aws-sdk to match
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25113'>SPARK-25113</a>] -         Add logging to CodeGenerator when any generated method&#39;s bytecode size goes above HugeMethodLimit
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25115'>SPARK-25115</a>] -             Eliminate extra memory copy done when a ByteBuf is used that is backed by &gt; 1 ByteBuffer.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25117'>SPARK-25117</a>] -         Add EXEPT ALL and INTERSECT ALL support in R.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25122'>SPARK-25122</a>] -         Deduplication of supports equals code
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25140'>SPARK-25140</a>] -         Add optional logging to UnsafeProjection.create when it falls back to interpreted mode
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25142'>SPARK-25142</a>] -         Add error messages when Python worker could not open socket in `_load_from_socket`.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25170'>SPARK-25170</a>] -         Add Task Metrics description to the documentation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25178'>SPARK-25178</a>] -         Directly ship the StructType objects of the keySchema / valueSchema for xxxHashMapGenerator
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25208'>SPARK-25208</a>] -         Loosen Cast.forceNullable for DecimalType.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25209'>SPARK-25209</a>] -         Optimization in Dataset.apply for DataFrames
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25212'>SPARK-25212</a>] -         Support Filter in ConvertToLocalRelation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25228'>SPARK-25228</a>] -         Add executor CPU Time metric 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25233'>SPARK-25233</a>] -         Give the user the option of specifying a fixed minimum message per partition per batch when using kafka direct API with backpressure
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25235'>SPARK-25235</a>] -         Merge the REPL code in Scala 2.11 and 2.12 branches
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25241'>SPARK-25241</a>] -         Configurable empty values when reading/writing CSV files
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25252'>SPARK-25252</a>] -         Support arrays of any types in to_json
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25253'>SPARK-25253</a>] -         Refactor pyspark connection &amp; authentication
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25260'>SPARK-25260</a>] -         Fix namespace handling in SchemaConverters.toAvroType
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25261'>SPARK-25261</a>] -         Standardize the default units of spark.driver|executor.memory
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25275'>SPARK-25275</a>] -         require memberhip in wheel to run &#39;su&#39; (in dockerfiles)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25286'>SPARK-25286</a>] -         Remove dangerous parmap
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25287'>SPARK-25287</a>] -         Check for JIRA_USERNAME and JIRA_PASSWORD up front in merge_spark_pr.py
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25300'>SPARK-25300</a>] -         Unified the configuration parameter `spark.shuffle.service.enabled`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25318'>SPARK-25318</a>] -         Add exception handling when wrapping the input stream during the the fetch or stage retry in response to a corrupted block
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25335'>SPARK-25335</a>] -         Skip Zinc downloading if it&#39;s installed in the system
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25375'>SPARK-25375</a>] -         Reenable qualified perm. function checks in UDFSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25384'>SPARK-25384</a>] -         Clarify fromJsonForceNullableSchema will be removed in Spark 3.0
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25400'>SPARK-25400</a>] -         Increase timeouts in schedulerIntegrationSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25445'>SPARK-25445</a>] -         publish a scala 2.12 build with Spark 2.4
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25469'>SPARK-25469</a>] -         Eval methods of Concat, Reverse and ElementAt should use pattern matching only once
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25639'>SPARK-25639</a>] -         Add documentation on foreachBatch, and multiple watermark policy
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25754'>SPARK-25754</a>] -         Change CDN for MathJax 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25859'>SPARK-25859</a>] -         add scala/java/python example and doc for PrefixSpan
</li>
</ul>
    
<h2>        Test
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-16139'>SPARK-16139</a>] -         Audit tests for leaked threads
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22882'>SPARK-22882</a>] -         ML test for StructuredStreaming: spark.ml.classification
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22883'>SPARK-22883</a>] -         ML test for StructuredStreaming: spark.ml.feature, A-M
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22884'>SPARK-22884</a>] -         ML test for StructuredStreaming: spark.ml.clustering
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22885'>SPARK-22885</a>] -         ML test for StructuredStreaming: spark.ml.tuning
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22886'>SPARK-22886</a>] -         ML test for StructuredStreaming: spark.ml.recommendation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22915'>SPARK-22915</a>] -         ML test for StructuredStreaming: spark.ml.feature, N-Z
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23169'>SPARK-23169</a>] -         Run lintr on the changes of lint-r script and .lintr configuration
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23392'>SPARK-23392</a>] -         Add some test case for images feature
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23849'>SPARK-23849</a>] -         Tests for the samplingRatio option of json schema inferring
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23881'>SPARK-23881</a>] -         Flaky test: JobCancellationSuite.&quot;interruptible iterator of shuffle reader&quot;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24044'>SPARK-24044</a>] -         Explicitly print out skipped tests from unittest module
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24502'>SPARK-24502</a>] -         flaky test: UnsafeRowSerializerSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24521'>SPARK-24521</a>] -         Fix ineffective test in CachedTableSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24562'>SPARK-24562</a>] -         Allow running same tests with multiple configs in SQLQueryTestSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24564'>SPARK-24564</a>] -         Add test suite for RecordBinaryComparator
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24740'>SPARK-24740</a>] -         PySpark tests do not pass with NumPy 0.14.x+
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24840'>SPARK-24840</a>] -         do not use dummy filter to switch codegen on/off
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24861'>SPARK-24861</a>] -         create corrected temp directories in RateSourceSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24886'>SPARK-24886</a>] -         Increase Jenkins build time
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25141'>SPARK-25141</a>] -         Modify tests for higher-order functions to check bind method.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25184'>SPARK-25184</a>] -         Flaky test: FlatMapGroupsWithState &quot;streaming with processing time timeout&quot;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25238'>SPARK-25238</a>] -         Lint-Python: Upgrading to the current version of pycodestyle fails
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25249'>SPARK-25249</a>] -         Add a unit test for OpenHashMap
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25267'>SPARK-25267</a>] -         Disable ConvertToLocalRelation in the test cases of sql/core and sql/hive
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25290'>SPARK-25290</a>] -         BytesToBytesMapOnHeapSuite randomizedStressTest can cause OutOfMemoryError
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25296'>SPARK-25296</a>] -         Create ExplainSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25422'>SPARK-25422</a>] -         flaky test: org.apache.spark.DistributedSuite.caching on disk, replicated (encryption = on) (with replication as stream)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25453'>SPARK-25453</a>] -         OracleIntegrationSuite IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25456'>SPARK-25456</a>] -         PythonForeachWriterSuite failing
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25673'>SPARK-25673</a>] -         Remove Travis CI which enables Java lint check
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25736'>SPARK-25736</a>] -         add tests to verify the behavior of multi-column count
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25805'>SPARK-25805</a>] -         Flaky test: DataFrameSuite.SPARK-25159 unittest failure
</li>
</ul>
    
<h2>        Wish
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23131'>SPARK-23131</a>] -         Kryo raises StackOverflow during serializing GLR model
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25258'>SPARK-25258</a>] -         Upgrade kryo package to version 4.0.2
</li>
</ul>
    
<h2>        Task
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20220'>SPARK-20220</a>] -         Add thrift scheduling pool config in scheduling docs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23092'>SPARK-23092</a>] -         Migrate MemoryStream to DataSource V2
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23451'>SPARK-23451</a>] -         Deprecate KMeans computeCost
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23501'>SPARK-23501</a>] -         Refactor AllStagesPage in order to avoid redundant code
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23533'>SPARK-23533</a>] -         Add support for changing ContinuousDataReader&#39;s startOffset
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23601'>SPARK-23601</a>] -         Remove .md5 files from release
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24392'>SPARK-24392</a>] -         Mark pandas_udf as Experimental
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24533'>SPARK-24533</a>] -         typesafe has rebranded to lightbend. change the build/mvn endpoint from downloads.typesafe.com to downloads.lightbend.com
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24654'>SPARK-24654</a>] -         Update, fix LICENSE and NOTICE, and specialize for source vs binary
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25063'>SPARK-25063</a>] -         Rename class KnowNotNull to KnownNotNull
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25095'>SPARK-25095</a>] -         Python support for BarrierTaskContext
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25213'>SPARK-25213</a>] -         DataSourceV2 doesn&#39;t seem to produce unsafe rows 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25336'>SPARK-25336</a>] -         Revert SPARK-24863 and SPARK-24748
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25836'>SPARK-25836</a>] -         (Temporarily) disable automatic build/test of kubernetes-integration-tests
</li>
</ul>
                                                    
<h2>        Dependency upgrade
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20395'>SPARK-20395</a>] -         Update Scala to 2.11.11 and zinc to 0.3.15
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23509'>SPARK-23509</a>] -         Upgrade commons-net from 2.2 to 3.1
</li>
</ul>
    
<h2>        Request
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21607'>SPARK-21607</a>] -         Can dropTempView function add a param like dropTempView(viewName: String, dropSelfOnly: Boolean)
</li>
</ul>
                
<h2>        Umbrella
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-6235'>SPARK-6235</a>] -         Address various 2G limits
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14220'>SPARK-14220</a>] -         Build and test Spark against Scala 2.12
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23899'>SPARK-23899</a>] -         Built-in SQL Function Improvement
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24090'>SPARK-24090</a>] -         Kubernetes Backend Hotlist for Spark 2.4
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25319'>SPARK-25319</a>] -         Spark MLlib, GraphX 2.4 QA umbrella
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25419'>SPARK-25419</a>] -         Parquet predicate pushdown improvement
</li>
</ul>
                                                                
<h2>        Documentation
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21261'>SPARK-21261</a>] -         SparkSQL regexpExpressions example 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23231'>SPARK-23231</a>] -         Add doc for string indexer ordering to user guide (also to RFormula guide)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23254'>SPARK-23254</a>] -         Add user guide entry for DataFrame multivariate summary
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23256'>SPARK-23256</a>] -         Add columnSchema method to PySpark image reader
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23329'>SPARK-23329</a>] -         Update the function descriptions with the arguments and returned values of the trigonometric functions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23566'>SPARK-23566</a>] -         Arguement name fix
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23642'>SPARK-23642</a>] -         isZero scaladoc for LongAccumulator describes wrong method
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23792'>SPARK-23792</a>] -         Documentation improvements for datetime functions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24134'>SPARK-24134</a>] -         A missing full-stop in doc &quot;Tuning Spark&quot;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24171'>SPARK-24171</a>] -         Update comments for non-deterministic functions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24191'>SPARK-24191</a>] -         Scala example code for Power Iteration Clustering in Spark ML examples
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24224'>SPARK-24224</a>] -         Java example code for Power Iteration Clustering in spark.ml
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24378'>SPARK-24378</a>] -         Incorrect examples for date_trunc function in spark 2.3.0
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24444'>SPARK-24444</a>] -         Improve pandas_udf GROUPED_MAP docs to explain column assignment
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24507'>SPARK-24507</a>] -         Description in &quot;Level of Parallelism in Data Receiving&quot; section of Spark Streaming Programming Guide in is not relevan for the recent Kafka direct apprach 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24628'>SPARK-24628</a>] -         Typos of the example code in docs/mllib-data-types.md
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25082'>SPARK-25082</a>] -         Documentation for Spark Function expm1 is incomplete
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25273'>SPARK-25273</a>] -         How to install testthat v1.0.2
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25583'>SPARK-25583</a>] -         Add newly added History server related configurations in the documentation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25656'>SPARK-25656</a>] -         Add an example section about how to use Parquet/ORC library options
</li>
</ul>