Release Notes - ASF JIRA

Release Notes - Spark - Version 2.3.0 - HTML format

Configure Release Notes

Sub-task

[SPARK-9104] - expose network layer memory usage
[SPARK-10365] - Support Parquet logical type TIMESTAMP_MICROS
[SPARK-11034] - Launcher: add support for monitoring Mesos apps
[SPARK-11035] - Launcher: allow apps to be launched in-process
[SPARK-12375] - VectorIndexer: allow unknown categories
[SPARK-13534] - Implement Apache Arrow serializer for Spark DataFrame for use in DataFrame.toPandas
[SPARK-13969] - Extend input format that feature hashing can handle
[SPARK-14280] - Update change-version.sh and pom.xml to add Scala 2.12 profiles
[SPARK-14650] - Compile Spark REPL for Scala 2.12
[SPARK-14878] - Support Trim characters in the string trim function
[SPARK-17074] - generate equi-height histogram for column
[SPARK-17139] - Add model summary for MultinomialLogisticRegression
[SPARK-17642] - Support DESC FORMATTED TABLE COLUMN command to show column-level statistics
[SPARK-17729] - Enable creating hive bucketed tables
[SPARK-18016] - Code Generation: Constant Pool Past Limit for Wide/Nested Dataset
[SPARK-18294] - Implement commit protocol to support `mapred` package's committer
[SPARK-19165] - UserDefinedFunction should verify call arguments and provide readable exception in case of mismatch
[SPARK-19357] - Parallel Model Evaluation for ML Tuning: Scala
[SPARK-19634] - Feature parity for descriptive statistics in MLlib
[SPARK-19762] - Implement aggregator/loss function hierarchy and apply to linear regression
[SPARK-19791] - Add doc and example for fpgrowth
[SPARK-20396] - groupBy().apply() with pandas udf in pyspark
[SPARK-20417] - Move error reporting for subquery from Analyzer to CheckAnalysis
[SPARK-20585] - R generic hint support
[SPARK-20641] - Key-value store abstraction and implementation for storing application data
[SPARK-20642] - Use key-value store to keep History Server application listing
[SPARK-20643] - Implement listener for saving application status data in key-value store
[SPARK-20644] - Hook up Spark UI to the new key-value store backend
[SPARK-20645] - Make Environment page use new app state store
[SPARK-20646] - Make Executors page use new app state store
[SPARK-20647] - Make the Storage page use new app state store
[SPARK-20648] - Make Jobs and Stages pages use the new app state store
[SPARK-20649] - Simplify REST API class hierarchy
[SPARK-20650] - Remove JobProgressListener (and other unneeded classes)
[SPARK-20652] - Make SQL UI use new app state store
[SPARK-20653] - Add auto-cleanup of old elements to the new app state store
[SPARK-20654] - Add controls for how much disk the SHS can use
[SPARK-20655] - In-memory key-value store implementation
[SPARK-20657] - Speed up Stage page
[SPARK-20664] - Remove stale applications from SHS listing
[SPARK-20727] - Skip SparkR tests when missing Hadoop winutils on CRAN windows machines
[SPARK-20748] - Built-in SQL Function Support - CH[A]R
[SPARK-20749] - Built-in SQL Function Support - all variants of LEN[GTH]
[SPARK-20750] - Built-in SQL Function Support - REPLACE
[SPARK-20751] - Built-in SQL Function Support - COT
[SPARK-20754] - Add Function Alias For MOD/TRUNCT/POSITION
[SPARK-20770] - Improve ColumnStats
[SPARK-20783] - Enhance ColumnVector to support compressed representation
[SPARK-20791] - Use Apache Arrow to Improve Spark createDataFrame from Pandas.DataFrame
[SPARK-20822] - Generate code to get value from CachedBatchColumnVector in ColumnarBatch
[SPARK-20881] - Clearly document the mechanism to choose between two sources of statistics
[SPARK-20909] - Build-in SQL Function Support - DAYOFWEEK
[SPARK-20910] - Build-in SQL Function Support - UUID
[SPARK-20931] - Built-in SQL Function ABS support string type
[SPARK-20948] - Built-in SQL Function UnaryMinus/UnaryPositive support string type
[SPARK-20961] - generalize the dictionary in ColumnVector
[SPARK-20962] - Support subquery column aliases in FROM clause
[SPARK-20963] - Support column aliases for aliased relation in FROM clause
[SPARK-20988] - Convert logistic regression to new aggregator framework
[SPARK-21007] - Add SQL function - RIGHT && LEFT
[SPARK-21031] - Add `alterTableStats` to store spark's stats and let `alterTable` keep existing stats
[SPARK-21046] - simplify the array offset and length in ColumnVector
[SPARK-21047] - Add test suites for complicated cases in ColumnarBatchSuite
[SPARK-21051] - Add hash map metrics to aggregate
[SPARK-21052] - Add hash map metrics to join
[SPARK-21083] - Store zero size and row count after analyzing empty table
[SPARK-21087] - CrossValidator, TrainValidationSplit should collect all models when fitting: Scala API
[SPARK-21127] - Update statistics after data changing commands
[SPARK-21180] - Remove conf from stats functions since now we have conf in LogicalPlan
[SPARK-21190] - SPIP: Vectorized UDFs in Python
[SPARK-21205] - pmod(number, 0) should be null
[SPARK-21213] - Support collecting partition-level statistics: rowCount and sizeInBytes
[SPARK-21237] - Invalidate stats once table data is changed
[SPARK-21322] - support histogram in filter cardinality estimation
[SPARK-21324] - Improve statistics test suites
[SPARK-21375] - Add date and timestamp support to ArrowConverters for toPandas() collection
[SPARK-21440] - Refactor ArrowConverters and add ArrayType and StructType support.
[SPARK-21456] - Make the driver failover_timeout configurable (Mesos cluster mode)
[SPARK-21552] - Add decimal type support to ArrowWriter.
[SPARK-21625] - Add incompatible Hive UDF describe to DOC
[SPARK-21654] - Complement predicates expression description
[SPARK-21671] - Move kvstore package to util.kvstore, add annotations
[SPARK-21720] - Filter predicate with many conditions throw stackoverflow error
[SPARK-21778] - Simpler Dataset.sample API in Scala / Java
[SPARK-21779] - Simpler Dataset.sample API in Python
[SPARK-21780] - Simpler Dataset.sample API in R
[SPARK-21805] - disable R vignettes code on Windows
[SPARK-21893] - Put Kafka 0.8 behind a profile
[SPARK-21895] - Support changing database in HiveClient
[SPARK-21934] - Expose Netty memory usage via Metrics System
[SPARK-21984] - Use histogram stats in join estimation
[SPARK-22026] - data source v2 write path
[SPARK-22032] - Speed up StructType.fromInternal
[SPARK-22053] - Implement stream-stream inner join in Append mode
[SPARK-22078] - clarify exception behaviors for all data source v2 interfaces
[SPARK-22086] - Add expression description for CASE WHEN
[SPARK-22087] - Clear remaining compile errors for 2.12; resolve most warnings
[SPARK-22100] - Make percentile_approx support date/timestamp type and change the output type to be the same as input type
[SPARK-22128] - Update paranamer to 2.8 to avoid BytecodeReadingParanamer ArrayIndexOutOfBoundsException with Scala 2.12 + Java 8 lambda
[SPARK-22136] - Implement stream-stream outer joins in append mode
[SPARK-22197] - push down operators to data source before planning
[SPARK-22221] - Add User Documentation for Working with Arrow in Spark
[SPARK-22226] - splitExpression can create too many method calls (generating a Constant Pool limit error)
[SPARK-22278] - Expose current event time watermark and current processing time in GroupState
[SPARK-22285] - Change implementation of ApproxCountDistinctForIntervals to TypedImperativeAggregate
[SPARK-22310] - Refactor join estimation to incorporate estimation logic for different kinds of statistics
[SPARK-22322] - Update FutureAction for compatibility with Scala 2.12 future
[SPARK-22324] - Upgrade Arrow to version 0.8.0 and upgrade Netty to 4.1.17
[SPARK-22344] - Prevent R CMD check from using /tmp
[SPARK-22361] - Add unit test for Window Frames
[SPARK-22363] - Add unit test for Window spilling
[SPARK-22387] - propagate session configs to data source read/write options
[SPARK-22389] - partitioning reporting
[SPARK-22392] - columnar reader interface
[SPARK-22400] - rename some APIs and classes to make their meaning clearer
[SPARK-22409] - Add function type argument to pandas_udf
[SPARK-22452] - DataSourceV2Options should have getInt, getBoolean, etc.
[SPARK-22475] - show histogram in DESC COLUMN command
[SPARK-22483] - Exposing java.nio bufferedPool memory metrics to metrics system
[SPARK-22494] - Coalesce and AtLeastNNonNulls can cause 64KB JVM bytecode limit exception
[SPARK-22498] - 64KB JVM bytecode limit problem with concat
[SPARK-22499] - 64KB JVM bytecode limit problem with least and greatest
[SPARK-22500] - 64KB JVM bytecode limit problem with cast
[SPARK-22501] - 64KB JVM bytecode limit problem with in
[SPARK-22508] - 64KB JVM bytecode limit problem with GenerateUnsafeRowJoiner.create()
[SPARK-22514] - move ColumnVector.Array and ColumnarBatch.Row to individual files
[SPARK-22515] - Estimation relation size based on numRows * rowSize
[SPARK-22529] - Relation stats should be consistent with other plans based on cbo config
[SPARK-22530] - Add ArrayType Support for working with Pandas and Arrow
[SPARK-22542] - remove unused features in ColumnarBatch
[SPARK-22543] - fix java 64kb compile error for deeply nested expressions
[SPARK-22549] - 64KB JVM bytecode limit problem with concat_ws
[SPARK-22550] - 64KB JVM bytecode limit problem with elt
[SPARK-22570] - Create a lot of global variables to reuse an object in generated code
[SPARK-22602] - remove ColumnVector#loadBytes
[SPARK-22603] - 64KB JVM bytecode limit problem with FormatString
[SPARK-22604] - remove the get address methods from ColumnVector
[SPARK-22626] - Wrong Hive table statistics may trigger OOM if enables CBO
[SPARK-22643] - ColumnarArray should be an immutable view
[SPARK-22646] - Spark on Kubernetes - basic submission client
[SPARK-22648] - Documentation for Kubernetes Scheduler Backend
[SPARK-22652] - remove set methods in ColumnarRow
[SPARK-22669] - Avoid unnecessary function calls in code generation
[SPARK-22693] - Avoid the generation of useless mutable states in complexTypeCreator and predicates
[SPARK-22695] - Avoid the generation of useless mutable states by scalaUDF
[SPARK-22696] - Avoid the generation of useless mutable states by objects functions
[SPARK-22699] - Avoid the generation of useless mutable states by GenerateSafeProjection
[SPARK-22703] - ColumnarRow should be an immutable view
[SPARK-22716] - Avoid the creation of mutable states in addReferenceObj
[SPARK-22732] - Add DataSourceV2 streaming APIs
[SPARK-22733] - refactor StreamExecution for extensibility
[SPARK-22745] - read partition stats from Hive
[SPARK-22746] - Avoid the generation of useless mutable states by SortMergeJoin
[SPARK-22750] - Introduce reusable mutable states
[SPARK-22757] - Init-container in the driver/executor pods for downloading remote dependencies
[SPARK-22762] - Basic tests for IfCoercion and CaseWhenCoercion
[SPARK-22772] - elt should use splitExpressionsWithCurrentInputs to split expression codes
[SPARK-22775] - move dictionary related APIs from ColumnVector to WritableColumnVector
[SPARK-22785] - rename ColumnVector.anyNullsSet to hasNull
[SPARK-22789] - Add ContinuousExecution for continuous processing queries
[SPARK-22807] - Change configuration options to use "container" instead of "docker"
[SPARK-22816] - Basic tests for PromoteStrings and InConversion
[SPARK-22821] - Basic tests for WidenSetOperationTypes, BooleanEquality, StackCoercion and Division
[SPARK-22822] - Basic tests for WindowFrameCoercion and DecimalPrecision
[SPARK-22829] - Add new built-in function date_trunc()
[SPARK-22845] - Modify spark.kubernetes.allocation.batch.delay to take time instead of int
[SPARK-22848] - Avoid the generation of useless mutable states by Stack function
[SPARK-22890] - Basic tests for DateTimeOperations
[SPARK-22892] - Simplify some estimation logic by using double instead of decimal
[SPARK-22904] - Basic tests for decimal operations and string cast
[SPARK-22908] - add basic continuous kafka source
[SPARK-22909] - Move Structured Streaming v2 APIs to streaming package
[SPARK-22912] - Support v2 streaming sources and sinks in MicroBatchExecution
[SPARK-22917] - Should not try to generate histogram for empty/null columns
[SPARK-22930] - Improve the description of Vectorized UDFs for non-deterministic cases
[SPARK-22978] - Register Scalar Vectorized UDFs for SQL Statement
[SPARK-22980] - Using pandas_udf when inputs are not Pandas's Series or DataFrame
[SPARK-23033] - disable task-level retry for continuous execution
[SPARK-23045] - Have RFormula use OneHotEncoderEstimator
[SPARK-23046] - Have RFormula include VectorSizeHint in pipeline
[SPARK-23047] - Change MapVector to NullableMapVector in ArrowColumnVector
[SPARK-23052] - Migrate Microbatch ConsoleSink to v2
[SPARK-23063] - Changes to publish the spark-kubernetes package
[SPARK-23064] - Add documentation for stream-stream joins
[SPARK-23093] - don't modify run id
[SPARK-23107] - ML, Graph 2.3 QA: API: New Scala APIs, docs
[SPARK-23108] - ML, Graph 2.3 QA: API: Experimental, DeveloperApi, final, sealed audit
[SPARK-23110] - ML 2.3 QA: API: Java compatibility, docs
[SPARK-23111] - ML, Graph 2.3 QA: Update user guide for new features & APIs
[SPARK-23112] - ML, Graph 2.3 QA: Programming guide update and migration guide
[SPARK-23116] - SparkR 2.3 QA: Update user guide for new features & APIs
[SPARK-23118] - SparkR 2.3 QA: Programming guide, migration guide, vignettes updates
[SPARK-23137] - spark.kubernetes.executor.podNamePrefix is ignored
[SPARK-23196] - Unify continuous and microbatch V2 sinks
[SPARK-23218] - simplify ColumnVector.getArray
[SPARK-23219] - Rename ReadTask to DataReaderFactory
[SPARK-23260] - remove V2 from the class name of data source reader/writer
[SPARK-23261] - Rename Pandas UDFs
[SPARK-23262] - mix-in interface should extend the interface it aimed to mix in
[SPARK-23268] - Reorganize packages in data source V2
[SPARK-23272] - add calendar interval type support to ColumnVector
[SPARK-23280] - add map type support to ColumnVector
[SPARK-23314] - Pandas grouped udf on dataset with timestamp column error
[SPARK-23334] - Fix pandas_udf with return type StringType() to handle str type properly in Python 2.
[SPARK-23352] - Explicitly specify supported types in Pandas UDFs
[SPARK-23446] - Explicitly check supported types in toPandas
[SPARK-24077] - Issue a better error message for `CREATE TEMPORARY FUNCTION IF NOT EXISTS`

Bug

[SPARK-3151] - DiskStore attempts to map any size BlockId without checking MappedByteBuffer limit
[SPARK-3577] - Add task metric to report spill time
[SPARK-3685] - Spark's local dir should accept only local paths
[SPARK-5484] - Pregel should checkpoint periodically to avoid StackOverflowError
[SPARK-9825] - Spark overwrites remote cluster "final" properties with local config
[SPARK-10719] - SQLImplicits.rddToDataFrameHolder is not thread safe when using Scala 2.10
[SPARK-11334] - numRunningTasks can't be less than 0, or it will affect executor allocation
[SPARK-12552] - Recovered driver's resource is not counted in the Master
[SPARK-12559] - Cluster mode doesn't work with --packages
[SPARK-12717] - pyspark broadcast fails when using multiple threads
[SPARK-13669] - Job will always fail in the external shuffle service unavailable situation
[SPARK-13757] - support quoted column names in schema string at types.py#_parse_datatype_string
[SPARK-13933] - hadoop-2.7 profile's curator version should be 2.7.1
[SPARK-13983] - HiveThriftServer2 can not get "--hiveconf" or ''--hivevar" variables since 1.6 version (both multi-session and single session)
[SPARK-14034] - Converting to Dataset causes wrong order and values in nested array of documents
[SPARK-14228] - Lost executor of RPC disassociated, and occurs exception: Could not find CoarseGrainedScheduler or it has been stopped
[SPARK-14387] - Enable Hive-1.x ORC compatibility with spark.sql.hive.convertMetastoreOrc
[SPARK-14408] - Update RDD.treeAggregate not to use reduce
[SPARK-14657] - RFormula output wrong features when formula w/o intercept
[SPARK-15243] - Binarizer.explainParam(u"...") raises ValueError
[SPARK-15474] - ORC data source fails to write and read back empty dataframe
[SPARK-16167] - RowEncoder should preserve array/map type nullability.
[SPARK-16542] - bugs about types that result an array of null when creating dataframe using python
[SPARK-16548] - java.io.CharConversionException: Invalid UTF-32 character prevents me from querying my data
[SPARK-16605] - Spark2.0 cannot "select" data from a table stored as an orc file which has been created by hive while hive or spark1.6 supports
[SPARK-16628] - OrcConversions should not convert an ORC table represented by MetastoreRelation to HadoopFsRelation if metastore schema does not match schema stored in ORC files
[SPARK-16986] - "Started" time, "Completed" time and "Last Updated" time in history server UI are not user local time
[SPARK-17029] - Dataset toJSON goes through RDD form instead of transforming dataset itself
[SPARK-17047] - Spark 2 cannot create table when CLUSTERED.
[SPARK-17284] - Remove statistics-related table properties from SHOW CREATE TABLE
[SPARK-17321] - YARN shuffle service should use good disk from yarn.nodemanager.local-dirs
[SPARK-17410] - Move Hive-generated Stats Info to HiveClientImpl
[SPARK-17528] - data should be copied properly before saving into InternalRow
[SPARK-17742] - Spark Launcher does not get failed state in Listener
[SPARK-17788] - RangePartitioner results in few very large tasks and many small to empty tasks
[SPARK-17851] - Make sure all test sqls in catalyst pass checkAnalysis
[SPARK-17902] - collect() ignores stringsAsFactors
[SPARK-17914] - Spark SQL casting to TimestampType with nanosecond results in incorrect timestamp
[SPARK-17920] - HiveWriterContainer passes null configuration to serde.initialize, causing NullPointerException in AvroSerde when using avro.schema.url
[SPARK-18004] - DataFrame filter Predicate push-down fails for Oracle Timestamp type columns
[SPARK-18061] - Spark Thriftserver needs to create SPNego principal
[SPARK-18355] - Spark SQL fails to read data from a ORC hive table that has a new column added to it
[SPARK-18394] - Executing the same query twice in a row results in CodeGenerator cache misses
[SPARK-18608] - Spark ML algorithms that check RDD cache level for internal caching double-cache data
[SPARK-18646] - ExecutorClassLoader for spark-shell does not honor spark.executor.userClassPathFirst
[SPARK-18935] - Use Mesos "Dynamic Reservation" resource for Spark
[SPARK-18950] - Report conflicting fields when merging two StructTypes.
[SPARK-19109] - ORC metadata section can sometimes exceed protobuf message size limit
[SPARK-19122] - Unnecessary shuffle+sort added if join predicates ordering differ from bucketing and sorting order
[SPARK-19326] - Speculated task attempts do not get launched in few scenarios
[SPARK-19372] - Code generation for Filter predicate including many OR conditions exceeds JVM method size limit
[SPARK-19451] - rangeBetween method should accept Long value as boundary
[SPARK-19471] - A confusing NullPointerException when creating table
[SPARK-19531] - History server doesn't refresh jobs for long-life apps like thriftserver
[SPARK-19580] - Support for avro.schema.url while writing to hive table
[SPARK-19644] - Memory leak in Spark Streaming (Encoder/Scala Reflection)
[SPARK-19688] - Spark on Yarn Credentials File set to different application directory
[SPARK-19726] - Faild to insert null timestamp value to mysql using spark jdbc
[SPARK-19753] - Remove all shuffle files on a host in case of slave lost of fetch failure
[SPARK-19809] - NullPointerException on zero-size ORC file
[SPARK-19812] - YARN shuffle service fails to relocate recovery DB across NFS directories
[SPARK-19824] - Standalone master JSON not showing cores for running applications
[SPARK-19900] - [Standalone] Master registers application again when driver relaunched
[SPARK-19910] - `stack` should not reject NULL values due to type mismatch
[SPARK-20025] - Driver fail over will not work, if SPARK_LOCAL* env is set.
[SPARK-20065] - Empty output files created for aggregation query in append mode
[SPARK-20079] - Re registration of AM hangs spark cluster in yarn-client mode
[SPARK-20098] - DataType's typeName method returns with 'StructF' in case of StructField
[SPARK-20140] - Remove hardcoded kinesis retry wait and max retries
[SPARK-20205] - DAGScheduler posts SparkListenerStageSubmitted before updating stage
[SPARK-20213] - DataFrameWriter operations do not show up in SQL tab
[SPARK-20256] - Fail to start SparkContext/SparkSession with Hive support enabled when user does not have read/write privilege to Hive metastore warehouse dir
[SPARK-20288] - Improve BasicSchedulerIntegrationSuite "multi-stage job"
[SPARK-20311] - SQL "range(N) as alias" or "range(N) alias" doesn't work
[SPARK-20312] - query optimizer calls udf with null values when it doesn't expect them
[SPARK-20329] - Resolution error when HAVING clause uses GROUP BY expression that involves implicit type coercion
[SPARK-20333] - Fix HashPartitioner in DAGSchedulerSuite
[SPARK-20338] - Spaces in spark.eventLog.dir are not correctly handled
[SPARK-20341] - Support BigIngeger values > 19 precision
[SPARK-20342] - DAGScheduler sends SparkListenerTaskEnd before updating task's accumulators
[SPARK-20345] - Fix STS error handling logic on HiveSQLException
[SPARK-20356] - Spark sql group by returns incorrect results after join + distinct transformations
[SPARK-20359] - Catalyst EliminateOuterJoin optimization can cause NPE
[SPARK-20365] - Not so accurate classpath format for AM and Containers
[SPARK-20367] - Spark silently escapes partition column names
[SPARK-20380] - describe table not showing updated table comment after alter operation
[SPARK-20412] - NullPointerException in places expecting non-optional partitionSpec.
[SPARK-20427] - Issue with Spark interpreting Oracle datatype NUMBER
[SPARK-20439] - Catalog.listTables() depends on all libraries used to create tables
[SPARK-20451] - Filter out nested mapType datatypes from sort order in randomSplit
[SPARK-20453] - Bump master branch version to 2.3.0-SNAPSHOT
[SPARK-20466] - HadoopRDD#addLocalConfiguration throws NPE
[SPARK-20541] - SparkR SS should support awaitTermination without timeout
[SPARK-20543] - R should skip long running or non-essential tests when running on CRAN
[SPARK-20565] - Improve the error message for unsupported JDBC types
[SPARK-20569] - RuntimeReplaceable functions accept invalid third parameter
[SPARK-20586] - Add deterministic to ScalaUDF
[SPARK-20591] - Succeeded tasks num not equal in job page and job detail page on spark web ui when speculative task(s) exist
[SPARK-20605] - Deprecate not used AM and executor port configuration
[SPARK-20609] - Run the SortShuffleSuite unit tests have residual spark_* system directory
[SPARK-20613] - Double quotes in Windows batch script
[SPARK-20626] - Fix SparkR test warning on Windows with timestamp time zone
[SPARK-20633] - FileFormatWriter wrap the FetchFailedException which breaks job's failover
[SPARK-20640] - Make rpc timeout and retry for shuffle registration configurable
[SPARK-20689] - python doctest leaking bucketed table
[SPARK-20690] - Subqueries in FROM should have alias names
[SPARK-20704] - CRAN test should run single threaded
[SPARK-20706] - Spark-shell not overriding method/variable definition
[SPARK-20708] - Make `addExclusionRules` up-to-date
[SPARK-20713] - Speculative task that got CommitDenied exception shows up as failed
[SPARK-20719] - Support LIMIT ALL
[SPARK-20756] - yarn-shuffle jar has references to unshaded guava and contains scala classes
[SPARK-20786] - Improve ceil and floor handle the value which is not expected
[SPARK-20815] - NullPointerException in RPackageUtils#checkManifestForR
[SPARK-20832] - Standalone master should explicitly inform drivers of worker deaths and invalidate external shuffle service outputs
[SPARK-20865] - caching dataset throws "Queries with streaming sources must be executed with writeStream.start()"
[SPARK-20873] - Improve the error message for unsupported Column Type
[SPARK-20876] - If the input parameter is float type for ceil or floor ,the result is not we expected
[SPARK-20898] - spark.blacklist.killBlacklistedExecutors doesn't work in YARN
[SPARK-20904] - Task failures during shutdown cause problems with preempted executors
[SPARK-20906] - Constrained Logistic Regression for SparkR
[SPARK-20914] - Javadoc contains code that is invalid
[SPARK-20916] - Improve error message for unaliased subqueries in FROM clause
[SPARK-20918] - Use FunctionIdentifier as function identifiers in FunctionRegistry
[SPARK-20922] - Unsafe deserialization in Spark LauncherConnection
[SPARK-20923] - TaskMetrics._updatedBlockStatuses uses a lot of memory
[SPARK-20926] - Exposure to Guava libraries by directly accessing tableRelationCache in SessionCatalog caused failures
[SPARK-20935] - A daemon thread, "BatchedWriteAheadLog Writer", left behind after terminating StreamingContext.
[SPARK-20945] - NoSuchElementException key not found in TaskSchedulerImpl
[SPARK-20976] - Unify Error Messages for FAILFAST mode.
[SPARK-20978] - CSV emits NPE when the number of tokens is less than given schema and corrupt column is given
[SPARK-20989] - Fail to start multiple workers on one host if external shuffle service is enabled in standalone mode
[SPARK-20991] - BROADCAST_TIMEOUT conf should be a timeoutConf
[SPARK-20997] - spark-submit's --driver-cores marked as "YARN-only" but listed under "Spark standalone with cluster deploy mode only"
[SPARK-21033] - fix the potential OOM in UnsafeExternalSorter
[SPARK-21041] - With whole-stage codegen, SparkSession.range()'s behavior is inconsistent with SparkContext.range()
[SPARK-21050] - ml word2vec write has overflow issue in calculating numPartitions
[SPARK-21055] - Support grouping__id
[SPARK-21057] - Do not use a PascalDistribution in countApprox
[SPARK-21064] - Fix the default value bug in NettyBlockTransferServiceSuite
[SPARK-21066] - LibSVM load just one input file
[SPARK-21093] - Multiple gapply execution occasionally failed in SparkR
[SPARK-21101] - Error running Hive temporary UDTF on latest Spark 2.2
[SPARK-21102] - Refresh command is too aggressive in parsing
[SPARK-21112] - ALTER TABLE SET TBLPROPERTIES should not overwrite COMMENT
[SPARK-21119] - unset table properties should keep the table comment
[SPARK-21124] - Wrong user shown in UI when using kerberos
[SPARK-21138] - Cannot delete staging dir when the clusters of "spark.yarn.stagingDir" and "spark.hadoop.fs.defaultFS" are different
[SPARK-21145] - Restarted queries reuse same StateStoreProvider, causing multiple concurrent tasks to update same StateStore
[SPARK-21147] - the schema of socket/rate source can not be set.
[SPARK-21163] - DataFrame.toPandas should respect the data type
[SPARK-21165] - Fail to write into partitioned hive table due to attribute reference not working with cast on partition column
[SPARK-21167] - Path is not decoded correctly when reading output of FileSink
[SPARK-21170] - Utils.tryWithSafeFinallyAndFailureCallbacks throws IllegalArgumentException: Self-suppression not permitted
[SPARK-21181] - Suppress memory leak errors reported by netty
[SPARK-21188] - releaseAllLocksForTask should synchronize the whole method
[SPARK-21204] - RuntimeException with Set and Case Class in Spark 2.1.1
[SPARK-21216] - Streaming DataFrames fail to join with Hive tables
[SPARK-21219] - Task retry occurs on same executor due to race condition with blacklisting
[SPARK-21223] - Thread-safety issue in FsHistoryProvider
[SPARK-21225] - decrease the Mem using for variable 'tasks' in function resourceOffers
[SPARK-21228] - InSet incorrect handling of structs
[SPARK-21248] - Flaky test: o.a.s.sql.kafka010.KafkaSourceSuite.assign from specific offsets (failOnDataLoss: true)
[SPARK-21254] - History UI: Taking over 1 minute for initial page display
[SPARK-21255] - NPE when creating encoder for enum
[SPARK-21263] - NumberFormatException is not thrown while converting an invalid string to float/double
[SPARK-21264] - Omitting columns with 'how' specified in join in PySpark throws NPE
[SPARK-21271] - UnsafeRow.hashCode assertion when sizeInBytes not multiple of 8
[SPARK-21272] - SortMergeJoin LeftAnti does not update numOutputRows
[SPARK-21278] - Upgrade to Py4J 0.10.6
[SPARK-21281] - cannot create empty typed array column
[SPARK-21283] - FileOutputStream should be created as append mode
[SPARK-21284] - rename SessionCatalog.registerFunction parameter name
[SPARK-21300] - ExternalMapToCatalyst should null-check map key prior to converting to internal value.
[SPARK-21306] - OneVsRest Conceals Columns That May Be Relevant To Underlying Classifier
[SPARK-21312] - UnsafeRow writeToStream has incorrect offsetInByteArray calculation for non-zero offset
[SPARK-21319] - UnsafeExternalRowSorter.RowComparator memory leak
[SPARK-21327] - ArrayConstructor should handle an array of typecode 'l' as long rather than int in Python 2.
[SPARK-21330] - Bad partitioning does not allow to read a JDBC table with extreme values on the partition column
[SPARK-21332] - Incorrect result type inferred for some decimal expressions
[SPARK-21333] - joinWith documents and analysis allow invalid join types
[SPARK-21335] - support un-aliased subquery
[SPARK-21338] - AggregatedDialect doesn't override isCascadingTruncateTable() method
[SPARK-21339] - spark-shell --packages option does not add jars to classpath on windows
[SPARK-21342] - Fix DownloadCallback to work well with RetryingBlockFetcher
[SPARK-21343] - Refine the document for spark.reducer.maxReqSizeShuffleToMem
[SPARK-21345] - SparkSessionBuilderSuite should clean up stopped sessions
[SPARK-21350] - Fix the error message when the number of arguments is wrong when invoking a UDF
[SPARK-21354] - INPUT FILE related functions do not support more than one sources
[SPARK-21357] - FileInputDStream not remove out of date RDD
[SPARK-21369] - Don't use Scala classes in external shuffle service
[SPARK-21374] - Reading globbed paths from S3 into DF doesn't work if filesystem caching is disabled
[SPARK-21376] - Token is not renewed in yarn client process in cluster mode
[SPARK-21377] - Jars specified with --jars or --packages are not added into AM's system classpath
[SPARK-21383] - YARN can allocate too many executors
[SPARK-21384] - Spark 2.2 + YARN without spark.yarn.jars / spark.yarn.archive fails
[SPARK-21394] - Reviving broken callable objects in UDF in PySpark
[SPARK-21400] - Spark shouldn't ignore user defined output committer in append mode
[SPARK-21403] - Cluster mode doesn't work with --packages [Mesos]
[SPARK-21411] - Failed to get new HDFS delegation tokens in AMCredentialRenewer
[SPARK-21414] - Buffer in SlidingWindowFunctionFrame could be big though window is small
[SPARK-21418] - NoSuchElementException: None.get in DataSourceScanExec with sun.io.serialization.extendedDebugInfo=true
[SPARK-21422] - Depend on Apache ORC 1.4.0
[SPARK-21428] - CliSessionState never be recognized because of IsolatedClientLoader
[SPARK-21432] - Reviving broken partial functions in UDF in PySpark
[SPARK-21439] - Cannot use Spark with Python ABCmeta (exception from cloudpickle)
[SPARK-21441] - Incorrect Codegen in SortMergeJoinExec results failures in some cases
[SPARK-21444] - Fetch failure due to node reboot causes job failure
[SPARK-21445] - NotSerializableException thrown by UTF8String.IntWrapper
[SPARK-21446] - [SQL] JDBC Postgres fetchsize parameter ignored again
[SPARK-21447] - Spark history server fails to render compressed inprogress history file in some cases.
[SPARK-21451] - HiveConf in SparkSQLCLIDriver doesn't respect spark.hadoop.some.hive.variables
[SPARK-21457] - ExternalCatalog.listPartitions should correctly handle partition values with dot
[SPARK-21462] - Add batchId to the json of StreamingQueryProgress
[SPARK-21463] - Output of StructuredStreaming tables don't respect user specified schema when reading back the table
[SPARK-21490] - SparkLauncher may fail to redirect streams
[SPARK-21494] - Spark 2.2.0 AES encryption not working with External shuffle
[SPARK-21498] - quick start -> one py demo have some bug in code
[SPARK-21501] - Spark shuffle index cache size should be memory based
[SPARK-21502] - --supervise causing frameworkId conflicts in mesos cluster mode
[SPARK-21503] - Spark UI shows incorrect task status for a killed Executor Process
[SPARK-21508] - Documentation on 'Spark Streaming Custom Receivers' has error in example code
[SPARK-21512] - DatasetCacheSuite needs to execute unpersistent after executing peristent
[SPARK-21516] - overriding afterEach() in DatasetCacheSuite must call super.afterEach()
[SPARK-21522] - Flaky test: LauncherServerSuite.testStreamFiltering
[SPARK-21523] - Fix bug of strong wolfe linesearch `init` parameter lose effectiveness
[SPARK-21534] - PickleException when creating dataframe from python row with empty bytearray
[SPARK-21541] - Spark Logs show incorrect job status for a job that does not create SparkContext
[SPARK-21546] - dropDuplicates with watermark yields RuntimeException due to binding failure
[SPARK-21549] - Spark fails to complete job correctly in case of OutputFormat which do not write into hdfs
[SPARK-21551] - pyspark's collect fails when getaddrinfo is too slow
[SPARK-21555] - GROUP BY don't work with expressions with NVL and nested objects
[SPARK-21563] - Race condition when serializing TaskDescriptions and adding jars
[SPARK-21565] - aggregate query fails with watermark on eventTime but works with watermark on timestamp column generated by current_timestamp
[SPARK-21567] - Dataset with Tuple of type alias throws error
[SPARK-21568] - ConsoleProgressBar should only be enabled in shells
[SPARK-21571] - Spark history server leaves incomplete or unreadable history files around forever.
[SPARK-21580] - A bug with `Group by ordinal`
[SPARK-21585] - Application Master marking application status as Failed for Client Mode
[SPARK-21587] - Filter pushdown for EventTime Watermark Operator
[SPARK-21588] - SQLContext.getConf(key, null) should return null, but it throws NPE
[SPARK-21593] - Fix broken configuration page
[SPARK-21595] - introduction of spark.sql.windowExec.buffer.spill.threshold in spark 2.2 breaks existing workflow
[SPARK-21596] - Audit the places calling HDFSMetadataLog.get
[SPARK-21597] - Avg event time calculated in progress may be wrong
[SPARK-21599] - Collecting column statistics for datasource tables may fail with java.util.NoSuchElementException
[SPARK-21605] - Let IntelliJ IDEA correctly detect Language level and Target byte code version
[SPARK-21610] - Corrupt records are not handled properly when creating a dataframe from a file
[SPARK-21615] - Fix broken redirect in collaborative filtering docs to databricks training repo
[SPARK-21617] - ALTER TABLE...ADD COLUMNS broken in Hive 2.1 for DS tables
[SPARK-21621] - Reset numRecordsWritten after DiskBlockObjectWriter.commitAndGet called
[SPARK-21637] - `hive.metastore.warehouse` in --hiveconf is not respected
[SPARK-21638] - Warning message of RF is not accurate
[SPARK-21642] - Use FQDN for DRIVER_HOST_ADDRESS instead of ip address
[SPARK-21644] - LocalLimit.maxRows is defined incorrectly
[SPARK-21647] - SortMergeJoin failed when using CROSS
[SPARK-21648] - Confusing assert failure in JDBC source when users misspell the option `partitionColumn`
[SPARK-21652] - Optimizer cannot reach a fixed point on certain queries
[SPARK-21656] - spark dynamic allocation should not idle timeout executors when there are enough tasks to run on them
[SPARK-21657] - Spark has exponential time complexity to explode(array of structs)
[SPARK-21677] - json_tuple throws NullPointException when column is null as string type.
[SPARK-21681] - MLOR do not work correctly when featureStd contains zero
[SPARK-21696] - State Store can't handle corrupted snapshots
[SPARK-21714] - SparkSubmit in Yarn Client mode downloads remote files and then reuploads them again
[SPARK-21721] - Memory leak in org.apache.spark.sql.hive.execution.InsertIntoHiveTable
[SPARK-21723] - Can't write LibSVM - key not found: numFeatures
[SPARK-21727] - Operating on an ArrayType in a SparkR DataFrame throws error
[SPARK-21738] - Thriftserver doesn't cancel jobs when session is closed
[SPARK-21739] - timestamp partition would fail in v2.2.0
[SPARK-21753] - running pi example with pypy on spark fails to serialize
[SPARK-21759] - In.checkInputDataTypes should not wrongly report unresolved plans for IN correlated subquery
[SPARK-21762] - FileFormatWriter/BasicWriteTaskStatsTracker metrics collection fails if a new file isn't yet visible
[SPARK-21766] - DataFrame toPandas() raises ValueError with nullable int columns
[SPARK-21767] - Add Decimal Test For Avro in VersionSuite
[SPARK-21782] - Repartition creates skews when numPartitions is a power of 2
[SPARK-21786] - The 'spark.sql.parquet.compression.codec' configuration doesn't take effect on tables with partition field(s)
[SPARK-21788] - Handle more exceptions when stopping a streaming query
[SPARK-21791] - ORC should support column names with dot
[SPARK-21793] - Correct validateAndTransformSchema in GaussianMixture and AFTSurvivalRegression
[SPARK-21798] - No config to replace deprecated SPARK_CLASSPATH config for launching daemons like History Server
[SPARK-21801] - SparkR unit test randomly fail on trees
[SPARK-21804] - json_tuple returns null values within repeated columns except the first one
[SPARK-21818] - MultivariateOnlineSummarizer.variance generate negative result
[SPARK-21826] - outer broadcast hash join should not throw NPE
[SPARK-21830] - Bump the dependency of ANTLR to version 4.7
[SPARK-21831] - Remove `spark.sql.hive.convertMetastoreOrc` config in HiveCompatibilitySuite
[SPARK-21832] - Merge SQLBuilderTest into ExpressionSQLBuilderSuite
[SPARK-21834] - Incorrect executor request in case of dynamic allocation
[SPARK-21835] - RewritePredicateSubquery should not produce unresolved query plans
[SPARK-21837] - UserDefinedTypeSuite local UDFs not actually testing what it intends
[SPARK-21845] - Make codegen fallback of expressions configurable
[SPARK-21877] - Windows command script can not handle quotes in parameter
[SPARK-21880] - [spark UI]In the SQL table page, modify jobs trace information
[SPARK-21890] - ObtainCredentials does not pass creds to addDelegationTokens
[SPARK-21904] - Rename tempTables to tempViews in SessionCatalog
[SPARK-21907] - NullPointerException in UnsafeExternalSorter.spill()
[SPARK-21912] - ORC/Parquet table should not create invalid column names
[SPARK-21913] - `withDatabase` should drop database with CASCADE
[SPARK-21917] - Remote http(s) resources is not supported in YARN mode
[SPARK-21922] - When executor failed and task metrics have not send to driver,the status will always be 'RUNNING' and the duration will be 'CurrentTime - launchTime'
[SPARK-21924] - Bug in Structured Streaming Documentation
[SPARK-21928] - ClassNotFoundException for custom Kryo registrator class during serde in netty threads
[SPARK-21929] - Support `ALTER TABLE table_name ADD COLUMNS(..)` for ORC data source
[SPARK-21941] - Stop storing unused attemptId in SQLTaskMetrics
[SPARK-21946] - Flaky test: InMemoryCatalogedDDLSuite.`alter table: rename cached table`
[SPARK-21947] - monotonically_increasing_id doesn't work in Structured Streaming
[SPARK-21950] - pyspark.sql.tests.SQLTests2 should stop SparkContext.
[SPARK-21953] - Show both memory and disk bytes spilled if either is present
[SPARK-21954] - JacksonUtils should verify MapType's value type instead of key type
[SPARK-21958] - Attempting to save large Word2Vec model hangs driver in constant GC.
[SPARK-21969] - CommandUtils.updateTableStats should call refreshTable
[SPARK-21977] - SinglePartition optimizations break certain Streaming Stateful Aggregation requirements
[SPARK-21979] - Improve QueryPlanConstraints framework
[SPARK-21980] - References in grouping functions should be indexed with resolver
[SPARK-21985] - PySpark PairDeserializer is broken for double-zipped RDDs
[SPARK-21987] - Spark 2.3 cannot read 2.2 event logs
[SPARK-21991] - [LAUNCHER] LauncherServer acceptConnections thread sometime dies if machine has very high load
[SPARK-21996] - Streaming ignores files with spaces in the file names
[SPARK-21998] - SortMergeJoinExec did not calculate its outputOrdering correctly during physical planning
[SPARK-22017] - watermark evaluation with multi-input stream operators is unspecified
[SPARK-22018] - Catalyst Optimizer does not preserve top-level metadata while collapsing projects
[SPARK-22030] - GraphiteSink fails to re-connect to Graphite instances behind an ELB or any other auto-scaled LB
[SPARK-22033] - BufferHolder, other size checks should account for the specific VM array size limitations
[SPARK-22036] - BigDecimal multiplication sometimes returns null
[SPARK-22042] - ReorderJoinPredicates can break when child's partitioning is not decided
[SPARK-22047] - HiveExternalCatalogVersionsSuite is Flaky on Jenkins
[SPARK-22052] - Incorrect Metric assigned in MetricsReporter.scala
[SPARK-22060] - CrossValidator/TrainValidationSplit parallelism param persist/load bug
[SPARK-22062] - BlockManager does not account for memory consumed by remote fetches
[SPARK-22067] - ArrowWriter StringWriter not using position of ByteBuffer holding data
[SPARK-22071] - Improve release build scripts to check correct JAVA version is being used for build
[SPARK-22074] - Task killed by other attempt task should not be resubmitted
[SPARK-22076] - Expand.projections should not be a Stream
[SPARK-22083] - When dropping multiple blocks to disk, Spark should release all locks on a failure
[SPARK-22088] - Incorrect scalastyle comment causes wrong styles in stringExpressions
[SPARK-22092] - Reallocation in OffHeapColumnVector.reserveInternal corrupts array data
[SPARK-22093] - UtilsSuite "resolveURIs with multiple paths" test always cancelled
[SPARK-22094] - processAllAvailable should not block forever when a query is stopped
[SPARK-22097] - Request an accurate memory after we unrolled the block
[SPARK-22107] - "as" should be "alias" in python quick start documentation
[SPARK-22109] - Reading tables partitioned by columns that look like timestamps has inconsistent schema inference
[SPARK-22129] - Spark release scripts ignore the GPG_KEY and always sign with your default key
[SPARK-22135] - metrics in spark-dispatcher not being registered properly
[SPARK-22141] - Propagate empty relation before checking Cartesian products
[SPARK-22143] - OffHeapColumnVector may leak memory
[SPARK-22145] - Issues with driver re-starting on mesos (supervise)
[SPARK-22146] - FileNotFoundException while reading ORC files containing '%'
[SPARK-22158] - convertMetastore should not ignore storage properties
[SPARK-22159] - spark.sql.execution.arrow.enable and spark.sql.codegen.aggregate.map.twolevel.enable -> enabled
[SPARK-22162] - Executors and the driver use inconsistent Job IDs during the new RDD commit protocol
[SPARK-22165] - Type conflicts between dates, timestamps and date in partition column
[SPARK-22167] - Spark Packaging w/R distro issues
[SPARK-22169] - support byte length literal as identifier
[SPARK-22171] - Describe Table Extended Failed when Table Owner is Empty
[SPARK-22172] - Worker hangs when the external shuffle service port is already in use
[SPARK-22176] - Dataset.show(Int.MaxValue) hits integer overflows
[SPARK-22178] - Refresh Table does not refresh the underlying tables of the persistent view
[SPARK-22206] - gapply in R can't work on empty grouping columns
[SPARK-22209] - PySpark does not recognize imports from submodules
[SPARK-22211] - LimitPushDown optimization for FullOuterJoin generates wrong results
[SPARK-22218] - spark shuffle services fails to update secret on application re-attempts
[SPARK-22222] - Fix the ARRAY_MAX in BufferHolder and add a test
[SPARK-22223] - ObjectHashAggregate introduces unnecessary shuffle
[SPARK-22224] - Override toString of KeyValueGroupedDataset & RelationalGroupedDataset
[SPARK-22227] - DiskBlockManager.getAllBlocks could fail if called during shuffle
[SPARK-22230] - agg(last('attr)) gives weird results for streaming
[SPARK-22238] - EnsureStatefulOpPartitioning shouldn't ask for the child RDD before planning is completed
[SPARK-22243] - streaming job failed to restart from checkpoint
[SPARK-22249] - UnsupportedOperationException: empty.reduceLeft when caching a dataframe
[SPARK-22251] - Metric "aggregate time" is incorrect when codegen is off
[SPARK-22252] - FileFormatWriter should respect the input query schema
[SPARK-22254] - clean up the implementation of `growToSize` in CompactBuffer
[SPARK-22257] - Reserve all non-deterministic expressions in ExpressionSet.
[SPARK-22267] - Spark SQL incorrectly reads ORC file when column order is different
[SPARK-22271] - Describe results in "null" for the value of "mean" of a numeric variable
[SPARK-22273] - Fix key/value schema field names in HashMapGenerators.
[SPARK-22280] - Improve StatisticsSuite to test `convertMetastore` properly
[SPARK-22281] - Handle R method breaking signature changes
[SPARK-22284] - Code of class \"org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection\" grows beyond 64 KB
[SPARK-22287] - SPARK_DAEMON_MEMORY not honored by MesosClusterDispatcher
[SPARK-22289] - Cannot save LogisticRegressionModel with bounds on coefficients
[SPARK-22290] - Starting second context in same JVM fails to get new Hive delegation token
[SPARK-22291] - Postgresql UUID[] to Cassandra: Conversion Error
[SPARK-22300] - Update ORC to 1.4.1
[SPARK-22303] - Getting java.sql.SQLException: Unsupported type 101 for BINARY_DOUBLE
[SPARK-22305] - HDFSBackedStateStoreProvider fails with StackOverflowException when attempting to recover state
[SPARK-22306] - INFER_AND_SAVE overwrites important metadata in Parquet Metastore table
[SPARK-22319] - SparkSubmit calls getFileStatus before calling loginUserFromKeytab
[SPARK-22326] - Remove unnecessary hashCode and equals methods
[SPARK-22327] - R CRAN check fails on non-latest branches
[SPARK-22328] - ClosureCleaner misses referenced superclass fields, gives them null values
[SPARK-22330] - Linear containsKey operation for serialized maps.
[SPARK-22332] - NaiveBayes unit test occasionly fail
[SPARK-22333] - ColumnReference should get higher priority than timeFunctionCall(CURRENT_DATE, CURRENT_TIMESTAMP)
[SPARK-22349] - In on-heap mode, when allocating memory from pool,we should fill memory with `MEMORY_DEBUG_FILL_CLEAN_VALUE`
[SPARK-22355] - Dataset.collect is not threadsafe
[SPARK-22356] - data source table should support overlapped columns between data and partition schema
[SPARK-22370] - Config values should be captured in Driver.
[SPARK-22373] - Intermittent NullPointerException in org.codehaus.janino.IClass.isAssignableFrom
[SPARK-22375] - Test script can fail if eggs are installed by setup.py during test process
[SPARK-22376] - run-tests.py fails at exec-sbt if run with Python 3
[SPARK-22377] - Maven nightly snapshot jenkins jobs are broken on multiple workers due to lsof
[SPARK-22393] - spark-shell can't find imported types in class constructors, extends clause
[SPARK-22395] - Fix the behavior of timestamp values for Pandas to respect session timezone
[SPARK-22396] - Unresolved operator InsertIntoDir for Hive format when Hive Support is not enabled
[SPARK-22403] - StructuredKafkaWordCount example fails in YARN cluster mode
[SPARK-22410] - Excessive spill for Pyspark UDF when a row has shrunk
[SPARK-22417] - createDataFrame from a pandas.DataFrame reads datetime64 values as longs
[SPARK-22429] - Streaming checkpointing code does not retry after failure due to NullPointerException
[SPARK-22431] - Creating Permanent view with illegal type
[SPARK-22437] - jdbc write fails to set default mode
[SPARK-22442] - Schema generated by Product Encoder doesn't match case class field name when using non-standard characters
[SPARK-22443] - AggregatedDialect doesn't override quoteIdentifier and other methods in JdbcDialects
[SPARK-22446] - Optimizer causing StringIndexerModel's indexer UDF to throw "Unseen label" exception incorrectly for filtered data.
[SPARK-22454] - ExternalShuffleClient.close() should check null
[SPARK-22462] - SQL metrics missing after foreach operation on dataframe
[SPARK-22463] - Missing hadoop/hive/hbase/etc configuration files in SPARK_CONF_DIR to distributed archive
[SPARK-22464] - <=> is not supported by Hive metastore partition predicate pushdown
[SPARK-22465] - Cogroup of two disproportionate RDDs could lead into 2G limit BUG
[SPARK-22466] - SPARK_CONF_DIR is not is set by Spark's launch scripts with default value
[SPARK-22469] - Accuracy problem in comparison with string and numeric
[SPARK-22472] - Datasets generate random values for null primitive types
[SPARK-22479] - SaveIntoDataSourceCommand logs jdbc credentials
[SPARK-22484] - PySpark DataFrame.write.csv(quote="") uses nullchar as quote
[SPARK-22487] - No usages of HIVE_EXECUTION_VERSION found in whole spark project
[SPARK-22488] - The view resolution in the SparkSession internal table() API
[SPARK-22489] - Shouldn't change broadcast join buildSide if user clearly specified
[SPARK-22495] - Fix setup of SPARK_HOME variable on Windows
[SPARK-22511] - Update maven central repo address
[SPARK-22516] - CSV Read breaks: When "multiLine" = "true", if "comment" option is set as last line's first character
[SPARK-22525] - Spark download page doesn't update package name based package type
[SPARK-22533] - SparkConfigProvider does not handle deprecated config keys
[SPARK-22535] - PythonRunner.MonitorThread should give the task a little time to finish before killing the python worker
[SPARK-22538] - SQLTransformer.transform(inputDataFrame) uncaches inputDataFrame
[SPARK-22540] - HighlyCompressedMapStatus's avgSize is incorrect
[SPARK-22544] - FileStreamSource should use its own hadoop conf to call globPathIfNecessary
[SPARK-22548] - Incorrect nested AND expression pushed down to JDBC data source
[SPARK-22557] - Use ThreadSignaler explicitly
[SPARK-22559] - history server: handle exception on opening corrupted listing.ldb
[SPARK-22572] - spark-shell does not re-initialize on :replay
[SPARK-22574] - Wrong request causing Spark Dispatcher going inactive
[SPARK-22583] - First delegation token renewal time is not 75% of renewal time in Mesos
[SPARK-22585] - Url encoding of jar path expected?
[SPARK-22587] - Spark job fails if fs.defaultFS and application jar are different url
[SPARK-22591] - GenerateOrdering shouldn't change ctx.INPUT_ROW
[SPARK-22605] - OutputMetrics empty for DataFrame writes
[SPARK-22607] - Set large stack size consistently for tests to avoid StackOverflowError
[SPARK-22615] - Handle more cases in PropagateEmptyRelation
[SPARK-22618] - RDD.unpersist can cause fatal exception when used with dynamic allocation
[SPARK-22635] - FileNotFoundException again while reading ORC files containing special characters
[SPARK-22637] - CatalogImpl.refresh() has quadratic complexity for a view
[SPARK-22642] - the createdTempDir will not be deleted if an exception occurs
[SPARK-22651] - Calling ImageSchema.readImages initiate multiple Hive clients
[SPARK-22653] - executorAddress registered in CoarseGrainedSchedulerBackend.executorDataMap is null
[SPARK-22654] - Retry download of Spark from ASF mirror in HiveExternalCatalogVersionsSuite
[SPARK-22655] - Fail task instead of complete task silently in PythonRunner during shutdown
[SPARK-22662] - Failed to prune columns after rewriting predicate subquery
[SPARK-22668] - CodegenContext.splitExpressions() creates incorrect results with global variable arguments
[SPARK-22681] - Accumulator should only be updated once for each task in result stage
[SPARK-22686] - DROP TABLE IF EXISTS should not show AnalysisException
[SPARK-22700] - Bucketizer.transform incorrectly drops row containing NaN
[SPARK-22710] - ConfigBuilder.fallbackConf doesn't trigger onCreate function
[SPARK-22712] - Use `buildReaderWithPartitionValues` in native OrcFileFormat
[SPARK-22721] - BytesToBytesMap peak memory usage not accurate after reset()
[SPARK-22759] - Filters can be combined iff both are deterministic
[SPARK-22764] - Flaky test: SparkContextSuite "Cancelling stages/jobs with custom reasons"
[SPARK-22777] - Docker container built for Kubernetes doesn't allow running entrypoint.sh
[SPARK-22778] - Kubernetes scheduler at master failing to run applications successfully
[SPARK-22779] - ConfigEntry's default value should actually be a value
[SPARK-22788] - HdfsUtils.getOutputStream uses non-existent Hadoop conf "hdfs.append.support"
[SPARK-22791] - Redact Output of Explain
[SPARK-22793] - Memory leak in Spark Thrift Server
[SPARK-22811] - pyspark.ml.tests is missing a py4j import.
[SPARK-22813] - run-tests.py fails when /usr/sbin/lsof does not exist
[SPARK-22815] - Keep PromotePrecision in Optimized Plans
[SPARK-22817] - Use fixed testthat version for SparkR tests in AppVeyor
[SPARK-22818] - csv escape of quote escape
[SPARK-22819] - Download page - updating package type does nothing
[SPARK-22824] - Spark Structured Streaming Source trait breaking change
[SPARK-22825] - Incorrect results of Casting Array to String
[SPARK-22827] - Avoid throwing OutOfMemoryError in case of exception in spill
[SPARK-22834] - Make insert commands have real children to fix UI issues
[SPARK-22836] - Executors page is not showing driver logs links
[SPARK-22837] - Session timeout checker does not work in SessionManager
[SPARK-22843] - R localCheckpoint API
[SPARK-22846] - table's owner property in hive metastore is null
[SPARK-22849] - ivy.retrieve pattern should also consider `classifier`
[SPARK-22850] - Executor page in SHS does not show driver
[SPARK-22852] - sbt publishLocal fails due to -Xlint:unchecked flag passed to javadoc
[SPARK-22854] - AppStatusListener should get Spark version by SparkListenerLogStart
[SPARK-22855] - Sbt publishLocal under scala 2.12 fails due to invalid javadoc comments in tags package
[SPARK-22861] - SQLAppStatusListener should track all stages in multi-job executions
[SPARK-22862] - Docs on lazy elimination of columns missing from an encoder.
[SPARK-22864] - Flaky test: ExecutorAllocationManagerSuite "cancel pending executors when no longer needed"
[SPARK-22866] - Kubernetes dockerfile path needs update
[SPARK-22875] - Assembly build fails for a high user id
[SPARK-22889] - CRAN checks can fail if older Spark install exists
[SPARK-22891] - NullPointerException when use udf
[SPARK-22899] - OneVsRestModel transform on streaming data failed.
[SPARK-22901] - Add non-deterministic to Python UDF
[SPARK-22905] - Fix ChiSqSelectorModel, GaussianMixtureModel save implementation for Row order issues
[SPARK-22916] - shouldn't bias towards build right if user does not specify
[SPARK-22920] - R sql functions for current_date, current_timestamp, rtrim/ltrim/trim with trimString
[SPARK-22924] - R DataFrame API for sortWithinPartitions
[SPARK-22932] - Refactor AnalysisContext
[SPARK-22933] - R Structured Streaming API for withWatermark, trigger, partitionBy
[SPARK-22934] - Make optional clauses order insensitive for CREATE TABLE SQL statement
[SPARK-22940] - Test suite HiveExternalCatalogVersionsSuite fails on platforms that don't have wget installed
[SPARK-22946] - Recursive withColumn calls cause org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection" grows beyond 64 KB
[SPARK-22948] - "SparkPodInitContainer" shouldn't be in "rest" package
[SPARK-22949] - Reduce memory requirement for TrainValidationSplit
[SPARK-22950] - user classpath first cause no class found exception
[SPARK-22951] - count() after dropDuplicates() on emptyDataFrame returns incorrect value
[SPARK-22953] - Duplicated secret volumes in Spark pods when init-containers are used
[SPARK-22956] - Union Stream Failover Cause `IllegalStateException`
[SPARK-22957] - ApproxQuantile breaks if the number of rows exceeds MaxInt
[SPARK-22961] - Constant columns no longer picked as constraints in 2.3
[SPARK-22962] - Kubernetes app fails if local files are used
[SPARK-22967] - VersionSuite failed on Windows caused by Windows format path
[SPARK-22972] - Couldn't find corresponding Hive SerDe for data source provider org.apache.spark.sql.hive.orc.
[SPARK-22973] - Incorrect results of casting Map to String
[SPARK-22975] - MetricsReporter producing NullPointerException when there was no progress reported
[SPARK-22976] - Worker cleanup can remove running driver directories
[SPARK-22977] - DataFrameWriter operations do not show details in SQL tab
[SPARK-22981] - Incorrect results of casting Struct to String
[SPARK-22982] - Remove unsafe asynchronous close() call from FileDownloadChannel
[SPARK-22983] - Don't push filters beneath aggregates with empty grouping expressions
[SPARK-22984] - Fix incorrect bitmap copying and offset shifting in GenerateUnsafeRowJoiner
[SPARK-22985] - Fix argument escaping bug in from_utc_timestamp / to_utc_timestamp codegen
[SPARK-22986] - Avoid instantiating multiple instances of broadcast variables
[SPARK-22990] - Fix method isFairScheduler in JobsTab and StagesTab
[SPARK-22992] - Remove assumption of cluster domain in Kubernetes mode
[SPARK-22998] - Value for SPARK_MOUNTED_CLASSPATH in executor pods is not set
[SPARK-23000] - Flaky test suite DataSourceWithHiveMetastoreCatalogSuite in Spark 2.3
[SPARK-23001] - NullPointerException when running desc database
[SPARK-23009] - PySpark should not assume Pandas cols are a basestring type
[SPARK-23018] - PySpark creatDataFrame causes Pandas warning of assignment to a copy of a reference
[SPARK-23019] - Flaky Test: org.apache.spark.JavaJdbcRDDSuite.testJavaJdbcRDD
[SPARK-23021] - AnalysisBarrier should not cut off the explain output for Parsed Logical Plan
[SPARK-23023] - Incorrect results of printing Array/Map/Struct in showString
[SPARK-23025] - DataSet with scala.Null causes Exception
[SPARK-23035] - Fix improper information of TempTableAlreadyExistsException
[SPARK-23037] - RFormula should not use deprecated OneHotEncoder and should include VectorSizeHint in pipeline
[SPARK-23038] - Update docker/spark-test (JDK/OS)
[SPARK-23049] - `spark.sql.files.ignoreCorruptFiles` should work for ORC files
[SPARK-23051] - job description in Spark UI is broken
[SPARK-23053] - taskBinarySerialization and task partitions calculate in DagScheduler.submitMissingTasks should keep the same RDD checkpoint status
[SPARK-23054] - Incorrect results of casting UserDefinedType to String
[SPARK-23055] - KafkaContinuousSourceSuite Kafka column types test failing
[SPARK-23065] - R API doc empty in Spark 2.3.0 RC1
[SPARK-23079] - Fix query constraints propagation with aliases
[SPARK-23080] - Improve error message for built-in functions
[SPARK-23087] - CheckCartesianProduct too restrictive when condition is constant folded to false/null
[SPARK-23089] - "Unable to create operation log session directory" when parent directory not present
[SPARK-23095] - Decorrelation of scalar subquery fails with java.util.NoSuchElementException.
[SPARK-23103] - LevelDB store not iterating correctly when indexed value has negative value
[SPARK-23119] - Fix API annotation in DataSource V2 for streaming
[SPARK-23121] - When the Spark Streaming app is running for a period of time, the page is incorrectly reported when accessing '/ jobs /' or '/ jobs / job /? Id = 13' and ui can not be accessed.
[SPARK-23133] - Spark options are not passed to the Executor in Docker context
[SPARK-23135] - Accumulators don't show up properly in the Stages page anymore
[SPARK-23140] - DataSourceV2Strategy is missing in HiveSessionStateBuilder
[SPARK-23147] - Stage page will throw exception when there's no complete tasks
[SPARK-23148] - spark.read.csv with multiline=true gives FileNotFoundException if path contains spaces
[SPARK-23157] - withColumn fails for a column that is a result of mapped DataSet
[SPARK-23177] - PySpark parameter-less UDFs raise exception if applied after distinct
[SPARK-23184] - All jobs page is broken when some stage is missing
[SPARK-23186] - Initialize DriverManager first before loading Drivers
[SPARK-23192] - Hint is lost after using cached data
[SPARK-23198] - Fix KafkaContinuousSourceStressForDontFailOnDataLossSuite to test ContinuousExecution
[SPARK-23205] - ImageSchema.readImages incorrectly sets alpha channel to 255 for four-channel images
[SPARK-23207] - Shuffle+Repartition on an DataFrame could lead to incorrect answers
[SPARK-23208] - GenArrayData produces illegal code
[SPARK-23209] - HiveDelegationTokenProvider throws an exception if Hive jars are not the classpath
[SPARK-23214] - cached data should not carry extra hint info
[SPARK-23220] - broadcast hint not applied in a streaming left anti join
[SPARK-23222] - Flaky test: DataFrameRangeSuite
[SPARK-23223] - Stacking dataset transforms performs poorly
[SPARK-23230] - When hive.default.fileformat is other kinds of file types, create textfile table cause a serde error
[SPARK-23233] - asNondeterministic in Python UDF not being set when the UDF is called at least once
[SPARK-23242] - Don't run tests in KafkaSourceSuiteBase twice
[SPARK-23245] - KafkaContinuousSourceSuite may hang forever
[SPARK-23250] - Typo in JavaDoc/ScalaDoc for DataFrameWriter
[SPARK-23267] - Increase spark.sql.codegen.hugeMethodLimit to 65535
[SPARK-23274] - ReplaceExceptWithFilter fails on dataframes filtered on same column
[SPARK-23275] - hive/tests have been failing when run locally on the laptop (Mac) with OOM
[SPARK-23281] - Query produces results in incorrect order when a composite order by clause refers to both original columns and aliases
[SPARK-23289] - OneForOneBlockFetcher.DownloadCallback.onData may write just a part of data
[SPARK-23290] - inadvertent change in handling of DateType when converting to pandas dataframe
[SPARK-23293] - data source v2 self join fails
[SPARK-23301] - data source v2 column pruning with arbitrary expressions is broken
[SPARK-23307] - Spark UI should sort jobs/stages with the completed timestamp before cleaning up them
[SPARK-23310] - Perf regression introduced by SPARK-21113
[SPARK-23315] - failed to get output from canonicalized data source v2 related plans
[SPARK-23316] - AnalysisException after max iteration reached for IN query
[SPARK-23326] - "Scheduler Delay" of a task is confusing
[SPARK-23330] - Spark UI SQL executions page throws NPE
[SPARK-23345] - Flaky test: FileBasedDataSourceSuite
[SPARK-23348] - append data using saveAsTable should adjust the data types
[SPARK-23358] - When the number of partitions is greater than 2^28, it will result in an error result
[SPARK-23360] - SparkSession.createDataFrame timestamps can be incorrect with non-Arrow codepath
[SPARK-23376] - creating UnsafeKVExternalSorter with BytesToBytesMap may fail
[SPARK-23377] - Bucketizer with multiple columns persistence bug
[SPARK-23384] - When it has no incomplete(completed) applications found, the last updated time is not formatted and client local time zone is not show in history server web ui.
[SPARK-23387] - Backport assertPandasEqual to branch-2.3.
[SPARK-23388] - Support for Parquet Binary DecimalType in VectorizedColumnReader
[SPARK-23391] - It may lead to overflow for some integer multiplication
[SPARK-23394] - Storage info's Cached Partitions doesn't consider the replications (but sc.getRDDStorageInfo does)
[SPARK-23399] - Register a task completion listener first for OrcColumnarBatchReader
[SPARK-23400] - Add the extra constructors for ScalaUDF
[SPARK-23413] - Sorting tasks by Host / Executor ID on the Stage page does not work
[SPARK-23419] - data source v2 write path should re-throw interruption exceptions directly
[SPARK-23421] - Document the behavior change in SPARK-22356
[SPARK-23422] - YarnShuffleIntegrationSuite failure when SPARK_PREPEND_CLASSES set to 1
[SPARK-23468] - Failure to authenticate with old shuffle service
[SPARK-23470] - org.apache.spark.ui.jobs.ApiHelper.lastStageNameAndDescription is too slow
[SPARK-23475] - The "stages" page doesn't show any completed stages
[SPARK-23481] - The job page shows wrong stages when some of stages are evicted
[SPARK-23484] - Fix possible race condition in KafkaContinuousReader
[SPARK-24401] - Aggreate on Decimal Types does not work
[SPARK-25523] - Multi thread execute sparkSession.read().jdbc(url, table, properties) problem
[SPARK-27191] - union of dataframes depends on order of the columns in 2.4.0

New Feature

[SPARK-3181] - Add Robust Regression Algorithm with Huber Estimator
[SPARK-4131] - Support "Writing data into the filesystem from queries"
[SPARK-12139] - REGEX Column Specification for Hive Queries
[SPARK-14516] - Clustering evaluator
[SPARK-15689] - Data source API v2
[SPARK-15767] - Decision Tree Regression wrapper in SparkR
[SPARK-16026] - Cost-based Optimizer Framework
[SPARK-16060] - Vectorized ORC reader
[SPARK-16742] - Kerberos support for Spark on Mesos
[SPARK-17025] - Cannot persist PySpark ML Pipeline model that includes custom Transformer
[SPARK-18710] - Add offset to GeneralizedLinearRegression models
[SPARK-18791] - Stream-Stream Joins
[SPARK-19489] - Stable serialization format for external & native code integration
[SPARK-19507] - pyspark.sql.types._verify_type() exceptions too broad to debug collections or nested data
[SPARK-19606] - Support constraints in spark-dispatcher
[SPARK-20090] - Add StructType.fieldNames to Python API
[SPARK-20542] - Add an API into Bucketizer that can bin a lot of columns all at once
[SPARK-20601] - Python API Changes for Constrained Logistic Regression Params
[SPARK-20703] - Add an operator for writing data out
[SPARK-20812] - Add Mesos Secrets support to the spark dispatcher
[SPARK-20863] - Add metrics/instrumentation to LiveListenerBus
[SPARK-20892] - Add SQL trunc function to SparkR
[SPARK-20899] - PySpark supports stringIndexerOrderType in RFormula
[SPARK-20917] - SparkR supports string encoding consistent with R
[SPARK-20953] - Add hash map metrics to aggregate and join
[SPARK-20960] - make ColumnVector public
[SPARK-20979] - Add a rate source to generate values for tests and benchmark
[SPARK-21000] - Add Mesos labels support to the Spark Dispatcher
[SPARK-21027] - Parallel One vs. Rest Classifier
[SPARK-21043] - Add unionByName API to Dataset
[SPARK-21092] - Wire SQLConf in logical plan and expressions
[SPARK-21208] - Ability to "setLocalProperty" from sc, in sparkR
[SPARK-21221] - CrossValidator and TrainValidationSplit Persist Nested Estimators such as OneVsRest
[SPARK-21310] - Add offset to PySpark GLM
[SPARK-21421] - Add the query id as a local property to allow source and sink using it
[SPARK-21468] - FeatureHasher Python API
[SPARK-21499] - Support creating persistent function for Spark UDAF(UserDefinedAggregateFunction)
[SPARK-21519] - Add an option to the JDBC data source to initialize the environment of the remote database session
[SPARK-21542] - Helper functions for custom Python Persistence
[SPARK-21633] - Unary Transformer in Python
[SPARK-21726] - Check for structural integrity of the plan in QO in test mode
[SPARK-21777] - Simpler Dataset.sample API
[SPARK-21840] - Allow multiple SparkSubmit invocations in same JVM without polluting system properties
[SPARK-21842] - Support Kerberos ticket renewal and creation in Mesos
[SPARK-21854] - Python interface for MLOR summary
[SPARK-21856] - Update Python API for MultilayerPerceptronClassifierModel
[SPARK-21911] - Parallel Model Evaluation for ML Tuning: PySpark
[SPARK-22131] - Add Mesos Secrets Support to the Mesos Driver
[SPARK-22160] - Allow changing sample points per partition in range shuffle exchange
[SPARK-22181] - ReplaceExceptWithFilter if one or both of the datasets are fully derived out of Filters from a same parent
[SPARK-22456] - Add new function dayofweek
[SPARK-22521] - VectorIndexerModel support handle unseen categories via handleInvalid: Python API
[SPARK-22734] - VectorSizeHint Python API
[SPARK-22781] - Support creating streaming dataset with ORC files
[SPARK-23008] - OnehotEncoderEstimator python API

Improvement

[SPARK-7481] - Add spark-hadoop-cloud module to pull in object store support
[SPARK-9221] - Support IntervalType in Range Frame
[SPARK-10216] - Avoid creating empty files during overwrite into Hive table with group by query
[SPARK-10655] - Enhance DB2 dialect to handle XML, and DECIMAL , and DECFLOAT
[SPARK-10931] - PySpark ML Models should contain Param values
[SPARK-11574] - Spark should support StatsD sink out of box
[SPARK-12664] - Expose probability, rawPrediction in MultilayerPerceptronClassificationModel
[SPARK-13030] - Change OneHotEncoder to Estimator
[SPARK-13041] - Add a driver history ui link and a mesos sandbox link on the dispatcher's ui page for each driver
[SPARK-13656] - Delete spark.sql.parquet.cacheMetadata
[SPARK-13846] - VectorIndexer output on unknown feature should be more descriptive
[SPARK-13947] - The error message from using an invalid table reference is not clear
[SPARK-14371] - OnlineLDAOptimizer should not collect stats for each doc in mini-batch to driver
[SPARK-14659] - OneHotEncoder support drop first category alphabetically in the encoded vector
[SPARK-14932] - Allow DataFrame.replace() to replace values with None
[SPARK-15648] - add TeradataDialect
[SPARK-16019] - Eliminate unexpected delay during spark on yarn job launch
[SPARK-16496] - Add wholetext as option for reading text in SQL.
[SPARK-16931] - PySpark access to data-frame bucketing api
[SPARK-16957] - Use weighted midpoints for split values.
[SPARK-17006] - WithColumn Performance Degrades with Number of Invocations
[SPARK-17310] - Disable Parquet's record-by-record filter in normal parquet reader and do it in Spark-side
[SPARK-17414] - Set type is not supported for creating data frames
[SPARK-17701] - Refactor DataSourceScanExec so its sameResult call does not compare strings
[SPARK-17924] - Consolidate streaming and batch write path
[SPARK-18136] - Make PySpark pip install works on windows
[SPARK-18540] - Wholestage code-gen for ORC Hive tables
[SPARK-18619] - Make QuantileDiscretizer/Bucketizer/StringIndexer inherit from HasHandleInvalid
[SPARK-18623] - Add `returnNullable` to `StaticInvoke` and modify it to handle properly.
[SPARK-18838] - High latency of event processing for large jobs
[SPARK-18891] - Support for specific collection types
[SPARK-19112] - add codec for ZStandard
[SPARK-19159] - PySpark UDF API improvements
[SPARK-19236] - Add createOrReplaceGlobalTempView
[SPARK-19270] - Add summary table to GLM summary
[SPARK-19285] - Java - Provide user-defined function of 0 arguments (UDF0)
[SPARK-19358] - LiveListenerBus shall log the event name when dropping them due to a fully filled queue
[SPARK-19439] - PySpark's registerJavaFunction Should Support UDAFs
[SPARK-19552] - Upgrade Netty version to 4.1.x final
[SPARK-19558] - Provide a config option to attach QueryExecutionListener to SparkSession
[SPARK-19732] - DataFrame.fillna() does not work for bools in PySpark
[SPARK-19759] - ALSModel.predict on Dataframes : potential optimization by not using blas
[SPARK-19852] - StringIndexer.setHandleInvalid should have another option 'new': Python API and docs
[SPARK-19866] - Add local version of Word2Vec findSynonyms for spark.ml: Python API
[SPARK-19878] - Add hive configuration when initialize hive serde in InsertIntoHiveTable.scala
[SPARK-19937] - Collect metrics of block sizes when shuffle.
[SPARK-19951] - Add string concatenate operator || to Spark SQL
[SPARK-19975] - Add map_keys and map_values functions to Python
[SPARK-20014] - Optimize mergeSpillsWithFileStream method
[SPARK-20055] - Documentation for CSV datasets in SQL programming guide
[SPARK-20073] - Unexpected Cartesian product when using eqNullSafe in join with a derived table
[SPARK-20101] - Use OffHeapColumnVector when "spark.sql.columnVector.offheap.enable" is set to "true"
[SPARK-20109] - Need a way to convert from IndexedRowMatrix to Dense Block Matrices
[SPARK-20199] - GradientBoostedTreesModel doesn't have featureSubsetStrategy parameter
[SPARK-20236] - Overwrite a partitioned data source table should only overwrite related partitions
[SPARK-20290] - PySpark Column should provide eqNullSafe
[SPARK-20307] - SparkR: pass on setHandleInvalid to spark.mllib functions that use StringIndexer
[SPARK-20331] - Broaden support for Hive partition pruning predicate pushdown
[SPARK-20350] - Apply Complementation Laws during boolean expression simplification
[SPARK-20355] - Display Spark version on history page
[SPARK-20371] - R wrappers for collect_list and collect_set
[SPARK-20375] - R wrappers for array and map
[SPARK-20376] - Make StateStoreProvider plugable
[SPARK-20379] - Allow setting SSL-related passwords through env variables
[SPARK-20383] - SparkSQL unsupports to create function with the keyword 'OR REPLACE' and 'IF NOT EXISTS'
[SPARK-20392] - Slow performance when calling fit on ML pipeline for dataset with many columns but few rows
[SPARK-20416] - Column names inconsistent for UDFs in SQL vs Dataset
[SPARK-20425] - Support an extended display mode to print a column data per line
[SPARK-20431] - Support a DDL-formatted string in DataFrameReader.schema
[SPARK-20433] - Update jackson-databind to 2.6.7.1
[SPARK-20437] - R wrappers for rollup and cube
[SPARK-20438] - R wrappers for split and repeat
[SPARK-20460] - Make it more consistent to handle column name duplication
[SPARK-20463] - Add support for IS [NOT] DISTINCT FROM to SPARK SQL
[SPARK-20484] - Add documentation to ALS code
[SPARK-20490] - Add eqNullSafe, not and ! to SparkR
[SPARK-20493] - De-deuplicate parse logics for DDL-like type string in R
[SPARK-20495] - Add StorageLevel to cacheTable API
[SPARK-20498] - RandomForestRegressionModel should expose getMaxDepth in PySpark
[SPARK-20519] - When the input parameter is null, may be a runtime exception occurs
[SPARK-20532] - SparkR should provide grouping and grouping_id
[SPARK-20533] - SparkR Wrappers Model should be private and value should be lazy
[SPARK-20535] - R wrappers for explode_outer and posexplode_outer
[SPARK-20544] - R wrapper for input_file_name
[SPARK-20550] - R wrappers for Dataset.alias
[SPARK-20557] - JdbcUtils doesn't support java.sql.Types.TIMESTAMP_WITH_TIMEZONE
[SPARK-20566] - ColumnVector should support `appendFloats` for array
[SPARK-20599] - ConsoleSink should work with write (batch)
[SPARK-20614] - Use the same log4j configuration with Jenkins in AppVeyor
[SPARK-20619] - StringIndexer supports multiple ways of label ordering
[SPARK-20639] - Add single argument support for to_timestamp in SQL
[SPARK-20668] - Modify ScalaUDF to handle nullability.
[SPARK-20670] - Simplify FPGrowth transform
[SPARK-20679] - Let ML ALS recommend for a subset of users/items
[SPARK-20682] - Add new ORCFileFormat based on Apache ORC
[SPARK-20715] - MapStatuses shouldn't be redundantly stored in both ShuffleMapStage and MapOutputTracker
[SPARK-20720] - 'Executor Summary' should show the exact number, 'Removed Executors' should display the specific number, in the Application Page
[SPARK-20726] - R wrapper for SQL broadcast
[SPARK-20728] - Make ORCFileFormat configurable between sql/hive and sql/core
[SPARK-20730] - Add a new Optimizer rule to combine nested Concats
[SPARK-20736] - PySpark StringIndexer supports StringOrderType
[SPARK-20775] - from_json should also have an API where the schema is specified with a string
[SPARK-20779] - The ASF header placed in an incorrect location in some files
[SPARK-20785] - Spark should provide jump links and add (count) in the SQL web ui.
[SPARK-20806] - Launcher: redundant check for Spark lib dir
[SPARK-20830] - PySpark wrappers for explode_outer and posexplode_outer
[SPARK-20835] - It should exit directly when the --total-executor-cores parameter is setted less than 0 when submit a application
[SPARK-20841] - Support table column aliases in FROM clause
[SPARK-20842] - Upgrade to 1.2.2 for Hive Metastore Client 1.2
[SPARK-20849] - Document R DecisionTree
[SPARK-20861] - Pyspark CrossValidator & TrainValidationSplit should delegate parameter looping to estimators
[SPARK-20871] - Only log Janino code in debug mode
[SPARK-20875] - Spark should print the log when the directory has been deleted
[SPARK-20883] - Improve StateStore APIs for efficiency
[SPARK-20886] - HadoopMapReduceCommitProtocol to fail with message if FileOutputCommitter.getWorkPath==null
[SPARK-20887] - support alternative keys in ConfigBuilder
[SPARK-20894] - Error while checkpointing to HDFS
[SPARK-20930] - Destroy broadcasted centers after computing cost
[SPARK-20936] - Lack of an important case about the test of resolveURI
[SPARK-20946] - Do not update conf for existing SparkContext in SparkSession.getOrCreate
[SPARK-20950] - add a new config to diskWriteBufferSize which is hard coded before
[SPARK-20966] - Table data is not sorted by startTime time desc, time is not formatted and redundant code in JDBC/ODBC Server page.
[SPARK-20972] - rename HintInfo.isBroadcastable to broadcast
[SPARK-20981] - Add --repositories equivalent configuration for Spark
[SPARK-20985] - Improve KryoSerializerResizableOutputSuite
[SPARK-20994] - Alleviate memory pressure in StreamManager
[SPARK-20995] - 'Spark-env.sh.template' should add 'YARN_CONF_DIR' configuration instructions.
[SPARK-21012] - Support glob path for resources adding to Spark
[SPARK-21039] - Use treeAggregate instead of aggregate in DataFrame.stat.bloomFilter
[SPARK-21060] - Css style about paging function is error in the executor page.
[SPARK-21070] - Pick up cloudpickle upgrades from cloudpickle python module
[SPARK-21091] - Move constraint code into QueryPlanConstraints
[SPARK-21100] - Add summary method as alternative to describe that gives quartiles similar to Pandas
[SPARK-21103] - QueryPlanConstraints should be part of LogicalPlan
[SPARK-21110] - Structs should be usable in inequality filters
[SPARK-21113] - Support for read ahead input stream to amortize disk IO cost in the Spill reader
[SPARK-21115] - If the cores left is less than the coresPerExecutor,the cores left will not be allocated, so it should not to check in every schedule
[SPARK-21125] - PySpark context missing function to set Job Description.
[SPARK-21135] - On history server page，duration of incompleted applications should be hidden instead of showing up as 0
[SPARK-21137] - Spark reads many small files slowly off local filesystem
[SPARK-21142] - spark-streaming-kafka-0-10 has too fat dependency on kafka
[SPARK-21146] - Master/Worker should handle and shutdown when any thread gets UncaughtException
[SPARK-21149] - Add job description API for R
[SPARK-21153] - Time windowing for tumbling windows can use a project instead of expand + filter
[SPARK-21155] - Add (? running tasks) into Spark UI progress
[SPARK-21164] - Remove isTableSample from Sample and isGenerated from Alias and AttributeReference
[SPARK-21174] - Validate sampling fraction in logical operator level
[SPARK-21175] - shuffle service should reject fetch requests if there are already many requests in progress
[SPARK-21189] - Handle unknown error codes in Jenkins rather then leaving incomplete comment in PRs
[SPARK-21192] - Preserve State Store provider class configuration across StreamingQuery restarts
[SPARK-21193] - Specify Pandas version in setup.py
[SPARK-21196] - Split codegen info of query plan into sequence
[SPARK-21217] - Support ColumnVector.Array.to<type>Array()
[SPARK-21222] - Move elimination of Distinct clause from analyzer to optimizer
[SPARK-21229] - remove QueryPlan.preCanonicalized
[SPARK-21238] - allow nested SQL execution
[SPARK-21240] - Fix code style for constructing and stopping a SparkContext in UT
[SPARK-21243] - Limit the number of maps in a single shuffle fetch
[SPARK-21247] - Type comparision should respect case-sensitive SQL conf
[SPARK-21250] - Add a url in the table of 'Running Executors' in worker page to visit job page
[SPARK-21256] - Add WithSQLConf to Catalyst Test
[SPARK-21260] - Remove the unused OutputFakerExec
[SPARK-21266] - Support schema a DDL-formatted string in dapply/gapply/from_json
[SPARK-21267] - Improvements to the Structured Streaming programming guide
[SPARK-21268] - Move center calculations to a distributed map in KMeans
[SPARK-21273] - Decouple stats propagation from logical plan
[SPARK-21275] - Update GLM test to use supportedFamilyNames
[SPARK-21276] - Update lz4-java to remove custom LZ4BlockInputStream
[SPARK-21285] - VectorAssembler should report the column name when data type used is not supported
[SPARK-21295] - Confusing error message for missing references
[SPARK-21296] - Avoid per-record type dispatch in PySpark createDataFrame schema verification
[SPARK-21297] - Add count in 'JDBC/ODBC Server' page.
[SPARK-21304] - remove unnecessary isNull variable for collection related encoder expressions
[SPARK-21305] - The BKM (best known methods) of using native BLAS to improvement ML/MLLIB performance
[SPARK-21308] - Remove SQLConf parameters from the optimizer
[SPARK-21313] - ConsoleSink's string representation
[SPARK-21315] - Skip some spill files when generateIterator(startIndex) in ExternalAppendOnlyUnsafeRowArray.
[SPARK-21321] - Spark very verbose on shutdown confusing users
[SPARK-21323] - Rename sql.catalyst.plans.logical.statsEstimation.Range to ValueInterval
[SPARK-21326] - Use TextFileFormat in implementation of LibSVMFileFormat
[SPARK-21329] - Make EventTimeWatermarkExec explicitly UnaryExecNode
[SPARK-21358] - Argument of repartitionandsortwithinpartitions at pyspark
[SPARK-21365] - Deduplicate logics parsing DDL-like type definition
[SPARK-21373] - Update Jetty to 9.3.20.v20170531
[SPARK-21381] - SparkR: pass on setHandleInvalid for classification algorithms
[SPARK-21382] - The note about Scala 2.10 in building-spark.md is wrong.
[SPARK-21388] - GBT inherit from HasStepSize & LInearSVC/Binarizer from HasThreshold
[SPARK-21396] - Spark Hive Thriftserver doesn't return UDT field
[SPARK-21401] - add poll function for BoundedPriorityQueue
[SPARK-21408] - Default RPC dispatcher thread pool size too large for small executors
[SPARK-21409] - Expose state store memory usage in SQL metrics and progress updates
[SPARK-21410] - In RangePartitioner(partitions: Int, rdd: RDD[]), RangePartitioner.numPartitions is wrong if the number of elements in RDD (rdd.count()) is less than number of partitions (partitions in constructor).
[SPARK-21415] - Triage scapegoat warnings, part 1
[SPARK-21434] - Add PySpark pip documentation
[SPARK-21435] - Empty files should be skipped while write to file
[SPARK-21472] - Introduce ArrowColumnVector as a reader for Arrow vectors.
[SPARK-21475] - Change to use NIO's Files API for external shuffle service
[SPARK-21477] - Mark LocalTableScanExec's input data transient
[SPARK-21491] - Performance enhancement: eliminate creation of intermediate collections
[SPARK-21504] - Add spark version info in table metadata
[SPARK-21506] - The description of "spark.executor.cores" may be not correct
[SPARK-21513] - SQL to_json should support all column types
[SPARK-21517] - Fetch local data via block manager cause oom
[SPARK-21524] - ValidatorParamsSuiteHelpers generates wrong temp files
[SPARK-21527] - Use buffer limit in order to take advantage of JAVA NIO Util's buffercache
[SPARK-21530] - Update description of spark.shuffle.maxChunksBeingTransferred
[SPARK-21538] - Attribute resolution inconsistency in Dataset API
[SPARK-21544] - Test jar of some module should not install or deploy twice
[SPARK-21553] - Add the description of the default value of master parameter in the spark-shell
[SPARK-21566] - Python method for summary
[SPARK-21575] - Eliminate needless synchronization in java-R serialization
[SPARK-21578] - Add JavaSparkContextSuite
[SPARK-21583] - Create a ColumnarBatch with ArrowColumnVectors for row based iteration
[SPARK-21584] - Update R method for summary to call new implementation
[SPARK-21589] - Add documents about unsupported functions in Hive UDF/UDTF/UDAF
[SPARK-21592] - Skip maven-compiler-plugin main and test compilations in Maven build
[SPARK-21602] - Add map_keys and map_values functions to R
[SPARK-21603] - The wholestage codegen will be much slower then wholestage codegen is closed when the function is too long
[SPARK-21604] - if the object extends Logging, i suggest to remove the var LOG which is useless.
[SPARK-21608] - Window rangeBetween() API should allow literal boundary
[SPARK-21611] - Error class name for log in several classes.
[SPARK-21619] - Fail the execution of canonicalized plans explicitly
[SPARK-21622] - Support Offset in SparkR
[SPARK-21623] - Comments of parentStats on ml/tree/impl/DTStatsAggregator.scala is wrong
[SPARK-21634] - Change OneRowRelation from a case object to case class
[SPARK-21640] - Method mode with String parameters within DataFrameWriter is error prone
[SPARK-21661] - SparkSQL can't merge load table from Hadoop
[SPARK-21665] - Need to close resources after use
[SPARK-21667] - ConsoleSink should not fail streaming query with checkpointLocation option
[SPARK-21669] - Internal API for collecting metrics/stats during FileFormatWriter jobs
[SPARK-21672] - Remove SHS-specific application / attempt data structures
[SPARK-21675] - Add a navigation bar at the bottom of the Details for Stage Page
[SPARK-21680] - ML/MLLIB Vector compressed optimization
[SPARK-21694] - Support Mesos CNI network labels
[SPARK-21701] - Add TCP send/rcv buffer size support for RPC client
[SPARK-21709] - use sbt 0.13.16 and update sbt plugins
[SPARK-21717] - Decouple the generated codes of consuming rows in operators under whole-stage codegen
[SPARK-21718] - Heavy log of type: "Skipping partition based on stats ..."
[SPARK-21728] - Allow SparkSubmit to use logging
[SPARK-21732] - Lazily init hive metastore client
[SPARK-21745] - Refactor ColumnVector hierarchy to make ColumnVector read-only and to introduce WritableColumnVector.
[SPARK-21751] - CodeGeneraor.splitExpressions counts code size more precisely
[SPARK-21756] - Add JSON option to allow unquoted control characters
[SPARK-21765] - Ensure all leaf nodes that are derived from streaming sources have isStreaming=true
[SPARK-21769] - Add a table option for Hive-serde tables to make Spark always respect schemas inferred by Spark SQL
[SPARK-21770] - ProbabilisticClassificationModel: Improve normalization of all-zero raw predictions
[SPARK-21771] - SparkSQLEnv creates a useless meta hive client
[SPARK-21773] - Should Install mkdocs if missing in the path in SQL documentation build
[SPARK-21781] - Modify DataSourceScanExec to use concrete ColumnVector type.
[SPARK-21787] - Support for pushing down filters for DateType in native OrcFileFormat
[SPARK-21789] - Remove obsolete codes for parsing abstract schema strings
[SPARK-21803] - Remove the HiveDDLCommandSuite
[SPARK-21806] - BinaryClassificationMetrics pr(): first point (0.0, 1.0) is misleading
[SPARK-21807] - The getAliasedConstraints function in LogicalPlan will take a long time when number of expressions is greater than 100
[SPARK-21813] - [core] Modify TaskMemoryManager.MAXIMUM_PAGE_SIZE_BYTES comments
[SPARK-21839] - Support SQL config for ORC compression
[SPARK-21862] - Add overflow check in PCA
[SPARK-21865] - simplify the distribution semantic of Spark SQL
[SPARK-21866] - SPIP: Image support in Spark
[SPARK-21871] - Check actual bytecode size when compiling generated code
[SPARK-21873] - CachedKafkaConsumer throws NonLocalReturnControl during fetching from Kafka
[SPARK-21875] - Jenkins passes Java code that violates ./dev/lint-java
[SPARK-21878] - Create SQLMetricsTestUtils
[SPARK-21886] - Use SparkSession.internalCreateDataFrame to create Dataset with LogicalRDD logical operator
[SPARK-21891] - Add TBLPROPERTIES to DDL statement: CREATE TABLE USING
[SPARK-21897] - Add unionByName API to DataFrame in Python and R
[SPARK-21901] - Define toString for StateOperatorProgress
[SPARK-21902] - BlockManager.doPut will hide actually exception when exception thrown in finally block
[SPARK-21903] - Upgrade scalastyle to 1.0.0
[SPARK-21923] - Avoid calling reserveUnrollMemoryForThisTask for every record
[SPARK-21963] - create temp file should be delete after use
[SPARK-21967] - org.apache.spark.unsafe.types.UTF8String#compareTo Should Compare 8 Bytes at a Time for Better Performance
[SPARK-21970] - Do a Project Wide Sweep for Redundant Throws Declarations
[SPARK-21973] - Add a new option to filter queries to run in TPCDSQueryBenchmark
[SPARK-21975] - Histogram support in cost-based optimizer
[SPARK-21981] - Python API for ClusteringEvaluator
[SPARK-21983] - Fix ANTLR 4.7 deprecations
[SPARK-21988] - Add default stats to StreamingRelation and StreamingExecutionRelation
[SPARK-22001] - ImputerModel can do withColumn for all input columns at one pass
[SPARK-22002] - Read JDBC table use custom schema support specify partial fields
[SPARK-22003] - vectorized reader does not work with UDF when the column is array
[SPARK-22009] - Using treeAggregate improve some algs
[SPARK-22043] - Python profile, show_profiles() and dump_profiles(), should throw an error with a better message
[SPARK-22049] - Confusing behavior of from_utc_timestamp and to_utc_timestamp
[SPARK-22050] - Allow BlockUpdated events to be optionally logged to the event log
[SPARK-22058] - the BufferedInputStream will not be closed if an exception occurs
[SPARK-22066] - Update checkstyle to 8.2, enable it, fix violations
[SPARK-22072] - Allow the same shell params to be used for all of the different steps in release-build
[SPARK-22075] - GBTs forgot to unpersist datasets cached by Checkpointer
[SPARK-22099] - The 'job ids' list style needs to be changed in the SQL page.
[SPARK-22103] - Move HashAggregateExec parent consume to a separate function in codegen
[SPARK-22106] - Remove support for 0-parameter pandas_udfs
[SPARK-22112] - Add missing method to pyspark api: spark.read.csv(Dataset<String>)
[SPARK-22120] - TestHiveSparkSession.reset() should clean out Hive warehouse directory
[SPARK-22122] - Respect WITH clauses to count input rows in TPCDSQueryBenchmark
[SPARK-22123] - Add latest failure reason for task set blacklist
[SPARK-22124] - Sample and Limit should also defer input evaluation under codegen
[SPARK-22125] - Enable Arrow Stream format for vectorized UDF.
[SPARK-22130] - UTF8String.trim() inefficiently scans all white-space string twice.
[SPARK-22133] - Document Mesos reject offer duration configutations
[SPARK-22138] - Allow retry during release-build
[SPARK-22142] - Move Flume support behind a profile
[SPARK-22147] - BlockId.hashCode allocates a StringBuilder/String on each call
[SPARK-22156] - Word2Vec: incorrect learning rate update equation when numIterations > 1
[SPARK-22170] - Broadcast join holds an extra copy of rows in driver memory
[SPARK-22173] - Table CSS style needs to be adjusted in History Page and in Executors Page.
[SPARK-22188] - Add defense against Cross-Site Scripting, MIME-sniffing and MitM attack
[SPARK-22190] - Add Spark executor task metrics to Dropwizard metrics
[SPARK-22193] - SortMergeJoinExec: typo correction
[SPARK-22203] - Add job description for file listing Spark jobs
[SPARK-22208] - Improve percentile_approx by not rounding up targetError and starting from index 0
[SPARK-22214] - Refactor the list hive partitions code
[SPARK-22217] - ParquetFileFormat to support arbitrary OutputCommitters
[SPARK-22233] - filter out empty InputSplit in HadoopRDD
[SPARK-22247] - Hive partition filter very slow
[SPARK-22263] - Refactor deterministic as lazy value
[SPARK-22266] - The same aggregate function was evaluated multiple times
[SPARK-22268] - Fix java style errors
[SPARK-22282] - Rename OrcRelation to OrcFileFormat and remove ORC_COMPRESSION
[SPARK-22294] - Reset spark.driver.bindAddress when starting a Checkpoint
[SPARK-22301] - Add rule to Optimizer for In with empty list of values
[SPARK-22302] - Remove manual backports for subprocess.check_output and check_call
[SPARK-22308] - Support unit tests of spark code using ScalaTest using suites other than FunSuite
[SPARK-22313] - Mark/print deprecation warnings as DeprecationWarning for deprecated APIs
[SPARK-22315] - Check for version match between R package and JVM
[SPARK-22346] - Update VectorAssembler to work with Structured Streaming
[SPARK-22348] - The table cache providing ColumnarBatch should also do partition batch pruning
[SPARK-22366] - Support ignoreMissingFiles flag parallel to ignoreCorruptFiles
[SPARK-22372] - Make YARN client extend SparkApplication
[SPARK-22378] - Redundant nullcheck is generated for extracting value in complex types
[SPARK-22379] - Reduce duplication setUpClass and tearDownClass in PySpark SQL tests
[SPARK-22385] - MapObjects should not access list element by index
[SPARK-22397] - Add multiple column support to QuantileDiscretizer
[SPARK-22405] - Enrich the event information and add new event of ExternalCatalogEvent
[SPARK-22407] - Add rdd id column on storage page to speed up navigating
[SPARK-22408] - RelationalGroupedDataset's distinct pivot value calculation launches unnecessary stages
[SPARK-22422] - Add Adjusted R2 to RegressionMetrics
[SPARK-22445] - move CodegenContext.copyResult to CodegenSupport
[SPARK-22450] - Safely register class for mllib
[SPARK-22476] - Add new function dayofweek in R
[SPARK-22496] - beeline display operation log
[SPARK-22519] - Remove unnecessary stagingDirPath null check in ApplicationMaster.cleanupStagingDir()
[SPARK-22520] - Support code generation also for complex CASE WHEN
[SPARK-22537] - Aggregation of map output statistics on driver faces single point bottleneck
[SPARK-22554] - Add a config to control if PySpark should use daemon or not
[SPARK-22566] - Better error message for `_merge_type` in Pandas to Spark DF conversion
[SPARK-22569] - Clean up caller of splitExpressions and addMutableState
[SPARK-22592] - cleanup filter converting for hive
[SPARK-22596] - set ctx.currentVars in CodegenSupport.consume
[SPARK-22597] - Add spark-sql script for Windows users
[SPARK-22608] - Avoid code duplication regarding CodeGeneration.splitExpressions()
[SPARK-22614] - Expose range partitioning shuffle
[SPARK-22617] - make splitExpressions extract current input of the context
[SPARK-22638] - Use a separate query for StreamingQueryListenerBus
[SPARK-22649] - localCheckpoint support in Dataset API
[SPARK-22660] - Use position() and limit() to fix ambiguity issue in scala-2.12
[SPARK-22665] - Dataset API: .repartition() inconsistency / issue
[SPARK-22667] - Fix model-specific optimization support for ML tuning: Python API
[SPARK-22673] - InMemoryRelation should utilize on-disk table stats whenever possible
[SPARK-22675] - Refactoring PropagateTypes in TypeCoercion
[SPARK-22677] - cleanup whole stage codegen for hash aggregate
[SPARK-22682] - HashExpression does not need to create global variables
[SPARK-22688] - Upgrade Janino version to 3.0.8
[SPARK-22690] - Imputer inherit HasOutputCols
[SPARK-22692] - Reduce the number of generated mutable states
[SPARK-22701] - add ctx.splitExpressionsWithCurrentInputs
[SPARK-22704] - Reduce # of mutable variables in Least and greatest
[SPARK-22705] - Reduce # of mutable variables in Case, Coalesce, and In
[SPARK-22707] - Optimize CrossValidator memory occupation by models in fitting
[SPARK-22719] - refactor ConstantPropagation
[SPARK-22729] - Add getTruncateQuery to JdbcDialect
[SPARK-22753] - Get rid of dataSource.writeAndRead
[SPARK-22754] - Check spark.executor.heartbeatInterval setting in case of ExecutorLost
[SPARK-22763] - SHS: Ignore unknown events and parse through the file
[SPARK-22767] - use ctx.addReferenceObj in InSet and ScalaUDF
[SPARK-22771] - SQL concat for binary
[SPARK-22774] - Add compilation check for generated code in TPCDSQuerySuite
[SPARK-22786] - only use AppStatusPlugin in history server
[SPARK-22790] - add a configurable factor to describe HadoopFsRelation's size
[SPARK-22799] - Bucketizer should throw exception if single- and multi-column params are both set
[SPARK-22801] - Allow FeatureHasher to specify numeric columns to treat as categorical
[SPARK-22810] - PySpark supports LinearRegression with huber loss
[SPARK-22830] - Scala Coding style has been improved in Spark Examples
[SPARK-22832] - BisectingKMeans unpersist unused datasets
[SPARK-22833] - [Examples] Improvements made at SparkHive Example with Scala
[SPARK-22844] - R date_trunc API
[SPARK-22847] - Remove the duplicate code in AppStatusListener while assigning schedulingPool for stage
[SPARK-22870] - Dynamic allocation should allow 0 idle time
[SPARK-22874] - Modify checking pandas version to use LooseVersion.
[SPARK-22893] - Unified the data type mismatch message
[SPARK-22894] - DateTimeOperations should accept SQL like string type
[SPARK-22895] - Push down the deterministic predicates that are after the first non-deterministic
[SPARK-22896] - Improvement in String interpolation
[SPARK-22897] - Expose stageAttemptId in TaskContext
[SPARK-22914] - Subbing for spark.history.ui.port does not resolve by default
[SPARK-22919] - Bump Apache httpclient versions
[SPARK-22921] - Merge script should prompt for assigning jiras
[SPARK-22922] - Python API for fitMultiple
[SPARK-22937] - SQL elt for binary inputs
[SPARK-22939] - Support Spark UDF in registerFunction
[SPARK-22944] - improve FoldablePropagation
[SPARK-22945] - add java UDF APIs in the functions object
[SPARK-22952] - Deprecate stageAttemptId in favour of stageAttemptNumber
[SPARK-22960] - Make build-push-docker-images.sh more dev-friendly
[SPARK-22979] - Avoid per-record type dispatch in Python data conversion (EvaluatePython.fromJava)
[SPARK-22994] - Require a single container image for Spark-on-K8S
[SPARK-22997] - Add additional defenses against use of freed MemoryBlocks
[SPARK-22999] - 'show databases like command' can remove the like keyword
[SPARK-23005] - Improve RDD.take on small number of partitions
[SPARK-23029] - Doc spark.shuffle.file.buffer units are kb when no units specified
[SPARK-23032] - Add a per-query codegenStageId to WholeStageCodegenExec
[SPARK-23036] - Add withGlobalTempView for testing
[SPARK-23062] - EXCEPT documentation should make it clear that it's EXCEPT DISTINCT
[SPARK-23081] - Add colRegex API to PySpark
[SPARK-23090] - polish ColumnVector
[SPARK-23091] - Incorrect unit test for approxQuantile
[SPARK-23122] - Deprecate register* for UDFs in SQLContext and Catalog in PySpark
[SPARK-23129] - Lazy init DiskMapIterator#deserializeStream to reduce memory usage when ExternalAppendOnlyMap spill too many times
[SPARK-23141] - Support data type string as a returnType for registerJavaFunction.
[SPARK-23142] - Add documentation for Continuous Processing
[SPARK-23143] - Add Python support for continuous trigger
[SPARK-23144] - Add console sink for continuous queries
[SPARK-23149] - polish ColumnarBatch
[SPARK-23170] - Dump the statistics of effective runs of analyzer and optimizer rules
[SPARK-23199] - improved Removes repetition from group expressions in Aggregate
[SPARK-23238] - Externalize SQLConf spark.sql.execution.arrow.enabled
[SPARK-23248] - Relocate module docstrings to the top in PySpark examples
[SPARK-23249] - Improve partition bin-filling algorithm to have less skew and fewer partitions
[SPARK-23276] - Enable UDT tests in (Hive)OrcHadoopFsRelationSuite
[SPARK-23279] - Avoid triggering distributed job for Console sink
[SPARK-23284] - Document several get API of ColumnVector's behavior when accessing null slot
[SPARK-23296] - Diagnostics message for user code exceptions should include the stacktrace
[SPARK-23305] - Test `spark.sql.files.ignoreMissingFiles` for all file-based data sources
[SPARK-23312] - add a config to turn off vectorized cache reader
[SPARK-23317] - rename ContinuousReader.setOffset to setStartOffset
[SPARK-23454] - Add Trigger information to the Structured Streaming programming guide
[SPARK-23617] - Register a Function without params with Spark SQL Java API
[SPARK-23993] - Support DESC FORMATTED table_name column_name
[SPARK-24328] - Fix scala.MatchError in literals.sql.out
[SPARK-26542] - Support the coordinator to demerminte post-shuffle partitions more reasonably

Test

[SPARK-19662] - Add Fair Scheduler Unit Test coverage for different build cases
[SPARK-20518] - Supplement the new blockidsuite unit tests
[SPARK-20571] - Flaky SparkR StructuredStreaming tests
[SPARK-20607] - Add new unit tests to ShuffleSuite
[SPARK-20957] - Flaky Test: o.a.s.sql.streaming.StreamingQueryManagerSuite listing
[SPARK-21006] - Create rpcEnv and run later needs shutdown and awaitTermination
[SPARK-21128] - Running R tests multiple times failed due to pre-exiting "spark-warehouse" / "metastore_db"
[SPARK-21286] - [spark core UT]Modify a error for unit test
[SPARK-21370] - Avoid doing anything on HDFSBackedStateStore.abort() when there are no updates to commit
[SPARK-21464] - Minimize deprecation warnings caused by ProcessingTime class
[SPARK-21573] - Tests failing with run-tests.py SyntaxError occasionally in Jenkins
[SPARK-21663] - MapOutputTrackerSuite case test("remote fetch below max RPC message size") should call stop
[SPARK-21693] - AppVeyor tests reach the time limit, 1.5 hours, sometimes in SparkR tests
[SPARK-21729] - Generic test for ProbabilisticClassifier to ensure consistent output columns
[SPARK-21764] - Tests failures on Windows: resources not being closed and incorrect paths
[SPARK-21843] - testNameNote should be "(minNumPostShufflePartitions: " + numPartitions + ")" in ExchangeCoordinatorSuite
[SPARK-21936] - backward compatibility test framework for HiveExternalCatalog
[SPARK-21949] - Tables created in unit tests should be dropped after use
[SPARK-21982] - Set Locale to US in order to pass UtilsSuite when your jvm Locale is not US
[SPARK-22140] - Add a test suite for TPCDS queries
[SPARK-22161] - Add Impala-modified TPC-DS queries
[SPARK-22418] - Add test cases for NULL Handling
[SPARK-22423] - Scala test source files like TestHiveSingleton.scala should be in scala source root
[SPARK-22595] - flaky test: CastSuite.SPARK-22500: cast for struct should not generate codes beyond 64KB
[SPARK-22644] - Make ML testsuite support StructuredStreaming test
[SPARK-22787] - Add a TPCH query suite
[SPARK-22800] - Add a SSB query suite
[SPARK-22881] - ML test for StructuredStreaming: spark.ml.regression
[SPARK-22938] - Assert that SQLConf.get is accessed only on the driver.
[SPARK-23072] - Add a Unicode schema test for file-based data sources
[SPARK-23132] - Run ml.image doctests in tests
[SPARK-23300] - Print out if Pandas and PyArrow are installed or not in tests
[SPARK-23311] - add FilterFunction test case for test CombineTypedFilters
[SPARK-23319] - Skip PySpark tests for old Pandas and old PyArrow

Task

[SPARK-12297] - Add work-around for Parquet/Hive int96 timestamp bug.
[SPARK-19810] - Remove support for Scala 2.10
[SPARK-20434] - Move Hadoop delegation token code from yarn to core
[SPARK-21366] - Add sql test for window functions
[SPARK-21699] - Remove unused getTableOption in ExternalCatalog
[SPARK-21731] - Upgrade scalastyle to 0.9
[SPARK-21848] - Create trait to identify user-defined functions
[SPARK-21939] - Use TimeLimits instead of Timeouts
[SPARK-22153] - Rename ShuffleExchange -> ShuffleExchangeExec
[SPARK-22416] - Move OrcOptions from `sql/hive` to `sql/core`
[SPARK-22473] - Replace deprecated AsyncAssertions.Waiter and methods of java.sql.Date
[SPARK-22485] - Use `exclude[Problem]` instead `excludePackage` in MiMa
[SPARK-22634] - Update Bouncy castle dependency
[SPARK-22672] - Refactor ORC Tests
[SPARK-23104] - Document that kubernetes is still "experimental"
[SPARK-23426] - Use `hive` ORC impl and disable PPD for Spark 2.3.0

Dependency upgrade

[SPARK-15526] - Shade JPMML

Brainstorming

[SPARK-7146] - Should ML sharedParams be a public API?

Umbrella

[SPARK-18085] - SPIP: Better History Server scalability for many / large applications
[SPARK-20746] - Built-in SQL Function Improvement
[SPARK-21926] - Compatibility between ML Transformers and Structured Streaming
[SPARK-22820] - Spark 2.3 SQL API audit
[SPARK-23105] - Spark MLlib, GraphX 2.3 QA umbrella

New JIRA Project

[SPARK-20758] - Add Constant propagation optimization

Documentation

[SPARK-20015] - Document R Structured Streaming (experimental) in R vignettes and R & SS programming guide, R example
[SPARK-20132] - Add documentation for column string functions
[SPARK-20192] - SparkR 2.2.0 migration guide, release note
[SPARK-20442] - Fill up documentations for functions in Column API in PySpark
[SPARK-20448] - Document how FileInputDStream works with object storage
[SPARK-20456] - Add examples for functions collection for pyspark
[SPARK-20477] - Document R bisecting k-means in R programming guide
[SPARK-20478] - Document LinearSVC in R programming guide
[SPARK-20855] - Update the Spark kinesis docs to use the KinesisInputDStream builder instead of deprecated KinesisUtils
[SPARK-20858] - Document ListenerBus event queue size property
[SPARK-20889] - SparkR grouped documentation for Column methods
[SPARK-20992] - Link to Nomad scheduler backend in docs
[SPARK-21042] - Document Dataset.union is resolution by position, not name
[SPARK-21069] - Add rate source to programming guide
[SPARK-21123] - Options for file stream source are in a wrong table
[SPARK-21292] - R document Catalog function metadata refresh
[SPARK-21293] - R document update structured streaming
[SPARK-21469] - Add doc and example for FeatureHasher
[SPARK-21485] - API Documentation for Spark SQL functions
[SPARK-21616] - SparkR 2.3.0 migration guide, release note
[SPARK-21712] - Clarify PySpark Column.substr() type checking error message
[SPARK-21724] - Missing since information in the documentation of date functions
[SPARK-21925] - Update trigger interval documentation in docs with behavior change in Spark 2.2
[SPARK-21976] - Fix wrong doc about Mean Absolute Error
[SPARK-22110] - Enhance function description trim string function
[SPARK-22335] - Union for DataSet uses column order instead of types for union
[SPARK-22369] - PySpark: Document methods of spark.catalog interface
[SPARK-22399] - reference in mllib-clustering.html is out of date
[SPARK-22412] - Fix incorrect comment in DataSourceScanExec
[SPARK-22428] - Document spark properties for configuring the ContextCleaner
[SPARK-22490] - PySpark doc has misleading string for SparkSession.builder
[SPARK-22541] - Dataframes: applying multiple filters one after another using udfs and accumulators results in faulty accumulators
[SPARK-22735] - Add VectorSizeHint to ML features documentation
[SPARK-22993] - checkpointInterval param doc should be clearer
[SPARK-23048] - Update mllib docs to replace OneHotEncoder with OneHotEncoderEstimator
[SPARK-23069] - R doc for describe missing text
[SPARK-23127] - Update FeatureHasher user guide for catCols parameter
[SPARK-23138] - Add user guide example for multiclass logistic regression summary
[SPARK-23154] - Document backwards compatibility guarantees for ML persistence
[SPARK-23163] - Sync Python ML API docs with Scala
[SPARK-23313] - Add a migration guide for ORC
[SPARK-23327] - Update the description of three external API or functions

Edit/Copy Release Notes

The text area below allows the project release notes to be edited and copied to another document.

Release Notes - Spark - Version 2.3.0
    
<h2>        Sub-task
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-9104'>SPARK-9104</a>] -         expose network layer memory usage
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-10365'>SPARK-10365</a>] -         Support Parquet logical type TIMESTAMP_MICROS
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-11034'>SPARK-11034</a>] -         Launcher: add support for monitoring Mesos apps
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-11035'>SPARK-11035</a>] -         Launcher: allow apps to be launched in-process
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-12375'>SPARK-12375</a>] -         VectorIndexer: allow unknown categories
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13534'>SPARK-13534</a>] -         Implement Apache Arrow serializer for Spark DataFrame for use in DataFrame.toPandas
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13969'>SPARK-13969</a>] -         Extend input format that feature hashing can handle
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14280'>SPARK-14280</a>] -         Update change-version.sh and pom.xml to add Scala 2.12 profiles
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14650'>SPARK-14650</a>] -         Compile Spark REPL for Scala 2.12
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14878'>SPARK-14878</a>] -         Support Trim characters in the string trim function
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-17074'>SPARK-17074</a>] -         generate equi-height histogram for column
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-17139'>SPARK-17139</a>] -         Add model summary for MultinomialLogisticRegression
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-17642'>SPARK-17642</a>] -         Support DESC FORMATTED TABLE COLUMN command to show column-level statistics
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-17729'>SPARK-17729</a>] -         Enable creating hive bucketed tables
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-18016'>SPARK-18016</a>] -         Code Generation: Constant Pool Past Limit for Wide/Nested Dataset
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-18294'>SPARK-18294</a>] -         Implement commit protocol to support `mapred` package&#39;s committer
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19165'>SPARK-19165</a>] -         UserDefinedFunction should verify call arguments and provide readable exception in case of mismatch
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19357'>SPARK-19357</a>] -         Parallel Model Evaluation for ML Tuning: Scala
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19634'>SPARK-19634</a>] -         Feature parity for descriptive statistics in MLlib
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19762'>SPARK-19762</a>] -         Implement aggregator/loss function hierarchy and apply to linear regression
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19791'>SPARK-19791</a>] -         Add doc and example for fpgrowth
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20396'>SPARK-20396</a>] -         groupBy().apply() with pandas udf in pyspark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20417'>SPARK-20417</a>] -         Move error reporting for subquery from Analyzer to CheckAnalysis
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20585'>SPARK-20585</a>] -         R generic hint support
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20641'>SPARK-20641</a>] -         Key-value store abstraction and implementation for storing application data
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20642'>SPARK-20642</a>] -         Use key-value store to keep History Server application listing
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20643'>SPARK-20643</a>] -         Implement listener for saving application status data in key-value store
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20644'>SPARK-20644</a>] -         Hook up Spark UI to the new key-value store backend
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20645'>SPARK-20645</a>] -         Make Environment page use new app state store
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20646'>SPARK-20646</a>] -         Make Executors page use new app state store
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20647'>SPARK-20647</a>] -         Make the Storage page use new app state store
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20648'>SPARK-20648</a>] -         Make Jobs and Stages pages use the new app state store
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20649'>SPARK-20649</a>] -         Simplify REST API class hierarchy
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20650'>SPARK-20650</a>] -         Remove JobProgressListener (and other unneeded classes)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20652'>SPARK-20652</a>] -         Make SQL UI use new app state store
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20653'>SPARK-20653</a>] -         Add auto-cleanup of old elements to the new app state store
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20654'>SPARK-20654</a>] -         Add controls for how much disk the SHS can use
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20655'>SPARK-20655</a>] -         In-memory key-value store implementation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20657'>SPARK-20657</a>] -         Speed up Stage page
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20664'>SPARK-20664</a>] -         Remove stale applications from SHS listing
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20727'>SPARK-20727</a>] -         Skip SparkR tests when missing Hadoop winutils on CRAN windows machines
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20748'>SPARK-20748</a>] -         Built-in SQL Function Support - CH[A]R
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20749'>SPARK-20749</a>] -         Built-in SQL Function Support - all variants of LEN[GTH]
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20750'>SPARK-20750</a>] -         Built-in SQL Function Support - REPLACE
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20751'>SPARK-20751</a>] -         Built-in SQL Function Support - COT
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20754'>SPARK-20754</a>] -         Add Function Alias For MOD/TRUNCT/POSITION
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20770'>SPARK-20770</a>] -         Improve ColumnStats
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20783'>SPARK-20783</a>] -         Enhance ColumnVector to support compressed representation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20791'>SPARK-20791</a>] -         Use Apache Arrow to Improve Spark createDataFrame from Pandas.DataFrame
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20822'>SPARK-20822</a>] -         Generate code to get value from CachedBatchColumnVector in ColumnarBatch
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20881'>SPARK-20881</a>] -         Clearly document the mechanism to choose between two sources of statistics
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20909'>SPARK-20909</a>] -         Build-in SQL Function Support - DAYOFWEEK
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20910'>SPARK-20910</a>] -         Build-in SQL Function Support - UUID
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20931'>SPARK-20931</a>] -         Built-in SQL Function ABS support string type
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20948'>SPARK-20948</a>] -         Built-in SQL Function UnaryMinus/UnaryPositive support string type
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20961'>SPARK-20961</a>] -         generalize the dictionary in ColumnVector
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20962'>SPARK-20962</a>] -         Support subquery column aliases in FROM clause
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20963'>SPARK-20963</a>] -         Support column aliases for aliased relation in FROM clause
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20988'>SPARK-20988</a>] -         Convert logistic regression to new aggregator framework
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21007'>SPARK-21007</a>] -         Add  SQL function - RIGHT &amp;&amp; LEFT
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21031'>SPARK-21031</a>] -         Add `alterTableStats` to store spark&#39;s stats and let `alterTable` keep existing stats
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21046'>SPARK-21046</a>] -         simplify the array offset and length in ColumnVector
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21047'>SPARK-21047</a>] -         Add test suites for complicated cases in ColumnarBatchSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21051'>SPARK-21051</a>] -         Add hash map metrics to aggregate
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21052'>SPARK-21052</a>] -         Add hash map metrics to join
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21083'>SPARK-21083</a>] -         Store zero size and row count after analyzing empty table
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21087'>SPARK-21087</a>] -         CrossValidator, TrainValidationSplit should collect all models when fitting: Scala API
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21127'>SPARK-21127</a>] -         Update statistics after data changing commands
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21180'>SPARK-21180</a>] -         Remove conf from stats functions since now we have conf in LogicalPlan
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21190'>SPARK-21190</a>] -         SPIP: Vectorized UDFs in Python
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21205'>SPARK-21205</a>] -         pmod(number, 0) should  be null
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21213'>SPARK-21213</a>] -         Support collecting partition-level statistics: rowCount and sizeInBytes
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21237'>SPARK-21237</a>] -         Invalidate stats once table data is changed
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21322'>SPARK-21322</a>] -         support histogram in filter cardinality estimation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21324'>SPARK-21324</a>] -         Improve statistics test suites
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21375'>SPARK-21375</a>] -         Add date and timestamp support to ArrowConverters for toPandas() collection
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21440'>SPARK-21440</a>] -         Refactor ArrowConverters and add ArrayType and StructType support.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21456'>SPARK-21456</a>] -         Make the driver failover_timeout configurable (Mesos cluster mode)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21552'>SPARK-21552</a>] -         Add decimal type support to ArrowWriter.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21625'>SPARK-21625</a>] -         Add incompatible Hive UDF describe to DOC
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21654'>SPARK-21654</a>] -         Complement predicates expression description
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21671'>SPARK-21671</a>] -         Move kvstore package to util.kvstore, add annotations
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21720'>SPARK-21720</a>] -         Filter predicate with many conditions throw stackoverflow error
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21778'>SPARK-21778</a>] -         Simpler Dataset.sample API in Scala / Java
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21779'>SPARK-21779</a>] -         Simpler Dataset.sample API in Python
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21780'>SPARK-21780</a>] -         Simpler Dataset.sample API in R
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21805'>SPARK-21805</a>] -         disable R vignettes code on Windows
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21893'>SPARK-21893</a>] -         Put Kafka 0.8 behind a profile
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21895'>SPARK-21895</a>] -         Support changing database in HiveClient
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21934'>SPARK-21934</a>] -         Expose Netty memory usage via Metrics System
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21984'>SPARK-21984</a>] -         Use histogram stats in join estimation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22026'>SPARK-22026</a>] -         data source v2 write path
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22032'>SPARK-22032</a>] -         Speed up StructType.fromInternal
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22053'>SPARK-22053</a>] -         Implement stream-stream inner join in Append mode
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22078'>SPARK-22078</a>] -         clarify exception behaviors for all data source v2 interfaces
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22086'>SPARK-22086</a>] -         Add expression description for CASE WHEN
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22087'>SPARK-22087</a>] -         Clear remaining compile errors for 2.12; resolve most warnings
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22100'>SPARK-22100</a>] -         Make percentile_approx support date/timestamp type and change the output type to be the same as input type
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22128'>SPARK-22128</a>] -         Update paranamer to 2.8 to avoid BytecodeReadingParanamer ArrayIndexOutOfBoundsException with Scala 2.12 + Java 8 lambda
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22136'>SPARK-22136</a>] -         Implement stream-stream outer joins in append mode
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22197'>SPARK-22197</a>] -         push down operators to data source before planning
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22221'>SPARK-22221</a>] -         Add User Documentation for Working with Arrow in Spark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22226'>SPARK-22226</a>] -         splitExpression can create too many method calls (generating a Constant Pool limit error)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22278'>SPARK-22278</a>] -         Expose current event time watermark and current processing time in GroupState
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22285'>SPARK-22285</a>] -         Change implementation of ApproxCountDistinctForIntervals to TypedImperativeAggregate
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22310'>SPARK-22310</a>] -         Refactor join estimation to incorporate estimation logic for different kinds of statistics
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22322'>SPARK-22322</a>] -         Update FutureAction for compatibility with Scala 2.12 future
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22324'>SPARK-22324</a>] -         Upgrade Arrow to version 0.8.0 and upgrade Netty to 4.1.17 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22344'>SPARK-22344</a>] -         Prevent R CMD check from using /tmp
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22361'>SPARK-22361</a>] -         Add unit test for Window Frames
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22363'>SPARK-22363</a>] -         Add unit test for Window spilling
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22387'>SPARK-22387</a>] -         propagate session configs to data source read/write options
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22389'>SPARK-22389</a>] -         partitioning reporting
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22392'>SPARK-22392</a>] -         columnar reader interface 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22400'>SPARK-22400</a>] -         rename some APIs and classes to make their meaning clearer
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22409'>SPARK-22409</a>] -         Add function type argument to pandas_udf
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22452'>SPARK-22452</a>] -         DataSourceV2Options should have getInt, getBoolean, etc.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22475'>SPARK-22475</a>] -         show histogram in DESC COLUMN command
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22483'>SPARK-22483</a>] -         Exposing java.nio bufferedPool memory metrics to metrics system
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22494'>SPARK-22494</a>] -         Coalesce and AtLeastNNonNulls can cause 64KB JVM bytecode limit exception
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22498'>SPARK-22498</a>] -         64KB JVM bytecode limit problem with concat
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22499'>SPARK-22499</a>] -         64KB JVM bytecode limit problem with least and greatest
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22500'>SPARK-22500</a>] -         64KB JVM bytecode limit problem with cast
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22501'>SPARK-22501</a>] -         64KB JVM bytecode limit problem with in
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22508'>SPARK-22508</a>] -         64KB JVM bytecode limit problem with GenerateUnsafeRowJoiner.create()
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22514'>SPARK-22514</a>] -         move ColumnVector.Array and ColumnarBatch.Row to individual files
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22515'>SPARK-22515</a>] -         Estimation relation size based on numRows * rowSize
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22529'>SPARK-22529</a>] -         Relation stats should be consistent with other plans based on cbo config
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22530'>SPARK-22530</a>] -         Add ArrayType Support for working with Pandas and Arrow
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22542'>SPARK-22542</a>] -         remove unused features in ColumnarBatch
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22543'>SPARK-22543</a>] -         fix java 64kb compile error for deeply nested expressions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22549'>SPARK-22549</a>] -         64KB JVM bytecode limit problem with concat_ws
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22550'>SPARK-22550</a>] -         64KB JVM bytecode limit problem with elt
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22570'>SPARK-22570</a>] -         Create a lot of global variables to reuse an object in generated code
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22602'>SPARK-22602</a>] -         remove ColumnVector#loadBytes
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22603'>SPARK-22603</a>] -         64KB JVM bytecode limit problem with FormatString
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22604'>SPARK-22604</a>] -         remove the get address methods from ColumnVector
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22626'>SPARK-22626</a>] -         Wrong Hive table statistics may trigger OOM if enables CBO
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22643'>SPARK-22643</a>] -         ColumnarArray should be an immutable view
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22646'>SPARK-22646</a>] -         Spark on Kubernetes - basic submission client
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22648'>SPARK-22648</a>] -         Documentation for Kubernetes Scheduler Backend
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22652'>SPARK-22652</a>] -         remove set methods in ColumnarRow
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22669'>SPARK-22669</a>] -         Avoid unnecessary function calls in code generation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22693'>SPARK-22693</a>] -         Avoid the generation of useless mutable states in complexTypeCreator and predicates
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22695'>SPARK-22695</a>] -         Avoid the generation of useless mutable states by scalaUDF
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22696'>SPARK-22696</a>] -         Avoid the generation of useless mutable states by objects functions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22699'>SPARK-22699</a>] -         Avoid the generation of useless mutable states by GenerateSafeProjection
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22703'>SPARK-22703</a>] -         ColumnarRow should be an immutable view
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22716'>SPARK-22716</a>] -         Avoid the creation of mutable states in addReferenceObj
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22732'>SPARK-22732</a>] -         Add DataSourceV2 streaming APIs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22733'>SPARK-22733</a>] -         refactor StreamExecution for extensibility
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22745'>SPARK-22745</a>] -         read partition stats from Hive
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22746'>SPARK-22746</a>] -         Avoid the generation of useless mutable states by SortMergeJoin
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22750'>SPARK-22750</a>] -         Introduce reusable mutable states
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22757'>SPARK-22757</a>] -         Init-container in the driver/executor pods for downloading remote dependencies
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22762'>SPARK-22762</a>] -         Basic tests for IfCoercion and CaseWhenCoercion
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22772'>SPARK-22772</a>] -         elt should use splitExpressionsWithCurrentInputs to split expression codes
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22775'>SPARK-22775</a>] -         move dictionary related APIs from ColumnVector to WritableColumnVector
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22785'>SPARK-22785</a>] -         rename ColumnVector.anyNullsSet to hasNull
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22789'>SPARK-22789</a>] -         Add ContinuousExecution for continuous processing queries
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22807'>SPARK-22807</a>] -         Change configuration options to use &quot;container&quot; instead of &quot;docker&quot;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22816'>SPARK-22816</a>] -         Basic tests for PromoteStrings and InConversion
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22821'>SPARK-22821</a>] -         Basic tests for WidenSetOperationTypes, BooleanEquality, StackCoercion and Division
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22822'>SPARK-22822</a>] -         Basic tests for WindowFrameCoercion and DecimalPrecision
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22829'>SPARK-22829</a>] -         Add new built-in function date_trunc()
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22845'>SPARK-22845</a>] -         Modify spark.kubernetes.allocation.batch.delay to take time instead of int
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22848'>SPARK-22848</a>] -         Avoid the generation of useless mutable states by Stack function
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22890'>SPARK-22890</a>] -         Basic tests for DateTimeOperations
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22892'>SPARK-22892</a>] -         Simplify some estimation logic by using double instead of decimal
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22904'>SPARK-22904</a>] -         Basic tests for decimal operations and string cast
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22908'>SPARK-22908</a>] -         add basic continuous kafka source
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22909'>SPARK-22909</a>] -         Move Structured Streaming v2 APIs to streaming package
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22912'>SPARK-22912</a>] -         Support v2 streaming sources and sinks in MicroBatchExecution
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22917'>SPARK-22917</a>] -         Should not try to generate histogram for empty/null columns
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22930'>SPARK-22930</a>] -         Improve the description of Vectorized UDFs for non-deterministic cases
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22978'>SPARK-22978</a>] -         Register Scalar Vectorized UDFs for SQL Statement
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22980'>SPARK-22980</a>] -         Using pandas_udf when inputs are not Pandas&#39;s Series or DataFrame
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23033'>SPARK-23033</a>] -         disable task-level retry for continuous execution
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23045'>SPARK-23045</a>] -         Have RFormula use OneHotEncoderEstimator
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23046'>SPARK-23046</a>] -         Have RFormula include VectorSizeHint in pipeline
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23047'>SPARK-23047</a>] -         Change MapVector to NullableMapVector in ArrowColumnVector
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23052'>SPARK-23052</a>] -         Migrate Microbatch ConsoleSink to v2
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23063'>SPARK-23063</a>] -         Changes to publish the spark-kubernetes package
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23064'>SPARK-23064</a>] -         Add documentation for stream-stream joins
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23093'>SPARK-23093</a>] -         don&#39;t modify run id
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23107'>SPARK-23107</a>] -         ML, Graph 2.3 QA: API: New Scala APIs, docs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23108'>SPARK-23108</a>] -         ML, Graph 2.3 QA: API: Experimental, DeveloperApi, final, sealed audit
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23110'>SPARK-23110</a>] -         ML 2.3 QA: API: Java compatibility, docs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23111'>SPARK-23111</a>] -         ML, Graph 2.3 QA: Update user guide for new features &amp; APIs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23112'>SPARK-23112</a>] -         ML, Graph 2.3 QA: Programming guide update and migration guide
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23116'>SPARK-23116</a>] -         SparkR 2.3 QA: Update user guide for new features &amp; APIs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23118'>SPARK-23118</a>] -         SparkR 2.3 QA: Programming guide, migration guide, vignettes updates
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23137'>SPARK-23137</a>] -         spark.kubernetes.executor.podNamePrefix is ignored
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23196'>SPARK-23196</a>] -         Unify continuous and microbatch V2 sinks
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23218'>SPARK-23218</a>] -         simplify ColumnVector.getArray
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23219'>SPARK-23219</a>] -         Rename ReadTask to DataReaderFactory
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23260'>SPARK-23260</a>] -         remove V2 from the class name of data source reader/writer
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23261'>SPARK-23261</a>] -         Rename Pandas UDFs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23262'>SPARK-23262</a>] -         mix-in interface should extend the interface it aimed to mix in
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23268'>SPARK-23268</a>] -         Reorganize packages in data source V2
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23272'>SPARK-23272</a>] -         add calendar interval type support to ColumnVector
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23280'>SPARK-23280</a>] -         add map type support to ColumnVector
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23314'>SPARK-23314</a>] -         Pandas grouped udf on dataset with timestamp column error 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23334'>SPARK-23334</a>] -         Fix pandas_udf with return type StringType() to handle str type properly in Python 2.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23352'>SPARK-23352</a>] -         Explicitly specify supported types in Pandas UDFs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23446'>SPARK-23446</a>] -         Explicitly check supported types in toPandas
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24077'>SPARK-24077</a>] -         Issue a better error message for `CREATE TEMPORARY FUNCTION IF NOT EXISTS`
</li>
</ul>
            
<h2>        Bug
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-3151'>SPARK-3151</a>] -         DiskStore attempts to map any size BlockId without checking MappedByteBuffer limit
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-3577'>SPARK-3577</a>] -         Add task metric to report spill time
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-3685'>SPARK-3685</a>] -         Spark&#39;s local dir should accept only local paths
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-5484'>SPARK-5484</a>] -         Pregel should checkpoint periodically to avoid StackOverflowError
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-9825'>SPARK-9825</a>] -         Spark overwrites remote cluster &quot;final&quot; properties with local config 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-10719'>SPARK-10719</a>] -         SQLImplicits.rddToDataFrameHolder is not thread safe when using Scala 2.10
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-11334'>SPARK-11334</a>] -         numRunningTasks can&#39;t be less than 0, or it will affect executor allocation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-12552'>SPARK-12552</a>] -         Recovered driver&#39;s resource is not counted in the Master
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-12559'>SPARK-12559</a>] -         Cluster mode doesn&#39;t work with --packages
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-12717'>SPARK-12717</a>] -         pyspark broadcast fails when using multiple threads
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13669'>SPARK-13669</a>] -         Job will always fail in the external shuffle service unavailable situation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13757'>SPARK-13757</a>] -         support quoted column names in schema string at types.py#_parse_datatype_string
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13933'>SPARK-13933</a>] -         hadoop-2.7 profile&#39;s curator version should be 2.7.1
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13983'>SPARK-13983</a>] -         HiveThriftServer2 can not get &quot;--hiveconf&quot; or &#39;&#39;--hivevar&quot; variables since 1.6 version (both multi-session and single session)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14034'>SPARK-14034</a>] -         Converting to Dataset causes wrong order and values in nested array of documents
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14228'>SPARK-14228</a>] -         Lost executor of RPC disassociated, and occurs exception: Could not find CoarseGrainedScheduler or it has been stopped
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14387'>SPARK-14387</a>] -         Enable Hive-1.x ORC compatibility with spark.sql.hive.convertMetastoreOrc
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14408'>SPARK-14408</a>] -         Update RDD.treeAggregate not to use reduce
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14657'>SPARK-14657</a>] -         RFormula output wrong features when formula w/o intercept
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-15243'>SPARK-15243</a>] -         Binarizer.explainParam(u&quot;...&quot;) raises ValueError
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-15474'>SPARK-15474</a>] -          ORC data source fails to write and read back empty dataframe
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-16167'>SPARK-16167</a>] -         RowEncoder should preserve array/map type nullability.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-16542'>SPARK-16542</a>] -         bugs about types that result an array of null when creating dataframe using python
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-16548'>SPARK-16548</a>] -         java.io.CharConversionException: Invalid UTF-32 character  prevents me from querying my data
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-16605'>SPARK-16605</a>] -         Spark2.0 cannot &quot;select&quot; data from a table stored as an orc file which has been created by hive while hive or spark1.6 supports
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-16628'>SPARK-16628</a>] -         OrcConversions should not convert an ORC table represented by MetastoreRelation to HadoopFsRelation if metastore schema does not match schema stored in ORC files
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-16986'>SPARK-16986</a>] -         &quot;Started&quot; time, &quot;Completed&quot; time and &quot;Last Updated&quot; time in history server UI are not user local time
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-17029'>SPARK-17029</a>] -         Dataset toJSON goes through RDD form instead of transforming dataset itself
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-17047'>SPARK-17047</a>] -         Spark 2 cannot create table when CLUSTERED.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-17284'>SPARK-17284</a>] -         Remove statistics-related table properties from SHOW CREATE TABLE
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-17321'>SPARK-17321</a>] -         YARN shuffle service should use good disk from yarn.nodemanager.local-dirs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-17410'>SPARK-17410</a>] -         Move Hive-generated Stats Info to HiveClientImpl
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-17528'>SPARK-17528</a>] -         data should be copied properly before saving into InternalRow
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-17742'>SPARK-17742</a>] -         Spark Launcher does not get failed state in Listener 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-17788'>SPARK-17788</a>] -         RangePartitioner results in few very large tasks and many small to empty tasks 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-17851'>SPARK-17851</a>] -         Make sure all test sqls in catalyst pass checkAnalysis
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-17902'>SPARK-17902</a>] -         collect() ignores stringsAsFactors
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-17914'>SPARK-17914</a>] -         Spark SQL casting to TimestampType with nanosecond results in incorrect timestamp
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-17920'>SPARK-17920</a>] -         HiveWriterContainer passes null configuration to serde.initialize, causing NullPointerException in AvroSerde when using avro.schema.url
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-18004'>SPARK-18004</a>] -         DataFrame filter Predicate push-down fails for Oracle Timestamp type columns
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-18061'>SPARK-18061</a>] -         Spark Thriftserver needs to create SPNego principal
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-18355'>SPARK-18355</a>] -         Spark SQL fails to read data from a ORC hive table that has a new column added to it
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-18394'>SPARK-18394</a>] -         Executing the same query twice in a row results in CodeGenerator cache misses
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-18608'>SPARK-18608</a>] -         Spark ML algorithms that check RDD cache level for internal caching double-cache data
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-18646'>SPARK-18646</a>] -         ExecutorClassLoader for spark-shell does not honor spark.executor.userClassPathFirst
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-18935'>SPARK-18935</a>] -         Use Mesos &quot;Dynamic Reservation&quot; resource for Spark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-18950'>SPARK-18950</a>] -         Report conflicting fields when merging two StructTypes.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19109'>SPARK-19109</a>] -         ORC metadata section can sometimes exceed protobuf message size limit
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19122'>SPARK-19122</a>] -         Unnecessary shuffle+sort added if join predicates ordering differ from bucketing and sorting order
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19326'>SPARK-19326</a>] -         Speculated task attempts do not get launched in few scenarios
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19372'>SPARK-19372</a>] -         Code generation for Filter predicate including many OR conditions exceeds JVM method size limit 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19451'>SPARK-19451</a>] -         rangeBetween method should accept Long value as boundary
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19471'>SPARK-19471</a>] -         A confusing NullPointerException when creating table
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19531'>SPARK-19531</a>] -         History server doesn&#39;t refresh jobs for long-life apps like thriftserver
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19580'>SPARK-19580</a>] -         Support for avro.schema.url while writing to hive table
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19644'>SPARK-19644</a>] -         Memory leak in Spark Streaming (Encoder/Scala Reflection)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19688'>SPARK-19688</a>] -         Spark on Yarn Credentials File set to different application directory
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19726'>SPARK-19726</a>] -         Faild to insert null timestamp value to mysql using spark jdbc
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19753'>SPARK-19753</a>] -         Remove all shuffle files on a host in case of slave lost of fetch failure
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19809'>SPARK-19809</a>] -         NullPointerException on zero-size ORC file
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19812'>SPARK-19812</a>] -         YARN shuffle service fails to relocate recovery DB across NFS directories
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19824'>SPARK-19824</a>] -         Standalone master JSON not showing cores for running applications
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19900'>SPARK-19900</a>] -         [Standalone] Master registers application again when driver relaunched
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19910'>SPARK-19910</a>] -         `stack` should not reject NULL values due to type mismatch
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20025'>SPARK-20025</a>] -         Driver fail over will not work, if SPARK_LOCAL* env is set.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20065'>SPARK-20065</a>] -         Empty output files created for aggregation query in append mode
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20079'>SPARK-20079</a>] -         Re registration of AM hangs spark cluster in yarn-client mode
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20098'>SPARK-20098</a>] -         DataType&#39;s typeName method returns with &#39;StructF&#39; in case of StructField
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20140'>SPARK-20140</a>] -         Remove hardcoded kinesis retry wait and max retries
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20205'>SPARK-20205</a>] -         DAGScheduler posts SparkListenerStageSubmitted before updating stage
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20213'>SPARK-20213</a>] -         DataFrameWriter operations do not show up in SQL tab
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20256'>SPARK-20256</a>] -         Fail to start SparkContext/SparkSession with Hive support enabled when user does not have read/write privilege to Hive metastore warehouse dir
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20288'>SPARK-20288</a>] -         Improve BasicSchedulerIntegrationSuite &quot;multi-stage job&quot;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20311'>SPARK-20311</a>] -         SQL &quot;range(N) as alias&quot; or &quot;range(N) alias&quot; doesn&#39;t work
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20312'>SPARK-20312</a>] -         query optimizer calls udf with null values when it doesn&#39;t expect them
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20329'>SPARK-20329</a>] -         Resolution error when HAVING clause uses GROUP BY expression that involves implicit type coercion
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20333'>SPARK-20333</a>] -         Fix HashPartitioner in DAGSchedulerSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20338'>SPARK-20338</a>] -         Spaces in spark.eventLog.dir are not correctly handled
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20341'>SPARK-20341</a>] -         Support BigIngeger values &gt; 19 precision
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20342'>SPARK-20342</a>] -         DAGScheduler sends SparkListenerTaskEnd before updating task&#39;s accumulators
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20345'>SPARK-20345</a>] -         Fix STS error handling logic on HiveSQLException
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20356'>SPARK-20356</a>] -         Spark sql group by returns incorrect results after join + distinct transformations
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20359'>SPARK-20359</a>] -         Catalyst EliminateOuterJoin optimization can cause NPE
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20365'>SPARK-20365</a>] -         Not so accurate classpath format for AM and Containers
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20367'>SPARK-20367</a>] -         Spark silently escapes partition column names
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20380'>SPARK-20380</a>] -         describe table not showing updated table comment after alter operation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20412'>SPARK-20412</a>] -         NullPointerException in places expecting non-optional partitionSpec.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20427'>SPARK-20427</a>] -         Issue with Spark interpreting Oracle datatype NUMBER
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20439'>SPARK-20439</a>] -         Catalog.listTables() depends on all libraries used to create tables
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20451'>SPARK-20451</a>] -         Filter out nested mapType datatypes from sort order in randomSplit
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20453'>SPARK-20453</a>] -         Bump master branch version to 2.3.0-SNAPSHOT
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20466'>SPARK-20466</a>] -         HadoopRDD#addLocalConfiguration throws NPE
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20541'>SPARK-20541</a>] -         SparkR SS should support awaitTermination without timeout
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20543'>SPARK-20543</a>] -         R should skip long running or non-essential tests when running on CRAN
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20565'>SPARK-20565</a>] -         Improve the error message for unsupported JDBC types
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20569'>SPARK-20569</a>] -         RuntimeReplaceable functions accept invalid third parameter
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20586'>SPARK-20586</a>] -         Add deterministic to ScalaUDF
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20591'>SPARK-20591</a>] -         Succeeded tasks num not equal in job page and job detail page on spark web ui when speculative task(s) exist
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20605'>SPARK-20605</a>] -         Deprecate not used AM and executor port configuration
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20609'>SPARK-20609</a>] -         Run the SortShuffleSuite unit tests have residual spark_* system directory
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20613'>SPARK-20613</a>] -         Double quotes in Windows batch script
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20626'>SPARK-20626</a>] -         Fix SparkR test warning on Windows with timestamp time zone
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20633'>SPARK-20633</a>] -         FileFormatWriter wrap the FetchFailedException which breaks job&#39;s failover
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20640'>SPARK-20640</a>] -         Make rpc timeout and retry for shuffle registration configurable
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20689'>SPARK-20689</a>] -         python doctest leaking bucketed table
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20690'>SPARK-20690</a>] -         Subqueries in FROM should have alias names
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20704'>SPARK-20704</a>] -         CRAN test should run single threaded
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20706'>SPARK-20706</a>] -         Spark-shell not overriding method/variable definition
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20708'>SPARK-20708</a>] -         Make `addExclusionRules` up-to-date
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20713'>SPARK-20713</a>] -         Speculative task that got CommitDenied exception shows up as failed
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20719'>SPARK-20719</a>] -         Support LIMIT ALL
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20756'>SPARK-20756</a>] -         yarn-shuffle jar has references to unshaded guava and contains scala classes
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20786'>SPARK-20786</a>] -         Improve ceil and floor handle the value which is not expected
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20815'>SPARK-20815</a>] -         NullPointerException in RPackageUtils#checkManifestForR
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20832'>SPARK-20832</a>] -         Standalone master should explicitly inform drivers of worker deaths and invalidate external shuffle service outputs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20865'>SPARK-20865</a>] -         caching dataset throws &quot;Queries with streaming sources must be executed with writeStream.start()&quot;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20873'>SPARK-20873</a>] -         Improve the error message for unsupported Column Type
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20876'>SPARK-20876</a>] -         If the input parameter is float type for  ceil or floor ,the result is not we expected
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20898'>SPARK-20898</a>] -         spark.blacklist.killBlacklistedExecutors doesn&#39;t work in YARN
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20904'>SPARK-20904</a>] -         Task failures during shutdown cause problems with preempted executors
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20906'>SPARK-20906</a>] -         Constrained Logistic Regression for SparkR
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20914'>SPARK-20914</a>] -         Javadoc contains code that is invalid
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20916'>SPARK-20916</a>] -         Improve error message for unaliased subqueries in FROM clause
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20918'>SPARK-20918</a>] -         Use FunctionIdentifier as function identifiers in FunctionRegistry
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20922'>SPARK-20922</a>] -         Unsafe deserialization in Spark LauncherConnection
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20923'>SPARK-20923</a>] -         TaskMetrics._updatedBlockStatuses uses a lot of memory
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20926'>SPARK-20926</a>] -         Exposure to Guava libraries by directly accessing tableRelationCache in SessionCatalog caused failures
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20935'>SPARK-20935</a>] -         A daemon thread, &quot;BatchedWriteAheadLog Writer&quot;, left behind after terminating StreamingContext.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20945'>SPARK-20945</a>] -         NoSuchElementException key not found in TaskSchedulerImpl
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20976'>SPARK-20976</a>] -         Unify Error Messages for FAILFAST mode. 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20978'>SPARK-20978</a>] -         CSV emits NPE when the number of tokens is less than given schema and corrupt column is given
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20989'>SPARK-20989</a>] -         Fail to start multiple workers on one host if external shuffle service is enabled in standalone mode
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20991'>SPARK-20991</a>] -         BROADCAST_TIMEOUT conf should be a timeoutConf
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20997'>SPARK-20997</a>] -         spark-submit&#39;s --driver-cores marked as &quot;YARN-only&quot; but listed under &quot;Spark standalone with cluster deploy mode only&quot;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21033'>SPARK-21033</a>] -         fix the potential OOM in UnsafeExternalSorter
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21041'>SPARK-21041</a>] -         With whole-stage codegen, SparkSession.range()&#39;s behavior is inconsistent with SparkContext.range()
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21050'>SPARK-21050</a>] -         ml word2vec write has overflow issue in calculating numPartitions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21055'>SPARK-21055</a>] -         Support grouping__id
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21057'>SPARK-21057</a>] -         Do not use a PascalDistribution in countApprox
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21064'>SPARK-21064</a>] -         Fix the default value bug in NettyBlockTransferServiceSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21066'>SPARK-21066</a>] -         LibSVM load just one input file
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21093'>SPARK-21093</a>] -         Multiple gapply execution occasionally failed in SparkR 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21101'>SPARK-21101</a>] -         Error running Hive temporary UDTF on latest Spark 2.2
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21102'>SPARK-21102</a>] -         Refresh command is too aggressive in parsing
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21112'>SPARK-21112</a>] -         ALTER TABLE SET TBLPROPERTIES should not overwrite COMMENT
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21119'>SPARK-21119</a>] -         unset table properties should keep the table comment
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21124'>SPARK-21124</a>] -         Wrong user shown in UI when using kerberos
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21138'>SPARK-21138</a>] -         Cannot delete staging dir when the clusters of &quot;spark.yarn.stagingDir&quot; and &quot;spark.hadoop.fs.defaultFS&quot; are different 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21145'>SPARK-21145</a>] -         Restarted queries reuse same StateStoreProvider, causing multiple concurrent tasks to update same StateStore
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21147'>SPARK-21147</a>] -         the schema of socket/rate source can not be set.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21163'>SPARK-21163</a>] -         DataFrame.toPandas should respect the data type
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21165'>SPARK-21165</a>] -         Fail to write into partitioned hive table due to attribute reference not working with cast on partition column
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21167'>SPARK-21167</a>] -         Path is not decoded correctly when reading output of FileSink
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21170'>SPARK-21170</a>] -         Utils.tryWithSafeFinallyAndFailureCallbacks throws IllegalArgumentException: Self-suppression not permitted
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21181'>SPARK-21181</a>] -         Suppress memory leak errors reported by netty
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21188'>SPARK-21188</a>] -         releaseAllLocksForTask should synchronize the whole method
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21204'>SPARK-21204</a>] -         RuntimeException with Set and Case Class in Spark 2.1.1
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21216'>SPARK-21216</a>] -         Streaming DataFrames fail to join with Hive tables
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21219'>SPARK-21219</a>] -         Task retry occurs on same executor due to race condition with blacklisting
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21223'>SPARK-21223</a>] -         Thread-safety issue in FsHistoryProvider 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21225'>SPARK-21225</a>] -         decrease the Mem using for variable &#39;tasks&#39; in function resourceOffers
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21228'>SPARK-21228</a>] -         InSet incorrect handling of structs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21248'>SPARK-21248</a>] -         Flaky test: o.a.s.sql.kafka010.KafkaSourceSuite.assign from specific offsets (failOnDataLoss: true)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21254'>SPARK-21254</a>] -         History UI: Taking over 1 minute for initial page display
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21255'>SPARK-21255</a>] -         NPE when creating encoder for enum
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21263'>SPARK-21263</a>] -         NumberFormatException is not thrown while converting an invalid string to float/double
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21264'>SPARK-21264</a>] -         Omitting columns with &#39;how&#39; specified in join in PySpark throws NPE
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21271'>SPARK-21271</a>] -         UnsafeRow.hashCode assertion when sizeInBytes not multiple of 8
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21272'>SPARK-21272</a>] -         SortMergeJoin LeftAnti does not update numOutputRows
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21278'>SPARK-21278</a>] -         Upgrade to Py4J 0.10.6
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21281'>SPARK-21281</a>] -         cannot create empty typed array column
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21283'>SPARK-21283</a>] -         FileOutputStream should be created as append mode
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21284'>SPARK-21284</a>] -         rename SessionCatalog.registerFunction parameter name
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21300'>SPARK-21300</a>] -         ExternalMapToCatalyst should null-check map key prior to converting to internal value.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21306'>SPARK-21306</a>] -         OneVsRest Conceals Columns That May Be Relevant To Underlying Classifier
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21312'>SPARK-21312</a>] -         UnsafeRow writeToStream has incorrect offsetInByteArray calculation for non-zero offset
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21319'>SPARK-21319</a>] -         UnsafeExternalRowSorter.RowComparator memory leak
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21327'>SPARK-21327</a>] -         ArrayConstructor should handle an array of typecode &#39;l&#39; as long rather than int in Python 2.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21330'>SPARK-21330</a>] -         Bad partitioning does not allow to read a JDBC table with extreme values on the partition column
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21332'>SPARK-21332</a>] -         Incorrect result type inferred for some decimal expressions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21333'>SPARK-21333</a>] -         joinWith documents and analysis allow invalid join types
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21335'>SPARK-21335</a>] -         support un-aliased subquery
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21338'>SPARK-21338</a>] -         AggregatedDialect doesn&#39;t override isCascadingTruncateTable() method
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21339'>SPARK-21339</a>] -         spark-shell --packages option does not add jars to classpath on windows
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21342'>SPARK-21342</a>] -         Fix DownloadCallback to work well with RetryingBlockFetcher
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21343'>SPARK-21343</a>] -         Refine the document for spark.reducer.maxReqSizeShuffleToMem
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21345'>SPARK-21345</a>] -         SparkSessionBuilderSuite should clean up stopped sessions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21350'>SPARK-21350</a>] -         Fix the error message when the number of arguments is wrong when invoking a UDF
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21354'>SPARK-21354</a>] -         INPUT FILE related functions do not support more than one sources
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21357'>SPARK-21357</a>] -         FileInputDStream not remove out of date RDD
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21369'>SPARK-21369</a>] -         Don&#39;t use Scala classes in external shuffle service
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21374'>SPARK-21374</a>] -         Reading globbed paths from S3 into DF doesn&#39;t work if filesystem caching is disabled
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21376'>SPARK-21376</a>] -         Token is not renewed in yarn client process in cluster mode
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21377'>SPARK-21377</a>] -         Jars specified with --jars or --packages are not added into AM&#39;s system classpath
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21383'>SPARK-21383</a>] -         YARN can allocate too many executors
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21384'>SPARK-21384</a>] -         Spark 2.2 + YARN without spark.yarn.jars / spark.yarn.archive fails
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21394'>SPARK-21394</a>] -         Reviving broken callable objects in UDF in PySpark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21400'>SPARK-21400</a>] -         Spark shouldn&#39;t ignore user defined output committer in append mode
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21403'>SPARK-21403</a>] -         Cluster mode doesn&#39;t work with --packages [Mesos]
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21411'>SPARK-21411</a>] -         Failed to get new HDFS delegation tokens in AMCredentialRenewer
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21414'>SPARK-21414</a>] -         Buffer in SlidingWindowFunctionFrame could be big though window is small
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21418'>SPARK-21418</a>] -         NoSuchElementException: None.get in DataSourceScanExec with sun.io.serialization.extendedDebugInfo=true
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21422'>SPARK-21422</a>] -         Depend on Apache ORC 1.4.0
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21428'>SPARK-21428</a>] -         CliSessionState never be recognized because of IsolatedClientLoader
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21432'>SPARK-21432</a>] -         Reviving broken partial functions in UDF in PySpark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21439'>SPARK-21439</a>] -         Cannot use Spark with Python ABCmeta (exception from cloudpickle)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21441'>SPARK-21441</a>] -         Incorrect Codegen in SortMergeJoinExec results failures in some cases
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21444'>SPARK-21444</a>] -         Fetch failure due to node reboot causes job failure
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21445'>SPARK-21445</a>] -         NotSerializableException thrown by UTF8String.IntWrapper
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21446'>SPARK-21446</a>] -         [SQL] JDBC Postgres fetchsize parameter ignored again
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21447'>SPARK-21447</a>] -         Spark history server fails to render compressed inprogress history file in some cases.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21451'>SPARK-21451</a>] -         HiveConf in SparkSQLCLIDriver doesn&#39;t respect spark.hadoop.some.hive.variables
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21457'>SPARK-21457</a>] -         ExternalCatalog.listPartitions should correctly handle partition values with dot
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21462'>SPARK-21462</a>] -         Add batchId to the json of StreamingQueryProgress
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21463'>SPARK-21463</a>] -         Output of StructuredStreaming tables don&#39;t respect user specified schema when reading back the table
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21490'>SPARK-21490</a>] -         SparkLauncher may fail to redirect streams
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21494'>SPARK-21494</a>] -         Spark 2.2.0 AES encryption not working with External shuffle
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21498'>SPARK-21498</a>] -         quick start  -&gt; one  py demo have some bug in code 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21501'>SPARK-21501</a>] -         Spark shuffle index cache size should be memory based
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21502'>SPARK-21502</a>] -         --supervise causing frameworkId conflicts in mesos cluster mode
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21503'>SPARK-21503</a>] -         Spark UI shows incorrect task status for a killed Executor Process
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21508'>SPARK-21508</a>] -         Documentation on &#39;Spark Streaming Custom Receivers&#39; has error in example code
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21512'>SPARK-21512</a>] -         DatasetCacheSuite needs to execute unpersistent after executing peristent
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21516'>SPARK-21516</a>] -         overriding afterEach() in DatasetCacheSuite must call super.afterEach()
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21522'>SPARK-21522</a>] -         Flaky test: LauncherServerSuite.testStreamFiltering
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21523'>SPARK-21523</a>] -         Fix bug of strong wolfe linesearch `init` parameter lose effectiveness
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21534'>SPARK-21534</a>] -         PickleException when creating dataframe from python row with empty bytearray
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21541'>SPARK-21541</a>] -         Spark Logs show incorrect job status for a job that does not create SparkContext
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21546'>SPARK-21546</a>] -         dropDuplicates with watermark yields RuntimeException due to binding failure
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21549'>SPARK-21549</a>] -         Spark fails to complete job correctly in case of OutputFormat which do not write into hdfs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21551'>SPARK-21551</a>] -         pyspark&#39;s collect fails when getaddrinfo is too slow
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21555'>SPARK-21555</a>] -         GROUP BY don&#39;t work with expressions with NVL and nested objects
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21563'>SPARK-21563</a>] -         Race condition when serializing TaskDescriptions and adding jars
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21565'>SPARK-21565</a>] -         aggregate query fails with watermark on eventTime but works with watermark on timestamp column generated by current_timestamp
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21567'>SPARK-21567</a>] -         Dataset with Tuple of type alias throws error
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21568'>SPARK-21568</a>] -         ConsoleProgressBar should only be enabled in shells
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21571'>SPARK-21571</a>] -         Spark history server leaves incomplete or unreadable history files around forever.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21580'>SPARK-21580</a>] -         A bug with  `Group by ordinal`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21585'>SPARK-21585</a>] -         Application Master marking application status as Failed for Client Mode
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21587'>SPARK-21587</a>] -         Filter pushdown for EventTime Watermark Operator
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21588'>SPARK-21588</a>] -         SQLContext.getConf(key, null) should return null, but it throws NPE
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21593'>SPARK-21593</a>] -         Fix broken configuration page
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21595'>SPARK-21595</a>] -         introduction of spark.sql.windowExec.buffer.spill.threshold in spark 2.2 breaks existing workflow
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21596'>SPARK-21596</a>] -         Audit the places calling HDFSMetadataLog.get
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21597'>SPARK-21597</a>] -         Avg event time calculated in progress may be wrong
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21599'>SPARK-21599</a>] -         Collecting column statistics for datasource tables may fail with java.util.NoSuchElementException
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21605'>SPARK-21605</a>] -         Let IntelliJ IDEA correctly detect Language level and Target byte code version
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21610'>SPARK-21610</a>] -         Corrupt records are not handled properly when creating a dataframe from a file
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21615'>SPARK-21615</a>] -         Fix broken redirect in collaborative filtering docs to databricks training repo
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21617'>SPARK-21617</a>] -         ALTER TABLE...ADD COLUMNS broken in Hive 2.1 for DS tables
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21621'>SPARK-21621</a>] -         Reset numRecordsWritten after DiskBlockObjectWriter.commitAndGet called
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21637'>SPARK-21637</a>] -         `hive.metastore.warehouse` in --hiveconf is not respected
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21638'>SPARK-21638</a>] -         Warning message of RF is not accurate
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21642'>SPARK-21642</a>] -         Use FQDN for DRIVER_HOST_ADDRESS instead of ip address
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21644'>SPARK-21644</a>] -         LocalLimit.maxRows is defined incorrectly
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21647'>SPARK-21647</a>] -         SortMergeJoin failed when using CROSS
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21648'>SPARK-21648</a>] -         Confusing assert failure in JDBC source when users misspell the option `partitionColumn`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21652'>SPARK-21652</a>] -         Optimizer cannot reach a fixed point on certain queries
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21656'>SPARK-21656</a>] -         spark dynamic allocation should not idle timeout executors when there are enough tasks to run on them
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21657'>SPARK-21657</a>] -         Spark has exponential time complexity to explode(array of structs)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21677'>SPARK-21677</a>] -         json_tuple throws NullPointException when column is null as string type.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21681'>SPARK-21681</a>] -         MLOR do not work correctly when featureStd contains zero
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21696'>SPARK-21696</a>] -         State Store can&#39;t handle corrupted snapshots
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21714'>SPARK-21714</a>] -         SparkSubmit in Yarn Client mode downloads remote files and then reuploads them again
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21721'>SPARK-21721</a>] -         Memory leak in org.apache.spark.sql.hive.execution.InsertIntoHiveTable
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21723'>SPARK-21723</a>] -         Can&#39;t write LibSVM - key not found: numFeatures
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21727'>SPARK-21727</a>] -         Operating on an ArrayType in a SparkR DataFrame throws error
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21738'>SPARK-21738</a>] -         Thriftserver doesn&#39;t cancel jobs when session is closed
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21739'>SPARK-21739</a>] -         timestamp partition would fail in v2.2.0
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21753'>SPARK-21753</a>] -         running pi example with pypy on spark fails to serialize 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21759'>SPARK-21759</a>] -         In.checkInputDataTypes should not wrongly report unresolved plans for IN correlated subquery
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21762'>SPARK-21762</a>] -         FileFormatWriter/BasicWriteTaskStatsTracker metrics collection fails if a new file isn&#39;t yet visible
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21766'>SPARK-21766</a>] -         DataFrame toPandas() raises ValueError with nullable int columns
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21767'>SPARK-21767</a>] -         Add Decimal Test For Avro in VersionSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21782'>SPARK-21782</a>] -         Repartition creates skews when numPartitions is a power of 2
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21786'>SPARK-21786</a>] -         The &#39;spark.sql.parquet.compression.codec&#39; configuration doesn&#39;t take effect on tables with partition field(s)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21788'>SPARK-21788</a>] -         Handle more exceptions when stopping a streaming query
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21791'>SPARK-21791</a>] -         ORC should support column names with dot
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21793'>SPARK-21793</a>] -         Correct validateAndTransformSchema in GaussianMixture and AFTSurvivalRegression
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21798'>SPARK-21798</a>] -         No config to replace deprecated SPARK_CLASSPATH config for launching daemons like History Server
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21801'>SPARK-21801</a>] -         SparkR unit test randomly fail on trees
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21804'>SPARK-21804</a>] -         json_tuple returns null values within repeated columns except the first one
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21818'>SPARK-21818</a>] -         MultivariateOnlineSummarizer.variance generate negative result
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21826'>SPARK-21826</a>] -         outer broadcast hash join should not throw NPE
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21830'>SPARK-21830</a>] -         Bump the dependency of ANTLR to version 4.7
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21831'>SPARK-21831</a>] -         Remove `spark.sql.hive.convertMetastoreOrc` config in HiveCompatibilitySuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21832'>SPARK-21832</a>] -         Merge SQLBuilderTest into ExpressionSQLBuilderSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21834'>SPARK-21834</a>] -         Incorrect executor request in case of dynamic allocation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21835'>SPARK-21835</a>] -         RewritePredicateSubquery should not produce unresolved query plans
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21837'>SPARK-21837</a>] -         UserDefinedTypeSuite local UDFs not actually testing what it intends
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21845'>SPARK-21845</a>] -         Make codegen fallback of expressions configurable
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21877'>SPARK-21877</a>] -         Windows command script can not handle quotes in parameter
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21880'>SPARK-21880</a>] -         [spark UI]In the SQL table page, modify jobs trace information
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21890'>SPARK-21890</a>] -         ObtainCredentials does not pass creds to addDelegationTokens
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21904'>SPARK-21904</a>] -         Rename tempTables to tempViews in SessionCatalog
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21907'>SPARK-21907</a>] -         NullPointerException in UnsafeExternalSorter.spill()
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21912'>SPARK-21912</a>] -         ORC/Parquet table should not create invalid column names
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21913'>SPARK-21913</a>] -         `withDatabase` should drop database with CASCADE
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21917'>SPARK-21917</a>] -         Remote http(s) resources is not supported in YARN mode
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21922'>SPARK-21922</a>] -         When executor failed and task metrics have not send to driver,the status will always be &#39;RUNNING&#39; and the duration will be &#39;CurrentTime - launchTime&#39;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21924'>SPARK-21924</a>] -         Bug in Structured Streaming Documentation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21928'>SPARK-21928</a>] -         ClassNotFoundException for custom Kryo registrator class during serde in netty threads
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21929'>SPARK-21929</a>] -         Support `ALTER TABLE table_name ADD COLUMNS(..)` for ORC data source
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21941'>SPARK-21941</a>] -         Stop storing unused attemptId in SQLTaskMetrics
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21946'>SPARK-21946</a>] -         Flaky test: InMemoryCatalogedDDLSuite.`alter table: rename cached table`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21947'>SPARK-21947</a>] -         monotonically_increasing_id doesn&#39;t work in Structured Streaming
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21950'>SPARK-21950</a>] -         pyspark.sql.tests.SQLTests2 should stop SparkContext.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21953'>SPARK-21953</a>] -         Show both memory and disk bytes spilled if either is present
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21954'>SPARK-21954</a>] -         JacksonUtils should verify MapType&#39;s value type instead of key type
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21958'>SPARK-21958</a>] -         Attempting to save large Word2Vec model hangs driver in constant GC.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21969'>SPARK-21969</a>] -         CommandUtils.updateTableStats should call refreshTable
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21977'>SPARK-21977</a>] -         SinglePartition optimizations break certain Streaming Stateful Aggregation requirements
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21979'>SPARK-21979</a>] -         Improve QueryPlanConstraints framework
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21980'>SPARK-21980</a>] -         References in grouping functions should be indexed with resolver
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21985'>SPARK-21985</a>] -         PySpark PairDeserializer is broken for double-zipped RDDs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21987'>SPARK-21987</a>] -         Spark 2.3 cannot read 2.2 event logs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21991'>SPARK-21991</a>] -         [LAUNCHER] LauncherServer acceptConnections thread sometime dies if machine has very high load
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21996'>SPARK-21996</a>] -         Streaming ignores files with spaces in the file names
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21998'>SPARK-21998</a>] -         SortMergeJoinExec did not calculate its outputOrdering correctly during physical planning
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22017'>SPARK-22017</a>] -         watermark evaluation with multi-input stream operators is unspecified
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22018'>SPARK-22018</a>] -         Catalyst Optimizer does not preserve top-level metadata while collapsing projects
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22030'>SPARK-22030</a>] -         GraphiteSink fails to re-connect to Graphite instances behind an ELB or any other auto-scaled LB
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22033'>SPARK-22033</a>] -         BufferHolder, other size checks should account for the specific VM array size limitations
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22036'>SPARK-22036</a>] -         BigDecimal multiplication sometimes returns null
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22042'>SPARK-22042</a>] -         ReorderJoinPredicates can break when child&#39;s partitioning is not decided
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22047'>SPARK-22047</a>] -         HiveExternalCatalogVersionsSuite is Flaky on Jenkins
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22052'>SPARK-22052</a>] -         Incorrect Metric assigned in MetricsReporter.scala
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22060'>SPARK-22060</a>] -         CrossValidator/TrainValidationSplit parallelism param persist/load bug
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22062'>SPARK-22062</a>] -         BlockManager does not account for memory consumed by remote fetches
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22067'>SPARK-22067</a>] -         ArrowWriter StringWriter not using position of ByteBuffer holding data
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22071'>SPARK-22071</a>] -         Improve release build scripts to check correct JAVA version is being used for build
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22074'>SPARK-22074</a>] -         Task killed by other attempt task should not be resubmitted
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22076'>SPARK-22076</a>] -         Expand.projections should not be a Stream
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22083'>SPARK-22083</a>] -         When dropping multiple blocks to disk, Spark should release all locks on a failure
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22088'>SPARK-22088</a>] -         Incorrect scalastyle comment causes wrong styles in stringExpressions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22092'>SPARK-22092</a>] -         Reallocation in OffHeapColumnVector.reserveInternal corrupts array data
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22093'>SPARK-22093</a>] -         UtilsSuite &quot;resolveURIs with multiple paths&quot; test always cancelled
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22094'>SPARK-22094</a>] -         processAllAvailable should not block forever when a query is stopped
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22097'>SPARK-22097</a>] -         Request an accurate memory after we unrolled the block
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22107'>SPARK-22107</a>] -         &quot;as&quot; should be &quot;alias&quot; in python quick start documentation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22109'>SPARK-22109</a>] -         Reading tables partitioned by columns that look like timestamps has inconsistent schema inference
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22129'>SPARK-22129</a>] -         Spark release scripts ignore the GPG_KEY and always sign with your default key
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22135'>SPARK-22135</a>] -         metrics in spark-dispatcher not being registered properly
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22141'>SPARK-22141</a>] -         Propagate empty relation before checking Cartesian products
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22143'>SPARK-22143</a>] -         OffHeapColumnVector may leak memory
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22145'>SPARK-22145</a>] -         Issues with driver re-starting on mesos (supervise)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22146'>SPARK-22146</a>] -         FileNotFoundException while reading ORC files containing &#39;%&#39;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22158'>SPARK-22158</a>] -         convertMetastore should not ignore storage properties
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22159'>SPARK-22159</a>] -         spark.sql.execution.arrow.enable and spark.sql.codegen.aggregate.map.twolevel.enable -&gt; enabled
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22162'>SPARK-22162</a>] -         Executors and the driver use inconsistent Job IDs during the new RDD commit protocol
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22165'>SPARK-22165</a>] -         Type conflicts between dates, timestamps and date in partition column
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22167'>SPARK-22167</a>] -         Spark Packaging w/R distro issues
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22169'>SPARK-22169</a>] -         support byte length literal as identifier
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22171'>SPARK-22171</a>] -         Describe Table Extended Failed when Table Owner is Empty
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22172'>SPARK-22172</a>] -         Worker hangs when the external shuffle service port is already in use
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22176'>SPARK-22176</a>] -         Dataset.show(Int.MaxValue) hits integer overflows
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22178'>SPARK-22178</a>] -         Refresh Table does not refresh the underlying tables of the persistent view
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22206'>SPARK-22206</a>] -         gapply in R can&#39;t work on empty grouping columns
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22209'>SPARK-22209</a>] -         PySpark does not recognize imports from submodules
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22211'>SPARK-22211</a>] -         LimitPushDown optimization for FullOuterJoin generates wrong results
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22218'>SPARK-22218</a>] -         spark shuffle services fails to update secret on application re-attempts
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22222'>SPARK-22222</a>] -         Fix the ARRAY_MAX in BufferHolder and add a test
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22223'>SPARK-22223</a>] -         ObjectHashAggregate introduces unnecessary shuffle
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22224'>SPARK-22224</a>] -         Override toString of KeyValueGroupedDataset &amp; RelationalGroupedDataset 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22227'>SPARK-22227</a>] -         DiskBlockManager.getAllBlocks could fail if called during shuffle
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22230'>SPARK-22230</a>] -         agg(last(&#39;attr)) gives weird results for streaming
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22238'>SPARK-22238</a>] -         EnsureStatefulOpPartitioning shouldn&#39;t ask for the child RDD before planning is completed
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22243'>SPARK-22243</a>] -         streaming job failed to restart from checkpoint
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22249'>SPARK-22249</a>] -         UnsupportedOperationException: empty.reduceLeft when caching a dataframe
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22251'>SPARK-22251</a>] -         Metric &quot;aggregate time&quot; is incorrect when codegen is off
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22252'>SPARK-22252</a>] -         FileFormatWriter should respect the input query schema
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22254'>SPARK-22254</a>] -         clean up the implementation of `growToSize` in CompactBuffer
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22257'>SPARK-22257</a>] -         Reserve all non-deterministic expressions in ExpressionSet.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22267'>SPARK-22267</a>] -         Spark SQL incorrectly reads ORC file when column order is different
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22271'>SPARK-22271</a>] -         Describe results in &quot;null&quot; for the value of &quot;mean&quot; of a numeric variable
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22273'>SPARK-22273</a>] -         Fix key/value schema field names in HashMapGenerators.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22280'>SPARK-22280</a>] -         Improve StatisticsSuite to test `convertMetastore` properly
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22281'>SPARK-22281</a>] -         Handle R method breaking signature changes
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22284'>SPARK-22284</a>] -         Code of class \&quot;org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection\&quot; grows beyond 64 KB
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22287'>SPARK-22287</a>] -         SPARK_DAEMON_MEMORY not honored by MesosClusterDispatcher
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22289'>SPARK-22289</a>] -         Cannot save LogisticRegressionModel with bounds on coefficients
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22290'>SPARK-22290</a>] -         Starting second context in same JVM fails to get new Hive delegation token
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22291'>SPARK-22291</a>] -         Postgresql UUID[] to Cassandra: Conversion Error
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22300'>SPARK-22300</a>] -         Update ORC to 1.4.1
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22303'>SPARK-22303</a>] -         Getting java.sql.SQLException: Unsupported type 101 for BINARY_DOUBLE
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22305'>SPARK-22305</a>] -         HDFSBackedStateStoreProvider fails with StackOverflowException when attempting to recover state
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22306'>SPARK-22306</a>] -         INFER_AND_SAVE overwrites important metadata in Parquet Metastore table
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22319'>SPARK-22319</a>] -         SparkSubmit calls getFileStatus before calling loginUserFromKeytab
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22326'>SPARK-22326</a>] -         Remove unnecessary hashCode and equals methods
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22327'>SPARK-22327</a>] -         R CRAN check fails on non-latest branches
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22328'>SPARK-22328</a>] -         ClosureCleaner misses referenced superclass fields, gives them null values
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22330'>SPARK-22330</a>] -         Linear containsKey operation for serialized maps.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22332'>SPARK-22332</a>] -         NaiveBayes unit test occasionly fail
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22333'>SPARK-22333</a>] -         ColumnReference should get higher priority than timeFunctionCall(CURRENT_DATE, CURRENT_TIMESTAMP)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22349'>SPARK-22349</a>] -         In on-heap mode, when allocating memory from pool,we should fill memory  with `MEMORY_DEBUG_FILL_CLEAN_VALUE`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22355'>SPARK-22355</a>] -         Dataset.collect is not threadsafe
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22356'>SPARK-22356</a>] -         data source table should support overlapped columns between data and partition schema
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22370'>SPARK-22370</a>] -         Config values should be captured in Driver.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22373'>SPARK-22373</a>] -         Intermittent NullPointerException in org.codehaus.janino.IClass.isAssignableFrom
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22375'>SPARK-22375</a>] -         Test script can fail if eggs are installed by setup.py during test process
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22376'>SPARK-22376</a>] -         run-tests.py fails at exec-sbt if run with Python 3
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22377'>SPARK-22377</a>] -         Maven nightly snapshot jenkins jobs are broken on multiple workers due to lsof
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22393'>SPARK-22393</a>] -         spark-shell can&#39;t find imported types in class constructors, extends clause
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22395'>SPARK-22395</a>] -         Fix the behavior of timestamp values for Pandas to respect session timezone
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22396'>SPARK-22396</a>] -         Unresolved operator InsertIntoDir for Hive format when Hive Support is not enabled
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22403'>SPARK-22403</a>] -         StructuredKafkaWordCount example fails in YARN cluster mode
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22410'>SPARK-22410</a>] -         Excessive spill for Pyspark UDF when a row has shrunk
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22417'>SPARK-22417</a>] -         createDataFrame from a pandas.DataFrame reads datetime64 values as longs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22429'>SPARK-22429</a>] -         Streaming checkpointing code does not retry after failure due to NullPointerException
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22431'>SPARK-22431</a>] -         Creating Permanent view with illegal type
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22437'>SPARK-22437</a>] -         jdbc write fails to set default mode
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22442'>SPARK-22442</a>] -         Schema generated by Product Encoder doesn&#39;t match case class field name when using non-standard characters
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22443'>SPARK-22443</a>] -         AggregatedDialect doesn&#39;t override quoteIdentifier and other methods in JdbcDialects
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22446'>SPARK-22446</a>] -         Optimizer causing StringIndexerModel&#39;s indexer UDF to throw &quot;Unseen label&quot; exception incorrectly for filtered data.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22454'>SPARK-22454</a>] -         ExternalShuffleClient.close() should check null
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22462'>SPARK-22462</a>] -         SQL metrics missing after foreach operation on dataframe
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22463'>SPARK-22463</a>] -         Missing hadoop/hive/hbase/etc configuration files in SPARK_CONF_DIR to distributed archive
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22464'>SPARK-22464</a>] -         &lt;=&gt; is not supported by Hive metastore partition predicate pushdown
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22465'>SPARK-22465</a>] -         Cogroup of two disproportionate RDDs could lead into 2G limit BUG
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22466'>SPARK-22466</a>] -         SPARK_CONF_DIR is not is set by Spark&#39;s launch scripts with default value
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22469'>SPARK-22469</a>] -         Accuracy problem in comparison with string and numeric 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22472'>SPARK-22472</a>] -         Datasets generate random values for null primitive types
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22479'>SPARK-22479</a>] -         SaveIntoDataSourceCommand logs jdbc credentials
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22484'>SPARK-22484</a>] -         PySpark DataFrame.write.csv(quote=&quot;&quot;) uses nullchar as quote
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22487'>SPARK-22487</a>] -         No usages of HIVE_EXECUTION_VERSION found in whole spark project
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22488'>SPARK-22488</a>] -         The view resolution in the SparkSession internal table() API 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22489'>SPARK-22489</a>] -         Shouldn&#39;t change broadcast join buildSide if user clearly specified
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22495'>SPARK-22495</a>] -         Fix setup of SPARK_HOME variable on Windows
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22511'>SPARK-22511</a>] -         Update maven central repo address
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22516'>SPARK-22516</a>] -         CSV Read breaks: When &quot;multiLine&quot; = &quot;true&quot;, if &quot;comment&quot; option is set as last line&#39;s first character
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22525'>SPARK-22525</a>] -         Spark download page doesn&#39;t update package name based package type
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22533'>SPARK-22533</a>] -         SparkConfigProvider does not handle deprecated config keys
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22535'>SPARK-22535</a>] -         PythonRunner.MonitorThread should give the task a little time to finish before killing the python worker
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22538'>SPARK-22538</a>] -         SQLTransformer.transform(inputDataFrame) uncaches inputDataFrame
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22540'>SPARK-22540</a>] -         HighlyCompressedMapStatus&#39;s avgSize is incorrect
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22544'>SPARK-22544</a>] -         FileStreamSource should use its own hadoop conf to call globPathIfNecessary
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22548'>SPARK-22548</a>] -         Incorrect nested AND expression pushed down to JDBC data source
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22557'>SPARK-22557</a>] -         Use ThreadSignaler explicitly
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22559'>SPARK-22559</a>] -         history server: handle exception on opening corrupted listing.ldb
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22572'>SPARK-22572</a>] -         spark-shell does not re-initialize on :replay
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22574'>SPARK-22574</a>] -         Wrong request causing Spark Dispatcher going inactive
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22583'>SPARK-22583</a>] -         First delegation token renewal time is not 75% of renewal time in Mesos
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22585'>SPARK-22585</a>] -         Url encoding of jar path expected?
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22587'>SPARK-22587</a>] -         Spark job fails if fs.defaultFS and application jar are different url
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22591'>SPARK-22591</a>] -         GenerateOrdering shouldn&#39;t change ctx.INPUT_ROW
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22605'>SPARK-22605</a>] -         OutputMetrics empty for DataFrame writes
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22607'>SPARK-22607</a>] -         Set large stack size consistently for tests to avoid StackOverflowError
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22615'>SPARK-22615</a>] -         Handle more cases in PropagateEmptyRelation 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22618'>SPARK-22618</a>] -         RDD.unpersist can cause fatal exception when used with dynamic allocation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22635'>SPARK-22635</a>] -         FileNotFoundException again while reading ORC files containing special characters
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22637'>SPARK-22637</a>] -         CatalogImpl.refresh() has quadratic complexity for a view
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22642'>SPARK-22642</a>] -         the createdTempDir will not be deleted if an exception occurs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22651'>SPARK-22651</a>] -         Calling ImageSchema.readImages initiate multiple Hive clients
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22653'>SPARK-22653</a>] -         executorAddress registered in CoarseGrainedSchedulerBackend.executorDataMap is null
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22654'>SPARK-22654</a>] -         Retry download of Spark from ASF mirror in HiveExternalCatalogVersionsSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22655'>SPARK-22655</a>] -         Fail task instead of complete task silently in PythonRunner during shutdown
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22662'>SPARK-22662</a>] -         Failed to prune columns after rewriting predicate subquery
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22668'>SPARK-22668</a>] -         CodegenContext.splitExpressions() creates incorrect results with global variable arguments 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22681'>SPARK-22681</a>] -         Accumulator should only be updated once for each task in result stage
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22686'>SPARK-22686</a>] -         DROP TABLE IF EXISTS should not show AnalysisException
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22700'>SPARK-22700</a>] -         Bucketizer.transform incorrectly drops row containing NaN
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22710'>SPARK-22710</a>] -         ConfigBuilder.fallbackConf doesn&#39;t trigger onCreate function
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22712'>SPARK-22712</a>] -         Use `buildReaderWithPartitionValues` in native OrcFileFormat
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22721'>SPARK-22721</a>] -         BytesToBytesMap peak memory usage not accurate after reset()
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22759'>SPARK-22759</a>] -         Filters can be combined iff both are deterministic
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22764'>SPARK-22764</a>] -         Flaky test: SparkContextSuite &quot;Cancelling stages/jobs with custom reasons&quot;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22777'>SPARK-22777</a>] -         Docker container built for Kubernetes doesn&#39;t allow running entrypoint.sh
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22778'>SPARK-22778</a>] -         Kubernetes scheduler at master failing to run applications successfully
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22779'>SPARK-22779</a>] -         ConfigEntry&#39;s default value should actually be a value
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22788'>SPARK-22788</a>] -         HdfsUtils.getOutputStream uses non-existent Hadoop conf &quot;hdfs.append.support&quot;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22791'>SPARK-22791</a>] -         Redact Output of Explain
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22793'>SPARK-22793</a>] -         Memory leak in Spark Thrift Server
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22811'>SPARK-22811</a>] -         pyspark.ml.tests is missing a py4j import.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22813'>SPARK-22813</a>] -         run-tests.py fails when /usr/sbin/lsof does not exist
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22815'>SPARK-22815</a>] -         Keep PromotePrecision in Optimized Plans
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22817'>SPARK-22817</a>] -         Use fixed testthat version for SparkR tests in AppVeyor
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22818'>SPARK-22818</a>] -         csv escape of quote escape
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22819'>SPARK-22819</a>] -         Download page - updating package type does nothing
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22824'>SPARK-22824</a>] -         Spark Structured Streaming Source trait breaking change
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22825'>SPARK-22825</a>] -         Incorrect results of Casting Array to String
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22827'>SPARK-22827</a>] -         Avoid throwing OutOfMemoryError in case of exception in spill
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22834'>SPARK-22834</a>] -         Make insert commands have real children to fix UI issues
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22836'>SPARK-22836</a>] -         Executors page is not showing driver logs links
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22837'>SPARK-22837</a>] -         Session timeout checker does not work in SessionManager
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22843'>SPARK-22843</a>] -         R localCheckpoint API
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22846'>SPARK-22846</a>] -         table&#39;s owner property in hive metastore is null
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22849'>SPARK-22849</a>] -         ivy.retrieve pattern should also consider `classifier`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22850'>SPARK-22850</a>] -         Executor page in SHS does not show driver
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22852'>SPARK-22852</a>] -         sbt publishLocal fails due to -Xlint:unchecked flag passed to javadoc
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22854'>SPARK-22854</a>] -         AppStatusListener should get Spark version by SparkListenerLogStart
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22855'>SPARK-22855</a>] -         Sbt publishLocal under scala 2.12 fails due to invalid javadoc comments in tags package
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22861'>SPARK-22861</a>] -         SQLAppStatusListener should track all stages in multi-job executions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22862'>SPARK-22862</a>] -         Docs on lazy elimination of columns missing from an encoder.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22864'>SPARK-22864</a>] -         Flaky test: ExecutorAllocationManagerSuite &quot;cancel pending executors when no longer needed&quot;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22866'>SPARK-22866</a>] -         Kubernetes dockerfile path needs update
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22875'>SPARK-22875</a>] -         Assembly build fails for a high user id
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22889'>SPARK-22889</a>] -         CRAN checks can fail if older Spark install exists
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22891'>SPARK-22891</a>] -         NullPointerException when use udf
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22899'>SPARK-22899</a>] -         OneVsRestModel transform on streaming data failed.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22901'>SPARK-22901</a>] -         Add non-deterministic to Python UDF
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22905'>SPARK-22905</a>] -         Fix ChiSqSelectorModel, GaussianMixtureModel save implementation for Row order issues
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22916'>SPARK-22916</a>] -         shouldn&#39;t bias towards build right if user does not specify
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22920'>SPARK-22920</a>] -         R sql functions for current_date, current_timestamp, rtrim/ltrim/trim with trimString
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22924'>SPARK-22924</a>] -         R DataFrame API for sortWithinPartitions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22932'>SPARK-22932</a>] -         Refactor AnalysisContext
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22933'>SPARK-22933</a>] -         R Structured Streaming API for withWatermark, trigger, partitionBy
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22934'>SPARK-22934</a>] -         Make optional clauses order insensitive for CREATE TABLE SQL statement
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22940'>SPARK-22940</a>] -         Test suite HiveExternalCatalogVersionsSuite fails on platforms that don&#39;t have wget installed
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22946'>SPARK-22946</a>] -         Recursive withColumn calls cause org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection&quot; grows beyond 64 KB
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22948'>SPARK-22948</a>] -         &quot;SparkPodInitContainer&quot; shouldn&#39;t be in &quot;rest&quot; package
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22949'>SPARK-22949</a>] -         Reduce memory requirement for TrainValidationSplit
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22950'>SPARK-22950</a>] -         user classpath first cause no class found exception
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22951'>SPARK-22951</a>] -         count() after dropDuplicates() on emptyDataFrame returns incorrect value
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22953'>SPARK-22953</a>] -         Duplicated secret volumes in Spark pods when init-containers are used
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22956'>SPARK-22956</a>] -         Union Stream Failover Cause `IllegalStateException`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22957'>SPARK-22957</a>] -         ApproxQuantile breaks if the number of rows exceeds MaxInt
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22961'>SPARK-22961</a>] -         Constant columns no longer picked as constraints in 2.3
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22962'>SPARK-22962</a>] -         Kubernetes app fails if local files are used
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22967'>SPARK-22967</a>] -         VersionSuite failed on Windows caused by Windows format path
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22972'>SPARK-22972</a>] -         Couldn&#39;t find corresponding Hive SerDe for data source provider org.apache.spark.sql.hive.orc.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22973'>SPARK-22973</a>] -         Incorrect results of casting Map to String
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22975'>SPARK-22975</a>] -         MetricsReporter producing NullPointerException when there was no progress reported
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22976'>SPARK-22976</a>] -         Worker cleanup can remove running driver directories
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22977'>SPARK-22977</a>] -         DataFrameWriter operations do not show details in SQL tab
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22981'>SPARK-22981</a>] -         Incorrect results of casting Struct to String
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22982'>SPARK-22982</a>] -         Remove unsafe asynchronous close() call from FileDownloadChannel
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22983'>SPARK-22983</a>] -         Don&#39;t push filters beneath aggregates with empty grouping expressions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22984'>SPARK-22984</a>] -         Fix incorrect bitmap copying and offset shifting in GenerateUnsafeRowJoiner
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22985'>SPARK-22985</a>] -         Fix argument escaping bug in from_utc_timestamp / to_utc_timestamp codegen
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22986'>SPARK-22986</a>] -         Avoid instantiating multiple instances of broadcast variables 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22990'>SPARK-22990</a>] -         Fix method isFairScheduler in JobsTab and StagesTab
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22992'>SPARK-22992</a>] -         Remove assumption of cluster domain in Kubernetes mode
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22998'>SPARK-22998</a>] -         Value for SPARK_MOUNTED_CLASSPATH in executor pods is not set
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23000'>SPARK-23000</a>] -         Flaky test suite DataSourceWithHiveMetastoreCatalogSuite in Spark 2.3
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23001'>SPARK-23001</a>] -         NullPointerException when running desc database
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23009'>SPARK-23009</a>] -         PySpark should not assume Pandas cols are a basestring type
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23018'>SPARK-23018</a>] -         PySpark creatDataFrame causes Pandas warning of assignment to a copy of a reference
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23019'>SPARK-23019</a>] -         Flaky Test: org.apache.spark.JavaJdbcRDDSuite.testJavaJdbcRDD
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23021'>SPARK-23021</a>] -         AnalysisBarrier should not cut off the explain output for Parsed Logical Plan
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23023'>SPARK-23023</a>] -         Incorrect results of printing Array/Map/Struct in showString
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23025'>SPARK-23025</a>] -         DataSet with scala.Null causes Exception
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23035'>SPARK-23035</a>] -         Fix improper information of TempTableAlreadyExistsException
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23037'>SPARK-23037</a>] -         RFormula should not use deprecated OneHotEncoder and should include VectorSizeHint in pipeline
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23038'>SPARK-23038</a>] -         Update docker/spark-test (JDK/OS)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23049'>SPARK-23049</a>] -         `spark.sql.files.ignoreCorruptFiles` should work for ORC files
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23051'>SPARK-23051</a>] -         job description in Spark UI is broken 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23053'>SPARK-23053</a>] -         taskBinarySerialization and task partitions calculate in DagScheduler.submitMissingTasks should keep the same RDD checkpoint status
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23054'>SPARK-23054</a>] -         Incorrect results of casting UserDefinedType to String
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23055'>SPARK-23055</a>] -         KafkaContinuousSourceSuite Kafka column types test failing
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23065'>SPARK-23065</a>] -         R API doc empty in Spark 2.3.0 RC1
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23079'>SPARK-23079</a>] -         Fix query constraints propagation with aliases
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23080'>SPARK-23080</a>] -         Improve error message for built-in functions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23087'>SPARK-23087</a>] -         CheckCartesianProduct too restrictive when condition is constant folded to false/null
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23089'>SPARK-23089</a>] -         &quot;Unable to create operation log session directory&quot; when parent directory not present
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23095'>SPARK-23095</a>] -         Decorrelation of scalar subquery fails with java.util.NoSuchElementException.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23103'>SPARK-23103</a>] -         LevelDB store not iterating correctly when indexed value has negative value
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23119'>SPARK-23119</a>] -         Fix API annotation in DataSource V2 for streaming
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23121'>SPARK-23121</a>] -         When the Spark Streaming app is running for a period of time, the page is incorrectly reported when accessing &#39;/ jobs /&#39; or &#39;/ jobs / job /? Id = 13&#39; and ui can not be accessed.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23133'>SPARK-23133</a>] -         Spark options are not passed to the Executor in Docker context
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23135'>SPARK-23135</a>] -         Accumulators don&#39;t show up properly in the Stages page anymore
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23140'>SPARK-23140</a>] -         DataSourceV2Strategy is missing in HiveSessionStateBuilder
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23147'>SPARK-23147</a>] -         Stage page will throw exception when there&#39;s no complete tasks
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23148'>SPARK-23148</a>] -         spark.read.csv with multiline=true gives FileNotFoundException if path contains spaces
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23157'>SPARK-23157</a>] -         withColumn fails for a column that is a result of mapped DataSet
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23177'>SPARK-23177</a>] -         PySpark parameter-less UDFs raise exception if applied after distinct
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23184'>SPARK-23184</a>] -         All jobs page is broken when some stage is missing
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23186'>SPARK-23186</a>] -         Initialize DriverManager first before loading Drivers
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23192'>SPARK-23192</a>] -         Hint is lost after using cached data
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23198'>SPARK-23198</a>] -         Fix KafkaContinuousSourceStressForDontFailOnDataLossSuite to test ContinuousExecution
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23205'>SPARK-23205</a>] -         ImageSchema.readImages incorrectly sets alpha channel to 255 for four-channel images
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23207'>SPARK-23207</a>] -         Shuffle+Repartition on an DataFrame could lead to incorrect answers
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23208'>SPARK-23208</a>] -         GenArrayData produces illegal code
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23209'>SPARK-23209</a>] -         HiveDelegationTokenProvider throws an exception if Hive jars are not the classpath
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23214'>SPARK-23214</a>] -         cached data should not carry extra hint info
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23220'>SPARK-23220</a>] -         broadcast hint not applied in a streaming left anti join
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23222'>SPARK-23222</a>] -         Flaky test: DataFrameRangeSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23223'>SPARK-23223</a>] -         Stacking dataset transforms performs poorly
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23230'>SPARK-23230</a>] -         When hive.default.fileformat is other kinds of file types, create textfile table cause a serde error
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23233'>SPARK-23233</a>] -         asNondeterministic in Python UDF not being set when the UDF is called at least once
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23242'>SPARK-23242</a>] -         Don&#39;t run tests in KafkaSourceSuiteBase twice
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23245'>SPARK-23245</a>] -         KafkaContinuousSourceSuite may hang forever
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23250'>SPARK-23250</a>] -         Typo in JavaDoc/ScalaDoc for DataFrameWriter
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23267'>SPARK-23267</a>] -         Increase spark.sql.codegen.hugeMethodLimit to 65535
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23274'>SPARK-23274</a>] -         ReplaceExceptWithFilter fails on dataframes filtered on same column
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23275'>SPARK-23275</a>] -         hive/tests have been failing when run locally on the laptop (Mac) with OOM 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23281'>SPARK-23281</a>] -         Query produces results in incorrect order when a composite order by clause refers to both original columns and aliases
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23289'>SPARK-23289</a>] -         OneForOneBlockFetcher.DownloadCallback.onData may write just a part of data
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23290'>SPARK-23290</a>] -         inadvertent change in handling of DateType when converting to pandas dataframe
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23293'>SPARK-23293</a>] -         data source v2 self join fails
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23301'>SPARK-23301</a>] -         data source v2 column pruning with arbitrary expressions is broken
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23307'>SPARK-23307</a>] -         Spark UI should sort jobs/stages with the completed timestamp before cleaning up them
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23310'>SPARK-23310</a>] -         Perf regression introduced by SPARK-21113
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23315'>SPARK-23315</a>] -         failed to get output from canonicalized data source v2 related plans
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23316'>SPARK-23316</a>] -         AnalysisException after max iteration reached for IN query
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23326'>SPARK-23326</a>] -         &quot;Scheduler Delay&quot; of a task is confusing
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23330'>SPARK-23330</a>] -         Spark UI SQL executions page throws NPE
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23345'>SPARK-23345</a>] -         Flaky test: FileBasedDataSourceSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23348'>SPARK-23348</a>] -         append data using saveAsTable should adjust the data types
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23358'>SPARK-23358</a>] -         When the number of partitions is greater than 2^28, it will result in an error result
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23360'>SPARK-23360</a>] -         SparkSession.createDataFrame timestamps can be incorrect with non-Arrow codepath
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23376'>SPARK-23376</a>] -         creating UnsafeKVExternalSorter with BytesToBytesMap may fail
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23377'>SPARK-23377</a>] -         Bucketizer with multiple columns persistence bug
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23384'>SPARK-23384</a>] -         When it has no incomplete(completed) applications found, the last updated time is not formatted and client local time zone is not show in history server web ui.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23387'>SPARK-23387</a>] -         Backport assertPandasEqual to branch-2.3.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23388'>SPARK-23388</a>] -         Support for Parquet Binary DecimalType in VectorizedColumnReader
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23391'>SPARK-23391</a>] -         It may lead to overflow for some integer multiplication 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23394'>SPARK-23394</a>] -         Storage info&#39;s Cached Partitions doesn&#39;t consider the replications (but sc.getRDDStorageInfo does)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23399'>SPARK-23399</a>] -         Register a task completion listener first for OrcColumnarBatchReader
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23400'>SPARK-23400</a>] -         Add the extra constructors for ScalaUDF
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23413'>SPARK-23413</a>] -         Sorting tasks by Host / Executor ID on the Stage page does not work
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23419'>SPARK-23419</a>] -         data source v2 write path should re-throw interruption exceptions directly
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23421'>SPARK-23421</a>] -         Document the behavior change in SPARK-22356
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23422'>SPARK-23422</a>] -         YarnShuffleIntegrationSuite failure when SPARK_PREPEND_CLASSES set to 1
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23468'>SPARK-23468</a>] -         Failure to authenticate with old shuffle service
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23470'>SPARK-23470</a>] -         org.apache.spark.ui.jobs.ApiHelper.lastStageNameAndDescription is too slow
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23475'>SPARK-23475</a>] -         The &quot;stages&quot; page doesn&#39;t show any completed stages
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23481'>SPARK-23481</a>] -         The job page shows wrong stages when some of stages are evicted
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23484'>SPARK-23484</a>] -         Fix possible race condition in KafkaContinuousReader
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24401'>SPARK-24401</a>] -         Aggreate on Decimal Types does not work
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25523'>SPARK-25523</a>] -         Multi thread execute sparkSession.read().jdbc(url, table, properties) problem
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27191'>SPARK-27191</a>] -         union of dataframes depends on order of the columns in 2.4.0
</li>
</ul>
            
<h2>        New Feature
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-3181'>SPARK-3181</a>] -         Add Robust Regression Algorithm with Huber Estimator
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-4131'>SPARK-4131</a>] -         Support &quot;Writing data into the filesystem from queries&quot;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-12139'>SPARK-12139</a>] -         REGEX Column Specification for Hive Queries
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14516'>SPARK-14516</a>] -         Clustering evaluator
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-15689'>SPARK-15689</a>] -         Data source API v2
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-15767'>SPARK-15767</a>] -         Decision Tree Regression wrapper in SparkR
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-16026'>SPARK-16026</a>] -         Cost-based Optimizer Framework
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-16060'>SPARK-16060</a>] -         Vectorized ORC reader
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-16742'>SPARK-16742</a>] -         Kerberos support for Spark on Mesos
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-17025'>SPARK-17025</a>] -         Cannot persist PySpark ML Pipeline model that includes custom Transformer
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-18710'>SPARK-18710</a>] -         Add offset to GeneralizedLinearRegression models
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-18791'>SPARK-18791</a>] -         Stream-Stream Joins
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19489'>SPARK-19489</a>] -         Stable serialization format for external &amp; native code integration
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19507'>SPARK-19507</a>] -         pyspark.sql.types._verify_type() exceptions too broad to debug collections or nested data
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19606'>SPARK-19606</a>] -         Support constraints in spark-dispatcher
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20090'>SPARK-20090</a>] -         Add StructType.fieldNames to Python API
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20542'>SPARK-20542</a>] -         Add an API into Bucketizer that can bin a lot of columns all at once
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20601'>SPARK-20601</a>] -         Python API Changes for Constrained Logistic Regression Params
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20703'>SPARK-20703</a>] -         Add an operator for writing data out
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20812'>SPARK-20812</a>] -         Add Mesos Secrets support to the spark dispatcher
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20863'>SPARK-20863</a>] -         Add metrics/instrumentation to LiveListenerBus
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20892'>SPARK-20892</a>] -         Add SQL trunc function to SparkR
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20899'>SPARK-20899</a>] -         PySpark supports stringIndexerOrderType in RFormula
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20917'>SPARK-20917</a>] -         SparkR supports string encoding consistent with R
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20953'>SPARK-20953</a>] -         Add hash map metrics to aggregate and join
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20960'>SPARK-20960</a>] -         make ColumnVector public
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20979'>SPARK-20979</a>] -         Add a rate source to generate values for tests and benchmark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21000'>SPARK-21000</a>] -         Add Mesos labels support to the Spark Dispatcher
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21027'>SPARK-21027</a>] -         Parallel One vs. Rest Classifier
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21043'>SPARK-21043</a>] -         Add unionByName API to Dataset
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21092'>SPARK-21092</a>] -         Wire SQLConf in logical plan and expressions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21208'>SPARK-21208</a>] -         Ability to &quot;setLocalProperty&quot; from sc, in sparkR
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21221'>SPARK-21221</a>] -         CrossValidator and TrainValidationSplit Persist Nested Estimators such as OneVsRest
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21310'>SPARK-21310</a>] -         Add offset to PySpark GLM 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21421'>SPARK-21421</a>] -         Add the query id as a local property to allow source and sink using it
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21468'>SPARK-21468</a>] -         FeatureHasher Python API
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21499'>SPARK-21499</a>] -         Support creating persistent function for Spark UDAF(UserDefinedAggregateFunction)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21519'>SPARK-21519</a>] -         Add an option to the JDBC data source to initialize the environment of the remote database session
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21542'>SPARK-21542</a>] -         Helper functions for custom Python Persistence
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21633'>SPARK-21633</a>] -         Unary Transformer in Python
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21726'>SPARK-21726</a>] -         Check for structural integrity of the plan in QO in test mode
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21777'>SPARK-21777</a>] -         Simpler Dataset.sample API
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21840'>SPARK-21840</a>] -         Allow multiple SparkSubmit invocations in same JVM without polluting system properties
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21842'>SPARK-21842</a>] -         Support Kerberos ticket renewal and creation in Mesos 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21854'>SPARK-21854</a>] -         Python interface for MLOR summary
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21856'>SPARK-21856</a>] -         Update Python API for MultilayerPerceptronClassifierModel
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21911'>SPARK-21911</a>] -         Parallel Model Evaluation for ML Tuning: PySpark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22131'>SPARK-22131</a>] -         Add Mesos Secrets Support to the Mesos Driver
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22160'>SPARK-22160</a>] -         Allow changing sample points per partition in range shuffle exchange
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22181'>SPARK-22181</a>] -         ReplaceExceptWithFilter if one or both of the datasets are fully derived out of Filters from a same parent
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22456'>SPARK-22456</a>] -         Add new function dayofweek
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22521'>SPARK-22521</a>] -         VectorIndexerModel support handle unseen categories via handleInvalid: Python API
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22734'>SPARK-22734</a>] -         VectorSizeHint Python API
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22781'>SPARK-22781</a>] -         Support creating streaming dataset with ORC files
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23008'>SPARK-23008</a>] -         OnehotEncoderEstimator python API
</li>
</ul>
    
<h2>        Improvement
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-7481'>SPARK-7481</a>] -         Add spark-hadoop-cloud module to pull in object store support
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-9221'>SPARK-9221</a>] -         Support IntervalType in Range Frame
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-10216'>SPARK-10216</a>] -         Avoid creating empty files during overwrite into Hive table with group by query
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-10655'>SPARK-10655</a>] -         Enhance DB2 dialect to handle XML, and DECIMAL , and DECFLOAT
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-10931'>SPARK-10931</a>] -         PySpark ML Models should contain Param values
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-11574'>SPARK-11574</a>] -         Spark should support StatsD sink out of box
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-12664'>SPARK-12664</a>] -         Expose probability, rawPrediction in MultilayerPerceptronClassificationModel
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13030'>SPARK-13030</a>] -         Change OneHotEncoder to Estimator
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13041'>SPARK-13041</a>] -         Add a driver history ui link and a mesos sandbox link on the dispatcher&#39;s ui page for each driver
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13656'>SPARK-13656</a>] -         Delete spark.sql.parquet.cacheMetadata
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13846'>SPARK-13846</a>] -         VectorIndexer output on unknown feature should be more descriptive
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13947'>SPARK-13947</a>] -         The error message from using an invalid table reference is not clear
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14371'>SPARK-14371</a>] -         OnlineLDAOptimizer should not collect stats for each doc in mini-batch to driver
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14659'>SPARK-14659</a>] -         OneHotEncoder support drop first category alphabetically in the encoded vector 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14932'>SPARK-14932</a>] -         Allow DataFrame.replace() to replace values with None
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-15648'>SPARK-15648</a>] -         add TeradataDialect
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-16019'>SPARK-16019</a>] -         Eliminate unexpected delay during spark on yarn job launch
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-16496'>SPARK-16496</a>] -         Add wholetext as option for reading text in SQL.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-16931'>SPARK-16931</a>] -         PySpark access to data-frame bucketing api
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-16957'>SPARK-16957</a>] -         Use weighted midpoints for split values.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-17006'>SPARK-17006</a>] -         WithColumn Performance Degrades with Number of Invocations
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-17310'>SPARK-17310</a>] -         Disable Parquet&#39;s record-by-record filter in normal parquet reader and do it in Spark-side
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-17414'>SPARK-17414</a>] -         Set type is not supported for creating data frames
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-17701'>SPARK-17701</a>] -         Refactor DataSourceScanExec so its sameResult call does not compare strings
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-17924'>SPARK-17924</a>] -         Consolidate streaming and batch write path
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-18136'>SPARK-18136</a>] -         Make PySpark pip install works on windows
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-18540'>SPARK-18540</a>] -         Wholestage code-gen for ORC Hive tables
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-18619'>SPARK-18619</a>] -         Make QuantileDiscretizer/Bucketizer/StringIndexer inherit from HasHandleInvalid
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-18623'>SPARK-18623</a>] -         Add `returnNullable` to `StaticInvoke` and modify it to handle properly.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-18838'>SPARK-18838</a>] -         High latency of event processing for large jobs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-18891'>SPARK-18891</a>] -         Support for specific collection types
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19112'>SPARK-19112</a>] -         add codec for ZStandard
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19159'>SPARK-19159</a>] -         PySpark UDF API improvements
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19236'>SPARK-19236</a>] -         Add createOrReplaceGlobalTempView
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19270'>SPARK-19270</a>] -         Add summary table to GLM summary
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19285'>SPARK-19285</a>] -         Java - Provide user-defined function of 0 arguments (UDF0)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19358'>SPARK-19358</a>] -         LiveListenerBus shall log the event name when dropping them due to a fully filled queue
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19439'>SPARK-19439</a>] -         PySpark&#39;s registerJavaFunction Should Support UDAFs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19552'>SPARK-19552</a>] -         Upgrade Netty version to 4.1.x final
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19558'>SPARK-19558</a>] -         Provide a config option to attach QueryExecutionListener to SparkSession
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19732'>SPARK-19732</a>] -         DataFrame.fillna() does not work for bools in PySpark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19759'>SPARK-19759</a>] -         ALSModel.predict on Dataframes : potential optimization by not using blas 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19852'>SPARK-19852</a>] -         StringIndexer.setHandleInvalid should have another option &#39;new&#39;: Python API and docs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19866'>SPARK-19866</a>] -         Add local version of Word2Vec findSynonyms for spark.ml: Python API
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19878'>SPARK-19878</a>] -         Add hive configuration when initialize hive serde in InsertIntoHiveTable.scala
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19937'>SPARK-19937</a>] -         Collect metrics of block sizes when shuffle.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19951'>SPARK-19951</a>] -         Add string concatenate operator || to Spark SQL
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19975'>SPARK-19975</a>] -         Add map_keys and map_values functions  to Python 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20014'>SPARK-20014</a>] -         Optimize mergeSpillsWithFileStream method
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20055'>SPARK-20055</a>] -         Documentation for CSV datasets in SQL programming guide
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20073'>SPARK-20073</a>] -         Unexpected Cartesian product when using eqNullSafe in join with a derived table
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20101'>SPARK-20101</a>] -         Use OffHeapColumnVector when &quot;spark.sql.columnVector.offheap.enable&quot; is set to &quot;true&quot;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20109'>SPARK-20109</a>] -         Need a way to convert from IndexedRowMatrix to Dense Block Matrices
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20199'>SPARK-20199</a>] -         GradientBoostedTreesModel doesn&#39;t have  featureSubsetStrategy parameter
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20236'>SPARK-20236</a>] -         Overwrite a partitioned data source table should only overwrite related partitions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20290'>SPARK-20290</a>] -         PySpark Column should provide eqNullSafe
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20307'>SPARK-20307</a>] -         SparkR: pass on setHandleInvalid to spark.mllib functions that use StringIndexer
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20331'>SPARK-20331</a>] -         Broaden support for Hive partition pruning predicate pushdown
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20350'>SPARK-20350</a>] -         Apply Complementation Laws during boolean expression simplification
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20355'>SPARK-20355</a>] -         Display Spark version on history page
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20371'>SPARK-20371</a>] -         R wrappers for collect_list and collect_set
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20375'>SPARK-20375</a>] -         R wrappers for array and map
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20376'>SPARK-20376</a>] -         Make StateStoreProvider plugable
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20379'>SPARK-20379</a>] -         Allow setting SSL-related passwords through env variables
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20383'>SPARK-20383</a>] -         SparkSQL unsupports to create function with the keyword &#39;OR REPLACE&#39; and &#39;IF NOT EXISTS&#39;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20392'>SPARK-20392</a>] -         Slow performance when calling fit on ML pipeline for dataset with many columns but few rows
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20416'>SPARK-20416</a>] -         Column names inconsistent for UDFs in SQL vs Dataset
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20425'>SPARK-20425</a>] -         Support an extended display mode to print a column data per line
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20431'>SPARK-20431</a>] -         Support a DDL-formatted string in DataFrameReader.schema
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20433'>SPARK-20433</a>] -         Update jackson-databind to 2.6.7.1
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20437'>SPARK-20437</a>] -         R wrappers for rollup and cube
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20438'>SPARK-20438</a>] -         R wrappers for split and repeat
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20460'>SPARK-20460</a>] -         Make it more consistent to handle column name duplication
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20463'>SPARK-20463</a>] -         Add support for IS [NOT] DISTINCT FROM to SPARK SQL
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20484'>SPARK-20484</a>] -         Add documentation to ALS code
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20490'>SPARK-20490</a>] -         Add eqNullSafe, not and ! to SparkR
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20493'>SPARK-20493</a>] -         De-deuplicate parse logics for DDL-like type string in R
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20495'>SPARK-20495</a>] -         Add StorageLevel to cacheTable API 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20498'>SPARK-20498</a>] -         RandomForestRegressionModel should expose getMaxDepth in PySpark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20519'>SPARK-20519</a>] -         When the input parameter is null,  may be a runtime exception occurs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20532'>SPARK-20532</a>] -         SparkR should provide grouping and grouping_id
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20533'>SPARK-20533</a>] -         SparkR Wrappers Model should be private and value should be lazy
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20535'>SPARK-20535</a>] -         R wrappers for explode_outer and posexplode_outer
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20544'>SPARK-20544</a>] -         R wrapper for input_file_name
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20550'>SPARK-20550</a>] -         R wrappers for Dataset.alias
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20557'>SPARK-20557</a>] -         JdbcUtils doesn&#39;t support java.sql.Types.TIMESTAMP_WITH_TIMEZONE
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20566'>SPARK-20566</a>] -         ColumnVector should support `appendFloats` for array
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20599'>SPARK-20599</a>] -         ConsoleSink should work with write (batch)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20614'>SPARK-20614</a>] -         Use the same log4j configuration with Jenkins in AppVeyor
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20619'>SPARK-20619</a>] -         StringIndexer supports multiple ways of label ordering
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20639'>SPARK-20639</a>] -         Add single argument support for to_timestamp in SQL
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20668'>SPARK-20668</a>] -         Modify ScalaUDF to handle nullability.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20670'>SPARK-20670</a>] -         Simplify FPGrowth transform
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20679'>SPARK-20679</a>] -         Let ML ALS recommend for a subset of users/items
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20682'>SPARK-20682</a>] -         Add new ORCFileFormat based on Apache ORC
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20715'>SPARK-20715</a>] -         MapStatuses shouldn&#39;t be redundantly stored in both ShuffleMapStage and MapOutputTracker
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20720'>SPARK-20720</a>] -         &#39;Executor Summary&#39; should show the exact number, &#39;Removed Executors&#39; should display the specific number, in the Application Page
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20726'>SPARK-20726</a>] -         R wrapper for SQL broadcast
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20728'>SPARK-20728</a>] -         Make ORCFileFormat configurable between sql/hive and sql/core
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20730'>SPARK-20730</a>] -         Add a new Optimizer rule to combine nested Concats
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20736'>SPARK-20736</a>] -         PySpark StringIndexer supports StringOrderType
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20775'>SPARK-20775</a>] -         from_json should also have an API where the schema is specified with a string
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20779'>SPARK-20779</a>] -         The ASF header placed in an incorrect location in some files
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20785'>SPARK-20785</a>] -         Spark should  provide jump links and add (count) in the SQL web ui.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20806'>SPARK-20806</a>] -         Launcher: redundant check for Spark lib dir
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20830'>SPARK-20830</a>] -         PySpark wrappers for explode_outer and posexplode_outer
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20835'>SPARK-20835</a>] -         It should exit directly when the --total-executor-cores parameter is setted less than 0 when submit a application
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20841'>SPARK-20841</a>] -         Support table column aliases in FROM clause
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20842'>SPARK-20842</a>] -         Upgrade to 1.2.2 for Hive Metastore Client 1.2 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20849'>SPARK-20849</a>] -         Document R DecisionTree
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20861'>SPARK-20861</a>] -         Pyspark CrossValidator &amp; TrainValidationSplit should delegate parameter looping to estimators
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20871'>SPARK-20871</a>] -         Only log Janino code in debug mode
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20875'>SPARK-20875</a>] -         Spark should print the log when the directory has been deleted
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20883'>SPARK-20883</a>] -         Improve StateStore APIs for efficiency
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20886'>SPARK-20886</a>] -         HadoopMapReduceCommitProtocol to fail with message if FileOutputCommitter.getWorkPath==null
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20887'>SPARK-20887</a>] -         support alternative keys in ConfigBuilder
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20894'>SPARK-20894</a>] -         Error while checkpointing to HDFS
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20930'>SPARK-20930</a>] -          Destroy broadcasted centers after computing cost
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20936'>SPARK-20936</a>] -         Lack of an important case about the test of resolveURI
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20946'>SPARK-20946</a>] -         Do not update conf for existing SparkContext in SparkSession.getOrCreate
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20950'>SPARK-20950</a>] -         add a new config to diskWriteBufferSize which is hard coded before
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20966'>SPARK-20966</a>] -         Table data is not sorted by startTime time desc, time is not formatted and redundant code in JDBC/ODBC Server page.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20972'>SPARK-20972</a>] -         rename HintInfo.isBroadcastable to broadcast
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20981'>SPARK-20981</a>] -         Add --repositories equivalent configuration for Spark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20985'>SPARK-20985</a>] -         Improve KryoSerializerResizableOutputSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20994'>SPARK-20994</a>] -         Alleviate memory pressure in StreamManager
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20995'>SPARK-20995</a>] -         &#39;Spark-env.sh.template&#39; should add &#39;YARN_CONF_DIR&#39; configuration instructions.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21012'>SPARK-21012</a>] -         Support glob path for resources adding to Spark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21039'>SPARK-21039</a>] -         Use treeAggregate instead of aggregate in DataFrame.stat.bloomFilter
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21060'>SPARK-21060</a>] -         Css style about paging function is error in the executor page.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21070'>SPARK-21070</a>] -         Pick up cloudpickle upgrades from cloudpickle python module
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21091'>SPARK-21091</a>] -         Move constraint code into QueryPlanConstraints
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21100'>SPARK-21100</a>] -         Add summary method as alternative to describe that gives quartiles similar to Pandas
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21103'>SPARK-21103</a>] -         QueryPlanConstraints should be part of LogicalPlan
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21110'>SPARK-21110</a>] -         Structs should be usable in inequality filters
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21113'>SPARK-21113</a>] -         Support for read ahead input stream to amortize disk IO cost in the Spill reader
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21115'>SPARK-21115</a>] -         If the cores left is less than the coresPerExecutor,the cores left will not be allocated, so it should not to check in every schedule
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21125'>SPARK-21125</a>] -         PySpark context missing function to set Job Description.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21135'>SPARK-21135</a>] -         On history server page，duration of incompleted applications should be hidden instead of showing up as 0
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21137'>SPARK-21137</a>] -         Spark reads many small files slowly off local filesystem
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21142'>SPARK-21142</a>] -         spark-streaming-kafka-0-10 has too fat dependency on kafka
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21146'>SPARK-21146</a>] -         Master/Worker should handle and shutdown when any thread gets UncaughtException
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21149'>SPARK-21149</a>] -         Add job description API for R
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21153'>SPARK-21153</a>] -         Time windowing for tumbling windows can use a project instead of expand + filter
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21155'>SPARK-21155</a>] -         Add (? running tasks) into Spark UI progress
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21164'>SPARK-21164</a>] -         Remove isTableSample from Sample and isGenerated from Alias and AttributeReference
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21174'>SPARK-21174</a>] -         Validate sampling fraction in logical operator level
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21175'>SPARK-21175</a>] -         shuffle service should reject fetch requests if there are already many requests in progress
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21189'>SPARK-21189</a>] -         Handle unknown error codes in Jenkins rather then leaving incomplete comment in PRs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21192'>SPARK-21192</a>] -         Preserve State Store provider class configuration across StreamingQuery restarts
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21193'>SPARK-21193</a>] -         Specify Pandas version in setup.py
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21196'>SPARK-21196</a>] -         Split codegen info of query plan into sequence
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21217'>SPARK-21217</a>] -         Support ColumnVector.Array.to&lt;type&gt;Array()
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21222'>SPARK-21222</a>] -         Move elimination of Distinct clause from analyzer to optimizer
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21229'>SPARK-21229</a>] -         remove QueryPlan.preCanonicalized
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21238'>SPARK-21238</a>] -         allow nested SQL execution
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21240'>SPARK-21240</a>] -         Fix code style for constructing and stopping a SparkContext in UT
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21243'>SPARK-21243</a>] -         Limit the number of maps in a single shuffle fetch
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21247'>SPARK-21247</a>] -         Type comparision should respect case-sensitive SQL conf
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21250'>SPARK-21250</a>] -         Add a url in the table of &#39;Running Executors&#39;  in worker page to visit job page
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21256'>SPARK-21256</a>] -         Add WithSQLConf to Catalyst Test
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21260'>SPARK-21260</a>] -         Remove the unused OutputFakerExec
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21266'>SPARK-21266</a>] -         Support schema a DDL-formatted string in dapply/gapply/from_json
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21267'>SPARK-21267</a>] -         Improvements to the Structured Streaming programming guide
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21268'>SPARK-21268</a>] -         Move center calculations to a distributed map in KMeans
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21273'>SPARK-21273</a>] -         Decouple stats propagation from logical plan
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21275'>SPARK-21275</a>] -         Update GLM test to use supportedFamilyNames
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21276'>SPARK-21276</a>] -         Update  lz4-java to remove custom LZ4BlockInputStream
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21285'>SPARK-21285</a>] -         VectorAssembler should report the column name when data type used is not supported
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21295'>SPARK-21295</a>] -         Confusing error message for missing references
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21296'>SPARK-21296</a>] -         Avoid per-record type dispatch in PySpark createDataFrame schema verification
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21297'>SPARK-21297</a>] -         Add count in &#39;JDBC/ODBC Server&#39; page.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21304'>SPARK-21304</a>] -         remove unnecessary isNull variable for collection related encoder expressions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21305'>SPARK-21305</a>] -         The BKM (best known methods) of using native BLAS to improvement ML/MLLIB performance
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21308'>SPARK-21308</a>] -         Remove SQLConf parameters from the optimizer
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21313'>SPARK-21313</a>] -         ConsoleSink&#39;s string representation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21315'>SPARK-21315</a>] -         Skip some spill files when generateIterator(startIndex) in ExternalAppendOnlyUnsafeRowArray.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21321'>SPARK-21321</a>] -         Spark very verbose on shutdown confusing users
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21323'>SPARK-21323</a>] -         Rename sql.catalyst.plans.logical.statsEstimation.Range to ValueInterval
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21326'>SPARK-21326</a>] -         Use TextFileFormat in implementation of LibSVMFileFormat
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21329'>SPARK-21329</a>] -         Make EventTimeWatermarkExec explicitly UnaryExecNode
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21358'>SPARK-21358</a>] -         Argument of repartitionandsortwithinpartitions at pyspark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21365'>SPARK-21365</a>] -         Deduplicate logics parsing DDL-like type definition
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21373'>SPARK-21373</a>] -         Update Jetty to 9.3.20.v20170531
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21381'>SPARK-21381</a>] -         SparkR: pass on setHandleInvalid for classification algorithms
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21382'>SPARK-21382</a>] -         The note about  Scala 2.10 in building-spark.md is wrong.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21388'>SPARK-21388</a>] -         GBT inherit from HasStepSize &amp; LInearSVC/Binarizer from HasThreshold
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21396'>SPARK-21396</a>] -         Spark Hive Thriftserver doesn&#39;t return UDT field
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21401'>SPARK-21401</a>] -         add poll function for BoundedPriorityQueue
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21408'>SPARK-21408</a>] -         Default RPC dispatcher thread pool size too large for small executors
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21409'>SPARK-21409</a>] -         Expose state store memory usage in SQL metrics and progress updates
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21410'>SPARK-21410</a>] -         In RangePartitioner(partitions: Int, rdd: RDD[]), RangePartitioner.numPartitions is wrong if the number of elements in RDD (rdd.count()) is less than number of partitions (partitions in constructor).
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21415'>SPARK-21415</a>] -         Triage scapegoat warnings, part 1
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21434'>SPARK-21434</a>] -         Add PySpark pip documentation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21435'>SPARK-21435</a>] -         Empty files should be skipped while write to file
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21472'>SPARK-21472</a>] -         Introduce ArrowColumnVector as a reader for Arrow vectors.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21475'>SPARK-21475</a>] -         Change to use NIO&#39;s Files API for external shuffle service
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21477'>SPARK-21477</a>] -         Mark LocalTableScanExec&#39;s input data transient
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21491'>SPARK-21491</a>] -         Performance enhancement: eliminate creation of intermediate collections
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21504'>SPARK-21504</a>] -         Add spark version info in table metadata
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21506'>SPARK-21506</a>] -         The description of &quot;spark.executor.cores&quot; may be not  correct
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21513'>SPARK-21513</a>] -         SQL to_json should support all column types
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21517'>SPARK-21517</a>] -         Fetch local data via block manager cause oom
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21524'>SPARK-21524</a>] -         ValidatorParamsSuiteHelpers generates wrong temp files
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21527'>SPARK-21527</a>] -         Use buffer limit in order to take advantage of  JAVA NIO Util&#39;s buffercache
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21530'>SPARK-21530</a>] -         Update description of spark.shuffle.maxChunksBeingTransferred
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21538'>SPARK-21538</a>] -         Attribute resolution inconsistency in Dataset API
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21544'>SPARK-21544</a>] -         Test jar of some module should not install or deploy twice
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21553'>SPARK-21553</a>] -         Add the description of the default value of master parameter in the spark-shell
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21566'>SPARK-21566</a>] -         Python method for summary
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21575'>SPARK-21575</a>] -         Eliminate needless synchronization in java-R serialization
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21578'>SPARK-21578</a>] -         Add JavaSparkContextSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21583'>SPARK-21583</a>] -         Create a ColumnarBatch with ArrowColumnVectors for row based iteration
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21584'>SPARK-21584</a>] -         Update R method for summary to call new implementation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21589'>SPARK-21589</a>] -         Add documents about unsupported functions in Hive UDF/UDTF/UDAF
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21592'>SPARK-21592</a>] -         Skip maven-compiler-plugin main and test compilations in Maven build
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21602'>SPARK-21602</a>] -         Add map_keys and map_values functions to R
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21603'>SPARK-21603</a>] -         The wholestage codegen will be much slower then wholestage codegen is closed when the function is too long
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21604'>SPARK-21604</a>] -         if the object extends Logging, i suggest to remove the var LOG which is useless. 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21608'>SPARK-21608</a>] -         Window rangeBetween() API should allow literal boundary
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21611'>SPARK-21611</a>] -         Error class name for log in several classes.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21619'>SPARK-21619</a>] -         Fail the execution of canonicalized plans explicitly
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21622'>SPARK-21622</a>] -         Support Offset in SparkR
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21623'>SPARK-21623</a>] -         Comments of parentStats on ml/tree/impl/DTStatsAggregator.scala is wrong
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21634'>SPARK-21634</a>] -         Change OneRowRelation from a case object to case class
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21640'>SPARK-21640</a>] -         Method mode with String parameters within DataFrameWriter is error prone
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21661'>SPARK-21661</a>] -         SparkSQL can&#39;t merge load table from Hadoop
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21665'>SPARK-21665</a>] -         Need to close resources  after use
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21667'>SPARK-21667</a>] -         ConsoleSink should not fail streaming query with checkpointLocation option
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21669'>SPARK-21669</a>] -         Internal API for collecting metrics/stats during FileFormatWriter jobs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21672'>SPARK-21672</a>] -         Remove SHS-specific application / attempt data structures
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21675'>SPARK-21675</a>] -         Add a navigation bar at the bottom of the Details for Stage Page
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21680'>SPARK-21680</a>] -         ML/MLLIB Vector compressed optimization
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21694'>SPARK-21694</a>] -         Support Mesos CNI network labels
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21701'>SPARK-21701</a>] -         Add TCP send/rcv buffer size support for RPC client
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21709'>SPARK-21709</a>] -         use sbt 0.13.16 and update sbt plugins
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21717'>SPARK-21717</a>] -         Decouple the generated codes of consuming rows in operators under whole-stage codegen
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21718'>SPARK-21718</a>] -         Heavy log of type: &quot;Skipping partition based on stats ...&quot;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21728'>SPARK-21728</a>] -         Allow SparkSubmit to use logging
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21732'>SPARK-21732</a>] -         Lazily init hive metastore client
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21745'>SPARK-21745</a>] -         Refactor ColumnVector hierarchy to make ColumnVector read-only and to introduce WritableColumnVector.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21751'>SPARK-21751</a>] -         CodeGeneraor.splitExpressions counts code size more precisely
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21756'>SPARK-21756</a>] -         Add JSON option to allow unquoted control characters
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21765'>SPARK-21765</a>] -         Ensure all leaf nodes that are derived from streaming sources have isStreaming=true
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21769'>SPARK-21769</a>] -         Add a table option for Hive-serde tables to make Spark always respect schemas inferred by Spark SQL
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21770'>SPARK-21770</a>] -         ProbabilisticClassificationModel: Improve normalization of all-zero raw predictions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21771'>SPARK-21771</a>] -         SparkSQLEnv creates a useless meta hive client
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21773'>SPARK-21773</a>] -         Should Install mkdocs if missing in the path in SQL documentation build
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21781'>SPARK-21781</a>] -         Modify DataSourceScanExec to use concrete ColumnVector type.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21787'>SPARK-21787</a>] -         Support for pushing down filters for DateType in native OrcFileFormat
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21789'>SPARK-21789</a>] -         Remove obsolete codes for parsing abstract schema strings
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21803'>SPARK-21803</a>] -         Remove the HiveDDLCommandSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21806'>SPARK-21806</a>] -         BinaryClassificationMetrics pr(): first point (0.0, 1.0) is misleading
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21807'>SPARK-21807</a>] -         The getAliasedConstraints function  in LogicalPlan will take a long time when number of expressions is greater than 100 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21813'>SPARK-21813</a>] -         [core] Modify TaskMemoryManager.MAXIMUM_PAGE_SIZE_BYTES comments
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21839'>SPARK-21839</a>] -         Support SQL config for ORC compression 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21862'>SPARK-21862</a>] -         Add overflow check in PCA
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21865'>SPARK-21865</a>] -         simplify the distribution semantic of Spark SQL
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21866'>SPARK-21866</a>] -         SPIP: Image support in Spark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21871'>SPARK-21871</a>] -         Check actual bytecode size when compiling generated code
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21873'>SPARK-21873</a>] -         CachedKafkaConsumer throws NonLocalReturnControl during fetching from Kafka
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21875'>SPARK-21875</a>] -         Jenkins passes Java code that violates ./dev/lint-java
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21878'>SPARK-21878</a>] -         Create SQLMetricsTestUtils
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21886'>SPARK-21886</a>] -         Use SparkSession.internalCreateDataFrame to create Dataset with LogicalRDD logical operator
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21891'>SPARK-21891</a>] -         Add TBLPROPERTIES to DDL statement: CREATE TABLE USING 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21897'>SPARK-21897</a>] -         Add unionByName API to DataFrame in Python and R
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21901'>SPARK-21901</a>] -         Define toString for StateOperatorProgress
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21902'>SPARK-21902</a>] -         BlockManager.doPut will hide actually exception when exception thrown in finally block
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21903'>SPARK-21903</a>] -         Upgrade scalastyle to 1.0.0
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21923'>SPARK-21923</a>] -         Avoid calling reserveUnrollMemoryForThisTask for every record
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21963'>SPARK-21963</a>] -         create temp file should be delete after use
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21967'>SPARK-21967</a>] -         org.apache.spark.unsafe.types.UTF8String#compareTo Should Compare 8 Bytes at a Time for Better Performance
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21970'>SPARK-21970</a>] -         Do a Project Wide Sweep for Redundant Throws Declarations
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21973'>SPARK-21973</a>] -         Add a new option to filter queries to run in TPCDSQueryBenchmark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21975'>SPARK-21975</a>] -         Histogram support in cost-based optimizer
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21981'>SPARK-21981</a>] -         Python API for ClusteringEvaluator
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21983'>SPARK-21983</a>] -         Fix ANTLR 4.7 deprecations
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21988'>SPARK-21988</a>] -         Add default stats to StreamingRelation and StreamingExecutionRelation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22001'>SPARK-22001</a>] -         ImputerModel can do withColumn for all input columns at one pass
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22002'>SPARK-22002</a>] -         Read JDBC table use custom schema support specify partial fields
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22003'>SPARK-22003</a>] -         vectorized reader does not work with UDF when the column is array
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22009'>SPARK-22009</a>] -         Using treeAggregate improve some algs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22043'>SPARK-22043</a>] -         Python profile, show_profiles() and dump_profiles(), should throw an error with a better message
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22049'>SPARK-22049</a>] -         Confusing behavior of from_utc_timestamp and to_utc_timestamp
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22050'>SPARK-22050</a>] -         Allow BlockUpdated events to be optionally logged to the event log
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22058'>SPARK-22058</a>] -         the BufferedInputStream will not be closed if an exception occurs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22066'>SPARK-22066</a>] -         Update checkstyle to 8.2, enable it, fix violations
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22072'>SPARK-22072</a>] -         Allow the same shell params to be used for all of the different steps in release-build
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22075'>SPARK-22075</a>] -         GBTs forgot to unpersist datasets cached by Checkpointer
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22099'>SPARK-22099</a>] -         The &#39;job ids&#39; list style needs to be changed in the SQL page.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22103'>SPARK-22103</a>] -         Move HashAggregateExec parent consume to a separate function in codegen
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22106'>SPARK-22106</a>] -         Remove support for 0-parameter pandas_udfs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22112'>SPARK-22112</a>] -         Add missing method to pyspark api: spark.read.csv(Dataset&lt;String&gt;)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22120'>SPARK-22120</a>] -         TestHiveSparkSession.reset() should clean out Hive warehouse directory
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22122'>SPARK-22122</a>] -         Respect WITH clauses to count input rows in TPCDSQueryBenchmark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22123'>SPARK-22123</a>] -         Add latest failure reason for task set blacklist
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22124'>SPARK-22124</a>] -         Sample and Limit should also defer input evaluation under codegen
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22125'>SPARK-22125</a>] -         Enable Arrow Stream format for vectorized UDF.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22130'>SPARK-22130</a>] -         UTF8String.trim() inefficiently scans all white-space string twice.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22133'>SPARK-22133</a>] -         Document Mesos reject offer duration configutations
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22138'>SPARK-22138</a>] -         Allow retry during release-build
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22142'>SPARK-22142</a>] -         Move Flume support behind a profile
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22147'>SPARK-22147</a>] -         BlockId.hashCode allocates a StringBuilder/String on each call
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22156'>SPARK-22156</a>] -         Word2Vec: incorrect learning rate update equation when numIterations &gt; 1
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22170'>SPARK-22170</a>] -         Broadcast join holds an extra copy of rows in driver memory
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22173'>SPARK-22173</a>] -         Table CSS style needs to be adjusted in History Page and in Executors Page.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22188'>SPARK-22188</a>] -         Add defense against Cross-Site Scripting, MIME-sniffing and MitM attack
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22190'>SPARK-22190</a>] -         Add Spark executor task metrics to Dropwizard metrics
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22193'>SPARK-22193</a>] -         SortMergeJoinExec: typo correction
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22203'>SPARK-22203</a>] -         Add job description for file listing Spark jobs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22208'>SPARK-22208</a>] -         Improve percentile_approx by not rounding up targetError and starting from index 0
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22214'>SPARK-22214</a>] -         Refactor the list hive partitions code
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22217'>SPARK-22217</a>] -         ParquetFileFormat to support arbitrary OutputCommitters
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22233'>SPARK-22233</a>] -         filter out empty InputSplit in HadoopRDD
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22247'>SPARK-22247</a>] -         Hive partition filter very slow
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22263'>SPARK-22263</a>] -         Refactor deterministic as lazy value
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22266'>SPARK-22266</a>] -         The same aggregate function was evaluated multiple times
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22268'>SPARK-22268</a>] -         Fix java style errors
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22282'>SPARK-22282</a>] -         Rename OrcRelation to OrcFileFormat and remove ORC_COMPRESSION
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22294'>SPARK-22294</a>] -         Reset spark.driver.bindAddress when starting a Checkpoint
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22301'>SPARK-22301</a>] -         Add rule to Optimizer for In with empty list of values
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22302'>SPARK-22302</a>] -         Remove manual backports for subprocess.check_output and check_call
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22308'>SPARK-22308</a>] -         Support unit tests of spark code using ScalaTest using suites other than FunSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22313'>SPARK-22313</a>] -         Mark/print deprecation warnings as DeprecationWarning for deprecated APIs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22315'>SPARK-22315</a>] -         Check for version match between R package and JVM
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22346'>SPARK-22346</a>] -         Update VectorAssembler to work with Structured Streaming
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22348'>SPARK-22348</a>] -         The table cache providing ColumnarBatch should also do partition batch pruning
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22366'>SPARK-22366</a>] -         Support ignoreMissingFiles flag parallel to ignoreCorruptFiles
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22372'>SPARK-22372</a>] -         Make YARN client extend SparkApplication
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22378'>SPARK-22378</a>] -         Redundant nullcheck is generated for extracting value in complex types
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22379'>SPARK-22379</a>] -         Reduce duplication setUpClass and tearDownClass in PySpark SQL tests
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22385'>SPARK-22385</a>] -         MapObjects should not access list element by index
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22397'>SPARK-22397</a>] -         Add multiple column support to QuantileDiscretizer
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22405'>SPARK-22405</a>] -         Enrich the event information and add new event of ExternalCatalogEvent
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22407'>SPARK-22407</a>] -         Add rdd id column on storage page to speed up navigating
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22408'>SPARK-22408</a>] -         RelationalGroupedDataset&#39;s distinct pivot value calculation launches unnecessary stages
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22422'>SPARK-22422</a>] -         Add Adjusted R2 to RegressionMetrics
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22445'>SPARK-22445</a>] -         move CodegenContext.copyResult to CodegenSupport
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22450'>SPARK-22450</a>] -         Safely register class for mllib
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22476'>SPARK-22476</a>] -         Add new function dayofweek in R
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22496'>SPARK-22496</a>] -         beeline display operation log
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22519'>SPARK-22519</a>] -         Remove unnecessary stagingDirPath null check in ApplicationMaster.cleanupStagingDir()
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22520'>SPARK-22520</a>] -         Support code generation also for complex CASE WHEN
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22537'>SPARK-22537</a>] -         Aggregation of map output statistics on driver faces single point bottleneck
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22554'>SPARK-22554</a>] -         Add a config to control if PySpark should use daemon or not
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22566'>SPARK-22566</a>] -         Better error message for `_merge_type` in Pandas to Spark DF conversion
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22569'>SPARK-22569</a>] -         Clean up caller of splitExpressions and addMutableState
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22592'>SPARK-22592</a>] -         cleanup filter converting for hive
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22596'>SPARK-22596</a>] -         set ctx.currentVars in CodegenSupport.consume
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22597'>SPARK-22597</a>] -         Add spark-sql script for Windows users
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22608'>SPARK-22608</a>] -         Avoid code duplication regarding CodeGeneration.splitExpressions()
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22614'>SPARK-22614</a>] -         Expose range partitioning shuffle
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22617'>SPARK-22617</a>] -         make splitExpressions extract current input of the context
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22638'>SPARK-22638</a>] -         Use a separate query for StreamingQueryListenerBus
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22649'>SPARK-22649</a>] -         localCheckpoint support in Dataset API
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22660'>SPARK-22660</a>] -         Use position() and limit() to fix ambiguity issue in scala-2.12
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22665'>SPARK-22665</a>] -         Dataset API: .repartition() inconsistency / issue
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22667'>SPARK-22667</a>] -         Fix model-specific optimization support for ML tuning: Python API
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22673'>SPARK-22673</a>] -         InMemoryRelation should utilize on-disk table stats whenever possible
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22675'>SPARK-22675</a>] -         Refactoring PropagateTypes in TypeCoercion
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22677'>SPARK-22677</a>] -         cleanup whole stage codegen for hash aggregate
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22682'>SPARK-22682</a>] -         HashExpression does not need to create global variables
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22688'>SPARK-22688</a>] -         Upgrade Janino version to 3.0.8
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22690'>SPARK-22690</a>] -         Imputer inherit HasOutputCols
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22692'>SPARK-22692</a>] -         Reduce the number of generated mutable states
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22701'>SPARK-22701</a>] -         add ctx.splitExpressionsWithCurrentInputs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22704'>SPARK-22704</a>] -         Reduce # of mutable variables in Least and greatest
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22705'>SPARK-22705</a>] -         Reduce # of mutable variables in Case, Coalesce, and In
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22707'>SPARK-22707</a>] -         Optimize CrossValidator memory occupation by models in fitting
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22719'>SPARK-22719</a>] -         refactor ConstantPropagation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22729'>SPARK-22729</a>] -         Add getTruncateQuery to JdbcDialect
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22753'>SPARK-22753</a>] -         Get rid of dataSource.writeAndRead
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22754'>SPARK-22754</a>] -         Check spark.executor.heartbeatInterval setting in case of ExecutorLost
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22763'>SPARK-22763</a>] -         SHS: Ignore unknown events and parse through the file
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22767'>SPARK-22767</a>] -         use ctx.addReferenceObj in InSet and ScalaUDF
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22771'>SPARK-22771</a>] -         SQL concat for binary 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22774'>SPARK-22774</a>] -         Add compilation check for generated code in TPCDSQuerySuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22786'>SPARK-22786</a>] -         only use AppStatusPlugin in history server
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22790'>SPARK-22790</a>] -         add a configurable factor to describe HadoopFsRelation&#39;s size
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22799'>SPARK-22799</a>] -         Bucketizer should throw exception if single- and multi-column params are both set
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22801'>SPARK-22801</a>] -         Allow FeatureHasher to specify numeric columns to treat as categorical
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22810'>SPARK-22810</a>] -         PySpark supports LinearRegression with huber loss
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22830'>SPARK-22830</a>] -         Scala Coding style has been improved in Spark Examples
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22832'>SPARK-22832</a>] -         BisectingKMeans unpersist unused datasets
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22833'>SPARK-22833</a>] -         [Examples] Improvements made at SparkHive Example with Scala
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22844'>SPARK-22844</a>] -         R date_trunc API
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22847'>SPARK-22847</a>] -         Remove the duplicate code in AppStatusListener while assigning schedulingPool  for stage
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22870'>SPARK-22870</a>] -         Dynamic allocation should allow 0 idle time
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22874'>SPARK-22874</a>] -         Modify checking pandas version to use LooseVersion.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22893'>SPARK-22893</a>] -         Unified the data type mismatch message
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22894'>SPARK-22894</a>] -         DateTimeOperations should accept SQL like string type
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22895'>SPARK-22895</a>] -         Push down the deterministic predicates that are after the first non-deterministic
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22896'>SPARK-22896</a>] -         Improvement in String interpolation 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22897'>SPARK-22897</a>] -         Expose  stageAttemptId in TaskContext
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22914'>SPARK-22914</a>] -         Subbing for spark.history.ui.port does not resolve by default
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22919'>SPARK-22919</a>] -         Bump Apache httpclient versions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22921'>SPARK-22921</a>] -         Merge script should prompt for assigning jiras
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22922'>SPARK-22922</a>] -         Python API for fitMultiple
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22937'>SPARK-22937</a>] -         SQL elt for binary inputs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22939'>SPARK-22939</a>] -         Support Spark UDF in registerFunction
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22944'>SPARK-22944</a>] -         improve FoldablePropagation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22945'>SPARK-22945</a>] -         add java UDF APIs in the functions object
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22952'>SPARK-22952</a>] -         Deprecate stageAttemptId in favour of stageAttemptNumber
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22960'>SPARK-22960</a>] -         Make build-push-docker-images.sh more dev-friendly
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22979'>SPARK-22979</a>] -         Avoid per-record type dispatch in Python data conversion (EvaluatePython.fromJava)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22994'>SPARK-22994</a>] -         Require a single container image for Spark-on-K8S
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22997'>SPARK-22997</a>] -         Add additional defenses against use of freed MemoryBlocks
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22999'>SPARK-22999</a>] -         &#39;show databases like command&#39; can remove the like keyword
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23005'>SPARK-23005</a>] -         Improve RDD.take on small number of partitions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23029'>SPARK-23029</a>] -         Doc spark.shuffle.file.buffer units are kb when no units specified
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23032'>SPARK-23032</a>] -         Add a per-query codegenStageId to WholeStageCodegenExec
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23036'>SPARK-23036</a>] -         Add withGlobalTempView for testing
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23062'>SPARK-23062</a>] -         EXCEPT documentation should make it clear that it&#39;s EXCEPT DISTINCT
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23081'>SPARK-23081</a>] -         Add colRegex API to PySpark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23090'>SPARK-23090</a>] -         polish ColumnVector
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23091'>SPARK-23091</a>] -         Incorrect unit test for approxQuantile
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23122'>SPARK-23122</a>] -         Deprecate register* for UDFs in SQLContext and Catalog in PySpark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23129'>SPARK-23129</a>] -         Lazy init DiskMapIterator#deserializeStream to reduce memory usage when ExternalAppendOnlyMap spill  too many times
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23141'>SPARK-23141</a>] -         Support data type string as a returnType for registerJavaFunction.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23142'>SPARK-23142</a>] -         Add documentation for Continuous Processing
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23143'>SPARK-23143</a>] -         Add Python support for continuous trigger
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23144'>SPARK-23144</a>] -         Add console sink for continuous queries
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23149'>SPARK-23149</a>] -         polish ColumnarBatch
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23170'>SPARK-23170</a>] -         Dump the statistics of effective runs of analyzer and optimizer rules
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23199'>SPARK-23199</a>] -         improved Removes repetition from group expressions in Aggregate
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23238'>SPARK-23238</a>] -         Externalize SQLConf spark.sql.execution.arrow.enabled 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23248'>SPARK-23248</a>] -         Relocate module docstrings to the top in PySpark examples
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23249'>SPARK-23249</a>] -         Improve partition bin-filling algorithm to have less skew and fewer partitions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23276'>SPARK-23276</a>] -         Enable UDT tests in (Hive)OrcHadoopFsRelationSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23279'>SPARK-23279</a>] -         Avoid triggering distributed job for Console sink
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23284'>SPARK-23284</a>] -         Document several get API of ColumnVector&#39;s behavior when accessing null slot
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23296'>SPARK-23296</a>] -         Diagnostics message for user code exceptions should include the stacktrace
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23305'>SPARK-23305</a>] -         Test `spark.sql.files.ignoreMissingFiles` for all file-based data sources
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23312'>SPARK-23312</a>] -         add a config to turn off vectorized cache reader
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23317'>SPARK-23317</a>] -         rename ContinuousReader.setOffset to setStartOffset
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23454'>SPARK-23454</a>] -         Add Trigger information to the Structured Streaming programming guide
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23617'>SPARK-23617</a>] -         Register a Function without params with Spark SQL Java API
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23993'>SPARK-23993</a>] -         Support DESC FORMATTED table_name column_name
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24328'>SPARK-24328</a>] -         Fix scala.MatchError in literals.sql.out 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26542'>SPARK-26542</a>] -         Support the coordinator to demerminte post-shuffle partitions more reasonably
</li>
</ul>
    
<h2>        Test
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19662'>SPARK-19662</a>] -         Add Fair Scheduler Unit Test coverage for different build cases
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20518'>SPARK-20518</a>] -         Supplement the new blockidsuite unit tests
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20571'>SPARK-20571</a>] -         Flaky SparkR StructuredStreaming tests
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20607'>SPARK-20607</a>] -         Add new unit tests to ShuffleSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20957'>SPARK-20957</a>] -         Flaky Test: o.a.s.sql.streaming.StreamingQueryManagerSuite listing
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21006'>SPARK-21006</a>] -         Create rpcEnv and run later needs shutdown and awaitTermination
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21128'>SPARK-21128</a>] -         Running R tests multiple times failed due to pre-exiting &quot;spark-warehouse&quot; / &quot;metastore_db&quot;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21286'>SPARK-21286</a>] -         [spark core UT]Modify a error for unit test
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21370'>SPARK-21370</a>] -         Avoid doing anything on HDFSBackedStateStore.abort() when there are no updates to commit
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21464'>SPARK-21464</a>] -         Minimize deprecation warnings caused by ProcessingTime class
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21573'>SPARK-21573</a>] -         Tests failing with run-tests.py SyntaxError occasionally in Jenkins
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21663'>SPARK-21663</a>] -         MapOutputTrackerSuite case test(&quot;remote fetch below max RPC message size&quot;) should call stop
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21693'>SPARK-21693</a>] -         AppVeyor tests reach the time limit, 1.5 hours, sometimes in SparkR tests
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21729'>SPARK-21729</a>] -         Generic test for ProbabilisticClassifier to ensure consistent output columns
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21764'>SPARK-21764</a>] -         Tests failures on Windows: resources not being closed and incorrect paths
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21843'>SPARK-21843</a>] -         testNameNote should be &quot;(minNumPostShufflePartitions: &quot; + numPartitions + &quot;)&quot; in ExchangeCoordinatorSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21936'>SPARK-21936</a>] -         backward compatibility test framework for HiveExternalCatalog
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21949'>SPARK-21949</a>] -         Tables created in unit tests should be dropped after use
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21982'>SPARK-21982</a>] -         Set Locale to US in order to pass UtilsSuite when your jvm Locale is not US
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22140'>SPARK-22140</a>] -         Add a test suite for TPCDS queries
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22161'>SPARK-22161</a>] -         Add Impala-modified TPC-DS queries
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22418'>SPARK-22418</a>] -         Add test cases for NULL Handling
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22423'>SPARK-22423</a>] -         Scala test source files like TestHiveSingleton.scala should be in scala source root
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22595'>SPARK-22595</a>] -         flaky test: CastSuite.SPARK-22500: cast for struct should not generate codes beyond 64KB
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22644'>SPARK-22644</a>] -         Make ML testsuite support StructuredStreaming test
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22787'>SPARK-22787</a>] -         Add a TPCH query suite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22800'>SPARK-22800</a>] -         Add a SSB query suite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22881'>SPARK-22881</a>] -         ML test for StructuredStreaming: spark.ml.regression
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22938'>SPARK-22938</a>] -         Assert that SQLConf.get is accessed only on the driver.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23072'>SPARK-23072</a>] -         Add a Unicode schema test for file-based data sources
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23132'>SPARK-23132</a>] -         Run ml.image doctests in tests
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23300'>SPARK-23300</a>] -         Print out if Pandas and PyArrow are installed or not in tests
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23311'>SPARK-23311</a>] -         add FilterFunction test case for test CombineTypedFilters
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23319'>SPARK-23319</a>] -         Skip PySpark tests for old Pandas and old PyArrow
</li>
</ul>
        
<h2>        Task
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-12297'>SPARK-12297</a>] -         Add work-around for Parquet/Hive int96 timestamp bug.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-19810'>SPARK-19810</a>] -         Remove support for Scala 2.10
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20434'>SPARK-20434</a>] -         Move Hadoop delegation token code from yarn to core
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21366'>SPARK-21366</a>] -         Add sql test for window functions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21699'>SPARK-21699</a>] -         Remove unused getTableOption in ExternalCatalog
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21731'>SPARK-21731</a>] -         Upgrade scalastyle to 0.9
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21848'>SPARK-21848</a>] -         Create trait to identify user-defined functions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21939'>SPARK-21939</a>] -         Use TimeLimits instead of Timeouts
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22153'>SPARK-22153</a>] -         Rename ShuffleExchange -&gt; ShuffleExchangeExec
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22416'>SPARK-22416</a>] -         Move OrcOptions from `sql/hive` to `sql/core`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22473'>SPARK-22473</a>] -         Replace deprecated AsyncAssertions.Waiter and methods of java.sql.Date
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22485'>SPARK-22485</a>] -         Use `exclude[Problem]` instead `excludePackage` in MiMa
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22634'>SPARK-22634</a>] -         Update Bouncy castle dependency
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22672'>SPARK-22672</a>] -         Refactor ORC Tests
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23104'>SPARK-23104</a>] -         Document that kubernetes is still &quot;experimental&quot;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23426'>SPARK-23426</a>] -         Use `hive` ORC impl and disable PPD for Spark 2.3.0
</li>
</ul>
                                                    
<h2>        Dependency upgrade
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-15526'>SPARK-15526</a>] -         Shade JPMML
</li>
</ul>
                
<h2>        Brainstorming
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-7146'>SPARK-7146</a>] -         Should ML sharedParams be a public API?
</li>
</ul>
    
<h2>        Umbrella
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-18085'>SPARK-18085</a>] -         SPIP: Better History Server scalability for many / large applications
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20746'>SPARK-20746</a>] -         Built-in SQL Function Improvement
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21926'>SPARK-21926</a>] -         Compatibility between ML Transformers and Structured Streaming
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22820'>SPARK-22820</a>] -         Spark 2.3 SQL API audit
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23105'>SPARK-23105</a>] -         Spark MLlib, GraphX 2.3 QA umbrella
</li>
</ul>
                                                        
<h2>        New JIRA Project
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20758'>SPARK-20758</a>] -         Add Constant propagation optimization
</li>
</ul>
        
<h2>        Documentation
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20015'>SPARK-20015</a>] -         Document R Structured Streaming (experimental) in R vignettes and R &amp; SS programming guide, R example
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20132'>SPARK-20132</a>] -         Add documentation for column string functions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20192'>SPARK-20192</a>] -         SparkR 2.2.0 migration guide, release note
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20442'>SPARK-20442</a>] -         Fill up documentations for functions in Column API in PySpark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20448'>SPARK-20448</a>] -         Document how FileInputDStream works with object storage
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20456'>SPARK-20456</a>] -         Add examples for functions collection for pyspark
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20477'>SPARK-20477</a>] -         Document R bisecting k-means in R programming guide
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20478'>SPARK-20478</a>] -         Document LinearSVC in R programming guide
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20855'>SPARK-20855</a>] -         Update the Spark kinesis docs to use the KinesisInputDStream builder instead of deprecated KinesisUtils
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20858'>SPARK-20858</a>] -         Document ListenerBus event queue size property
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20889'>SPARK-20889</a>] -         SparkR grouped documentation for Column methods
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-20992'>SPARK-20992</a>] -         Link to Nomad scheduler backend in docs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21042'>SPARK-21042</a>] -         Document Dataset.union is resolution by position, not name
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21069'>SPARK-21069</a>] -         Add rate source to programming guide
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21123'>SPARK-21123</a>] -         Options for file stream source are in a wrong table
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21292'>SPARK-21292</a>] -         R document Catalog function metadata refresh
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21293'>SPARK-21293</a>] -         R document update structured streaming
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21469'>SPARK-21469</a>] -         Add doc and example for FeatureHasher
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21485'>SPARK-21485</a>] -         API Documentation for Spark SQL functions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21616'>SPARK-21616</a>] -         SparkR 2.3.0 migration guide, release note
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21712'>SPARK-21712</a>] -         Clarify PySpark Column.substr() type checking error message
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21724'>SPARK-21724</a>] -         Missing since information in the documentation of date functions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21925'>SPARK-21925</a>] -         Update trigger interval documentation in docs with behavior change in Spark 2.2
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-21976'>SPARK-21976</a>] -         Fix wrong doc about Mean Absolute Error
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22110'>SPARK-22110</a>] -         Enhance function description trim string function
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22335'>SPARK-22335</a>] -         Union for DataSet uses column order instead of types for union
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22369'>SPARK-22369</a>] -         PySpark: Document methods of spark.catalog interface
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22399'>SPARK-22399</a>] -         reference in mllib-clustering.html is out of date
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22412'>SPARK-22412</a>] -         Fix incorrect comment in DataSourceScanExec
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22428'>SPARK-22428</a>] -         Document spark properties for configuring the ContextCleaner
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22490'>SPARK-22490</a>] -         PySpark doc has misleading string for SparkSession.builder
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22541'>SPARK-22541</a>] -         Dataframes: applying multiple filters one after another using udfs and accumulators results in faulty accumulators
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22735'>SPARK-22735</a>] -         Add VectorSizeHint to ML features documentation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-22993'>SPARK-22993</a>] -         checkpointInterval param doc should be clearer
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23048'>SPARK-23048</a>] -         Update mllib docs to replace OneHotEncoder with OneHotEncoderEstimator 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23069'>SPARK-23069</a>] -         R doc for describe missing text
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23127'>SPARK-23127</a>] -         Update FeatureHasher user guide for catCols parameter
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23138'>SPARK-23138</a>] -         Add user guide example for multiclass logistic regression summary
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23154'>SPARK-23154</a>] -         Document backwards compatibility guarantees for ML persistence
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23163'>SPARK-23163</a>] -         Sync Python ML API docs with Scala
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23313'>SPARK-23313</a>] -         Add a migration guide for ORC
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-23327'>SPARK-23327</a>] -         Update the description of three external API or functions
</li>
</ul>