Sub-task
- [SPARK-21083] - Store zero size and row count after analyzing empty table
- [SPARK-21489] - Update release docs to point out Python 2.6 support is removed.
- [SPARK-21720] - Filter predicate with many conditions throw stackoverflow error
- [SPARK-21805] - disable R vignettes code on Windows
- [SPARK-22344] - Prevent R CMD check from using /tmp
- [SPARK-22494] - Coalesce and AtLeastNNonNulls can cause 64KB JVM bytecode limit exception
- [SPARK-22498] - 64KB JVM bytecode limit problem with concat
- [SPARK-22499] - 64KB JVM bytecode limit problem with least and greatest
- [SPARK-22500] - 64KB JVM bytecode limit problem with cast
- [SPARK-22501] - 64KB JVM bytecode limit problem with in
- [SPARK-22508] - 64KB JVM bytecode limit problem with GenerateUnsafeRowJoiner.create()
- [SPARK-22549] - 64KB JVM bytecode limit problem with concat_ws
- [SPARK-22550] - 64KB JVM bytecode limit problem with elt
Bug
- [SPARK-12717] - pyspark broadcast fails when using multiple threads
- [SPARK-14387] - Enable Hive-1.x ORC compatibility with spark.sql.hive.convertMetastoreOrc
- [SPARK-15757] - Error occurs when using Spark sql "select" statement on orc file after hive sql "insert overwrite tb1 select * from sourcTb" has been executed on this orc file
- [SPARK-16605] - Spark2.0 cannot "select" data from a table stored as an orc file which has been created by hive while hive or spark1.6 supports
- [SPARK-16628] - OrcConversions should not convert an ORC table represented by MetastoreRelation to HadoopFsRelation if metastore schema does not match schema stored in ORC files
- [SPARK-17902] - collect() ignores stringsAsFactors
- [SPARK-17920] - HiveWriterContainer passes null configuration to serde.initialize, causing NullPointerException in AvroSerde when using avro.schema.url
- [SPARK-18355] - Spark SQL fails to read data from a ORC hive table that has a new column added to it
- [SPARK-18608] - Spark ML algorithms that check RDD cache level for internal caching double-cache data
- [SPARK-19106] - Styling for the configuration docs is broken
- [SPARK-19580] - Support for avro.schema.url while writing to hive table
- [SPARK-19644] - Memory leak in Spark Streaming (Encoder/Scala Reflection)
- [SPARK-20098] - DataType's typeName method returns with 'StructF' in case of StructField
- [SPARK-20256] - Fail to start SparkContext/SparkSession with Hive support enabled when user does not have read/write privilege to Hive metastore warehouse dir
- [SPARK-20342] - DAGScheduler sends SparkListenerTaskEnd before updating task's accumulators
- [SPARK-20466] - HadoopRDD#addLocalConfiguration throws NPE
- [SPARK-20904] - Task failures during shutdown cause problems with preempted executors
- [SPARK-21170] - Utils.tryWithSafeFinallyAndFailureCallbacks throws IllegalArgumentException: Self-suppression not permitted
- [SPARK-21219] - Task retry occurs on same executor due to race condition with blacklisting
- [SPARK-21228] - InSet incorrect handling of structs
- [SPARK-21254] - History UI: Taking over 1 minute for initial page display
- [SPARK-21272] - SortMergeJoin LeftAnti does not update numOutputRows
- [SPARK-21300] - ExternalMapToCatalyst should null-check map key prior to converting to internal value.
- [SPARK-21306] - OneVsRest Conceals Columns That May Be Relevant To Underlying Classifier
- [SPARK-21312] - UnsafeRow writeToStream has incorrect offsetInByteArray calculation for non-zero offset
- [SPARK-21330] - Bad partitioning does not allow to read a JDBC table with extreme values on the partition column
- [SPARK-21332] - Incorrect result type inferred for some decimal expressions
- [SPARK-21333] - joinWith documents and analysis allow invalid join types
- [SPARK-21339] - spark-shell --packages option does not add jars to classpath on windows
- [SPARK-21342] - Fix DownloadCallback to work well with RetryingBlockFetcher
- [SPARK-21343] - Refine the document for spark.reducer.maxReqSizeShuffleToMem
- [SPARK-21344] - BinaryType comparison does signed byte array comparison
- [SPARK-21345] - SparkSessionBuilderSuite should clean up stopped sessions
- [SPARK-21369] - Don't use Scala classes in external shuffle service
- [SPARK-21374] - Reading globbed paths from S3 into DF doesn't work if filesystem caching is disabled
- [SPARK-21376] - Token is not renewed in yarn client process in cluster mode
- [SPARK-21383] - YARN can allocate too many executors
- [SPARK-21384] - Spark 2.2 + YARN without spark.yarn.jars / spark.yarn.archive fails
- [SPARK-21414] - Buffer in SlidingWindowFunctionFrame could be big though window is small
- [SPARK-21418] - NoSuchElementException: None.get in DataSourceScanExec with sun.io.serialization.extendedDebugInfo=true
- [SPARK-21441] - Incorrect Codegen in SortMergeJoinExec results failures in some cases
- [SPARK-21445] - NotSerializableException thrown by UTF8String.IntWrapper
- [SPARK-21446] - [SQL] JDBC Postgres fetchsize parameter ignored again
- [SPARK-21447] - Spark history server fails to render compressed inprogress history file in some cases.
- [SPARK-21457] - ExternalCatalog.listPartitions should correctly handle partition values with dot
- [SPARK-21494] - Spark 2.2.0 AES encryption not working with External shuffle
- [SPARK-21503] - Spark UI shows incorrect task status for a killed Executor Process
- [SPARK-21508] - Documentation on 'Spark Streaming Custom Receivers' has error in example code
- [SPARK-21522] - Flaky test: LauncherServerSuite.testStreamFiltering
- [SPARK-21523] - Fix bug of strong wolfe linesearch `init` parameter lose effectiveness
- [SPARK-21546] - dropDuplicates with watermark yields RuntimeException due to binding failure
- [SPARK-21549] - Spark fails to complete job correctly in case of OutputFormat which do not write into hdfs
- [SPARK-21551] - pyspark's collect fails when getaddrinfo is too slow
- [SPARK-21555] - GROUP BY don't work with expressions with NVL and nested objects
- [SPARK-21563] - Race condition when serializing TaskDescriptions and adding jars
- [SPARK-21565] - aggregate query fails with watermark on eventTime but works with watermark on timestamp column generated by current_timestamp
- [SPARK-21580] - A bug with `Group by ordinal`
- [SPARK-21588] - SQLContext.getConf(key, null) should return null, but it throws NPE
- [SPARK-21593] - Fix broken configuration page
- [SPARK-21595] - introduction of spark.sql.windowExec.buffer.spill.threshold in spark 2.2 breaks existing workflow
- [SPARK-21596] - Audit the places calling HDFSMetadataLog.get
- [SPARK-21597] - Avg event time calculated in progress may be wrong
- [SPARK-21617] - ALTER TABLE...ADD COLUMNS broken in Hive 2.1 for DS tables
- [SPARK-21621] - Reset numRecordsWritten after DiskBlockObjectWriter.commitAndGet called
- [SPARK-21647] - SortMergeJoin failed when using CROSS
- [SPARK-21648] - Confusing assert failure in JDBC source when users misspell the option `partitionColumn`
- [SPARK-21656] - spark dynamic allocation should not idle timeout executors when there are enough tasks to run on them
- [SPARK-21681] - MLOR do not work correctly when featureStd contains zero
- [SPARK-21696] - State Store can't handle corrupted snapshots
- [SPARK-21714] - SparkSubmit in Yarn Client mode downloads remote files and then reuploads them again
- [SPARK-21721] - Memory leak in org.apache.spark.sql.hive.execution.InsertIntoHiveTable
- [SPARK-21723] - Can't write LibSVM - key not found: numFeatures
- [SPARK-21739] - timestamp partition would fail in v2.2.0
- [SPARK-21793] - Correct validateAndTransformSchema in GaussianMixture and AFTSurvivalRegression
- [SPARK-21798] - No config to replace deprecated SPARK_CLASSPATH config for launching daemons like History Server
- [SPARK-21818] - MultivariateOnlineSummarizer.variance generate negative result
- [SPARK-21826] - outer broadcast hash join should not throw NPE
- [SPARK-21834] - Incorrect executor request in case of dynamic allocation
- [SPARK-21890] - ObtainCredentials does not pass creds to addDelegationTokens
- [SPARK-21907] - NullPointerException in UnsafeExternalSorter.spill()
- [SPARK-21915] - Model 1 and Model 2 ParamMaps Missing
- [SPARK-21924] - Bug in Structured Streaming Documentation
- [SPARK-21928] - ClassNotFoundException for custom Kryo registrator class during serde in netty threads
- [SPARK-21946] - Flaky test: InMemoryCatalogedDDLSuite.`alter table: rename cached table`
- [SPARK-21950] - pyspark.sql.tests.SQLTests2 should stop SparkContext.
- [SPARK-21953] - Show both memory and disk bytes spilled if either is present
- [SPARK-21954] - JacksonUtils should verify MapType's value type instead of key type
- [SPARK-21980] - References in grouping functions should be indexed with resolver
- [SPARK-21985] - PySpark PairDeserializer is broken for double-zipped RDDs
- [SPARK-21991] - [LAUNCHER] LauncherServer acceptConnections thread sometime dies if machine has very high load
- [SPARK-22052] - Incorrect Metric assigned in MetricsReporter.scala
- [SPARK-22071] - Improve release build scripts to check correct JAVA version is being used for build
- [SPARK-22076] - Expand.projections should not be a Stream
- [SPARK-22083] - When dropping multiple blocks to disk, Spark should release all locks on a failure
- [SPARK-22092] - Reallocation in OffHeapColumnVector.reserveInternal corrupts array data
- [SPARK-22094] - processAllAvailable should not block forever when a query is stopped
- [SPARK-22107] - "as" should be "alias" in python quick start documentation
- [SPARK-22109] - Reading tables partitioned by columns that look like timestamps has inconsistent schema inference
- [SPARK-22129] - Spark release scripts ignore the GPG_KEY and always sign with your default key
- [SPARK-22135] - metrics in spark-dispatcher not being registered properly
- [SPARK-22141] - Propagate empty relation before checking Cartesian products
- [SPARK-22143] - OffHeapColumnVector may leak memory
- [SPARK-22146] - FileNotFoundException while reading ORC files containing '%'
- [SPARK-22158] - convertMetastore should not ignore storage properties
- [SPARK-22167] - Spark Packaging w/R distro issues
- [SPARK-22178] - Refresh Table does not refresh the underlying tables of the persistent view
- [SPARK-22206] - gapply in R can't work on empty grouping columns
- [SPARK-22211] - LimitPushDown optimization for FullOuterJoin generates wrong results
- [SPARK-22218] - spark shuffle services fails to update secret on application re-attempts
- [SPARK-22223] - ObjectHashAggregate introduces unnecessary shuffle
- [SPARK-22227] - DiskBlockManager.getAllBlocks could fail if called during shuffle
- [SPARK-22243] - streaming job failed to restart from checkpoint
- [SPARK-22249] - UnsupportedOperationException: empty.reduceLeft when caching a dataframe
- [SPARK-22252] - FileFormatWriter should respect the input query schema
- [SPARK-22271] - Describe results in "null" for the value of "mean" of a numeric variable
- [SPARK-22273] - Fix key/value schema field names in HashMapGenerators.
- [SPARK-22281] - Handle R method breaking signature changes
- [SPARK-22284] - Code of class \"org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection\" grows beyond 64 KB
- [SPARK-22287] - SPARK_DAEMON_MEMORY not honored by MesosClusterDispatcher
- [SPARK-22291] - Postgresql UUID[] to Cassandra: Conversion Error
- [SPARK-22306] - INFER_AND_SAVE overwrites important metadata in Parquet Metastore table
- [SPARK-22319] - SparkSubmit calls getFileStatus before calling loginUserFromKeytab
- [SPARK-22327] - R CRAN check fails on non-latest branches
- [SPARK-22328] - ClosureCleaner misses referenced superclass fields, gives them null values
- [SPARK-22332] - NaiveBayes unit test occasionly fail
- [SPARK-22333] - ColumnReference should get higher priority than timeFunctionCall(CURRENT_DATE, CURRENT_TIMESTAMP)
- [SPARK-22355] - Dataset.collect is not threadsafe
- [SPARK-22356] - data source table should support overlapped columns between data and partition schema
- [SPARK-22377] - Maven nightly snapshot jenkins jobs are broken on multiple workers due to lsof
- [SPARK-22403] - StructuredKafkaWordCount example fails in YARN cluster mode
- [SPARK-22406] - pyspark version tag is wrong on PyPi
- [SPARK-22417] - createDataFrame from a pandas.DataFrame reads datetime64 values as longs
- [SPARK-22429] - Streaming checkpointing code does not retry after failure due to NullPointerException
- [SPARK-22442] - Schema generated by Product Encoder doesn't match case class field name when using non-standard characters
- [SPARK-22464] - <=> is not supported by Hive metastore partition predicate pushdown
- [SPARK-22469] - Accuracy problem in comparison with string and numeric
- [SPARK-22471] - SQLListener consumes much memory causing OutOfMemoryError
- [SPARK-22472] - Datasets generate random values for null primitive types
- [SPARK-22479] - SaveIntoDataSourceCommand logs jdbc credentials
- [SPARK-22488] - The view resolution in the SparkSession internal table() API
- [SPARK-22495] - Fix setup of SPARK_HOME variable on Windows
- [SPARK-22511] - Update maven central repo address
- [SPARK-22535] - PythonRunner.MonitorThread should give the task a little time to finish before killing the python worker
- [SPARK-22538] - SQLTransformer.transform(inputDataFrame) uncaches inputDataFrame
- [SPARK-22540] - HighlyCompressedMapStatus's avgSize is incorrect
- [SPARK-22544] - FileStreamSource should use its own hadoop conf to call globPathIfNecessary
- [SPARK-22548] - Incorrect nested AND expression pushed down to JDBC data source
- [SPARK-22591] - GenerateOrdering shouldn't change ctx.INPUT_ROW
- [SPARK-22755] - Expression (946-885)*1.0/946 < 0.1 and (946-885)*1.000/946 < 0.1 return different results
- [SPARK-23351] - checkpoint corruption in long running application
New Feature
- [SPARK-19606] - Support constraints in spark-dispatcher
Improvement
- [SPARK-18136] - Make PySpark pip install works on windows
- [SPARK-19878] - Add hive configuration when initialize hive serde in InsertIntoHiveTable.scala
- [SPARK-21243] - Limit the number of maps in a single shuffle fetch
- [SPARK-21267] - Improvements to the Structured Streaming programming guide
- [SPARK-21321] - Spark very verbose on shutdown confusing users
- [SPARK-21434] - Add PySpark pip documentation
- [SPARK-21477] - Mark LocalTableScanExec's input data transient
- [SPARK-21538] - Attribute resolution inconsistency in Dataset API
- [SPARK-21667] - ConsoleSink should not fail streaming query with checkpointLocation option
- [SPARK-21901] - Define toString for StateOperatorProgress
- [SPARK-22043] - Python profile, show_profiles() and dump_profiles(), should throw an error with a better message
- [SPARK-22072] - Allow the same shell params to be used for all of the different steps in release-build
- [SPARK-22120] - TestHiveSparkSession.reset() should clean out Hive warehouse directory
- [SPARK-22138] - Allow retry during release-build
- [SPARK-22217] - ParquetFileFormat to support arbitrary OutputCommitters
- [SPARK-22294] - Reset spark.driver.bindAddress when starting a Checkpoint
- [SPARK-22315] - Check for version match between R package and JVM
Test
- [SPARK-21128] - Running R tests multiple times failed due to pre-exiting "spark-warehouse" / "metastore_db"
- [SPARK-21464] - Minimize deprecation warnings caused by ProcessingTime class
- [SPARK-21663] - MapOutputTrackerSuite case test("remote fetch below max RPC message size") should call stop
- [SPARK-21693] - AppVeyor tests reach the time limit, 1.5 hours, sometimes in SparkR tests
- [SPARK-21936] - backward compatibility test framework for HiveExternalCatalog
- [SPARK-22140] - Add a test suite for TPCDS queries
- [SPARK-22161] - Add Impala-modified TPC-DS queries
- [SPARK-22595] - flaky test: CastSuite.SPARK-22500: cast for struct should not generate codes beyond 64KB
Task
- [SPARK-21366] - Add sql test for window functions
- [SPARK-21699] - Remove unused getTableOption in ExternalCatalog
Documentation
- [SPARK-21069] - Add rate source to programming guide
- [SPARK-21925] - Update trigger interval documentation in docs with behavior change in Spark 2.2
- [SPARK-21976] - Fix wrong doc about Mean Absolute Error
- [SPARK-22490] - PySpark doc has misleading string for SparkSession.builder
- [SPARK-22627] - Fix formatting of headers in configuration.html page
Edit/Copy Release Notes
The text area below allows the project release notes to be edited and copied to another document.