Release Notes - ASF JIRA

Release Notes - Spark - Version 1.6.2 - HTML format

Configure Release Notes

Sub-task

[SPARK-15723] - SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ name

Bug

[SPARK-8428] - TimSort Comparison method violates its general contract with CLUSTER BY
[SPARK-10722] - Uncaught exception: RDDBlockId not found in driver-heartbeater
[SPARK-11327] - spark-dispatcher doesn't pass along some spark properties
[SPARK-11507] - Error thrown when using BlockMatrix.add
[SPARK-12655] - GraphX does not unpersist RDDs
[SPARK-12712] - test-dependencies.sh script fails when run against empty .m2 cache
[SPARK-12941] - Spark-SQL JDBC Oracle dialect fails to map string datatypes to Oracle VARCHAR datatype
[SPARK-13023] - Check for presence of 'root' module after computing test_modules, not changed_modules
[SPARK-13207] - _SUCCESS should not break partition discovery
[SPARK-13227] - Risky apply() in OpenHashMap
[SPARK-13242] - Moderately complex `when` expression causes code generation failure
[SPARK-13327] - colnames()<- allows invalid column names
[SPARK-13352] - BlockFetch does not scale well on large block
[SPARK-13444] - QuantileDiscretizer chooses bad splits on large DataFrames
[SPARK-13522] - Executor should kill itself when it's unable to heartbeat to the driver more than N times
[SPARK-13566] - Deadlock between MemoryStore and BlockManager
[SPARK-13622] - Issue creating level db file for YARN shuffle service if URI is used in yarn.nodemanager.local-dirs
[SPARK-13631] - getPreferredLocations race condition in spark 1.6.0?
[SPARK-13642] - Properly handle signal kill of ApplicationMaster
[SPARK-13648] - org.apache.spark.sql.hive.client.VersionsSuite fails NoClassDefFoundError on IBM JDK
[SPARK-13652] - TransportClient.sendRpcSync returns wrong results
[SPARK-13697] - TransformFunctionSerializer.loads doesn't restore the function's module name if it's '__main__'
[SPARK-13705] - UpdateStateByKey Operation documentation incorrectly refers to StatefulNetworkWordCount
[SPARK-13711] - Apache Spark driver stopping JVM when master not available
[SPARK-13755] - Escape quotes in SQL plan visualization node labels
[SPARK-13772] - DataType mismatch about decimal
[SPARK-13803] - Standalone master does not balance cluster-mode drivers across workers
[SPARK-13806] - SQL round() produces incorrect results for negative values
[SPARK-13845] - BlockStatus and StreamBlockId keep on growing result driver OOM
[SPARK-13850] - TimSort Comparison method violates its general contract
[SPARK-13901] - We get wrong logdebug information when jump to the next locality level.
[SPARK-13958] - Executor OOM due to unbounded growth of pointer array in Sorter
[SPARK-14006] - Builds of 1.6 branch fail R style check
[SPARK-14074] - Use fixed version of install_github in SparkR build
[SPARK-14138] - Generated SpecificColumnarIterator code can exceed JVM size limit for cached DataFrames
[SPARK-14159] - StringIndexerModel sets output column metadata incorrectly
[SPARK-14187] - Incorrect use of binarysearch in SparseMatrix
[SPARK-14204] - [SQL] Failure to register URL-derived JDBC driver on executors in cluster mode
[SPARK-14219] - Fix `pickRandomVertex` not to fall into infinite loops for graphs with one vertex
[SPARK-14232] - Event timeline on job page doesn't show if an executor is removed with multiple line reason
[SPARK-14243] - updatedBlockStatuses does not update correctly when removing blocks
[SPARK-14261] - Memory leak in Spark Thrift Server
[SPARK-14298] - LDA should support disable checkpoint
[SPARK-14322] - Use treeAggregate instead of reduce in OnlineLDAOptimizer
[SPARK-14357] - Tasks that fail due to CommitDeniedException (a side-effect of speculation) can cause job failure
[SPARK-14363] - Executor OOM due to a memory leak in Sorter
[SPARK-14368] - Support python.spark.worker.memory with upper-case unit
[SPARK-14454] - Better exception handling while marking tasks as failed
[SPARK-14468] - Always enable OutputCommitCoordinator
[SPARK-14495] - Distinct aggregation cannot be used in the having clause
[SPARK-14563] - SQLTransformer.transformSchema is not implemented correctly
[SPARK-14665] - PySpark StopWordsRemover default stopwords are Java object
[SPARK-14671] - Pipeline.setStages needs to handle Array non-covariance
[SPARK-14679] - UI DAG visualization causes OOM generating data
[SPARK-14739] - Vectors.parse doesn't handle dense vectors of size 0 and sparse vectors with no indices
[SPARK-14757] - Incorrect behavior of Join operation in Spqrk SQL JOIN : "false" in the left table is joined to "null" on the right table
[SPARK-14915] - Tasks that fail due to CommitDeniedException (a side-effect of speculation) can cause job to never complete
[SPARK-14965] - StructType throws exception for missing field
[SPARK-15165] - Codegen can break because toCommentSafeString is not actually safe
[SPARK-15209] - Web UI's timeline visualizations fails to render if descriptions contain single quotes
[SPARK-15260] - UnifiedMemoryManager could be in bad state if any exception happen while evicting blocks
[SPARK-15262] - race condition in killing an executor and reregistering an executor
[SPARK-15528] - conv function returns inconsistent result for the same data
[SPARK-15601] - CircularBuffer's toString() to print only the contents written if buffer isn't full
[SPARK-15736] - Gracefully handle loss of DiskStore files
[SPARK-15754] - org.apache.spark.deploy.yarn.Client changes the credential of current user
[SPARK-15892] - Incorrectly merged AFTAggregator with zero total count
[SPARK-15975] - Improper Popen.wait() return code handling in dev/run-tests
[SPARK-16017] - YarnClientSchedulerBackend now registers backends as IPs instead of Hostnames which causes all tasks to run with RACK_LOCAL locality.
[SPARK-16035] - The SparseVector parser fails checking for valid end parenthesis
[SPARK-16086] - Python UDF failed when there is no arguments
[SPARK-16466] - names() function allows creation of column name containing "-". filter() function subsequently fails
[SPARK-16709] - Task with commit failed will retry infinite when speculation set to true

New Feature

[SPARK-11515] - QuantileDiscretizer should take random seed
[SPARK-13465] - Add a task failure listener to TaskContext

Improvement

[SPARK-13599] - Groovy-all ends up in spark-assembly if hive profile set
[SPARK-13601] - Invoke task failure callbacks before calling outputstream.close()
[SPARK-13663] - Upgrade Snappy Java to 1.1.2.1
[SPARK-13810] - Add Port Configuration Suggestions on Bind Exceptions
[SPARK-14058] - Incorrect docstring in Window.orderBy
[SPARK-14107] - PySpark spark.ml GBT algs need seed Param
[SPARK-14149] - Log exceptions in tryOrIOException
[SPARK-14242] - avoid too many copies in network when a network frame is large
[SPARK-14787] - Upgrade Joda-Time library from 2.9 to 2.9.3
[SPARK-15205] - Codegen can compile the same source code more than twice
[SPARK-15827] - Publish Spark's forked sbt-pom-reader to Maven Central
[SPARK-18391] - Openstack deployment scenarios

Documentation

[SPARK-14618] - RegressionEvaluator doc out of date
[SPARK-15223] - spark.executor.logs.rolling.maxSize wrongly referred to as spark.executor.logs.rolling.size.maxBytes

Edit/Copy Release Notes

The text area below allows the project release notes to be edited and copied to another document.

Release Notes - Spark - Version 1.6.2
    
<h2>        Sub-task
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-15723'>SPARK-15723</a>] -         SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ name
</li>
</ul>
            
<h2>        Bug
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-8428'>SPARK-8428</a>] -         TimSort Comparison method violates its general contract with CLUSTER BY
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-10722'>SPARK-10722</a>] -         Uncaught exception: RDDBlockId not found in driver-heartbeater
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-11327'>SPARK-11327</a>] -         spark-dispatcher doesn&#39;t pass along some spark properties
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-11507'>SPARK-11507</a>] -         Error thrown when using BlockMatrix.add
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-12655'>SPARK-12655</a>] -         GraphX does not unpersist RDDs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-12712'>SPARK-12712</a>] -         test-dependencies.sh script fails when run against empty .m2 cache
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-12941'>SPARK-12941</a>] -         Spark-SQL JDBC Oracle dialect fails to map string datatypes to Oracle VARCHAR datatype
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13023'>SPARK-13023</a>] -         Check for presence of &#39;root&#39; module after computing test_modules, not changed_modules
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13207'>SPARK-13207</a>] -         _SUCCESS should not break partition discovery
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13227'>SPARK-13227</a>] -         Risky apply() in OpenHashMap
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13242'>SPARK-13242</a>] -         Moderately complex `when` expression causes code generation failure
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13327'>SPARK-13327</a>] -         colnames()&lt;- allows invalid column names
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13352'>SPARK-13352</a>] -         BlockFetch does not scale well on large block
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13444'>SPARK-13444</a>] -         QuantileDiscretizer chooses bad splits on large DataFrames
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13522'>SPARK-13522</a>] -         Executor should kill itself when it&#39;s unable to heartbeat to the driver more than N times
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13566'>SPARK-13566</a>] -         Deadlock between MemoryStore and BlockManager
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13622'>SPARK-13622</a>] -         Issue creating level db file for YARN shuffle service if URI is used in yarn.nodemanager.local-dirs
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13631'>SPARK-13631</a>] -         getPreferredLocations race condition in spark 1.6.0?
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13642'>SPARK-13642</a>] -         Properly handle signal kill of ApplicationMaster
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13648'>SPARK-13648</a>] -         org.apache.spark.sql.hive.client.VersionsSuite fails NoClassDefFoundError on IBM JDK
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13652'>SPARK-13652</a>] -         TransportClient.sendRpcSync returns wrong results
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13697'>SPARK-13697</a>] -         TransformFunctionSerializer.loads doesn&#39;t restore the function&#39;s module name if it&#39;s &#39;__main__&#39;
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13705'>SPARK-13705</a>] -         UpdateStateByKey Operation documentation incorrectly refers to StatefulNetworkWordCount
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13711'>SPARK-13711</a>] -         Apache Spark driver stopping JVM when master not available 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13755'>SPARK-13755</a>] -         Escape quotes in SQL plan visualization node labels
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13772'>SPARK-13772</a>] -         DataType mismatch about decimal
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13803'>SPARK-13803</a>] -         Standalone master does not balance cluster-mode drivers across workers
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13806'>SPARK-13806</a>] -         SQL round() produces incorrect results for negative values
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13845'>SPARK-13845</a>] -         BlockStatus and StreamBlockId keep on growing result driver OOM
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13850'>SPARK-13850</a>] -         TimSort Comparison method violates its general contract
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13901'>SPARK-13901</a>] -         We get wrong logdebug information when jump to the next locality level.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13958'>SPARK-13958</a>] -         Executor OOM due to unbounded growth of pointer array in Sorter
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14006'>SPARK-14006</a>] -         Builds of 1.6 branch fail R style check
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14074'>SPARK-14074</a>] -         Use fixed version of install_github in SparkR build
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14138'>SPARK-14138</a>] -         Generated SpecificColumnarIterator code can exceed JVM size limit for cached DataFrames
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14159'>SPARK-14159</a>] -         StringIndexerModel sets output column metadata incorrectly
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14187'>SPARK-14187</a>] -         Incorrect use of binarysearch in SparseMatrix
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14204'>SPARK-14204</a>] -         [SQL] Failure to register URL-derived JDBC driver on executors in cluster mode
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14219'>SPARK-14219</a>] -         Fix `pickRandomVertex` not to fall into infinite loops for graphs with one vertex
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14232'>SPARK-14232</a>] -         Event timeline on job page doesn&#39;t show if an executor is removed with multiple line reason
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14243'>SPARK-14243</a>] -         updatedBlockStatuses does not update correctly when removing blocks
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14261'>SPARK-14261</a>] -         Memory leak in Spark Thrift Server
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14298'>SPARK-14298</a>] -         LDA should support disable checkpoint
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14322'>SPARK-14322</a>] -         Use treeAggregate instead of reduce in OnlineLDAOptimizer
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14357'>SPARK-14357</a>] -         Tasks that fail due to CommitDeniedException (a side-effect of speculation) can cause job failure
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14363'>SPARK-14363</a>] -         Executor OOM due to a memory leak in Sorter
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14368'>SPARK-14368</a>] -         Support python.spark.worker.memory with upper-case unit
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14454'>SPARK-14454</a>] -         Better exception handling while marking tasks as failed
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14468'>SPARK-14468</a>] -         Always enable OutputCommitCoordinator
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14495'>SPARK-14495</a>] -         Distinct aggregation cannot be used in the having clause
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14563'>SPARK-14563</a>] -         SQLTransformer.transformSchema is not implemented correctly
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14665'>SPARK-14665</a>] -         PySpark StopWordsRemover default stopwords are Java object
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14671'>SPARK-14671</a>] -         Pipeline.setStages needs to handle Array non-covariance
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14679'>SPARK-14679</a>] -         UI DAG visualization causes OOM generating data
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14739'>SPARK-14739</a>] -         Vectors.parse doesn&#39;t handle dense vectors of size 0 and sparse vectors with no indices
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14757'>SPARK-14757</a>] -         Incorrect behavior of Join operation in Spqrk SQL JOIN : &quot;false&quot; in the left table is joined to &quot;null&quot; on the right table
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14915'>SPARK-14915</a>] -         Tasks that fail due to CommitDeniedException (a side-effect of speculation) can cause job to never complete
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14965'>SPARK-14965</a>] -         StructType throws exception for missing field
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-15165'>SPARK-15165</a>] -         Codegen can break because toCommentSafeString is not actually safe
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-15209'>SPARK-15209</a>] -         Web UI&#39;s timeline visualizations fails to render if descriptions contain single quotes
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-15260'>SPARK-15260</a>] -         UnifiedMemoryManager could be in bad state if any exception happen while evicting blocks
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-15262'>SPARK-15262</a>] -         race condition in killing an executor and reregistering an executor
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-15528'>SPARK-15528</a>] -         conv function returns inconsistent result for the same data
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-15601'>SPARK-15601</a>] -         CircularBuffer&#39;s toString() to print only the contents written if buffer isn&#39;t full
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-15736'>SPARK-15736</a>] -         Gracefully handle loss of DiskStore files
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-15754'>SPARK-15754</a>] -         org.apache.spark.deploy.yarn.Client changes the credential of current user
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-15892'>SPARK-15892</a>] -         Incorrectly merged AFTAggregator with zero total count
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-15975'>SPARK-15975</a>] -         Improper Popen.wait() return code handling in dev/run-tests
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-16017'>SPARK-16017</a>] -         YarnClientSchedulerBackend now registers backends as IPs instead of Hostnames which causes all tasks to run with RACK_LOCAL locality.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-16035'>SPARK-16035</a>] -         The SparseVector parser fails checking for valid end parenthesis
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-16086'>SPARK-16086</a>] -         Python UDF failed when there is no arguments
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-16466'>SPARK-16466</a>] -         names() function allows creation of column name containing &quot;-&quot;.  filter() function subsequently fails
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-16709'>SPARK-16709</a>] -         Task with commit failed will retry infinite when speculation set to true
</li>
</ul>
            
<h2>        New Feature
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-11515'>SPARK-11515</a>] -         QuantileDiscretizer should take random seed
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13465'>SPARK-13465</a>] -         Add a task failure listener to TaskContext
</li>
</ul>
    
<h2>        Improvement
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13599'>SPARK-13599</a>] -         Groovy-all ends up in spark-assembly if hive profile set
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13601'>SPARK-13601</a>] -         Invoke task failure callbacks before calling outputstream.close()
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13663'>SPARK-13663</a>] -         Upgrade Snappy Java to 1.1.2.1
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-13810'>SPARK-13810</a>] -         Add Port Configuration Suggestions on Bind Exceptions
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14058'>SPARK-14058</a>] -         Incorrect docstring in Window.orderBy 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14107'>SPARK-14107</a>] -         PySpark spark.ml GBT algs need seed Param
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14149'>SPARK-14149</a>] -         Log exceptions in tryOrIOException
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14242'>SPARK-14242</a>] -         avoid too many copies in network when a network frame is large
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14787'>SPARK-14787</a>] -         Upgrade Joda-Time library from 2.9 to 2.9.3
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-15205'>SPARK-15205</a>] -         Codegen can compile the same source code more than twice
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-15827'>SPARK-15827</a>] -         Publish Spark&#39;s forked sbt-pom-reader to Maven Central
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-18391'>SPARK-18391</a>] -         Openstack deployment scenarios
</li>
</ul>
                                                                                                                                                    
<h2>        Documentation
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-14618'>SPARK-14618</a>] -         RegressionEvaluator doc out of date
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-15223'>SPARK-15223</a>] -         spark.executor.logs.rolling.maxSize wrongly referred to as spark.executor.logs.rolling.size.maxBytes
</li>
</ul>