Release Notes - ASF JIRA

Release Notes - Spark - Version 2.4.8 - HTML format

Configure Release Notes

Sub-task

[SPARK-24266] - Spark client terminates while driver is still running
[SPARK-27421] - RuntimeException when querying a view on a partitioned parquet table
[SPARK-30894] - The nullability of Size function should not depend on SQLConf.get
[SPARK-32247] - scipy installation fails with PyPy
[SPARK-33096] - Use LinkedHashMap instead of Map for newlyCreatedExecutors
[SPARK-33290] - REFRESH TABLE should invalidate cache even though the table itself may not be cached
[SPARK-33464] - Add/remove (un)necessary cache and restructure GitHub Actions yaml
[SPARK-33667] - Respect case sensitivity in V1 SHOW PARTITIONS
[SPARK-33670] - Verify the partition provider is Hive in v1 SHOW TABLE EXTENDED
[SPARK-33732] - Kubernetes integration tests doesn't work with Minikube 1.9+
[SPARK-33742] - Throw PartitionsAlreadyExistException from HiveExternalCatalog.createPartitions()
[SPARK-33788] - Throw NoSuchPartitionsException from HiveExternalCatalog.dropPartitions()
[SPARK-33911] - Update SQL migration guide about changes in HiveClientImpl
[SPARK-34407] - KubernetesClusterSchedulerBackend.stop should clean up K8s resources
[SPARK-34507] - Spark artefacts built against Scala 2.13 incorrectly depend on Scala 2.12

Bug

[SPARK-25271] - Creating parquet table with all the column null throws exception
[SPARK-26625] - spark.redaction.regex should include oauthToken
[SPARK-26645] - CSV infer schema bug infers decimal(9,-1)
[SPARK-27575] - Spark overwrites existing value of spark.yarn.dist.* instead of merging value
[SPARK-27872] - Driver and executors use a different service account breaking pull secrets
[SPARK-29574] - spark with user provided hadoop doesn't work on kubernetes
[SPARK-30201] - HiveOutputWriter standardOI should use ObjectInspectorCopyOption.DEFAULT
[SPARK-30228] - Update zstd-jni to 1.4.4-3
[SPARK-31882] - DAG-viz is not rendered correctly with pagination.
[SPARK-32635] - When pyspark.sql.functions.lit() function is used with dataframe cache, it returns wrong result
[SPARK-32708] - Query optimization fails to reuse exchange with DataSourceV2
[SPARK-32715] - Broadcast block pieces may memory leak
[SPARK-32738] - thread safe endpoints may hang due to fatal error
[SPARK-32794] - Rare corner case error in micro-batch engine with some stateful queries + no-data-batches + V1 streaming sources
[SPARK-32815] - Fix LibSVM data source loading error on file paths with glob metacharacters
[SPARK-32832] - Use CaseInsensitiveMap for DataStreamReader/Writer options
[SPARK-32836] - Fix DataStreamReaderWriterSuite to check writer options correctly
[SPARK-32845] - Add sinkParameter to check sink options robustly in DataStreamReaderWriterSuite
[SPARK-32865] - python section in quickstart page doesn't display SPARK_VERSION correctly
[SPARK-32872] - BytesToBytesMap at MAX_CAPACITY exceeds growth threshold
[SPARK-32886] - '.../jobs/undefined' link from "Event Timeline" in jobs page
[SPARK-32898] - totalExecutorRunTimeMs is too big
[SPARK-32900] - UnsafeExternalSorter.SpillableIterator cannot spill when there are NULLs in the input and radix sorting is used.
[SPARK-32901] - UnsafeExternalSorter may cause a SparkOutOfMemoryError to be thrown while spilling
[SPARK-32908] - percentile_approx() returns incorrect results
[SPARK-32924] - Web UI sort on duration is wrong
[SPARK-32999] - TreeNode.nodeName should not throw malformed class name error
[SPARK-33094] - ORC format does not propagate Hadoop config from DS options to underlying HDFS file system
[SPARK-33101] - LibSVM format does not propagate Hadoop config from DS options to underlying HDFS file system
[SPARK-33131] - Fix grouping sets with having clause can not resolve qualified col name
[SPARK-33136] - Handling nullability for complex types is broken during resolution of V2 write command
[SPARK-33183] - Bug in optimizer rule EliminateSorts
[SPARK-33217] - Set upper bound of Pandas and PyArrow version in GitHub Actions in branch-2.4
[SPARK-33230] - FileOutputWriter jobs have duplicate JobIDs if launched in same second
[SPARK-33268] - Fix bugs for casting data from/to PythonUserDefinedType
[SPARK-33277] - Python/Pandas UDF right after off-heap vectorized reader could cause executor crash.
[SPARK-33292] - Make Literal ArrayBasedMapData string representation disambiguous
[SPARK-33313] - R/run-tests.sh is not compatible with testthat >= 3.0
[SPARK-33333] - Upgrade Jetty to 9.4.28.v20200408
[SPARK-33338] - GROUP BY using literal map should not fail
[SPARK-33339] - Pyspark application will hang due to non Exception
[SPARK-33372] - Fix InSet bucket pruning
[SPARK-33405] - Upgrade commons-compress to 1.20
[SPARK-33417] - Correct the behaviour of query filters in TPCDSQueryBenchmark
[SPARK-33472] - IllegalArgumentException when applying RemoveRedundantSorts before EnsureRequirements
[SPARK-33483] - Fix rat exclusion patterns and add a LICENSE
[SPARK-33588] - Partition spec in SHOW TABLE EXTENDED doesn't respect `spark.sql.caseSensitive`
[SPARK-33593] - Vector reader got incorrect data with binary partition value
[SPARK-33631] - Clean up `spark.core.connection.ack.wait.timeout` from `configuration.md`
[SPARK-33681] - Increase K8s IT timeout to 3 minutes
[SPARK-33725] - Upgrade snappy-java to 1.1.8.2
[SPARK-33726] - Duplicate field names causes wrong answers during aggregation
[SPARK-33733] - PullOutNondeterministic should check and collect deterministic field
[SPARK-33749] - Exclude target directory in pycodestyle and flake8
[SPARK-33756] - BytesToBytesMap's iterator hasNext method should be idempotent.
[SPARK-33757] - Fix the R dependencies build error on GitHub Actions and AppVeyor
[SPARK-33831] - Update Jetty to 9.4.34
[SPARK-34012] - Keep behavior consistent when conf `spark.sql.legacy.parser.havingWithoutGroupByAsWhere` is true with migration guide
[SPARK-34125] - Make EventLoggingListener.codecMap thread-safe
[SPARK-34187] - Use available offset range obtained during polling when checking offset validation
[SPARK-34212] - For parquet table, after changing the precision and scale of decimal type in hive, spark reads incorrect value
[SPARK-34229] - Avro should read decimal values with the file schema
[SPARK-34231] - AvroSuite has test failure when run from IDE due to bad loading of resource file
[SPARK-34260] - UnresolvedException when creating temp view twice
[SPARK-34268] - The Signature for ConcatWs in Spark SQL Docs Is Inconsistent with the Actual Behavior
[SPARK-34270] - Combine StateStoreMetrics should not override StateStoreCustomMetric
[SPARK-34273] - Do not reregister BlockManager when SparkContext is stopped
[SPARK-34318] - Dataset.colRegex should work with column names and qualifiers which contain newlines
[SPARK-34327] - Omit inlining passwords during build process.
[SPARK-34449] - Upgrade Jetty to fix CVE-2020-27218
[SPARK-34596] - NewInstance.doGenCode should not throw malformed class name error
[SPARK-34607] - NewInstance.resolved should not throw malformed class name error
[SPARK-34672] - Fix docker file for creating release
[SPARK-34696] - Fix CodegenInterpretedPlanTest to generate correct test cases
[SPARK-34703] - Fix pyspark test when using sort_values on Pandas
[SPARK-34719] - fail if the view query has duplicated column names
[SPARK-34724] - Fix Interpreted evaluation by using getClass.getMethod instead of getDeclaredMethod
[SPARK-34726] - Fix collectToPython timeouts
[SPARK-34743] - ExpressionEncoderSuite should use deepEquals when we expect `array of array`
[SPARK-34774] - The `change-scala- version.sh` script not replaced scala.version property correctly
[SPARK-34776] - Catalyst error on on certain struct operation (Couldn't find _gen_alias_)
[SPARK-34811] - Redact fs.s3a.access.key like secret and token
[SPARK-34855] - SparkContext - avoid using local lazy val
[SPARK-34874] - Recover test reports for failed GA builds
[SPARK-34876] - Non-nullable aggregates can return NULL in a correlated subquery
[SPARK-34909] - conv() does not convert negative inputs to unsigned correctly
[SPARK-34939] - Throw fetch failure exception when unable to deserialize broadcasted map statuses
[SPARK-34963] - Nested column pruning fails to extract case-insensitive struct field from array
[SPARK-34988] - Upgrade Jetty for CVE-2021-28165
[SPARK-34994] - Fix git error when pushing the tag after release script succeeds
[SPARK-35080] - Correlated subqueries with equality predicates can return wrong results
[SPARK-35278] - Invoke should find the method with correct number of parameters
[SPARK-35288] - StaticInvoke should find the method without exact argument classes match

Improvement

[SPARK-31225] - Override `sql` method for OuterReference
[SPARK-31807] - Use python 3 style in release-build.sh
[SPARK-32090] - UserDefinedType.equal() does not have symmetry
[SPARK-33123] - Ignore `GitHub Action file` change in Amplab Jenkins
[SPARK-33156] - Upgrade GithubAction image from 18.04 to 20.04
[SPARK-33228] - Don't uncache data when replacing an existing view having the same plan
[SPARK-33535] - export LANG to en_US.UTF-8 in jenkins test script
[SPARK-33675] - Add GitHub Action job to publish snapshot
[SPARK-34059] - Use for/foreach rather than map to make sure execute it eagerly
[SPARK-34118] - Replaces filter and check for emptiness with exists or forall
[SPARK-34153] - Remove unused `getRawTable()` from `HiveExternalCatalog.alterPartitions()`
[SPARK-34275] - Replaces filter and size with count
[SPARK-34310] - Replaces map and flatten with flatMap
[SPARK-35227] - Replace Bintray with the new repository service for the spark-packages resolver in SparkSubmit

Test

[SPARK-24931] - Recover lint-r job in GitHub Actions workflow
[SPARK-26646] - Flaky test: pyspark.mllib.tests.test_streaming_algorithms StreamingLogisticRegressionWithSGDTests.test_training_and_prediction
[SPARK-33051] - Uses setup-r to install R in GitHub Actions build
[SPARK-33770] - Test failures: ALTER TABLE .. DROP PARTITION tries to delete files out of partition path
[SPARK-33869] - Have a separate metastore dir for each PySpark test process

Task

[SPARK-35233] - Switch from bintray to scala.jfrog.io for SBT download in branch 2.4 and 3.0

Documentation

[SPARK-32306] - `approx_percentile` in Spark SQL gives incorrect results
[SPARK-32888] - reading a parallized rdd with two identical records results in a zero count df when read via spark.read.csv
[SPARK-33585] - The comment for SQLContext.tables() doesn't mention the `database` column

Edit/Copy Release Notes

The text area below allows the project release notes to be edited and copied to another document.

Release Notes - Spark - Version 2.4.8
    
<h2>        Sub-task
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24266'>SPARK-24266</a>] -         Spark client terminates while driver is still running
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27421'>SPARK-27421</a>] -         RuntimeException when querying a view on a partitioned parquet table
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30894'>SPARK-30894</a>] -         The nullability of Size function should not depend on SQLConf.get
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-32247'>SPARK-32247</a>] -         scipy installation fails with PyPy
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33096'>SPARK-33096</a>] -         Use LinkedHashMap instead of Map for newlyCreatedExecutors
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33290'>SPARK-33290</a>] -         REFRESH TABLE should invalidate cache even though the table itself may not be cached
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33464'>SPARK-33464</a>] -         Add/remove (un)necessary cache and restructure GitHub Actions yaml
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33667'>SPARK-33667</a>] -         Respect case sensitivity in V1 SHOW PARTITIONS
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33670'>SPARK-33670</a>] -         Verify the partition provider is Hive in v1 SHOW TABLE EXTENDED
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33732'>SPARK-33732</a>] -         Kubernetes integration tests doesn&#39;t work with Minikube 1.9+
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33742'>SPARK-33742</a>] -         Throw PartitionsAlreadyExistException from HiveExternalCatalog.createPartitions()
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33788'>SPARK-33788</a>] -         Throw NoSuchPartitionsException from HiveExternalCatalog.dropPartitions()
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33911'>SPARK-33911</a>] -         Update SQL migration guide about changes in HiveClientImpl
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34407'>SPARK-34407</a>] -         KubernetesClusterSchedulerBackend.stop should clean up K8s resources
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34507'>SPARK-34507</a>] -         Spark artefacts built against Scala 2.13 incorrectly depend on Scala 2.12
</li>
</ul>
            
<h2>        Bug
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-25271'>SPARK-25271</a>] -         Creating parquet table with all the column null throws exception
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26625'>SPARK-26625</a>] -         spark.redaction.regex should include oauthToken
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26645'>SPARK-26645</a>] -         CSV infer schema bug infers decimal(9,-1)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27575'>SPARK-27575</a>] -         Spark overwrites existing value of spark.yarn.dist.* instead of merging value
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-27872'>SPARK-27872</a>] -         Driver and executors use a different service account breaking pull secrets
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-29574'>SPARK-29574</a>] -         spark with user provided hadoop doesn&#39;t work on kubernetes
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30201'>SPARK-30201</a>] -         HiveOutputWriter standardOI should use ObjectInspectorCopyOption.DEFAULT
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-30228'>SPARK-30228</a>] -         Update zstd-jni to 1.4.4-3
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-31882'>SPARK-31882</a>] -         DAG-viz is not rendered correctly with pagination.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-32635'>SPARK-32635</a>] -         When pyspark.sql.functions.lit() function is used with dataframe cache, it returns wrong result
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-32708'>SPARK-32708</a>] -         Query optimization fails to reuse exchange with DataSourceV2
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-32715'>SPARK-32715</a>] -         Broadcast block pieces may memory leak
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-32738'>SPARK-32738</a>] -         thread safe endpoints may hang due to fatal error
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-32794'>SPARK-32794</a>] -         Rare corner case error in micro-batch engine with some stateful queries + no-data-batches + V1 streaming sources 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-32815'>SPARK-32815</a>] -         Fix LibSVM data source loading error on file paths with glob metacharacters
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-32832'>SPARK-32832</a>] -         Use CaseInsensitiveMap for DataStreamReader/Writer options
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-32836'>SPARK-32836</a>] -         Fix DataStreamReaderWriterSuite to check writer options correctly
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-32845'>SPARK-32845</a>] -         Add sinkParameter to check sink options robustly in DataStreamReaderWriterSuite
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-32865'>SPARK-32865</a>] -         python section in quickstart page doesn&#39;t display SPARK_VERSION correctly
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-32872'>SPARK-32872</a>] -         BytesToBytesMap at MAX_CAPACITY exceeds growth threshold
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-32886'>SPARK-32886</a>] -         &#39;.../jobs/undefined&#39; link from &quot;Event Timeline&quot; in jobs page
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-32898'>SPARK-32898</a>] -         totalExecutorRunTimeMs is too big
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-32900'>SPARK-32900</a>] -         UnsafeExternalSorter.SpillableIterator cannot spill when there are NULLs in the input and radix sorting is used.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-32901'>SPARK-32901</a>] -         UnsafeExternalSorter may cause a SparkOutOfMemoryError to be thrown while spilling
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-32908'>SPARK-32908</a>] -         percentile_approx() returns incorrect results
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-32924'>SPARK-32924</a>] -         Web UI sort on duration is wrong
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-32999'>SPARK-32999</a>] -         TreeNode.nodeName should not throw malformed class name error
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33094'>SPARK-33094</a>] -         ORC format does not propagate Hadoop config from DS options to underlying HDFS file system
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33101'>SPARK-33101</a>] -         LibSVM format does not propagate Hadoop config from DS options to underlying HDFS file system
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33131'>SPARK-33131</a>] -         Fix grouping sets with having clause can not resolve qualified col name
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33136'>SPARK-33136</a>] -         Handling nullability for complex types is broken during resolution of V2 write command
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33183'>SPARK-33183</a>] -         Bug in optimizer rule EliminateSorts
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33217'>SPARK-33217</a>] -         Set upper bound of Pandas and PyArrow version in GitHub Actions in branch-2.4
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33230'>SPARK-33230</a>] -         FileOutputWriter jobs have duplicate JobIDs if launched in same second
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33268'>SPARK-33268</a>] -         Fix bugs for casting data from/to PythonUserDefinedType
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33277'>SPARK-33277</a>] -         Python/Pandas UDF right after off-heap vectorized reader could cause executor crash.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33292'>SPARK-33292</a>] -         Make Literal ArrayBasedMapData string representation disambiguous
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33313'>SPARK-33313</a>] -         R/run-tests.sh is not compatible with testthat &gt;= 3.0
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33333'>SPARK-33333</a>] -         Upgrade Jetty to 9.4.28.v20200408
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33338'>SPARK-33338</a>] -         GROUP BY using literal map should not fail
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33339'>SPARK-33339</a>] -         Pyspark application will hang due to non Exception
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33372'>SPARK-33372</a>] -         Fix InSet bucket pruning
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33405'>SPARK-33405</a>] -         Upgrade commons-compress to 1.20
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33417'>SPARK-33417</a>] -         Correct the behaviour of query filters in TPCDSQueryBenchmark 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33472'>SPARK-33472</a>] -         IllegalArgumentException when applying RemoveRedundantSorts before EnsureRequirements
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33483'>SPARK-33483</a>] -         Fix rat exclusion patterns and add a LICENSE
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33588'>SPARK-33588</a>] -         Partition spec in SHOW TABLE EXTENDED doesn&#39;t respect `spark.sql.caseSensitive`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33593'>SPARK-33593</a>] -         Vector reader got incorrect data with binary partition value
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33631'>SPARK-33631</a>] -         Clean up `spark.core.connection.ack.wait.timeout` from `configuration.md`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33681'>SPARK-33681</a>] -         Increase K8s IT timeout to 3 minutes
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33725'>SPARK-33725</a>] -         Upgrade snappy-java to 1.1.8.2
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33726'>SPARK-33726</a>] -         Duplicate field names causes wrong answers during aggregation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33733'>SPARK-33733</a>] -         PullOutNondeterministic should check and collect deterministic field
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33749'>SPARK-33749</a>] -         Exclude target directory in pycodestyle and flake8
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33756'>SPARK-33756</a>] -         BytesToBytesMap&#39;s iterator hasNext method should be idempotent.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33757'>SPARK-33757</a>] -         Fix the R dependencies build error on GitHub Actions and AppVeyor
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33831'>SPARK-33831</a>] -         Update Jetty to 9.4.34
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34012'>SPARK-34012</a>] -         Keep behavior consistent when conf `spark.sql.legacy.parser.havingWithoutGroupByAsWhere` is true with migration guide
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34125'>SPARK-34125</a>] -         Make EventLoggingListener.codecMap thread-safe
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34187'>SPARK-34187</a>] -         Use available offset range obtained during polling when checking offset validation
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34212'>SPARK-34212</a>] -         For parquet table, after changing the precision and scale of decimal type in hive, spark reads incorrect value
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34229'>SPARK-34229</a>] -         Avro should read decimal values with the file schema
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34231'>SPARK-34231</a>] -         AvroSuite has test failure when run from IDE due to bad loading of resource file
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34260'>SPARK-34260</a>] -         UnresolvedException when creating temp view twice
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34268'>SPARK-34268</a>] -         The Signature for ConcatWs in Spark SQL Docs Is Inconsistent with the Actual Behavior
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34270'>SPARK-34270</a>] -         Combine StateStoreMetrics should not override StateStoreCustomMetric
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34273'>SPARK-34273</a>] -         Do not reregister BlockManager when SparkContext is stopped
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34318'>SPARK-34318</a>] -         Dataset.colRegex should work with column names and qualifiers which contain newlines
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34327'>SPARK-34327</a>] -         Omit inlining passwords during build process.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34449'>SPARK-34449</a>] -         Upgrade Jetty to fix CVE-2020-27218
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34596'>SPARK-34596</a>] -         NewInstance.doGenCode should not throw malformed class name error
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34607'>SPARK-34607</a>] -         NewInstance.resolved should not throw malformed class name error
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34672'>SPARK-34672</a>] -         Fix docker file for creating release
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34696'>SPARK-34696</a>] -         Fix CodegenInterpretedPlanTest to generate correct test cases
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34703'>SPARK-34703</a>] -         Fix pyspark test when using sort_values on Pandas
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34719'>SPARK-34719</a>] -         fail if the view query has duplicated column names
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34724'>SPARK-34724</a>] -         Fix Interpreted evaluation by using getClass.getMethod instead of getDeclaredMethod
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34726'>SPARK-34726</a>] -         Fix collectToPython timeouts
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34743'>SPARK-34743</a>] -         ExpressionEncoderSuite should use deepEquals when we expect `array of array`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34774'>SPARK-34774</a>] -         The `change-scala- version.sh` script not replaced scala.version property correctly
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34776'>SPARK-34776</a>] -         Catalyst error on on certain struct operation (Couldn&#39;t find _gen_alias_)
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34811'>SPARK-34811</a>] -         Redact fs.s3a.access.key like secret and token
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34855'>SPARK-34855</a>] -         SparkContext - avoid using local lazy val
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34874'>SPARK-34874</a>] -         Recover test reports for failed GA builds
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34876'>SPARK-34876</a>] -         Non-nullable aggregates can return NULL in a correlated subquery
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34909'>SPARK-34909</a>] -         conv() does not convert negative inputs to unsigned correctly
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34939'>SPARK-34939</a>] -         Throw fetch failure exception when unable to deserialize broadcasted map statuses
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34963'>SPARK-34963</a>] -         Nested column pruning fails to extract case-insensitive struct field from array
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34988'>SPARK-34988</a>] -         Upgrade Jetty for CVE-2021-28165
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34994'>SPARK-34994</a>] -         Fix git error when pushing the tag after release script succeeds
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-35080'>SPARK-35080</a>] -         Correlated subqueries with equality predicates can return wrong results
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-35278'>SPARK-35278</a>] -         Invoke should find the method with correct number of parameters
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-35288'>SPARK-35288</a>] -         StaticInvoke should find the method without exact argument classes match
</li>
</ul>
                
<h2>        Improvement
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-31225'>SPARK-31225</a>] -         Override `sql` method for  OuterReference
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-31807'>SPARK-31807</a>] -         Use python 3 style in release-build.sh
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-32090'>SPARK-32090</a>] -         UserDefinedType.equal() does not have symmetry 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33123'>SPARK-33123</a>] -         Ignore `GitHub Action file` change in Amplab Jenkins
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33156'>SPARK-33156</a>] -         Upgrade GithubAction image from 18.04 to 20.04
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33228'>SPARK-33228</a>] -         Don&#39;t uncache data when replacing an existing view having the same plan
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33535'>SPARK-33535</a>] -         export LANG to en_US.UTF-8 in jenkins test script
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33675'>SPARK-33675</a>] -         Add GitHub Action job to publish snapshot
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34059'>SPARK-34059</a>] -         Use for/foreach rather than map to make sure execute it eagerly 
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34118'>SPARK-34118</a>] -         Replaces filter and check for emptiness with exists or forall
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34153'>SPARK-34153</a>] -         Remove unused `getRawTable()` from `HiveExternalCatalog.alterPartitions()`
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34275'>SPARK-34275</a>] -         Replaces filter and size with count
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-34310'>SPARK-34310</a>] -         Replaces map and flatten with flatMap
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-35227'>SPARK-35227</a>] -         Replace Bintray with the new repository service for the spark-packages resolver in SparkSubmit
</li>
</ul>
    
<h2>        Test
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-24931'>SPARK-24931</a>] -         Recover lint-r job in GitHub Actions workflow
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-26646'>SPARK-26646</a>] -         Flaky test: pyspark.mllib.tests.test_streaming_algorithms StreamingLogisticRegressionWithSGDTests.test_training_and_prediction
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33051'>SPARK-33051</a>] -         Uses setup-r to install R in GitHub Actions build
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33770'>SPARK-33770</a>] -         Test failures: ALTER TABLE .. DROP PARTITION tries to delete files out of partition path
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33869'>SPARK-33869</a>] -         Have a separate metastore dir for each PySpark test process
</li>
</ul>
        
<h2>        Task
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-35233'>SPARK-35233</a>] -         Switch from bintray to scala.jfrog.io for SBT download in branch 2.4 and 3.0
</li>
</ul>
                                                                                                                                        
<h2>        Documentation
</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-32306'>SPARK-32306</a>] -         `approx_percentile` in Spark SQL gives incorrect results
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-32888'>SPARK-32888</a>] -         reading a parallized rdd with two identical records results in a zero count df when read via spark.read.csv
</li>
<li>[<a href='https://issues.apache.org/jira/browse/SPARK-33585'>SPARK-33585</a>] -         The comment for SQLContext.tables() doesn&#39;t mention the `database` column
</li>
</ul>