Sub-task
- [SPARK-32249] - Run Github Actions builds in other branches as well
- [SPARK-32367] - Fix typo of parameter in KubernetesTestComponents
- [SPARK-32695] - Add 'build' and 'project/build.properties' into cache key of SBT and Zinc
Bug
- [SPARK-28818] - FrequentItems applies an incorrect schema to the resulting dataframe when nulls are present
- [SPARK-31511] - Make BytesToBytesMap iterator() thread-safe
- [SPARK-31703] - Changes made by SPARK-26985 break reading parquet files correctly in BigEndian architectures (AIX + LinuxPPC64)
- [SPARK-31854] - Different results of query execution with wholestage codegen on and off
- [SPARK-31871] - Display the canvas element icon for sorting column
- [SPARK-31903] - toPandas with Arrow enabled doesn't show metrics in Query UI.
- [SPARK-31911] - Using S3A staging committer, pending uploads are committed more than once and listed incorrectly in _SUCCESS data
- [SPARK-31918] - SparkR CRAN check gives a warning with R 4.0.0 on OSX
- [SPARK-31923] - Event log cannot be generated when some internal accumulators use unexpected types
- [SPARK-31935] - Hadoop file system config should be effective in data source options
- [SPARK-31941] - Handling the exception in SparkUI for getSparkUser method
- [SPARK-31967] - Loading jobs UI page takes 40 seconds
- [SPARK-31968] - write.partitionBy() creates duplicate subdirectories when user provides duplicate columns
- [SPARK-31980] - Spark sequence() fails if start and end of range are identical dates
- [SPARK-31997] - Should drop test_udtf table when SingleSessionSuite completed
- [SPARK-32000] - Fix the flaky testcase for partially launched task in barrier-mode.
- [SPARK-32003] - Shuffle files for lost executor are not unregistered if fetch failure occurs after executor is lost
- [SPARK-32024] - Disk usage tracker went negative in HistoryServerDiskManager
- [SPARK-32028] - App id link in history summary page point to wrong application attempt
- [SPARK-32034] - Port HIVE-14817: Shutdown the SessionManager timeoutChecker thread properly upon shutdown
- [SPARK-32035] - Inconsistent AWS environment variables in documentation
- [SPARK-32044] - [SS] 2.4 Kafka continuous processing print mislead initial offsets log
- [SPARK-32098] - Use iloc for positional slicing instead of direct slicing in createDataFrame with Arrow
- [SPARK-32115] - Incorrect results for SUBSTRING when overflow
- [SPARK-32131] - Fix AnalysisException messages at UNION/INTERSECT/EXCEPT/MINUS operations
- [SPARK-32167] - nullability of GetArrayStructFields is incorrect
- [SPARK-32214] - The type conversion function generated in makeFromJava for "other" type uses a wrong variable.
- [SPARK-32238] - Use Utils.getSimpleName to avoid hitting Malformed class name in ScalaUDF
- [SPARK-32280] - AnalysisException thrown when query contains several JOINs
- [SPARK-32300] - toPandas with no partitions should work
- [SPARK-32344] - Unevaluable expr is set to FIRST/LAST ignoreNullsExpr in distinct aggregates
- [SPARK-32364] - Use CaseInsensitiveMap for DataFrameReader/Writer options
- [SPARK-32372] - "Resolved attribute(s) XXX missing" after dudup conflict references
- [SPARK-32377] - CaseInsensitiveMap should be deterministic for addition
- [SPARK-32379] - docker based spark release script should use correct CRAN repo.
- [SPARK-32556] - Fix release script to uri encode the user provided passwords.
- [SPARK-32609] - Incorrect exchange reuse with DataSourceV2
- [SPARK-32625] - Log error message when falling back to interpreter mode
- [SPARK-32672] - Data corruption in some cached compressed boolean columns
- [SPARK-32693] - Compare two dataframes with same schema except nullable property
- [SPARK-32771] - The example of expressions.Aggregator in Javadoc / Scaladoc is wrong
- [SPARK-32810] - CSV/JSON data sources should avoid globbing paths when inferring schema
- [SPARK-32812] - Run tests script for Python fails in certain environments
Improvement
- [SPARK-31860] - Only push release tags on success
- [SPARK-31889] - Docker release script does not allocate enough memory to reliably publish
- [SPARK-31954] - delete duplicate test cases in hivequerysuite
- [SPARK-32073] - Drop R < 3.5 support
- [SPARK-32089] - Upgrade R version to 4.0.2 in the release DockerFile
- [SPARK-32397] - Snapshot artifacts can have differing timestamps, making it hard to consume
- [SPARK-32428] - [EXAMPLES] Make BinaryClassificationMetricsExample consistently print the metrics on driver's stdout
- [SPARK-32560] - improve exception message
Test
- [SPARK-31966] - Flaky test: pyspark.mllib.tests.test_streaming_algorithms StreamingLogisticRegressionWithSGDTests.test_training_and_prediction
- [SPARK-32318] - Add a test case to EliminateSortsSuite for protecting ORDER BY in DISTRIBUTE BY
Documentation
- [SPARK-32674] - Add suggestion for parallel directory listing in tuning doc
Edit/Copy Release Notes
The text area below allows the project release notes to be edited and copied to another document.