Release Notes - Spark - Version 3.0.3 - HTML format

Sub-task

  • [SPARK-33976] - Add a dedicated SQL document page for the TRANSFORM-related functionality,
  • [SPARK-34507] - Spark artefacts built against Scala 2.13 incorrectly depend on Scala 2.12
  • [SPARK-34543] - Respect case sensitivity in V1 ALTER TABLE .. SET LOCATION
  • [SPARK-35093] - AQE columnar mismatch on exchange reuse
  • [SPARK-35159] - extract doc of hive format
  • [SPARK-35168] - mapred.reduce.tasks should be shuffle.partitions not adaptive.coalescePartitions.initialPartitionNum
  • [SPARK-35695] - QueryExecutionListener does not see any observed metrics fired before persist/cache

Bug

  • [SPARK-32924] - Web UI sort on duration is wrong
  • [SPARK-33482] - V2 Datasources that extend FileScan preclude exchange reuse
  • [SPARK-33504] - The application log in the Spark history server contains sensitive attributes such as password that should be redated instead of plain text
  • [SPARK-34392] - Invalid ID for offset-based ZoneId since Spark 3.0
  • [SPARK-34421] - Custom functions can't be used in temporary views with CTEs
  • [SPARK-34449] - Upgrade Jetty to fix CVE-2020-27218
  • [SPARK-34534] - New protocol FetchShuffleBlocks in OneForOneBlockFetcher lead to data loss or correctness
  • [SPARK-34545] - PySpark Python UDF return inconsistent results when applying 2 UDFs with different return type to 2 columns together
  • [SPARK-34556] - Checking duplicate static partition columns doesn't respect case sensitive conf
  • [SPARK-34596] - NewInstance.doGenCode should not throw malformed class name error
  • [SPARK-34607] - NewInstance.resolved should not throw malformed class name error
  • [SPARK-34676] - TableCapabilityCheckSuite should not inherit all tests from AnalysisSuite
  • [SPARK-34696] - Fix CodegenInterpretedPlanTest to generate correct test cases
  • [SPARK-34697] - Allow DESCRIBE FUNCTION and SHOW FUNCTIONS explain about || (string concatenation operator).
  • [SPARK-34719] - fail if the view query has duplicated column names
  • [SPARK-34723] - Correct parameter type for subexpression elimination under whole-stage
  • [SPARK-34724] - Fix Interpreted evaluation by using getClass.getMethod instead of getDeclaredMethod
  • [SPARK-34743] - ExpressionEncoderSuite should use deepEquals when we expect `array of array`
  • [SPARK-34747] - Add virtual operators to the built-in function document.
  • [SPARK-34756] - Fix FileScan equality check
  • [SPARK-34760] - run JavaSQLDataSourceExample failed with Exception in runBasicDataSourceExample().
  • [SPARK-34763] - col(), $"<name>" and df("name") should handle quoted column names properly.
  • [SPARK-34768] - Respect the default input buffer size in Univocity
  • [SPARK-34772] - RebaseDateTime loadRebaseRecords should use Spark classloader instead of context
  • [SPARK-34774] - The `change-scala- version.sh` script not replaced scala.version property correctly
  • [SPARK-34776] - Catalyst error on on certain struct operation (Couldn't find _gen_alias_)
  • [SPARK-34794] - Nested higher-order functions broken in DSL
  • [SPARK-34798] - Fix incorrect join condition
  • [SPARK-34811] - Redact fs.s3a.access.key like secret and token
  • [SPARK-34832] - ExternalAppendOnlyUnsafeRowArrayBenchmark can't run with spark-submit
  • [SPARK-34834] - There is a potential Netty memory leak in TransportResponseHandler.
  • [SPARK-34845] - ProcfsMetricsGetter.computeAllMetrics may return partial metrics when some of child pids metrics are missing
  • [SPARK-34874] - Recover test reports for failed GA builds
  • [SPARK-34876] - Non-nullable aggregates can return NULL in a correlated subquery
  • [SPARK-34897] - Support reconcile schemas based on index after nested column pruning
  • [SPARK-34900] - Some `spark-submit`  commands used to run benchmarks in the user's guide is wrong
  • [SPARK-34909] - conv() does not convert negative inputs to unsigned correctly
  • [SPARK-34926] - PartitionUtils.getPathFragment should handle null value
  • [SPARK-34933] - Remove the description that || and && can be used as logical operators from the document.
  • [SPARK-34939] - Throw fetch failure exception when unable to deserialize broadcasted map statuses
  • [SPARK-34963] - Nested column pruning fails to extract case-insensitive struct field from array
  • [SPARK-34988] - Upgrade Jetty for CVE-2021-28165
  • [SPARK-35014] - A foldable expression could not be replaced by an AttributeReference
  • [SPARK-35080] - Correlated subqueries with equality predicates can return wrong results
  • [SPARK-35096] - foreachBatch throws ArrayIndexOutOfBoundsException if schema is case Insensitive
  • [SPARK-35106] - HadoopMapReduceCommitProtocol performs bad rename when dynamic partition overwrite is used
  • [SPARK-35142] - `OneVsRest` classifier uses incorrect data type for `rawPrediction` column
  • [SPARK-35178] - maven autodownload failing
  • [SPARK-35210] - Upgrade Jetty to 9.4.40 to fix ERR_CONNECTION_RESET issue
  • [SPARK-35244] - invoke should throw the original exception
  • [SPARK-35278] - Invoke should find the method with correct number of parameters
  • [SPARK-35288] - StaticInvoke should find the method without exact argument classes match
  • [SPARK-35296] - Dataset.observe fails with an assertion
  • [SPARK-35393] - PIP packaging test is skipped in GitHub Actions build
  • [SPARK-35425] - Pin jinja2 in spark-rm/Dockerfile and add as a required dependency in the release README.md
  • [SPARK-35458] - ARM CI failed: failed to validate maven sha512
  • [SPARK-35463] - Skip checking checksum on a system doesn't have `shasum`
  • [SPARK-35482] - case sensitive block manager port key should be used in BasicExecutorFeatureStep
  • [SPARK-35493] - spark.blockManager.port does not work for driver pod
  • [SPARK-35566] - Fix number of output rows for StateStoreRestoreExec
  • [SPARK-35573] - Make SparkR tests pass with R 4.1+
  • [SPARK-35610] - Memory leak in Spark interpreter
  • [SPARK-35653] - [SQL] CatalystToExternalMap interpreted path fails for Map with case classes as keys or values
  • [SPARK-35659] - Avoid write null to StateStore
  • [SPARK-35673] - Spark fails on unrecognized hint in subquery
  • [SPARK-35679] - Overflow on converting valid Timestamp to Microseconds
  • [SPARK-38141] - NoSuchMethodError: org.json4s.JsonDSL$JsonAssoc org.json4s.JsonDSL$.pair2Assoc

Improvement

  • [SPARK-34683] - Update the documents to explain the usage of LIST FILE and LIST JAR in case they take multiple file names
  • [SPARK-34915] - Cache Maven, SBT and Scala in all jobs that use them
  • [SPARK-34922] - Use better CBO cost function
  • [SPARK-34940] - Fix minor unit test in BasicWriteTaskStatsTrackerSuite
  • [SPARK-35002] - Fix the java.net.BindException when testing with Github Action
  • [SPARK-35045] - Add an internal option to control input buffer in univocity
  • [SPARK-35127] - When we switch between different stage-detail pages, the entry item in the newly-opened page may be blank.
  • [SPARK-35227] - Replace Bintray with the new repository service for the spark-packages resolver in SparkSubmit
  • [SPARK-35358] - Set maximum Java heap used for release build
  • [SPARK-35373] - Verify checksums of downloaded artifacts in build/mvn
  • [SPARK-35687] - PythonUDFSuite move assume into its methods
  • [SPARK-35714] - Bug fix for deadlock during the executor shutdown

Test

  • [SPARK-24931] - Recover lint-r job in GitHub Actions workflow
  • [SPARK-34424] - HiveOrcHadoopFsRelationSuite fails with seed 610710213676
  • [SPARK-34604] - Flaky test: TaskContextTestsWithWorkerReuse.test_task_context_correct_with_python_worker_reuse
  • [SPARK-34610] - Fix Python UDF used in GroupedAggPandasUDFTests.
  • [SPARK-34795] - Adds a new job in GitHub Actions to check the output of TPC-DS queries
  • [SPARK-34951] - Recover Python linter (Sphinx build) in GitHub Actions
  • [SPARK-35192] - Port minimal TPC-DS datagen code from databricks/spark-sql-perf
  • [SPARK-35293] - Use the newer dsdgen for TPCDSQueryTestSuite
  • [SPARK-35327] - Filters out the TPC-DS queries that can cause flaky test results
  • [SPARK-35413] - Use the SHA of the latest commit when checking out databricks/tpcds-kit

Task

  • [SPARK-34970] - Redact map-type options in the output of explain()
  • [SPARK-35233] - Switch from bintray to scala.jfrog.io for SBT download in branch 2.4 and 3.0
  • [SPARK-35495] - Change SparkR maintainer for CRAN

Documentation

  • [SPARK-35405] - Submitting Applications documentation has outdated information about K8s client mode support

Edit/Copy Release Notes

The text area below allows the project release notes to be edited and copied to another document.