Release Notes - Spark - Version 3.3.3 - HTML format

Bug

  • [SPARK-37829] - An outer-join using joinWith on DataFrames returns Rows with null fields instead of null values
  • [SPARK-39399] - proxy-user not working for Spark on k8s in cluster deploy mode
  • [SPARK-39696] - Uncaught exception in thread executor-heartbeater java.util.ConcurrentModificationException: mutation occurred during iteration
  • [SPARK-41741] - [SQL] ParquetFilters StringStartsWith push down matching string do not use UTF-8
  • [SPARK-41952] - Upgrade Parquet to fix off-heap memory leaks in Zstd codec
  • [SPARK-41958] - Disallow arbitrary custom classpath with proxy user in cluster mode
  • [SPARK-42286] - Fix internal error for valid CASE WHEN expression with CAST when inserting into a table
  • [SPARK-42445] - Fix SparkR install.spark function
  • [SPARK-42462] - Prevent `docker-image-tool.sh` from publishing OCI manifests
  • [SPARK-42473] - An explicit cast will be needed when INSERT OVERWRITE SELECT UNION ALL
  • [SPARK-42478] - Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory
  • [SPARK-42516] - Non-captured session time zone in view creation
  • [SPARK-42553] - NonReserved keyword "interval" can't be column name
  • [SPARK-42596] - [YARN] OMP_NUM_THREADS not set to number of executor cores by default
  • [SPARK-42635] - Several counter-intuitive behaviours in the TimestampAdd expression
  • [SPARK-42649] - Remove the standard Apache License header from the top of third-party source files
  • [SPARK-42673] - Make build/mvn build Spark only with the verified maven version
  • [SPARK-42697] - /api/v1/applications return 0 for duration
  • [SPARK-42784] - Fix the problem of incomplete creation of subdirectories in push merged localDir
  • [SPARK-42785] - [K8S][Core] When spark submit without --deploy-mode, will face NPE in Kubernetes Case
  • [SPARK-42799] - Update SBT build `xercesImpl` version to match with pom.xml
  • [SPARK-42906] - Replace a starting digit with `x` in resource name prefix
  • [SPARK-42922] - Use SecureRandom, instead of Random in security sensitive contexts
  • [SPARK-42937] - Join with subquery in condition can fail with wholestage codegen and adaptive execution disabled
  • [SPARK-42967] - Fix SparkListenerTaskStart.stageAttemptId when a task is started after the stage is cancelled
  • [SPARK-43004] - vendor==vendor typo in ResourceRequest.equals()
  • [SPARK-43005] - `v is v >= 0` typo in pyspark/pandas/config.py
  • [SPARK-43050] - Fix construct aggregate expressions by replacing grouping functions
  • [SPARK-43069] - Use `sbt-eclipse` instead of `sbteclipse-plugin`
  • [SPARK-43113] - Codegen error when full outer join's bound condition has multiple references to the same stream-side column
  • [SPARK-43158] - Set upperbound of pandas version in binder integrations
  • [SPARK-43240] - df.describe() method may- return wrong result if the last RDD is RDD[UnsafeRow]
  • [SPARK-43293] - __qualified_access_only should be ignored in normal columns
  • [SPARK-43337] - Asc/desc arrow icons for sorting column does not get displayed in the table column
  • [SPARK-43398] - Executor timeout should be max of idleTimeout rddTimeout shuffleTimeout
  • [SPARK-43541] - Incorrect column resolution on FULL OUTER JOIN with USING
  • [SPARK-43589] - Fix `cannotBroadcastTableOverMaxTableBytesError` to use `bytesToString`
  • [SPARK-43718] - References to a specific side's key in a USING join can have wrong nullability
  • [SPARK-43719] - Handle missing row.excludedInStages field
  • [SPARK-43956] - Fix the bug doesn't display column's sql for Percentile[Cont|Disc]
  • [SPARK-43976] - Handle the case where modifiedConfigs doesn't exist in event logs
  • [SPARK-44040] - Incorrect result after count distinct
  • [SPARK-44134] - Can't set resources (GPU/FPGA) to 0 when they are set to positive value in spark-defaults.conf
  • [SPARK-44142] - Utility to convert python types to spark types compares Python "type" object rather than user's "tpe" for categorical data types
  • [SPARK-44158] - Remove unused `spark.kubernetes.executor.lostCheck.maxAttempts`
  • [SPARK-44184] - Remove a wrong doc about ARROW_PRE_0_15_IPC_FORMAT
  • [SPARK-44215] - Client receives zero number of chunks in merge meta response which doesn't trigger fallback to unmerged blocks
  • [SPARK-44241] - Set io.connectionTimeout/connectionCreationTimeout to zero or negative will cause executor incessantes cons/destructions
  • [SPARK-44251] - Potential for incorrect results or NPE when full outer USING join has null key value
  • [SPARK-44588] - Migrated shuffle blocks are encrypted multiple times when io.encryption is enabled
  • [SPARK-44653] - non-trivial DataFrame unions should not break caching

Improvement

  • [SPARK-40376] - `np.bool` will be deprecated
  • [SPARK-41660] - only propagate metadata columns if they are used
  • [SPARK-42647] - Remove aliases from deprecated numpy data types
  • [SPARK-42934] - Testing OrcEncryptionSuite using maven is always skipped
  • [SPARK-43395] - Exclude macOS tar extended metadata in make-distribution.sh

Test

  • [SPARK-43587] - Run HealthTrackerIntegrationSuite in a dedicate JVM

Documentation

  • [SPARK-43751] - Document for unbase64 behavior change

Edit/Copy Release Notes

The text area below allows the project release notes to be edited and copied to another document.