### Development Setup Configuration Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/configuration.md Example properties for a typical development environment setup. This configuration enables logging, sets watched namespaces, and defines parallelism and intervals. ```properties spark.logConf=true spark.kubernetes.operator.watchedNamespaces=* spark.kubernetes.operator.reconciler.parallelism=10 spark.kubernetes.operator.reconciler.intervalSeconds=30 spark.kubernetes.operator.kubernetes.client.metricsEnabled=true spark.kubernetes.operator.josdkMetrics.enabled=true spark.kubernetes.operator.periodicGC.intervalSeconds=0 ``` -------------------------------- ### Production Setup Configuration Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/configuration.md Example properties for a production environment. This configuration disables logging, specifies specific watched namespaces, and increases parallelism and timeouts for stability. ```properties spark.logConf=false spark.kubernetes.operator.watchedNamespaces=default,spark-prod,data-pipeline spark.kubernetes.operator.reconciler.parallelism=100 spark.kubernetes.operator.reconciler.intervalSeconds=300 spark.kubernetes.operator.reconciler.foregroundRequestTimeoutSeconds=120 spark.kubernetes.operator.api.retryMaxAttempts=20 spark.kubernetes.operator.reconciler.trimStateTransitionHistoryEnabled=true spark.kubernetes.operator.leaderElection.enabled=true spark.kubernetes.operator.periodicGC.intervalSeconds=3600 spark.kubernetes.operator.dynamicConfig.enabled=true ``` -------------------------------- ### High-Availability Multi-Region Setup Configuration Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/configuration.md Example properties for a high-availability multi-region setup. This configuration emphasizes leader election, high parallelism, and extended timeouts for robust operation across regions. ```properties spark.kubernetes.operator.leaderElection.enabled=true spark.kubernetes.operator.reconciler.parallelism=200 spark.kubernetes.operator.informer.cacheSyncTimeoutSeconds=60 spark.kubernetes.operator.reconciler.terminationTimeoutSeconds=120 spark.kubernetes.operator.reconciler.foregroundRequestTimeoutSeconds=180 spark.kubernetes.operator.api.retryMaxAttempts=25 ``` -------------------------------- ### Install Prometheus with Helm Source: https://github.com/apache/spark-kubernetes-operator/blob/main/docs/configuration.md Install Prometheus using its official Helm chart to scrape metrics. ```bash helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm install prometheus prometheus-community/prometheus ``` -------------------------------- ### Install Apache YuniKorn Scheduler Source: https://github.com/apache/spark-kubernetes-operator/blob/main/README.md Install the latest version of YuniKorn using Helm. Ensure the admission controller is disabled if not needed. ```bash helm repo add yunikorn https://apache.github.io/yunikorn-release helm repo update helm install yunikorn yunikorn/yunikorn --namespace yunikorn --version 1.8.0 --create-namespace --set embedAdmissionController=false ``` -------------------------------- ### SparkApplication Example Usage Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-custom-resources.md An example of a SparkApplication resource definition in YAML format for deployment on Kubernetes. ```yaml apiVersion: spark.apache.org/v1 kind: SparkApplication metadata: name: pi-example namespace: default spec: mainClass: org.apache.spark.examples.SparkPi runtimeVersions: sparkVersion: "4.1.2" deploymentMode: ClusterMode driverSpec: podTemplateSpec: spec: containers: - name: spark-kubernetes-driver image: apache/spark:4.1.2-scala status: currentState: currentStateSummary: Submitted message: "Spark application has been created on Kubernetes Cluster." lastUpdateTime: "2024-01-10T15:30:00Z" ``` -------------------------------- ### Run Spark Pi App on Kubernetes Source: https://github.com/apache/spark-kubernetes-operator/blob/main/README.md Apply the example pi.yaml to create a SparkApp, check its status, and then delete it. ```bash $kubectl apply -f examples/pi.yaml $kubectl get sparkapp NAME CURRENT STATE AGE pi ResourceReleased 4m10s $kubectl delete sparkapp/pi ``` -------------------------------- ### ApplicationState Example Usage Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-status.md Demonstrates how to create and use an ApplicationState object. ```APIDOC ## ApplicationState Example ```java ApplicationState state = new ApplicationState( ApplicationStateSummary.DriverReady, "Driver pod is ready to accept executor connections"); // Access the state ApplicationStateSummary summary = state.getCurrentStateSummary(); String message = state.getStateMessage(); Instant timestamp = Instant.parse(state.getLastUpdateTime()); ``` ``` -------------------------------- ### Creating and Configuring MetricsSystem with MetricsSystemFactory Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/reconciler-progress-metrics.md Shows how to create a default MetricsSystem instance using MetricsSystemFactory, register custom sources, and start the metrics server. ```java // Create default configured system MetricsSystem metricsSystem = MetricsSystemFactory.createMetricsSystem(); // Register custom sources metricsSystem.registerSource(myCustomSource); // Start serving metrics metricsSystem.start(); ``` -------------------------------- ### Example Usage of ApplicationStatus Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-status.md Demonstrates how to initialize an ApplicationStatus, transition it to a new state using appendNewState, and check the current state summary. ```java ApplicationStatus status = new ApplicationStatus(); // Transition to a new state ApplicationState newState = new ApplicationState( ApplicationStateSummary.DriverStarted, "Driver pod is now running"); status = status.appendNewState(newState); // Check current state ApplicationStateSummary currentSummary = status.getCurrentState().getCurrentStateSummary(); if (currentSummary == ApplicationStateSummary.RunningHealthy) { System.out.println("Application is running healthy"); } ``` -------------------------------- ### Registering and Managing Metrics in MetricsSystem Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/reconciler-progress-metrics.md Demonstrates how to register a custom metrics source, start the metrics server, and retrieve all collected metrics using the MetricsSystem. ```java // Register a metrics source metricsSystem.registerSource(new CustomMetrics()); // Start serving metrics on HTTP port metricsSystem.start(); // Get metrics for export Map allMetrics = metricsSystem.getAllMetrics(); ``` -------------------------------- ### Example of Resource Retention with Restart Configuration Source: https://github.com/apache/spark-kubernetes-operator/blob/main/docs/spark_custom_resources.md This example demonstrates resource retention settings in conjunction with a restart configuration. The retain policy and TTL are applied after the final state of the application. ```yaml applicationTolerations: restartConfig: restartPolicy: OnFailure maxRestartAttempts: 1 resourceRetainPolicy: Always resourceRetainDurationMillis: 30000 ttlAfterStopMillis: 60000 ``` -------------------------------- ### Install Helm Chart Source: https://github.com/apache/spark-kubernetes-operator/blob/main/CLAUDE.md Installs the Spark Kubernetes Operator using its Helm chart, applying configurations from a specified values file. ```bash helm install spark -f build-tools/helm/spark-kubernetes-operator/values.yaml \ build-tools/helm/spark-kubernetes-operator/ ``` -------------------------------- ### Install Spark Kubernetes Operator with Helm Source: https://github.com/apache/spark-kubernetes-operator/blob/main/README.md Add the Helm repository, update it, and install the Spark Kubernetes Operator. Ensure you have Helm installed and configured. ```bash helm repo add spark https://apache.github.io/spark-kubernetes-operator helm repo update helm install spark spark/spark-kubernetes-operator ``` -------------------------------- ### Start Spark Operator Components Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-operator.md This code demonstrates how to start the Spark Operator and its various components, including watching for Spark resources, health probes, and metrics. ```java SparkOperator sparkOperator = new SparkOperator(); // All components are initialized in constructor // Start watching for Spark resources for (Operator operator : sparkOperator.registeredOperators) { operator.start(); } // Start health probes sparkOperator.probeService.start(); // Start metrics sparkOperator.metricsResourcesSingleThreadPool.submit(sparkOperator.metricsSystem::start); sparkOperator.metricsResourcesSingleThreadPool.submit(sparkOperator.metricsService::start); ``` -------------------------------- ### Full Build with Gradle (Include Tests) Source: https://github.com/apache/spark-kubernetes-operator/blob/main/CLAUDE.md Executes a full build including compilation, linters, and all unit tests. Ensure you have JDK 21+ installed. ```bash ./gradlew build ``` -------------------------------- ### SparkCluster Example Usage Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-custom-resources.md Example of a SparkCluster resource definition in YAML format. ```yaml apiVersion: spark.apache.org/v1 kind: SparkCluster metadata: name: prod-cluster namespace: default spec: runtimeVersions: sparkVersion: "4.1.2" clusterTolerations: restartConfig: restartPolicy: Never masterSpec: instances: 1 podTemplateSpec: spec: containers: - name: spark-kubernetes-master image: apache/spark:4.1.2-scala workerSpec: instances: 3 podTemplateSpec: spec: containers: - name: spark-kubernetes-worker image: apache/spark:4.1.2-scala ``` -------------------------------- ### Helm Chart Lint and Install Commands Source: https://github.com/apache/spark-kubernetes-operator/blob/main/AGENTS.md Commands to lint the Helm chart for the Spark Kubernetes Operator and to install it using a specified values file. ```bash helm lint --strict build-tools/helm/spark-kubernetes-operator ``` ```bash helm install spark -f build-tools/helm/spark-kubernetes-operator/values.yaml \ build-tools/helm/spark-kubernetes-operator/ ``` -------------------------------- ### ApplicationSpec Constructor Example Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-specs.md Demonstrates how to construct an ApplicationSpec object with various configuration parameters for a Spark application. ```java ApplicationSpec spec = ApplicationSpec.builder() .mainClass("org.apache.spark.examples.SparkPi") .runtimeVersions(RuntimeVersions.builder() .sparkVersion("4.1.2") .scalaVersion("2.13.12") .jdkVersion("21") .build()) .deploymentMode(DeploymentMode.ClusterMode) .driverArgs(Arrays.asList("100")) // Arguments for SparkPi (number of partitions) .sparkConf(Map.of( "spark.executor.instances", "4", "spark.executor.cores", "2", "spark.executor.memory", "4g" )) .build(); ``` -------------------------------- ### Check Helm Installations Source: https://github.com/apache/spark-kubernetes-operator/blob/main/docs/operations.md Lists all deployed Helm releases across all namespaces to verify the installation status of the Spark Kubernetes Operator. ```bash $ helm list -A NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION us-west-1 us-west-1 1 2026-05-06 10:00:00.000000 -0700 PDT deployed spark-kubernetes-operator-1.8.0-dev 1.0.0-SNAPSHOT us-west-2 us-west-2 1 2026-05-06 10:00:03.000000 -0700 PDT deployed spark-kubernetes-operator-1.8.0-dev 1.0.0-SNAPSHOT ``` -------------------------------- ### Example Usage Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-status.md Demonstrates how to use the ApplicationStateSummary enum and its methods in Java code to check the current state of a Spark application. ```APIDOC ### Example ```java ApplicationStateSummary state = ApplicationStateSummary.RunningHealthy; if (state.isFailure()) { System.out.println("Application failed"); } else if (state.isTerminated()) { System.out.println("Application is terminated"); } else if (state.isStarting()) { System.out.println("Application is starting"); } ``` ``` -------------------------------- ### Example SparkAppStatusListener Implementation Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/reconciliation-framework.md An example implementation of the SparkAppStatusListener interface that logs Spark application state transitions to an external audit system. Provide the fully-qualified class name in the operator configuration. ```java public class AuditAppListener implements SparkAppStatusListener { @Override public void onApplicationStatusUpdate(SparkApplication app, ApplicationStatus newStatus) { String appName = app.getMetadata().getName(); String namespace = app.getMetadata().getNamespace(); ApplicationStateSummary state = newStatus.getCurrentState().getCurrentStateSummary(); // Log to external audit system auditLog.info("App {}/{} transitioned to {}", namespace, appName, state); } } ``` -------------------------------- ### Reconcile Application Example Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/reconciler-progress-metrics.md Example of using ReconcileProgress within a SparkApplication reconciler to control requeue behavior based on operation outcomes. ```java // Reconciler logic public UpdateControl reconcile(SparkApplication app) { try { // Submit application submitApplication(app); // Continue monitoring on next iteration after default interval return UpdateControl.patchStatus(app) .andThen(buildReconcileProgress(ReconcileProgress.completeAndDefaultRequeue())); } catch (TemporaryApiError e) { // Requeue sooner to retry return UpdateControl.patchStatus(app) .andThen(buildReconcileProgress(ReconcileProgress.completeAndRequeueAfter( Duration.ofSeconds(5) ))); } } ``` -------------------------------- ### Install Helm Chart in us-west-2 Source: https://github.com/apache/spark-kubernetes-operator/blob/main/docs/operations.md Installs the Spark Kubernetes Operator Helm chart in the 'us-west-2' namespace. This is similar to the us-west-1 installation but uses distinct names for RBAC resources. ```bash helm install us-west-2 spark/spark-kubernetes-operator --create-namespace --namespace us-west-2 --set operatorRbac.clusterRole.name=spark-operator-clusterrole-us-west-2 --set operatorRbac.clusterRoleBinding.name=spark-operator-clusterrolebinding-us-west-2 --set workloadResources.clusterRole.name=spark-workload-clusterrole-us-west-2 ``` -------------------------------- ### BaseSpec Configuration Example Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-specs.md Demonstrates how to set Spark configuration properties for a BaseSpec. This is useful for overriding default Spark settings when creating a Spark application. ```java BaseSpec spec = new BaseSpec(); spec.setSparkConf(Map.of( "spark.executor.instances", "4", "spark.executor.memory", "4g", "spark.executor.cores", "2" )); ``` -------------------------------- ### isStarting() Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-status.md Determines if the application is currently in the starting phase. This covers states from after initialization up to, but not including, the fully running healthy state. ```APIDOC #### isStarting() ```java public boolean isStarting() ``` | Return | | Description | |--------|---| | boolean | | `true` if state is between `ScheduledToRestart` and `RunningHealthy` (exclusive) | ``` -------------------------------- ### ApplicationTimeoutConfig Builder Example Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-specs.md Demonstrates how to configure application timeouts using the builder pattern. This is useful for setting custom durations for driver and executor startup, and termination grace periods. ```java ApplicationTimeoutConfig timeoutConfig = ApplicationTimeoutConfig.builder() .driverStartTimeoutMillis(5 * 60 * 1000L) // 5 minutes .driverReadyTimeoutMillis(10 * 60 * 1000L) // 10 minutes (extended for slow startup) .executorStartTimeoutMillis(5 * 60 * 1000L) // 5 minutes .forceTerminationGracePeriodMillis(1 * 60 * 1000L) // 1 minute .terminationRequeuePeriodMillis(5 * 1000L) // 5 seconds .build(); ``` -------------------------------- ### Environment Variable Configuration Examples Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/configuration.md Set operator configuration options using environment variables. Colons in property keys are replaced with underscores and converted to uppercase. ```bash export SPARK_LOGCONF=true export SPARK_KUBERNETES_OPERATOR_NAMESPACE=spark-operator export SPARK_KUBERNETES_OPERATOR_WATCHED_NAMESPACES=default,spark-apps,* export SPARK_KUBERNETES_OPERATOR_RECONCILER_PARALLELISM=100 ``` -------------------------------- ### Main Application Entry Point Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-operator.md The main entry point for the Spark Kubernetes Operator application. It initializes and starts the operator, probe service, and metrics system. ```java public static void main(String[] args) ``` -------------------------------- ### Create BaseApplicationTemplateSpec with PodTemplateSpec Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-specs.md Provides a Kubernetes Pod template specification for driver or executor pods. This example configures container image, resources, and other standard PodTemplateSpec fields. ```java BaseApplicationTemplateSpec driverSpec = BaseApplicationTemplateSpec.builder() .podTemplateSpec(new PodTemplateSpec( new ObjectMeta(), new PodSpec() .withContainers(Collections.singletonList(new Container() .withName("spark-kubernetes-driver") .withImage("apache/spark:4.1.2-scala") .withResources(new ResourceRequirements() .withRequests(Map.of( "cpu", new Quantity("2"), "memory", new Quantity("4Gi") )) .withLimits(Map.of( "cpu", new Quantity("4"), "memory", new Quantity("8Gi") )) ) )) )) .build(); ``` -------------------------------- ### Example Prometheus Configuration and Deployment Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/reconciler-progress-metrics.md This YAML defines a Prometheus ConfigMap for scrape configurations, a Service to expose Prometheus, and a Deployment to run the Prometheus instance. Adjust scrape_interval and other global settings as needed for your environment. ```yaml apiVersion: v1 kind: ConfigMap metadata: name: prometheus-config data: prometheus.yml: | global: scrape_interval: 30s evaluation_interval: 30s scrape_configs: - job_name: spark-operator static_configs: - targets: - spark-operator:8080 metrics_path: /metrics --- # Prometheus Service apiVersion: v1 kind: Service metadata: name: prometheus spec: selector: app: prometheus ports: - port: 9090 targetPort: 9090 --- # Prometheus Deployment apiVersion: apps/v1 kind: Deployment metadata: name: prometheus spec: selector: matchLabels: app: prometheus template: metadata: labels: app: prometheus spec: containers: - name: prometheus image: prom/prometheus:latest ports: - containerPort: 9090 volumeMounts: - name: config mountPath: /etc/prometheus volumes: - name: config configMap: name: prometheus-config ``` -------------------------------- ### Run Spark Operator Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/index.md Start the Spark Operator. Use environment variables to configure watched namespaces and reconciler parallelism. ```bash java -cp spark-operator-*.jar org.apache.spark.k8s.operator.SparkOperator ``` ```bash export SPARK_KUBERNETES_OPERATOR_WATCHED_NAMESPACES=default,spark-apps export SPARK_KUBERNETES_OPERATOR_RECONCILER_PARALLELISM=100 java -cp spark-operator-*.jar org.apache.spark.k8s.operator.SparkOperator ``` -------------------------------- ### Example ConfigMap for Dynamic Configuration Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/configuration.md This ConfigMap defines dynamic configuration settings for the Spark Kubernetes Operator. Ensure the ConfigMap is in the same namespace as the operator and has the correct labels. ```yaml apiVersion: v1 kind: ConfigMap metadata: name: spark-operator-config namespace: spark-operator labels: app.kubernetes.io/name: spark-kubernetes-operator data: spark.kubernetes.operator.reconciler.intervalSeconds: "300" spark.kubernetes.operator.watched-namespaces: "default,spark-apps,*" ``` -------------------------------- ### Build and Test Commands for Spark Kubernetes Operator Source: https://github.com/apache/spark-kubernetes-operator/blob/main/AGENTS.md Use these Gradle wrapper commands for building, linting, testing, and formatting the operator code. Ensure a JDK 21+ is installed. ```bash ./gradlew build -x test ``` ```bash ./gradlew build ``` ```bash ./gradlew :spark-operator:test ``` ```bash ./gradlew :spark-operator:test --tests "org.apache.spark.k8s.operator.SparkOperatorTest" ``` ```bash ./gradlew spotlessApply ``` ```bash ./gradlew spotlessCheck ``` ```bash ./gradlew javadoc ``` ```bash ./gradlew buildDockerImage ``` ```bash ./gradlew dependencyUpdates ``` -------------------------------- ### Prometheus Service Configuration for Spark Operator Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/reconciler-progress-metrics.md Example YAML configuration for a Kubernetes Service to enable Prometheus scraping of Spark Operator metrics. ```yaml apiVersion: v1 kind: Service metadata: name: spark-operator labels: app: spark-operator annotations: prometheus.io/scrape: "true" prometheus.io/port: "8080" prometheus.io/path: "/metrics" spec: selector: app: spark-operator ports: - name: metrics port: 8080 targetPort: 8080 ``` -------------------------------- ### Kubernetes Readiness Probe Configuration Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/reconciler-progress-metrics.md Configure the readiness probe for the operator. This probe determines if the operator is ready to reconcile resources and handle traffic. It will remove the operator from traffic if it's starting up or degraded. ```yaml readinessProbe: httpGet: path: /readyz port: 8080 initialDelaySeconds: 10 periodSeconds: 5 failureThreshold: 2 ``` -------------------------------- ### Spark Sentinel Resource Example Source: https://github.com/apache/spark-kubernetes-operator/blob/main/docs/operations.md A dummy SparkApplication resource used for operator health probe monitoring. It should be labeled with 'spark.operator/sentinel': 'true' and will not create other Kubernetes resources. The reconciliation delay is controlled by 'health.sentinel.resource.reconciliation.delay.seconds'. ```yaml apiVersion: org.apache.spark/v1 kind: SparkApplication metadata: name: spark-sentinel-resources labels: "spark.operator/sentinel": "true" ``` -------------------------------- ### Resource Filtering in Custom Code with Label Selector Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/constants-labels-messages.md Build custom observers or integrations by defining and converting label selectors. This example shows how to create a LabelSelector object from a string representation for filtering resources. ```java String selectorStr = "spark.operator/name=spark-kubernetes-operator,spark-role=driver"; LabelSelector selector = Serialization.jsonMapper().convertValue( Map.of("matchLabels", Map.of( "spark.operator/name", "spark-kubernetes-operator", "spark-role", "driver" )), LabelSelector.class ); ``` -------------------------------- ### Restart Configuration for Transient Failures Source: https://github.com/apache/spark-kubernetes-operator/blob/main/docs/spark_custom_resources.md Configure the operator to tolerate a high number of transient failures but stop the application if persistent issues arise. This example allows many total restart attempts but stops after 3 consecutive failures. ```yaml restartConfig: restartPolicy: Always maxRestartAttempts: 100 restartBackoffMillis: 30000 maxRestartOnFailure: 3 restartBackoffMillisForFailure: 60000 ``` -------------------------------- ### main(String[] args) Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-operator.md The main entry point for the Spark Kubernetes Operator application. ```APIDOC ## main(String[] args) ### Description The main entry point for the Spark Kubernetes Operator. This method orchestrates the application bootstrap sequence, including logging versions, creating the operator instance, starting registered operators, and initiating probe and metrics services. ### Method ```java public static void main(String[] args) ``` ### Application Bootstrap Sequence 1. Logs operator, Java, and built-in Spark versions. 2. Creates a new `SparkOperator()` instance. 3. Starts all registered operators. 4. Starts the probe service for Kubernetes health checks. 5. Submits the metrics system startup to a thread pool. 6. Submits the metrics service startup to a thread pool. ``` -------------------------------- ### Install Helm Chart in us-west-1 Source: https://github.com/apache/spark-kubernetes-operator/blob/main/docs/operations.md Installs the Spark Kubernetes Operator Helm chart in the 'us-west-1' namespace. Ensure the necessary RBAC roles are configured with unique names for this namespace. ```bash helm install us-west-1 spark/spark-kubernetes-operator --create-namespace --namespace us-west-1 --set operatorRbac.clusterRole.name=spark-operator-clusterrole-us-west-1 --set operatorRbac.clusterRoleBinding.name=spark-operator-clusterrolebinding-us-west-1 --set workloadResources.clusterRole.name=spark-workload-clusterrole-us-west-1 ``` -------------------------------- ### Create and Access ApplicationState Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-status.md Demonstrates how to create an ApplicationState instance with a specific summary and message, and how to access its fields. ```java ApplicationState state = new ApplicationState( ApplicationStateSummary.DriverReady, "Driver pod is ready to accept executor connections"); // Access the state ApplicationStateSummary summary = state.getCurrentStateSummary(); String message = state.getStateMessage(); Instant timestamp = Instant.parse(state.getLastUpdateTime()); ``` -------------------------------- ### Compile and Lint with Gradle (Skip Tests) Source: https://github.com/apache/spark-kubernetes-operator/blob/main/CLAUDE.md Use this command to quickly compile the project and run linters without executing tests. Requires a JDK 21+. ```bash ./gradlew build -x test ``` -------------------------------- ### Enable Logging of Operator Configuration Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/configuration.md Set this system property to `true` to log all operator configuration settings on startup. This is useful for debugging and verifying configuration. ```properties -Dspark.logConf=true ``` -------------------------------- ### Apply Code Formatting with Gradle Source: https://github.com/apache/spark-kubernetes-operator/blob/main/CLAUDE.md Automatically formats the project's sources according to defined style rules. Recommended to run before committing changes. Requires JDK 21+. ```bash ./gradlew spotlessApply ``` -------------------------------- ### Check Application State Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-status.md Example of how to check the current state of a Spark application using the ApplicationStateSummary enumeration and its helper methods. ```java ApplicationStateSummary state = ApplicationStateSummary.RunningHealthy; if (state.isFailure()) { System.out.println("Application failed"); } else if (state.isTerminated()) { System.out.println("Application is terminated"); } else if (state.isStarting()) { System.out.println("Application is starting"); } ``` -------------------------------- ### Generate Javadoc with Gradle Source: https://github.com/apache/spark-kubernetes-operator/blob/main/CLAUDE.md Creates the Javadoc documentation for the project. This is a gate in the CI process. Requires JDK 21+. ```bash ./gradlew javadoc ``` -------------------------------- ### Prometheus Scraping Configuration Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/reconciler-progress-metrics.md Configure Prometheus to scrape metrics from the Spark operator. This example sets up a job to scrape metrics from the operator's /metrics endpoint. ```yaml scrape_configs: - job_name: spark-operator static_configs: - targets: - spark-operator:8080 metrics_path: /metrics scrape_interval: 30s ``` -------------------------------- ### Set Deployment Mode to ClusterMode Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-specs.md Demonstrates how to set the DeploymentMode to ClusterMode. ```java DeploymentMode mode = DeploymentMode.ClusterMode; ``` -------------------------------- ### SparkApplication initStatus() Method Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-custom-resources.md Creates a fresh, empty ApplicationStatus object for this SparkApplication. Called automatically during resource creation to initialize status tracking. ```java @Override public ApplicationStatus initStatus() ``` -------------------------------- ### Get Kubernetes Client Interceptors Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-operator.md Retrieves a list of HTTP interceptors for the Kubernetes client. If metrics are enabled, a KubernetesMetricsInterceptor is added to track API call metrics. ```java protected List getClientInterceptors(MetricsSystem metricsSystem) ``` -------------------------------- ### Get SparkApplication Status Source: https://github.com/apache/spark-kubernetes-operator/blob/main/docs/spark_custom_resources.md After submitting a SparkApplication, use this kubectl command to retrieve its status in YAML format. This allows you to inspect the observed state of your Spark application. ```bash kubectl get sparkapp pi -o yaml ``` -------------------------------- ### Get Spark Application Status Message using kubectl Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/constants-labels-messages.md Retrieve the detailed status message of a Spark application from its Kubernetes resource using a JSONPath expression. ```bash kubectl get sparkapp my-app -o jsonpath='{.status.currentState.stateMessage}' ``` -------------------------------- ### Chainsaw End-to-End Test Execution Source: https://github.com/apache/spark-kubernetes-operator/blob/main/AGENTS.md Command to run end-to-end tests using Chainsaw. This requires a running Kubernetes cluster, a built operator image, and Chainsaw installed. ```bash chainsaw test --test-dir ./tests/e2e/state-transition --parallel 1 ``` -------------------------------- ### Build Docker Image with Gradle Source: https://github.com/apache/spark-kubernetes-operator/blob/main/CLAUDE.md Builds the Docker image for the Spark Kubernetes Operator, tagged with the project version. Requires JDK 21+. ```bash ./gradlew buildDockerImage ``` -------------------------------- ### Deploy Spark Application using kubectl Source: https://github.com/apache/spark-kubernetes-operator/blob/main/docs/operations.md Applies a Spark application definition from a YAML file to both 'us-west-1' and 'us-west-2' namespaces using kubectl. ```bash kubectl apply -f https://apache.github.io/spark-kubernetes-operator/pi.yaml -n us-west-1 kubectl apply -f https://apache.github.io/spark-kubernetes-operator/pi.yaml -n us-west-2 ``` -------------------------------- ### BaseAttemptInfo Abstract Class Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/types.md An abstract base class for holding information about a Spark attempt. It includes fields for attempt ID, start time, completion time, and duration. ```java public abstract class BaseAttemptInfo { protected String attemptId; protected Long startTime; protected Long completionTime; protected Long durationMillis; } ``` -------------------------------- ### Define ClusterAttemptSummary in Java Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/types.md Provides aggregated information about a Spark cluster attempt, extending base summary fields. Tracks attempt ID, start, completion, and duration. ```java public class ClusterAttemptSummary extends BaseAttemptSummary { // Tracks: // - attemptId — Unique identifier // - startTime — Start epoch timestamp // - completionTime — Completion timestamp // - durationMillis — Total duration } ``` -------------------------------- ### Submit Spark App to YuniKorn Source: https://github.com/apache/spark-kubernetes-operator/blob/main/README.md Apply the Spark application configuration to submit it to the YuniKorn scheduler. Use 'kubectl describe pod' to monitor the driver pod's scheduling and execution. ```bash $ kubectl apply -f examples/pi-on-yunikorn.yaml $ kubectl describe pod pi-on-yunikorn-0-driver ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduling 1s yunikorn default/pi-on-yunikorn-0-driver is queued and waiting for allocation Normal Scheduled 1s yunikorn Successfully assigned default/pi-on-yunikorn-0-driver to node docker-desktop Normal PodBindSuccessful 1s yunikorn Pod default/pi-on-yunikorn-0-driver is successfully bound to node docker-desktop Normal Pulled 0s kubelet Container image "apache/spark:4.1.2-scala" already present on machine Normal Created 0s kubelet Created container: spark-kubernetes-driver Normal Started 0s kubelet Started container spark-kubernetes-driver $ kubectl delete sparkapp pi-on-yunikorn sparkapplication.spark.apache.org "pi-on-yunikorn" deleted from default namespace ``` -------------------------------- ### isInitializing() Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-status.md Checks if the application state is in the initializing phase. This includes states where the application has been submitted but not yet scheduled, or is marked for restart. ```APIDOC #### isInitializing() ```java public boolean isInitializing() ``` | Return | | Description | |--------|---| | boolean | | `true` if state is `Submitted` or `ScheduledToRestart` | ``` -------------------------------- ### Filter Reconciled Spark Resources Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/configuration.md Use this optional label selector to filter which Spark resources the operator will reconcile. If empty, all resources are reconciled. This is useful for multi-operator setups. ```properties spark.kubernetes.operator.reconciler.labelSelector=operator=prod,tier=critical ``` -------------------------------- ### ApplicationState Constructors Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-status.md Provides the default constructor and a constructor to initialize with a specific state summary and message. ```java public ApplicationState() public ApplicationState(ApplicationStateSummary currentStateSummary, String message) ``` -------------------------------- ### Run Spark Cluster on Kubernetes Source: https://github.com/apache/spark-kubernetes-operator/blob/main/README.md Deploy a production Spark cluster with three workers, forward the master port, submit a Pi application, check its status, and finally delete the cluster. ```bash $kubectl apply -f examples/prod-cluster-with-three-workers.yaml $kubectl get sparkcluster NAME CURRENT STATE AGE prod RunningHealthy 10s $kubectl port-forward prod-master-0 6066 & $ ./examples/submit-pi-to-prod.sh { "action" : "CreateSubmissionResponse", "message" : "Driver successfully submitted as driver-20260110030233-0000", "serverSparkVersion" : "4.1.2", "submissionId" : "driver-20260110030233-0000", "success" : true } $ curl http://localhost:6066/v1/submissions/status/driver-20260110030233-0000/ { "action" : "SubmissionStatusResponse", "driverState" : "FINISHED", "serverSparkVersion" : "4.1.2", "submissionId" : "driver-20260110030233-0000", "success" : true, "workerHostPort" : "10.1.1.172:44233", "workerId" : "worker-20260110030145-10.1.1.172-44233" } $ kubectl delete sparkcluster prod sparkcluster.spark.apache.org "prod" deleted ``` -------------------------------- ### Configure Periodic Garbage Collection Interval Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/configuration.md Set the interval in seconds for periodic `System.gc()` invocations. A value of 0 or less disables this feature. Note that this is only effective if the JVM is not started with `-XX:+DisableExplicitGC`. ```properties spark.kubernetes.operator.periodicGC.intervalSeconds=3600 ``` ```properties spark.kubernetes.operator.periodicGC.intervalSeconds=0 ``` -------------------------------- ### Check Code Formatting with Gradle Source: https://github.com/apache/spark-kubernetes-operator/blob/main/CLAUDE.md Verifies that all sources adhere to the project's formatting standards without making changes. Use this to ensure compliance. Requires JDK 21+. ```bash ./gradlew spotlessCheck ``` -------------------------------- ### Enable Kubernetes Client Metrics Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/reconciler-progress-metrics.md Enable Kubernetes client metrics by setting the corresponding configuration property. This allows monitoring of API server requests, latency, and errors. ```properties spark.kubernetes.operator.kubernetes.client.metricsEnabled=true ``` -------------------------------- ### initStatus() Method Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-custom-resources.md Initializes and returns a new, empty ApplicationStatus object for the SparkApplication. This method is called automatically during resource creation to initialize status tracking. ```APIDOC ## initStatus() ### Description Creates a fresh, empty ApplicationStatus object for this SparkApplication. Called automatically during resource creation to initialize status tracking. ### Method `@Override public ApplicationStatus initStatus()` ### Returns `ApplicationStatus` — A new initialized ApplicationStatus instance ``` -------------------------------- ### Override Single Helm Parameter Source: https://github.com/apache/spark-kubernetes-operator/blob/main/docs/operations.md Use the `--set` flag to override a single configuration parameter during Helm installation. This is useful for quick changes to specific values like the operator image repository. ```bash helm install --set image.repository=/spark-kubernetes-operator \ -f build-tools/helm/spark-kubernetes-operator/values.yaml \ build-tools/helm/spark-kubernetes-operator/ ``` -------------------------------- ### ApplicationStatus Constructors Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-status.md Provides the default constructor and a parameterized constructor for initializing ApplicationStatus with specific state details. ```java public ApplicationStatus() public ApplicationStatus(ApplicationState currentState, Map stateTransitionHistory, ApplicationAttemptSummary previousAttemptSummary, ApplicationAttemptSummary currentAttemptSummary) ``` -------------------------------- ### ApplicationState Constructors Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-status.md Provides the constructors for creating an ApplicationState object. ```APIDOC ## ApplicationState Constructors ### Public Constructors - `public ApplicationState()` - `public ApplicationState(ApplicationStateSummary currentStateSummary, String message)` ``` -------------------------------- ### Override Multiple Helm Parameters with Custom Values Files Source: https://github.com/apache/spark-kubernetes-operator/blob/main/docs/operations.md Provide multiple custom values files using the `-f` flag during Helm installation. The last specified file takes precedence, allowing for layered configuration. ```bash helm install spark \ -f build-tools/helm/spark-kubernetes-operator/values.yaml \ -f my_values.yaml \ build-tools/helm/spark-kubernetes-operator/ ``` -------------------------------- ### Configure Application Instance Settings Source: https://github.com/apache/spark-kubernetes-operator/blob/main/docs/spark_custom_resources.md Define `instanceConfig` to specify minimum, initial, and maximum executor counts for an application. This helps the operator determine if an application is running healthily, especially in clusters without a batch scheduler. ```yaml applicationTolerations: instanceConfig: minExecutors: 3 initExecutors: 5 maxExecutors: 10 sparkConf: spark.executor.instances: "10" ``` -------------------------------- ### Restart Configuration for API Server Stress Mitigation Source: https://github.com/apache/spark-kubernetes-operator/blob/main/docs/spark_custom_resources.md Configure the operator to stop an application quickly on scheduling failures to avoid overwhelming the API server. This example limits consecutive scheduling failures to 2 with a longer backoff period. ```yaml restartConfig: restartPolicy: Always maxRestartAttempts: 50 restartBackoffMillis: 30000 maxRestartOnSchedulingFailure: 2 restartBackoffMillisForSchedulingFailure: 600000 ``` -------------------------------- ### Build ClusterSpec with RuntimeVersions and Node Specs Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-specs.md Use this to define the desired configuration for a Spark cluster, including runtime versions, master and worker node specifications, and Spark configurations. Requires `podTemplate` to be defined elsewhere. ```java ClusterSpec spec = ClusterSpec.builder() .runtimeVersions(RuntimeVersions.builder() .sparkVersion("4.1.2") .scalaVersion("2.13.12") .build()) .masterSpec(MasterSpec.builder() .instances(1) .podTemplateSpec(podTemplate) .build()) .workerSpec(WorkerSpec.builder() .instances(3) .podTemplateSpec(podTemplate) .build()) .sparkConf(Map.of("spark.default.parallelism", "10")) .build(); ``` -------------------------------- ### Define Runtime Versions for Spark, Scala, and JDK Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-specs.md Specify the exact versions for Spark, Scala, and optionally JDK to be used in the cluster. This is a common configuration when setting up a Spark environment. ```java RuntimeVersions versions = RuntimeVersions.builder() .sparkVersion("4.1.2") .scalaVersion("2.13.12") .jdkVersion("21") .build(); ``` -------------------------------- ### runSystemGc() Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-operator.md Invokes the system garbage collector and logs memory usage statistics. ```APIDOC ## runSystemGc() ### Description Invokes `System.gc()` to trigger garbage collection and logs memory usage before and after the operation. This method is called periodically if `PERIODIC_GC_INTERVAL_SECONDS` is set to a positive value. ### Method ```java static void runSystemGc() ``` ### Logs - Elapsed time in milliseconds - Used memory before/after (in MB) - Total memory before/after (in MB) ```