### Development Setup Configuration

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/configuration.md

Example properties for a typical development environment setup. This configuration enables logging, sets watched namespaces, and defines parallelism and intervals.

```properties
spark.logConf=true
spark.kubernetes.operator.watchedNamespaces=*
spark.kubernetes.operator.reconciler.parallelism=10
spark.kubernetes.operator.reconciler.intervalSeconds=30
spark.kubernetes.operator.kubernetes.client.metricsEnabled=true
spark.kubernetes.operator.josdkMetrics.enabled=true
spark.kubernetes.operator.periodicGC.intervalSeconds=0
```

--------------------------------

### Production Setup Configuration

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/configuration.md

Example properties for a production environment. This configuration disables logging, specifies specific watched namespaces, and increases parallelism and timeouts for stability.

```properties
spark.logConf=false
spark.kubernetes.operator.watchedNamespaces=default,spark-prod,data-pipeline
spark.kubernetes.operator.reconciler.parallelism=100
spark.kubernetes.operator.reconciler.intervalSeconds=300
spark.kubernetes.operator.reconciler.foregroundRequestTimeoutSeconds=120
spark.kubernetes.operator.api.retryMaxAttempts=20
spark.kubernetes.operator.reconciler.trimStateTransitionHistoryEnabled=true
spark.kubernetes.operator.leaderElection.enabled=true
spark.kubernetes.operator.periodicGC.intervalSeconds=3600
spark.kubernetes.operator.dynamicConfig.enabled=true
```

--------------------------------

### High-Availability Multi-Region Setup Configuration

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/configuration.md

Example properties for a high-availability multi-region setup. This configuration emphasizes leader election, high parallelism, and extended timeouts for robust operation across regions.

```properties
spark.kubernetes.operator.leaderElection.enabled=true
spark.kubernetes.operator.reconciler.parallelism=200
spark.kubernetes.operator.informer.cacheSyncTimeoutSeconds=60
spark.kubernetes.operator.reconciler.terminationTimeoutSeconds=120
spark.kubernetes.operator.reconciler.foregroundRequestTimeoutSeconds=180
spark.kubernetes.operator.api.retryMaxAttempts=25
```

--------------------------------

### Install Prometheus with Helm

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/docs/configuration.md

Install Prometheus using its official Helm chart to scrape metrics.

```bash
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/prometheus
```

--------------------------------

### Install Apache YuniKorn Scheduler

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/README.md

Install the latest version of YuniKorn using Helm. Ensure the admission controller is disabled if not needed.

```bash
helm repo add yunikorn https://apache.github.io/yunikorn-release

helm repo update

helm install yunikorn yunikorn/yunikorn --namespace yunikorn --version 1.8.0 --create-namespace --set embedAdmissionController=false
```

--------------------------------

### SparkApplication Example Usage

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-custom-resources.md

An example of a SparkApplication resource definition in YAML format for deployment on Kubernetes.

```yaml
apiVersion: spark.apache.org/v1
kind: SparkApplication
metadata:
  name: pi-example
  namespace: default
spec:
  mainClass: org.apache.spark.examples.SparkPi
  runtimeVersions:
    sparkVersion: "4.1.2"
  deploymentMode: ClusterMode
  driverSpec:
    podTemplateSpec:
      spec:
        containers:
        - name: spark-kubernetes-driver
          image: apache/spark:4.1.2-scala
status:
  currentState:
    currentStateSummary: Submitted
    message: "Spark application has been created on Kubernetes Cluster."
    lastUpdateTime: "2024-01-10T15:30:00Z"

```

--------------------------------

### Run Spark Pi App on Kubernetes

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/README.md

Apply the example pi.yaml to create a SparkApp, check its status, and then delete it.

```bash
$kubectl apply -f examples/pi.yaml

$kubectl get sparkapp
NAME   CURRENT STATE      AGE
pi     ResourceReleased   4m10s

$kubectl delete sparkapp/pi
```

--------------------------------

### ApplicationState Example Usage

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-status.md

Demonstrates how to create and use an ApplicationState object.

```APIDOC
## ApplicationState Example

```java
ApplicationState state = new ApplicationState(
    ApplicationStateSummary.DriverReady,
    "Driver pod is ready to accept executor connections");

// Access the state
ApplicationStateSummary summary = state.getCurrentStateSummary();
String message = state.getStateMessage();
Instant timestamp = Instant.parse(state.getLastUpdateTime());
```
```

--------------------------------

### Creating and Configuring MetricsSystem with MetricsSystemFactory

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/reconciler-progress-metrics.md

Shows how to create a default MetricsSystem instance using MetricsSystemFactory, register custom sources, and start the metrics server.

```java
// Create default configured system
MetricsSystem metricsSystem = MetricsSystemFactory.createMetricsSystem();

// Register custom sources
metricsSystem.registerSource(myCustomSource);

// Start serving metrics
metricsSystem.start();
```

--------------------------------

### Example Usage of ApplicationStatus

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-status.md

Demonstrates how to initialize an ApplicationStatus, transition it to a new state using appendNewState, and check the current state summary.

```java
ApplicationStatus status = new ApplicationStatus();

// Transition to a new state
ApplicationState newState = new ApplicationState(
    ApplicationStateSummary.DriverStarted, 
    "Driver pod is now running");
status = status.appendNewState(newState);

// Check current state
ApplicationStateSummary currentSummary = status.getCurrentState().getCurrentStateSummary();
if (currentSummary == ApplicationStateSummary.RunningHealthy) {
    System.out.println("Application is running healthy");
}
```

--------------------------------

### Registering and Managing Metrics in MetricsSystem

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/reconciler-progress-metrics.md

Demonstrates how to register a custom metrics source, start the metrics server, and retrieve all collected metrics using the MetricsSystem.

```java
// Register a metrics source
metricsSystem.registerSource(new CustomMetrics());

// Start serving metrics on HTTP port
metricsSystem.start();

// Get metrics for export
Map<String, Metric> allMetrics = metricsSystem.getAllMetrics();
```

--------------------------------

### Example of Resource Retention with Restart Configuration

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/docs/spark_custom_resources.md

This example demonstrates resource retention settings in conjunction with a restart configuration. The retain policy and TTL are applied after the final state of the application.

```yaml
applicationTolerations:
  restartConfig:
    restartPolicy: OnFailure
    maxRestartAttempts: 1
  resourceRetainPolicy: Always
  resourceRetainDurationMillis: 30000
  ttlAfterStopMillis: 60000
```

--------------------------------

### Install Helm Chart

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/CLAUDE.md

Installs the Spark Kubernetes Operator using its Helm chart, applying configurations from a specified values file.

```bash
helm install spark -f build-tools/helm/spark-kubernetes-operator/values.yaml \
  build-tools/helm/spark-kubernetes-operator/
```

--------------------------------

### Install Spark Kubernetes Operator with Helm

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/README.md

Add the Helm repository, update it, and install the Spark Kubernetes Operator. Ensure you have Helm installed and configured.

```bash
helm repo add spark https://apache.github.io/spark-kubernetes-operator
helm repo update
helm install spark spark/spark-kubernetes-operator
```

--------------------------------

### Start Spark Operator Components

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-operator.md

This code demonstrates how to start the Spark Operator and its various components, including watching for Spark resources, health probes, and metrics.

```java
SparkOperator sparkOperator = new SparkOperator();

// All components are initialized in constructor
// Start watching for Spark resources
for (Operator operator : sparkOperator.registeredOperators) {
    operator.start();
}

// Start health probes
sparkOperator.probeService.start();

// Start metrics
sparkOperator.metricsResourcesSingleThreadPool.submit(sparkOperator.metricsSystem::start);
sparkOperator.metricsResourcesSingleThreadPool.submit(sparkOperator.metricsService::start);
```

--------------------------------

### Full Build with Gradle (Include Tests)

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/CLAUDE.md

Executes a full build including compilation, linters, and all unit tests. Ensure you have JDK 21+ installed.

```bash
./gradlew build
```

--------------------------------

### SparkCluster Example Usage

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-custom-resources.md

Example of a SparkCluster resource definition in YAML format.

```yaml
apiVersion: spark.apache.org/v1
kind: SparkCluster
metadata:
  name: prod-cluster
  namespace: default
spec:
  runtimeVersions:
    sparkVersion: "4.1.2"
  clusterTolerations:
    restartConfig:
      restartPolicy: Never
  masterSpec:
    instances: 1
    podTemplateSpec:
      spec:
        containers:
        - name: spark-kubernetes-master
          image: apache/spark:4.1.2-scala
  workerSpec:
    instances: 3
    podTemplateSpec:
      spec:
        containers:
        - name: spark-kubernetes-worker
          image: apache/spark:4.1.2-scala

```

--------------------------------

### Helm Chart Lint and Install Commands

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/AGENTS.md

Commands to lint the Helm chart for the Spark Kubernetes Operator and to install it using a specified values file.

```bash
helm lint --strict build-tools/helm/spark-kubernetes-operator

```

```bash
helm install spark -f build-tools/helm/spark-kubernetes-operator/values.yaml \
  build-tools/helm/spark-kubernetes-operator/

```

--------------------------------

### ApplicationSpec Constructor Example

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-specs.md

Demonstrates how to construct an ApplicationSpec object with various configuration parameters for a Spark application.

```java
ApplicationSpec spec = ApplicationSpec.builder()
    .mainClass("org.apache.spark.examples.SparkPi")
    .runtimeVersions(RuntimeVersions.builder()
        .sparkVersion("4.1.2")
        .scalaVersion("2.13.12")
        .jdkVersion("21")
        .build())
    .deploymentMode(DeploymentMode.ClusterMode)
    .driverArgs(Arrays.asList("100"))  // Arguments for SparkPi (number of partitions)
    .sparkConf(Map.of(
        "spark.executor.instances", "4",
        "spark.executor.cores", "2",
        "spark.executor.memory", "4g"
    ))
    .build();
```

--------------------------------

### Check Helm Installations

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/docs/operations.md

Lists all deployed Helm releases across all namespaces to verify the installation status of the Spark Kubernetes Operator.

```bash
$ helm list -A
NAME      NAMESPACE REVISION UPDATED                              STATUS   CHART                               APP VERSION
us-west-1 us-west-1 1        2026-05-06 10:00:00.000000 -0700 PDT deployed spark-kubernetes-operator-1.8.0-dev 1.0.0-SNAPSHOT
us-west-2 us-west-2 1        2026-05-06 10:00:03.000000 -0700 PDT deployed spark-kubernetes-operator-1.8.0-dev 1.0.0-SNAPSHOT
```

--------------------------------

### Example Usage

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-status.md

Demonstrates how to use the ApplicationStateSummary enum and its methods in Java code to check the current state of a Spark application.

```APIDOC
### Example

```java
ApplicationStateSummary state = ApplicationStateSummary.RunningHealthy;

if (state.isFailure()) {
    System.out.println("Application failed");
} else if (state.isTerminated()) {
    System.out.println("Application is terminated");
} else if (state.isStarting()) {
    System.out.println("Application is starting");
}
```
```

--------------------------------

### Example SparkAppStatusListener Implementation

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/reconciliation-framework.md

An example implementation of the SparkAppStatusListener interface that logs Spark application state transitions to an external audit system. Provide the fully-qualified class name in the operator configuration.

```java
public class AuditAppListener implements SparkAppStatusListener {
  @Override
  public void onApplicationStatusUpdate(SparkApplication app, ApplicationStatus newStatus) {
    String appName = app.getMetadata().getName();
    String namespace = app.getMetadata().getNamespace();
    ApplicationStateSummary state = newStatus.getCurrentState().getCurrentStateSummary();
    
    // Log to external audit system
    auditLog.info("App {}/{} transitioned to {}", namespace, appName, state);
  }
}
```

--------------------------------

### Reconcile Application Example

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/reconciler-progress-metrics.md

Example of using ReconcileProgress within a SparkApplication reconciler to control requeue behavior based on operation outcomes.

```java
// Reconciler logic
public UpdateControl<SparkApplication> reconcile(SparkApplication app) {
  try {
    // Submit application
    submitApplication(app);
    
    // Continue monitoring on next iteration after default interval
    return UpdateControl.patchStatus(app)
        .andThen(buildReconcileProgress(ReconcileProgress.completeAndDefaultRequeue()));
  } catch (TemporaryApiError e) {
    // Requeue sooner to retry
    return UpdateControl.patchStatus(app)
        .andThen(buildReconcileProgress(ReconcileProgress.completeAndRequeueAfter(
            Duration.ofSeconds(5)
        )));
  }
}
```

--------------------------------

### Install Helm Chart in us-west-2

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/docs/operations.md

Installs the Spark Kubernetes Operator Helm chart in the 'us-west-2' namespace. This is similar to the us-west-1 installation but uses distinct names for RBAC resources.

```bash
helm install us-west-2 spark/spark-kubernetes-operator --create-namespace --namespace us-west-2 --set operatorRbac.clusterRole.name=spark-operator-clusterrole-us-west-2 --set operatorRbac.clusterRoleBinding.name=spark-operator-clusterrolebinding-us-west-2 --set workloadResources.clusterRole.name=spark-workload-clusterrole-us-west-2
```

--------------------------------

### BaseSpec Configuration Example

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-specs.md

Demonstrates how to set Spark configuration properties for a BaseSpec. This is useful for overriding default Spark settings when creating a Spark application.

```java
BaseSpec spec = new BaseSpec();
spec.setSparkConf(Map.of(
    "spark.executor.instances", "4",
    "spark.executor.memory", "4g",
    "spark.executor.cores", "2"
));
```

--------------------------------

### isStarting()

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-status.md

Determines if the application is currently in the starting phase. This covers states from after initialization up to, but not including, the fully running healthy state.

```APIDOC
#### isStarting()

```java
public boolean isStarting()
```

| Return |
| Description |
|--------|---|
| boolean |
| `true` if state is between `ScheduledToRestart` and `RunningHealthy` (exclusive) |
```

--------------------------------

### ApplicationTimeoutConfig Builder Example

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-specs.md

Demonstrates how to configure application timeouts using the builder pattern. This is useful for setting custom durations for driver and executor startup, and termination grace periods.

```java
ApplicationTimeoutConfig timeoutConfig = ApplicationTimeoutConfig.builder()
    .driverStartTimeoutMillis(5 * 60 * 1000L)      // 5 minutes
    .driverReadyTimeoutMillis(10 * 60 * 1000L)     // 10 minutes (extended for slow startup)
    .executorStartTimeoutMillis(5 * 60 * 1000L)    // 5 minutes
    .forceTerminationGracePeriodMillis(1 * 60 * 1000L)  // 1 minute
    .terminationRequeuePeriodMillis(5 * 1000L)     // 5 seconds
    .build();
```

--------------------------------

### Environment Variable Configuration Examples

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/configuration.md

Set operator configuration options using environment variables. Colons in property keys are replaced with underscores and converted to uppercase.

```bash
export SPARK_LOGCONF=true
export SPARK_KUBERNETES_OPERATOR_NAMESPACE=spark-operator
export SPARK_KUBERNETES_OPERATOR_WATCHED_NAMESPACES=default,spark-apps,*
export SPARK_KUBERNETES_OPERATOR_RECONCILER_PARALLELISM=100
```

--------------------------------

### Main Application Entry Point

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-operator.md

The main entry point for the Spark Kubernetes Operator application. It initializes and starts the operator, probe service, and metrics system.

```java
public static void main(String[] args)
```

--------------------------------

### Create BaseApplicationTemplateSpec with PodTemplateSpec

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-specs.md

Provides a Kubernetes Pod template specification for driver or executor pods. This example configures container image, resources, and other standard PodTemplateSpec fields.

```java
BaseApplicationTemplateSpec driverSpec = BaseApplicationTemplateSpec.builder()
    .podTemplateSpec(new PodTemplateSpec(
        new ObjectMeta(),
        new PodSpec()
            .withContainers(Collections.singletonList(new Container()
                .withName("spark-kubernetes-driver")
                .withImage("apache/spark:4.1.2-scala")
                .withResources(new ResourceRequirements()
                    .withRequests(Map.of(
                        "cpu", new Quantity("2"),
                        "memory", new Quantity("4Gi")
                    ))
                    .withLimits(Map.of(
                        "cpu", new Quantity("4"),
                        "memory", new Quantity("8Gi")
                    ))
                )
            ))
    ))
    .build();
```

--------------------------------

### Example Prometheus Configuration and Deployment

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/reconciler-progress-metrics.md

This YAML defines a Prometheus ConfigMap for scrape configurations, a Service to expose Prometheus, and a Deployment to run the Prometheus instance. Adjust scrape_interval and other global settings as needed for your environment.

```yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
data:
  prometheus.yml: |
    global:
      scrape_interval: 30s
      evaluation_interval: 30s
    
    scrape_configs:
      - job_name: spark-operator
        static_configs:
          - targets:
              - spark-operator:8080
        metrics_path: /metrics

---
# Prometheus Service
apiVersion: v1
kind: Service
metadata:
  name: prometheus
spec:
  selector:
    app: prometheus
  ports:
    - port: 9090
      targetPort: 9090

---
# Prometheus Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
spec:
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      containers:
        - name: prometheus
          image: prom/prometheus:latest
          ports:
            - containerPort: 9090
          volumeMounts:
            - name: config
              mountPath: /etc/prometheus
      volumes:
        - name: config
          configMap:
            name: prometheus-config

```

--------------------------------

### Run Spark Operator

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/index.md

Start the Spark Operator. Use environment variables to configure watched namespaces and reconciler parallelism.

```bash
java -cp spark-operator-*.jar org.apache.spark.k8s.operator.SparkOperator
```

```bash
export SPARK_KUBERNETES_OPERATOR_WATCHED_NAMESPACES=default,spark-apps
export SPARK_KUBERNETES_OPERATOR_RECONCILER_PARALLELISM=100
java -cp spark-operator-*.jar org.apache.spark.k8s.operator.SparkOperator
```

--------------------------------

### Example ConfigMap for Dynamic Configuration

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/configuration.md

This ConfigMap defines dynamic configuration settings for the Spark Kubernetes Operator. Ensure the ConfigMap is in the same namespace as the operator and has the correct labels.

```yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: spark-operator-config
  namespace: spark-operator
  labels:
    app.kubernetes.io/name: spark-kubernetes-operator
data:
  spark.kubernetes.operator.reconciler.intervalSeconds: "300"
  spark.kubernetes.operator.watched-namespaces: "default,spark-apps,*"
```

--------------------------------

### Build and Test Commands for Spark Kubernetes Operator

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/AGENTS.md

Use these Gradle wrapper commands for building, linting, testing, and formatting the operator code. Ensure a JDK 21+ is installed.

```bash
./gradlew build -x test

```

```bash
./gradlew build

```

```bash
./gradlew :spark-operator:test

```

```bash
./gradlew :spark-operator:test --tests "org.apache.spark.k8s.operator.SparkOperatorTest"

```

```bash
./gradlew spotlessApply

```

```bash
./gradlew spotlessCheck

```

```bash
./gradlew javadoc

```

```bash
./gradlew buildDockerImage

```

```bash
./gradlew dependencyUpdates

```

--------------------------------

### Prometheus Service Configuration for Spark Operator

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/reconciler-progress-metrics.md

Example YAML configuration for a Kubernetes Service to enable Prometheus scraping of Spark Operator metrics.

```yaml
apiVersion: v1
kind: Service
metadata:
  name: spark-operator
  labels:
    app: spark-operator
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "8080"
    prometheus.io/path: "/metrics"
spec:
  selector:
    app: spark-operator
  ports:
    - name: metrics
      port: 8080
      targetPort: 8080
```

--------------------------------

### Kubernetes Readiness Probe Configuration

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/reconciler-progress-metrics.md

Configure the readiness probe for the operator. This probe determines if the operator is ready to reconcile resources and handle traffic. It will remove the operator from traffic if it's starting up or degraded.

```yaml
readinessProbe:
  httpGet:
    path: /readyz
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 5
  failureThreshold: 2
```

--------------------------------

### Spark Sentinel Resource Example

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/docs/operations.md

A dummy SparkApplication resource used for operator health probe monitoring. It should be labeled with 'spark.operator/sentinel': 'true' and will not create other Kubernetes resources. The reconciliation delay is controlled by 'health.sentinel.resource.reconciliation.delay.seconds'.

```yaml
apiVersion: org.apache.spark/v1
kind: SparkApplication
metadata:
  name: spark-sentinel-resources
  labels:
    "spark.operator/sentinel": "true"
```

--------------------------------

### Resource Filtering in Custom Code with Label Selector

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/constants-labels-messages.md

Build custom observers or integrations by defining and converting label selectors. This example shows how to create a LabelSelector object from a string representation for filtering resources.

```java
String selectorStr = "spark.operator/name=spark-kubernetes-operator,spark-role=driver";
LabelSelector selector = Serialization.jsonMapper().convertValue(
    Map.of("matchLabels", Map.of(
        "spark.operator/name", "spark-kubernetes-operator",
        "spark-role", "driver"
    )),
    LabelSelector.class
);
```

--------------------------------

### Restart Configuration for Transient Failures

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/docs/spark_custom_resources.md

Configure the operator to tolerate a high number of transient failures but stop the application if persistent issues arise. This example allows many total restart attempts but stops after 3 consecutive failures.

```yaml
restartConfig:
  restartPolicy: Always
  maxRestartAttempts: 100
  restartBackoffMillis: 30000
  maxRestartOnFailure: 3
  restartBackoffMillisForFailure: 60000
```

--------------------------------

### main(String[] args)

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-operator.md

The main entry point for the Spark Kubernetes Operator application.

```APIDOC
## main(String[] args)

### Description
The main entry point for the Spark Kubernetes Operator. This method orchestrates the application bootstrap sequence, including logging versions, creating the operator instance, starting registered operators, and initiating probe and metrics services.

### Method
```java
public static void main(String[] args)
```

### Application Bootstrap Sequence
1. Logs operator, Java, and built-in Spark versions.
2. Creates a new `SparkOperator()` instance.
3. Starts all registered operators.
4. Starts the probe service for Kubernetes health checks.
5. Submits the metrics system startup to a thread pool.
6. Submits the metrics service startup to a thread pool.
```

--------------------------------

### Install Helm Chart in us-west-1

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/docs/operations.md

Installs the Spark Kubernetes Operator Helm chart in the 'us-west-1' namespace. Ensure the necessary RBAC roles are configured with unique names for this namespace.

```bash
helm install us-west-1 spark/spark-kubernetes-operator --create-namespace --namespace us-west-1 --set operatorRbac.clusterRole.name=spark-operator-clusterrole-us-west-1 --set operatorRbac.clusterRoleBinding.name=spark-operator-clusterrolebinding-us-west-1 --set workloadResources.clusterRole.name=spark-workload-clusterrole-us-west-1
```

--------------------------------

### Create and Access ApplicationState

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-status.md

Demonstrates how to create an ApplicationState instance with a specific summary and message, and how to access its fields.

```java
ApplicationState state = new ApplicationState(
    ApplicationStateSummary.DriverReady,
    "Driver pod is ready to accept executor connections");

// Access the state
ApplicationStateSummary summary = state.getCurrentStateSummary();
String message = state.getStateMessage();
Instant timestamp = Instant.parse(state.getLastUpdateTime());
```

--------------------------------

### Compile and Lint with Gradle (Skip Tests)

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/CLAUDE.md

Use this command to quickly compile the project and run linters without executing tests. Requires a JDK 21+.

```bash
./gradlew build -x test
```

--------------------------------

### Enable Logging of Operator Configuration

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/configuration.md

Set this system property to `true` to log all operator configuration settings on startup. This is useful for debugging and verifying configuration.

```properties
-Dspark.logConf=true
```

--------------------------------

### Apply Code Formatting with Gradle

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/CLAUDE.md

Automatically formats the project's sources according to defined style rules. Recommended to run before committing changes. Requires JDK 21+.

```bash
./gradlew spotlessApply
```

--------------------------------

### Check Application State

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-status.md

Example of how to check the current state of a Spark application using the ApplicationStateSummary enumeration and its helper methods.

```java
ApplicationStateSummary state = ApplicationStateSummary.RunningHealthy;

if (state.isFailure()) {
    System.out.println("Application failed");
} else if (state.isTerminated()) {
    System.out.println("Application is terminated");
} else if (state.isStarting()) {
    System.out.println("Application is starting");
}
```

--------------------------------

### Generate Javadoc with Gradle

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/CLAUDE.md

Creates the Javadoc documentation for the project. This is a gate in the CI process. Requires JDK 21+.

```bash
./gradlew javadoc
```

--------------------------------

### Prometheus Scraping Configuration

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/reconciler-progress-metrics.md

Configure Prometheus to scrape metrics from the Spark operator. This example sets up a job to scrape metrics from the operator's /metrics endpoint.

```yaml
scrape_configs:
  - job_name: spark-operator
    static_configs:
      - targets:
          - spark-operator:8080
    metrics_path: /metrics
    scrape_interval: 30s
```

--------------------------------

### Set Deployment Mode to ClusterMode

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-specs.md

Demonstrates how to set the DeploymentMode to ClusterMode.

```java
DeploymentMode mode = DeploymentMode.ClusterMode;
```

--------------------------------

### SparkApplication initStatus() Method

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-custom-resources.md

Creates a fresh, empty ApplicationStatus object for this SparkApplication. Called automatically during resource creation to initialize status tracking.

```java
@Override
public ApplicationStatus initStatus()

```

--------------------------------

### Get Kubernetes Client Interceptors

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-operator.md

Retrieves a list of HTTP interceptors for the Kubernetes client. If metrics are enabled, a KubernetesMetricsInterceptor is added to track API call metrics.

```java
protected List<Interceptor> getClientInterceptors(MetricsSystem metricsSystem)
```

--------------------------------

### Get SparkApplication Status

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/docs/spark_custom_resources.md

After submitting a SparkApplication, use this kubectl command to retrieve its status in YAML format. This allows you to inspect the observed state of your Spark application.

```bash
kubectl get sparkapp pi -o yaml
```

--------------------------------

### Get Spark Application Status Message using kubectl

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/constants-labels-messages.md

Retrieve the detailed status message of a Spark application from its Kubernetes resource using a JSONPath expression.

```bash
kubectl get sparkapp my-app -o jsonpath='{.status.currentState.stateMessage}'
```

--------------------------------

### Chainsaw End-to-End Test Execution

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/AGENTS.md

Command to run end-to-end tests using Chainsaw. This requires a running Kubernetes cluster, a built operator image, and Chainsaw installed.

```bash
chainsaw test --test-dir ./tests/e2e/state-transition --parallel 1

```

--------------------------------

### Build Docker Image with Gradle

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/CLAUDE.md

Builds the Docker image for the Spark Kubernetes Operator, tagged with the project version. Requires JDK 21+.

```bash
./gradlew buildDockerImage
```

--------------------------------

### Deploy Spark Application using kubectl

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/docs/operations.md

Applies a Spark application definition from a YAML file to both 'us-west-1' and 'us-west-2' namespaces using kubectl.

```bash
kubectl apply -f https://apache.github.io/spark-kubernetes-operator/pi.yaml -n us-west-1
kubectl apply -f https://apache.github.io/spark-kubernetes-operator/pi.yaml -n us-west-2
```

--------------------------------

### BaseAttemptInfo Abstract Class

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/types.md

An abstract base class for holding information about a Spark attempt. It includes fields for attempt ID, start time, completion time, and duration.

```java
public abstract class BaseAttemptInfo {
  protected String attemptId;
  protected Long startTime;
  protected Long completionTime;
  protected Long durationMillis;
}
```

--------------------------------

### Define ClusterAttemptSummary in Java

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/types.md

Provides aggregated information about a Spark cluster attempt, extending base summary fields. Tracks attempt ID, start, completion, and duration.

```java
public class ClusterAttemptSummary extends BaseAttemptSummary {
  // Tracks:
  // - attemptId — Unique identifier
  // - startTime — Start epoch timestamp
  // - completionTime — Completion timestamp
  // - durationMillis — Total duration
}
```

--------------------------------

### Submit Spark App to YuniKorn

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/README.md

Apply the Spark application configuration to submit it to the YuniKorn scheduler. Use 'kubectl describe pod' to monitor the driver pod's scheduling and execution.

```bash
$ kubectl apply -f examples/pi-on-yunikorn.yaml

$ kubectl describe pod pi-on-yunikorn-0-driver
...
Events:
  Type    Reason             Age   From      Message
  ----    ------             ----  ----      -------
  Normal  Scheduling         1s    yunikorn  default/pi-on-yunikorn-0-driver is queued and waiting for allocation
  Normal  Scheduled          1s    yunikorn  Successfully assigned default/pi-on-yunikorn-0-driver to node docker-desktop
  Normal  PodBindSuccessful  1s    yunikorn  Pod default/pi-on-yunikorn-0-driver is successfully bound to node docker-desktop
  Normal  Pulled             0s    kubelet   Container image "apache/spark:4.1.2-scala" already present on machine
  Normal  Created            0s    kubelet   Created container: spark-kubernetes-driver
  Normal  Started            0s    kubelet   Started container spark-kubernetes-driver

$ kubectl delete sparkapp pi-on-yunikorn
sparkapplication.spark.apache.org "pi-on-yunikorn" deleted from default namespace
```

--------------------------------

### isInitializing()

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-status.md

Checks if the application state is in the initializing phase. This includes states where the application has been submitted but not yet scheduled, or is marked for restart.

```APIDOC
#### isInitializing()

```java
public boolean isInitializing()
```

| Return |
| Description |
|--------|---|
| boolean |
| `true` if state is `Submitted` or `ScheduledToRestart` |
```

--------------------------------

### Filter Reconciled Spark Resources

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/configuration.md

Use this optional label selector to filter which Spark resources the operator will reconcile. If empty, all resources are reconciled. This is useful for multi-operator setups.

```properties
spark.kubernetes.operator.reconciler.labelSelector=operator=prod,tier=critical
```

--------------------------------

### ApplicationState Constructors

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-status.md

Provides the default constructor and a constructor to initialize with a specific state summary and message.

```java
public ApplicationState()
public ApplicationState(ApplicationStateSummary currentStateSummary, String message)
```

--------------------------------

### Run Spark Cluster on Kubernetes

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/README.md

Deploy a production Spark cluster with three workers, forward the master port, submit a Pi application, check its status, and finally delete the cluster.

```bash
$kubectl apply -f examples/prod-cluster-with-three-workers.yaml

$kubectl get sparkcluster
NAME   CURRENT STATE    AGE
prod   RunningHealthy   10s

$kubectl port-forward prod-master-0 6066 &

$ ./examples/submit-pi-to-prod.sh
{
  "action" : "CreateSubmissionResponse",
  "message" : "Driver successfully submitted as driver-20260110030233-0000",
  "serverSparkVersion" : "4.1.2",
  "submissionId" : "driver-20260110030233-0000",
  "success" : true
}

$ curl http://localhost:6066/v1/submissions/status/driver-20260110030233-0000/
{
  "action" : "SubmissionStatusResponse",
  "driverState" : "FINISHED",
  "serverSparkVersion" : "4.1.2",
  "submissionId" : "driver-20260110030233-0000",
  "success" : true,
  "workerHostPort" : "10.1.1.172:44233",
  "workerId" : "worker-20260110030145-10.1.1.172-44233"
}

$ kubectl delete sparkcluster prod
sparkcluster.spark.apache.org "prod" deleted
```

--------------------------------

### Configure Periodic Garbage Collection Interval

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/configuration.md

Set the interval in seconds for periodic `System.gc()` invocations. A value of 0 or less disables this feature. Note that this is only effective if the JVM is not started with `-XX:+DisableExplicitGC`.

```properties
spark.kubernetes.operator.periodicGC.intervalSeconds=3600
```

```properties
spark.kubernetes.operator.periodicGC.intervalSeconds=0
```

--------------------------------

### Check Code Formatting with Gradle

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/CLAUDE.md

Verifies that all sources adhere to the project's formatting standards without making changes. Use this to ensure compliance. Requires JDK 21+.

```bash
./gradlew spotlessCheck
```

--------------------------------

### Enable Kubernetes Client Metrics

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/reconciler-progress-metrics.md

Enable Kubernetes client metrics by setting the corresponding configuration property. This allows monitoring of API server requests, latency, and errors.

```properties
spark.kubernetes.operator.kubernetes.client.metricsEnabled=true
```

--------------------------------

### initStatus() Method

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-custom-resources.md

Initializes and returns a new, empty ApplicationStatus object for the SparkApplication. This method is called automatically during resource creation to initialize status tracking.

```APIDOC
## initStatus()

### Description
Creates a fresh, empty ApplicationStatus object for this SparkApplication. Called automatically during resource creation to initialize status tracking.

### Method
`@Override public ApplicationStatus initStatus()`

### Returns
`ApplicationStatus` — A new initialized ApplicationStatus instance
```

--------------------------------

### Override Single Helm Parameter

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/docs/operations.md

Use the `--set` flag to override a single configuration parameter during Helm installation. This is useful for quick changes to specific values like the operator image repository.

```bash
helm install --set image.repository=<my_registry>/spark-kubernetes-operator \
   -f build-tools/helm/spark-kubernetes-operator/values.yaml \
  build-tools/helm/spark-kubernetes-operator/
```

--------------------------------

### ApplicationStatus Constructors

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-status.md

Provides the default constructor and a parameterized constructor for initializing ApplicationStatus with specific state details.

```java
public ApplicationStatus()
public ApplicationStatus(ApplicationState currentState, Map<Long, ApplicationState> stateTransitionHistory, 
    ApplicationAttemptSummary previousAttemptSummary, ApplicationAttemptSummary currentAttemptSummary)
```

--------------------------------

### ApplicationState Constructors

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-status.md

Provides the constructors for creating an ApplicationState object.

```APIDOC
## ApplicationState Constructors

### Public Constructors

- `public ApplicationState()`
- `public ApplicationState(ApplicationStateSummary currentStateSummary, String message)`
```

--------------------------------

### Override Multiple Helm Parameters with Custom Values Files

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/docs/operations.md

Provide multiple custom values files using the `-f` flag during Helm installation. The last specified file takes precedence, allowing for layered configuration.

```bash
helm install spark \
   -f build-tools/helm/spark-kubernetes-operator/values.yaml \
   -f my_values.yaml \
   build-tools/helm/spark-kubernetes-operator/
```

--------------------------------

### Configure Application Instance Settings

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/docs/spark_custom_resources.md

Define `instanceConfig` to specify minimum, initial, and maximum executor counts for an application. This helps the operator determine if an application is running healthily, especially in clusters without a batch scheduler.

```yaml
applicationTolerations:
  instanceConfig:
    minExecutors: 3
    initExecutors: 5
    maxExecutors: 10
sparkConf:
  spark.executor.instances: "10"
```

--------------------------------

### Restart Configuration for API Server Stress Mitigation

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/docs/spark_custom_resources.md

Configure the operator to stop an application quickly on scheduling failures to avoid overwhelming the API server. This example limits consecutive scheduling failures to 2 with a longer backoff period.

```yaml
restartConfig:
  restartPolicy: Always
  maxRestartAttempts: 50
  restartBackoffMillis: 30000
  maxRestartOnSchedulingFailure: 2
  restartBackoffMillisForSchedulingFailure: 600000
```

--------------------------------

### Build ClusterSpec with RuntimeVersions and Node Specs

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-specs.md

Use this to define the desired configuration for a Spark cluster, including runtime versions, master and worker node specifications, and Spark configurations. Requires `podTemplate` to be defined elsewhere.

```java
ClusterSpec spec = ClusterSpec.builder()
    .runtimeVersions(RuntimeVersions.builder()
        .sparkVersion("4.1.2")
        .scalaVersion("2.13.12")
        .build())
    .masterSpec(MasterSpec.builder()
        .instances(1)
        .podTemplateSpec(podTemplate)
        .build())
    .workerSpec(WorkerSpec.builder()
        .instances(3)
        .podTemplateSpec(podTemplate)
        .build())
    .sparkConf(Map.of("spark.default.parallelism", "10"))
    .build();
```

--------------------------------

### Define Runtime Versions for Spark, Scala, and JDK

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-specs.md

Specify the exact versions for Spark, Scala, and optionally JDK to be used in the cluster. This is a common configuration when setting up a Spark environment.

```java
RuntimeVersions versions = RuntimeVersions.builder()
    .sparkVersion("4.1.2")
    .scalaVersion("2.13.12")
    .jdkVersion("21")
    .build();
```

--------------------------------

### runSystemGc()

Source: https://github.com/apache/spark-kubernetes-operator/blob/main/_autodocs/api-reference-operator.md

Invokes the system garbage collector and logs memory usage statistics.

```APIDOC
## runSystemGc()

### Description
Invokes `System.gc()` to trigger garbage collection and logs memory usage before and after the operation. This method is called periodically if `PERIODIC_GC_INTERVAL_SECONDS` is set to a positive value.

### Method
```java
static void runSystemGc()
```

### Logs
- Elapsed time in milliseconds
- Used memory before/after (in MB)
- Total memory before/after (in MB)
```