### Initialize Overwatch Workspace (Scala)

Source: https://databrickslabs.github.io/overwatch/gettingstarted

Initializes the Overwatch workspace by defining configuration parameters such as storage paths, data targets, security tokens, and cost details. This setup is crucial for the pipeline's operation and requires manual validation on the first run.

```scala
private val storagePrefix = "/mnt/global/Overwatch/working/directory"
private val workspaceID = ??? //automatically generated in the runner scripts referenced above
private val dataTarget = DataTarget(
  Some(etlDB), Some(s"${storagePrefix}/${workspaceID}/${etlDB}.db"), Some(s"${storagePrefix}/global_share"),
  Some(consumerDB), Some(s"${storagePrefix}/${workspaceID}/${consumerDB}.db")
)

private val tokenSecret = TokenSecret(secretsScope, dbPATKey)
private val badRecordsPath = s"${storagePrefix}/${workspaceID}/sparkEventsBadRecords"
private val auditSourcePath = "/path/to/my/workspaces/raw_audit_logs" // INPUT: workspace audit log directory
private val interactiveDBUPrice = 0.56
private val automatedDBUPrice = 0.26
private val customWorkspaceName = workspaceID // customize this to a custom name if custom workspace_name is desired

val params = OverwatchParams(
  auditLogConfig = AuditLogConfig(rawAuditPath = Some(auditSourcePath)),
  dataTarget = Some(dataTarget),
  tokenSecret = Some(tokenSecret),
  badRecordsPath = Some(badRecordsPath),
  overwatchScope = Some(scopes),
  maxDaysToLoad = maxDaysToLoad,
  databricksContractPrices = DatabricksContractPrices(interactiveDBUPrice, automatedDBUPrice),
  primordialDateString = Some(primordialDateString),
  workspace_name = Some(customWorkspaceName), // as of 0.6.0
  externalizeOptimize = false // as of 0.6.0
)

// args are the full json config to be used for the Overwatch workspace deployment
val args = JsonUtils.objToJson(params).compactString

// the workspace object contains everything Overwatch needs to build, run, maintain its pipelines
val workspace = Initializer(args) // 0.5.x+
```

--------------------------------

### Setup Widgets for Configuration (Scala)

Source: https://databrickslabs.github.io/overwatch/assets/GettingStarted/azure_runner_docs_example

Initializes and retrieves values from Databricks widgets for configuring ETL and consumer database names, secrets scope, and other parameters.

```scala
// dbutils.widgets.removeAll
// dbutils.widgets.text("storagePrefix", "", "1. ETL Storage Prefix")
// dbutils.widgets.text("etlDBName", "overwatch_etl", "2. ETL Database Name")
// dbutils.widgets.text("consumerDBName", "overwatch", "3. Consumer DB Name")
// dbutils.widgets.text("secretsScope", "my_secret_scope", "4. Secret Scope")
// dbutils.widgets.text("dbPATKey", "my_key_with_api", "5. Secret Key (DBPAT)")
// dbutils.widgets.text("ehKey", "overwatch_eventhub_conn_string", "6. Secret Key (EH)")
// dbutils.widgets.text("ehName", "my_eh_name", "7. EH Topic Name")
// dbutils.widgets.text("primordialDateString", "2021-04-01", "8. Primordial Date")
// dbutils.widgets.text("maxDaysToLoad", "60", "9. Max Days")
// dbutils.widgets.text("scopes", "all", "A1. Scopes")
```

```scala
val storagePrefix = dbutils.widgets.get("storagePrefix")
val etlDB = dbutils.widgets.get("etlDBName")
val consumerDB = dbutils.widgets.get("consumerDBName")
val secretsScope = dbutils.widgets.get("secretsScope")
val dbPATKey = dbutils.widgets.get("dbPATKey")
val ehName = dbutils.widgets.get("ehName")
val ehKey = dbutils.widgets.get("ehKey")
val primordialDateString = dbutils.widgets.get("primordialDateString")
val maxDaysToLoad = dbutils.widgets.get("maxDaysToLoad").toInt
val scopes = if (dbutils.widgets.get("scopes") == "all") {
  "audit,sparkEvents,jobs,clusters,clusterEvents,notebooks,pools".split(",")
} else dbutils.widgets.get("scopes").split(",")
```

```scala
if (storagePrefix.isEmpty || consumerDB.isEmpty || etlDB.isEmpty || ehName.isEmpty || secretsScope.isEmpty || ehKey.isEmpty || dbPATKey.isEmpty) {
  throw new IllegalArgumentException("Please specify all required parameters!")
}
```

--------------------------------

### Initialize Pipeline for Custom Costs (Scala)

Source: https://databrickslabs.github.io/overwatch/gettingstarted

Initializes the pipeline to create custom costing tables in the ETL database. This step is necessary for the first-time initialization when custom compute costs need to be configured according to regional or contract pricing.

```scala
Bronze(workspace)
```

--------------------------------

### Databricks Overwatch Main Class Parameters

Source: https://databrickslabs.github.io/overwatch/gettingstarted

Demonstrates how parameters are passed to the Databricks Overwatch Main Class, requiring a JSON array of escaped strings. This format is crucial for configuring job parameters, especially when using the 'escapedConfigString' utility.

```shell
["<overwatch_escaped_params>"]
["bronze", "<overwatch_escaped_params>"]
["silver", "<overwatch_escaped_params>"]
["gold", "<overwatch_escaped_params>"]
```

--------------------------------

### Databricks Overwatch Main Class Argument Structure

Source: https://databrickslabs.github.io/overwatch/gettingstarted

Explains the expected structure for arguments passed to the Databricks Overwatch Main Class. It covers scenarios with a single argument (Overwatch parameters JSON string) and two arguments (pipeline layer and Overwatch parameters JSON string).

```text
Single argument: Arg(0) is Overwatch Parameters json string (escaped if sent through API).
Two arguments: Arg(0) must be ‘bronze’, ‘silver’, or ‘gold’ corresponding to the Overwatch Pipeline layer you want to be run. Arg(1) will be the Overwatch Parameters json string.
```

--------------------------------

### Add Cluster Dependencies for OverWatch

Source: https://databrickslabs.github.io/overwatch/gettingstarted

Specifies the Maven coordinates for the OverWatch assembly (fat jar) and the Azure EventHubs Spark integration. These are required dependencies to be added to your Databricks cluster.

```maven
com.databricks.labs:overwatch_2.12:<latest>
```

```maven
com.microsoft.azure:azure-eventhubs-spark_2.12:2.3.21
```

--------------------------------

### Execute Overwatch Pipeline Stages (Scala)

Source: https://databrickslabs.github.io/overwatch/gettingstarted

Executes the main stages of the Overwatch pipeline: Bronze, Silver, and Gold. This is typically done from a Databricks notebook after the workspace has been initialized and configured.

```scala
private val args = JsonUtils.objToJson(params).compactString
val workspace = Initializer(Array(args), debugFlag = false)

Bronze(workspace).run()
Silver(workspace).run()
Gold(workspace).run()
```

--------------------------------

### Construct Overwatch Parameters and Instantiate Workspace

Source: https://databrickslabs.github.io/overwatch/assets/GettingStarted/aws_runner_docs_example

Defines data targets, token secrets, file paths, DBU prices, and constructs the OverwatchParams object. It then uses these parameters to instantiate the Initializer, converting the parameters to a JSON string. This is a key setup step for the Overwatch project.

```scala
private val dataTarget = DataTarget(
  Some(etlDB), Some(s"${storagePrefix}/${workspaceID}/${etlDB}.db"), Some(s"${storagePrefix}/global_share"),
  Some(consumerDB), Some(s"${storagePrefix}/${workspaceID}/${consumerDB}.db")
)
private val tokenSecret = TokenSecret(secretsScope, dbPATKey)
private val badRecordsPath = s"${storagePrefix}/${workspaceID}/sparkEventsBadrecords"
private val auditSourcePath = s"${storagePrefix}/${workspaceID}/raw_audit_logs"
private val interactiveDBUPrice = 0.56
private val automatedDBUPrice = 0.26

val params = OverwatchParams(
  auditLogConfig = AuditLogConfig(rawAuditPath = Some(auditSourcePath)),
  dataTarget = Some(dataTarget),
  tokenSecret = Some(tokenSecret),
  badRecordsPath = Some(badRecordsPath),
  overwatchScope = Some(scopes),
  maxDaysToLoad = maxDaysToLoad,
  databricksContractPrices = DatabricksContractPrices(interactiveDBUPrice, automatedDBUPrice),
  primordialDateString = Some(primordialDateString)
)

private val args = JsonUtils.objToJson(params).compactString
val workspace = if (args.length != 0) {
  Initializer(Array(args), debugFlag = true)
} else { Initializer(Array()) }
```

--------------------------------

### Interacting With Overwatch and its State

Source: https://databrickslabs.github.io/overwatch/dataengineer/advancedtopics

Guide on how to instantiate Overwatch configurations and retrieve workspace state, and interact with pipeline modules.

```APIDOC
## Interacting With Overwatch and its State

### Description
This section explains how to use your production notebook to instantiate Overwatch configurations, obtain condensed parameters using `JsonUtils.compactString`, and retrieve workspace objects using helper functions.

### Method
Scala code execution within a Databricks notebook.

### Endpoint
N/A

### Parameters
#### Instantiating Workspace (As of 0.6.0.4)
- **Database Name** (string) - The name of the Overwatch ETL database (e.g., `"overwatch_etl"`).

#### Instantiating Workspace (Older versions/Compact String)
- **Compact Config String** (string) - The condensed configuration string obtained from a previous run.

### Request Example
#### Getting Compact String
```scala
import com.databricks.labs.overwatch.utils.JsonUtils
val params = OverwatchParams(...)
print(JsonUtils.objToJson(params).compactString)
```

#### Instantiating Workspace (Helper Function)
```scala
import com.databricks.labs.overwatch.utils.Helpers
val prodWorkspace = Helpers.getWorkspaceByDatabase("overwatch_etl")
```

#### Instantiating Workspace (Compact String)
```scala
// For older versions
import com.databricks.labs.overwatch.pipeline.Initializer
val workspace = Initializer("<compact config string>")
```

### Response
- **Workspace Object**: Represents the Overwatch configuration and state of a workspace.

### Using the Workspace
#### Exploring the Config
```scala
prodWorkspace.getConfig.*
```

#### Interacting with the Pipeline
```scala
val bronzePipeline = Bonze(prodWorkspace)
val silverPipeline = Silver(prodWorkspace)
val goldPipeline = Gold(prodWorkspace)
bronzePipeline.*
```

### Error Handling
- Ensure the compact string used is from a compatible Overwatch version.
- `Helpers.getWorkspaceByDatabase` may not work across different Overwatch versions.
```

--------------------------------

### Materialize JSON Configurations (Scala)

Source: https://databrickslabs.github.io/overwatch/gettingstarted

Generates different string representations of the Overwatch pipeline parameters (params). These configurations are used for various purposes, including Databricks job runs and notebook instantiations.

```scala
val escapedConfigString = JsonUtils.objToJson(params).escapedString // used for job runs using main class
val compactConfigString = JsonUtils.objToJson(params).compactString // used to instantiate workspace in a notebook
val prettyConfigString = JsonUtils.objToJson(params).prettyString // human-readable json config
```

--------------------------------

### Build Overwatch Uber JAR with Maven

Source: https://databrickslabs.github.io/overwatch/deployoverwatch/runningoverwatch/clusterconfig

Demonstrates the Maven command to compile an uber JAR of Overwatch with all its dependencies. This is an alternative to downloading pre-built JARs from GitHub releases, requiring Maven to be installed and configured.

```bash
mvn clean package
```

--------------------------------

### Azure Production Workspace Config - Scala

Source: https://databrickslabs.github.io/overwatch/assets/ChangeLog/Upgrade_Example

Provides an example of a production workspace configuration string for Azure. This JSON string includes settings for audit logging, token secrets, data targets, and intelligent scaling, formatted for use with Databricks Overwatch.

```Scala
val prodArgs = """{"auditLogConfig":{"auditLogFormat":"json","azureAuditLogEventhubConfig":{"connectionString":"{{secrets/overwatch_shared/eh_conn}}","eventHubName":"EHName","auditRawEventsPrefix":"EHChkDir","maxEventsPerTrigger":10000}},"tokenSecret":{"scope":"mySecretScope","key":"PATKeyName"},"dataTarget":{"databaseName":"overwatch_etl","databaseLocation":"/path/to/etl.db","etlDataPathPrefix":"/path/to/global_share","consumerDatabaseName":"overwatch","consumerDatabaseLocation":"/path/to/consumer.db"},"badRecordsPath":"/path/to/bad/recordsTracker","overwatchScope":["audit","accounts","jobs","sparkEvents","clusters","clusterEvents","notebooks","pools"],"maxDaysToLoad":60,"databricksContractPrices":{"interactiveDBUCostUSD":0.55,"automatedDBUCostUSD":0.15,"sqlComputeDBUCostUSD":0.22,"jobsLightDBUCostUSD":0.1},"primordialDateString":"2021-01-01","intelligentScaling":{"enabled":true,"minimumCores":8,"maximumCores":64,"coeff":1.0}}"""
```

--------------------------------

### Initialize Overwatch Workspace (Scala)

Source: https://databrickslabs.github.io/overwatch/assets/ChangeLog/Upgrade_0610_91LTS

Initializes the Overwatch workspace with a compact configuration string. This sets up the necessary objects and configurations to run Overwatch jobs.

```Scala
val compactString = """compact string"""
val prodWorkspace = Initializer(compactString)
```

--------------------------------

### AWS Production Workspace Config - Scala

Source: https://databrickslabs.github.io/overwatch/assets/ChangeLog/Upgrade_Example

Provides an example of a production workspace configuration string for AWS. This JSON string outlines settings for audit logging, token secrets, data targets, and intelligent scaling, tailored for Databricks Overwatch on AWS.

```Scala
val prodArgs = """{"auditLogConfig":{"rawAuditPath":"/path/to/AuditLogs","auditLogFormat":"json"},"tokenSecret":{"scope":"mySecretScope","key":"PATKeyName"},"dataTarget":{"databaseName":"overwatch_etl","databaseLocation":"/path/to/etl.db","etlDataPathPrefix":"/path/to/global_share","consumerDatabaseName":"overwatch","consumerDatabaseLocation":"/path/to/consumer.db"},"badRecordsPath":"/tmp/overwatch_etl/sparkEventsBadrecords","overwatchScope":["audit","sparkEvents","jobs","clusters","clusterEvents","notebooks","pools"],"maxDaysToLoad":30,"databricksContractPrices":{"interactiveDBUCostUSD":0.56,"automatedDBUCostUSD":0.26,"sqlComputeDBUCostUSD":0.22,"jobsLightDBUCostUSD":0.1},"primordialDateString":"2021-01-01","intelligentScaling":{"enabled":false,"minimumCores":4,"maximumCores":512,"coeff":1.0}}"""
```

--------------------------------

### Import Overwatch Utilities - Scala

Source: https://databrickslabs.github.io/overwatch/assets/ChangeLog/Upgrade_Example

Imports necessary classes from the Overwatch utility and pipeline modules for use in Scala notebooks.

```Scala
import com.databricks.labs.overwatch.utils.Upgrade
```

```Scala
import com.databricks.labs.overwatch.pipeline.Initializer
```

--------------------------------

### Initialize Overwatch Workspace (Scala)

Source: https://databrickslabs.github.io/overwatch/assets/ChangeLog/Upgrade_Example

Initializes the Overwatch workspace using production arguments. This step prepares the environment for subsequent operations, such as upgrades. This snippet uses Scala.

```Scala
val prodWorkspace = Initializer(prodArgs)
```

--------------------------------

### Configure Overwatch Parameters and Instantiate Workspace

Source: https://databrickslabs.github.io/overwatch/assets/GettingStarted/aws_runner_docs_example_070

This snippet shows how to construct the Overwatch parameters, including data targets, token secrets, paths, pricing, and workspace name, and then instantiate the Overwatch workspace.

```scala
private val dataTarget = DataTarget(
  Some(etlDB), Some(s"${storagePrefix}/${workspaceID}/${etlDB}.db"), Some(s"${storagePrefix}/global_share"),
  Some(consumerDB), Some(s"${storagePrefix}/${workspaceID}/${consumerDB}.db")
)

private val tokenSecret = TokenSecret(secretsScope, dbPATKey)
private val badRecordsPath = s"${storagePrefix}/${workspaceID}/sparkEventsBadrecords"
private val auditSourcePath = "/path/to/my/workspaces/raw_audit_logs" // INPUT: workspace audit log directory
private val interactiveDBUPrice = 0.56
private val automatedDBUPrice = 0.26
private val customWorkspaceName = workspaceID // customize this to a custom name if custom workspace_name is desired

val params = OverwatchParams(
  auditLogConfig = AuditLogConfig(rawAuditPath = Some(auditSourcePath)),
  dataTarget = Some(dataTarget),
  tokenSecret = Some(tokenSecret),
  badRecordsPath = Some(badRecordsPath),
  overwatchScope = Some(scopes),
  maxDaysToLoad = maxDaysToLoad,
  databricksContractPrices = DatabricksContractPrices(interactiveDBUPrice, automatedDBUPrice),
  primordialDateString = Some(primordialDateString),
  workspace_name = Some(customWorkspaceName), 
  externalizeOptimize = false // recommend true -- reference this https://databrickslabs.github.io/overwatch/gettingstarted/advancedtopics/#externalize-optimize--z-order-as-of-060
)

private val args = JsonUtils.objToJson(params).compactString
val workspace = Initializer(args, debugFlag = false)
```

--------------------------------

### AWS Example Configuration for Overwatch Pipeline

Source: https://databrickslabs.github.io/overwatch/gettingstarted/configuration

This Scala code snippet demonstrates how to configure the Overwatch pipeline for AWS, setting up data targets, security tokens, and pricing details. It initializes parameters for audit logs, data storage, and intelligent scaling.

```scala
import com.databricks.labs.overwatch.pipeline.{Initializer, Bronze, Silver, Gold}
import com.databricks.labs.overwatch.utils._
private val storagePrefix = /mnt/overwatch_global
private val dataTarget = DataTarget(
  Some(etlDB), Some(s"${storagePrefix}/${workspaceID}/${etlDB}.db"), Some(s"${storagePrefix}/global_share"),
  Some(consumerDB), Some(s"${storagePrefix}/${workspaceID}/${consumerDB}.db")
)

private val tokenSecret = TokenSecret(secretsScope, dbPATKey)
private val badRecordsPath = s"${storagePrefix}/${workspaceID}/sparkEventsBadrecords"
private val auditSourcePath = "/path/to/my/workspaces/raw_audit_logs" // INPUT: workspace audit log directory
private val interactiveDBUPrice = 0.56
private val automatedDBUPrice = 0.26
private val DatabricksSQLDBUPrice = 0.22
private val automatedJobsLightDBUPrice = 0.10
private val customWorkspaceName = workspaceID // customize this to a custom name if custom workspace_name is desired

val params = OverwatchParams(
  auditLogConfig = AuditLogConfig(rawAuditPath = Some(auditSourcePath), auditLogFormat = "json"),
  dataTarget = Some(dataTarget),
  tokenSecret = Some(tokenSecret),
  badRecordsPath = Some(badRecordsPath),
  overwatchScope = Some(scopes),
  maxDaysToLoad = maxDaysToLoad,
  databricksContractPrices = DatabricksContractPrices(interactiveDBUPrice, automatedDBUPrice, DatabricksSQLDBUPrice, automatedJobsLightDBUPrice),
  primordialDateString = Some(primordialDateString),
  intelligentScaling = IntelligentScaling(enabled = true, minimumCores = 16, maximumCores = 64, coeff = 1.0),
  workspace_name = Some(customWorkspaceName), // as of 0.6.0
  externalizeOptimize = false // as of 0.6.0
)

```

--------------------------------

### Scala Example for Azure Spark UI API

Source: https://databrickslabs.github.io/overwatch/assets/_index/realtime_helpers

Example Scala code demonstrating how to construct the URL and retrieve data from the Azure Spark UI API.

```APIDOC
## Scala Example for Azure Spark UI API

### Description
This Scala code snippet shows how to dynamically construct the API URL for the Azure Spark UI and set up variables needed for the request.

### Method
N/A (Code Example)

### Endpoint
N/A (Code Example)

### Parameters
#### Scala Variables
- **`env`** (String) - The Databricks API URL.
- **`token`** (String) - The Databricks API token.
- **`clusterId`** (String) - The ID of the Spark cluster.
- **`sparkContextID`** (String) - The ID of the Spark context.
- **`orgId`** (String) - The organization ID.
- **`uiPort`** (String) - The Spark UI port.
- **`apiPath`** (String) - The specific API endpoint path (e.g., 'applications').
- **`url`** (String) - The complete constructed URL for the API request.

### Request Example
```scala
val env = dbutils.notebook.getContext.apiUrl.get
val token = dbutils.notebook.getContext.apiToken.get
val clusterId = spark.conf.get("spark.databricks.clusterUsageTags.clusterId")
val sparkContextID = spark.conf.get("spark.databricks.sparkContextId")
val orgId = spark.conf.get("spark.databricks.clusterUsageTags.clusterOwnerOrgId")
val uiPort = spark.conf.get("spark.ui.port")
val apiPath = "applications"
val url = s"$env/driver-proxy-api/o/$orgId/$clusterId/$uiPort/api/v1/$apiPath"
```

### Response
N/A (Code Example)
```

--------------------------------

### Display Configuration Strings

Source: https://databrickslabs.github.io/overwatch/assets/GettingStarted/aws_runner_docs_example

Shows how to convert the Overwatch parameters object to different JSON string formats (escaped and compact). These commands are useful for preparing the configuration to be run as a job with a main class.

```scala
JsonUtils.objToJson(params).escapedString
```

```scala
JsonUtils.objToJson(params).compactString
```

--------------------------------

### Scala Example for AWS Spark UI API

Source: https://databrickslabs.github.io/overwatch/assets/_index/realtime_helpers

Example Scala code for constructing the URL to access the AWS Spark UI API.

```APIDOC
## Scala Example for AWS Spark UI API

### Description
This Scala code snippet demonstrates how to construct the API endpoint URL for accessing the AWS Spark UI through the Databricks driver proxy.

### Method
N/A (Code Example)

### Endpoint
N/A (Code Example)

### Parameters
#### Scala Variables
- **`uiPort`** (String) - The Spark UI port.
- **`env`** (String) - The base URL of the Databricks environment (shard name).
- **`token`** (String) - The Databricks API token.
- **`clusterId`** (String) - The ID of the Spark cluster.
- **`url`** (String) - The complete constructed URL for the API request.

### Request Example
```scala
val uiPort = spark.conf.get("spark.ui.port")
val env = dbutils.notebook.getContext.apiUrl.get
val token = dbutils.notebook.getContext.apiToken.get
val clusterId = spark.conf.get("spark.databricks.clusterUsageTags.clusterId")
val url = s"https://$env/driver-proxy-api/o/0/$clusterId/$uiPort/api/v1/"

// Example output from execution:
// uiPort: String = 42285
// env: String = https://demo.cloud.databricks.com
// token: String = [REDACTED]
// clusterId: String = 0318-151752-abed99
// url: String = https://https://demo.cloud.databricks.com/driver-proxy-api/o/0/0318-151752-abed99/42285/api/v1/
```

### Response
N/A (Code Example)
```

--------------------------------

### AWS EC2 DescribeSpotPriceHistory API Response Example

Source: https://databrickslabs.github.io/overwatch/faq

A sample XML response from the AWS EC2 DescribeSpotPriceHistory API, detailing instance type, product description, spot price, timestamp, and availability zone for spot instances.

```xml
  <requestId>59dbff89-35bd-4eac-99ed-be587EXAMPLE</requestId> 
  <spotPriceHistorySet>
    <item>
      <instanceType>m3.medium</instanceType>
      <productDescription>Linux/UNIX</productDescription>
      <spotPrice>0.287</spotPrice>
      <timestamp>2020-11-01T20:56:05.000Z</timestamp>
      <availabilityZone>us-west-2a</availabilityZone>
    </item>
    <item>
      <instanceType>m3.medium</instanceType>
      <productDescription>Windows</productDescription>
      <spotPrice>0.033</spotPrice>
      <timestamp>2020-11-01T22:33:47.000Z</timestamp>
      <availabilityZone>us-west-2a</availabilityZone>
    </item>
  </spotPriceHistorySet>
  <nextToken/>
</DescribeSpotPriceHistoryResponse>
```

--------------------------------

### Import Databricks Overwatch Libraries (Scala)

Source: https://databrickslabs.github.io/overwatch/assets/GettingStarted/aws_runner_docs_example

Imports necessary classes from the Databricks Overwatch library for pipeline operations and utility functions.

```scala
import com.databricks.labs.overwatch.pipeline.{Initializer, Bronze, Silver, Gold}
import com.databricks.labs.overwatch.utils._
import com.databricks.labs.overwatch.pipeline.TransformFunctions
```

--------------------------------

### Generate Compact Config String - Scala

Source: https://databrickslabs.github.io/overwatch/assets/ChangeLog/Upgrade_Example

Demonstrates how to instantiate Overwatch parameters and then convert the object to a compact JSON string using `JsonUtils.objToJson` and `compactString`. This is essential for passing configuration parameters efficiently.

```Scala
val params = OverwatchParams(...)
println(JsonUtils.objToJson(params).compactString)
```

--------------------------------

### List Storage Directory (Scala)

Source: https://databrickslabs.github.io/overwatch/assets/GettingStarted/aws_runner_docs_example

Displays the contents of a specified directory within the Databricks file system, typically used for initial verification or debugging.

```scala
display(dbutils.fs.ls(s"${storagePrefix}/${workspaceID}").toDF)
```

--------------------------------

### Get Config Strings for Overwatch

Source: https://databrickslabs.github.io/overwatch/assets/GettingStarted/aws_runner_docs_example_060

Provides methods to serialize Overwatch parameters into different string formats (escaped, compact, pretty) for job execution or reference.

```Scala
// use this to run Overwatch as a job
JsonUtils.objToJson(params).escapedString
```

```Scala
// use this to get a compact string of the created parameters for reference and/or creating workspace object on the fly
JsonUtils.objToJson(params).compactString
```

```Scala
// Prints a pretty string of the entire config to be used in the run
JsonUtils.objToJson(params).prettyString
```

--------------------------------

### Install and Configure Collectd

Source: https://databrickslabs.github.io/overwatch/assets/_index/realtime_helpers

Installs the collectd monitoring agent and its utilities, then configures the collectd.conf file to monitor various system metrics and send them to a Graphite server. This involves setting up plugins for CPU, memory, disk, network interfaces, and more, along with specifying the Graphite host and port.

```bash
apt-get update -y
DEBIAN_FRONTEND=noninteractive apt install collectd collectd-utils -y
cat >"/etc/collectd/collectd.conf" <<EOL
Hostname "${IP_CLEAN}"
FQDNLookup false
BaseDir "/local_disk0/tmp"

LoadPlugin cpu
LoadPlugin CPUFreq
LoadPlugin df
LoadPlugin entropy
LoadPlugin interface
LoadPlugin load
LoadPlugin memory
LoadPlugin mysql
LoadPlugin processes
LoadPlugin rrdtool
LoadPlugin users
LoadPlugin write_graphite

<Plugin df>
    Device "/dev/mapper/vg-lv"
    MountPoint "/local_disk0"
    FSType "ext4"
</Plugin>

<Plugin interface>
    Interface "ens3"
    IgnoreSelected false
</Plugin>

<Plugin write_graphite>
    <Node "graphing">
        Host "${OVERWATCH_RELAY}"
        Port "2003"
        Protocol "tcp"
        LogSendErrors true
        Prefix "v1.${WORKSPACE}.cluster_${DB_CLUSTER_ID}.machine.raw.${IS_DRIVER}."
        StoreRates true
        AlwaysAppendDS false
        EscapeCharacter "_"
    </Node>
</Plugin>
EOL
service collectd restart
```

--------------------------------

### Convert Parameters to Compact String for Reference

Source: https://databrickslabs.github.io/overwatch/assets/GettingStarted/aws_runner_docs_example_070

This snippet demonstrates converting the Overwatch parameters object into a compact JSON string. This is useful for referencing the configuration or instantiating the workspace object on the fly.

```scala
// use this to get a compact string of the created parameters for reference and/or creating workspace object on the fly
JsonUtils.objToJson(params).compactString
```

--------------------------------

### Execute Overwatch Upgrade - Scala Example

Source: https://databrickslabs.github.io/overwatch/assets/ChangeLog/060_upgrade_process

Provides an example of how to execute the Overwatch upgrade to version 0.6.0 using Scala. It demonstrates calling the 'upgradeTo060' function with specific parameters, including disabling Spark table rebuilds.

```scala
Upgrade.upgradeTo060(prodWorkspace, workspaceNameMap, rebuildSparkTables = false, startStep = 1.0)
```

--------------------------------

### Import Overwatch Initialization Utility (Scala)

Source: https://databrickslabs.github.io/overwatch/assets/ChangeLog/Upgrade_0610_91LTS

Imports the Initializer class from the Overwatch pipeline utilities. This class is essential for setting up and configuring Overwatch jobs.

```Scala
import com.databricks.labs.overwatch.pipeline.Initializer
```

--------------------------------

### Execute Silver Pipeline Stage

Source: https://databrickslabs.github.io/overwatch/assets/GettingStarted/aws_runner_docs_example_070

This command executes the Silver layer of the Overwatch pipeline, transforming bronze data into a silver data format.

```scala
Silver(workspace).run()
```

--------------------------------

### Construct Overwatch Parameters and Instantiate Workspace

Source: https://databrickslabs.github.io/overwatch/assets/GettingStarted/aws_runner_docs_example_042

Defines data targets, token secrets, paths, and pricing for the Overwatch project. It then constructs the OverwatchParams object, serializes it to JSON, and initializes the workspace. Dependencies include `DataTarget`, `TokenSecret`, `DatabricksContractPrices`, `AuditLogConfig`, and `OverwatchParams`.

```scala
private val dataTarget = DataTarget(
  Some(etlDB), Some(s"${storagePrefix}/${workspaceID}/${etlDB}.db"), Some(s"${storagePrefix}/global_share"),
  Some(consumerDB), Some(s"${storagePrefix}/${workspaceID}/${consumerDB}.db")
)
```

```scala
private val tokenSecret = TokenSecret(secretsScope, dbPATKey)
```

```scala
private val badRecordsPath = s"${storagePrefix}/${workspaceID}/sparkEventsBadrecords"
```

```scala
private val auditSourcePath = "/path/to/my/workspaces/raw_audit_logs" // INPUT: workspace audit log directory
```

```scala
private val interactiveDBUPrice = 0.56
```

```scala
private val automatedDBUPrice = 0.26
```

```scala
val params = OverwatchParams(
  auditLogConfig = AuditLogConfig(rawAuditPath = Some(auditSourcePath)),
  dataTarget = Some(dataTarget),
  tokenSecret = Some(tokenSecret),
  badRecordsPath = Some(badRecordsPath),
  overwatchScope = Some(scopes),
  maxDaysToLoad = maxDaysToLoad,
  databricksContractPrices = DatabricksContractPrices(interactiveDBUPrice, automatedDBUPrice),
  primordialDateString = Some(primordialDateString)
)
```

```scala
private val args = JsonUtils.objToJson(params).compactString
```

```scala
val workspace = Initializer(args, debugFlag = true)
```

--------------------------------

### Configure Widgets for Job Parameters (Scala)

Source: https://databrickslabs.github.io/overwatch/assets/GettingStarted/aws_runner_docs_example

Sets up Databricks widgets for user-configurable parameters like storage prefix, database names, secrets, and date ranges. It also defines the scopes for data loading.

```scala
// dbutils.widgets.removeAll
// dbutils.widgets.text("storagePrefix", "", "1. ETL Storage Prefix")
// dbutils.widgets.text("etlDBName", "overwatch_etl", "2. ETL Database Name")
// dbutils.widgets.text("consumerDBName", "overwatch", "3. Consumer DB Name")
// dbutils.widgets.text("secretsScope", "my_secret_scope", "4. Secret Scope")
// dbutils.widgets.text("dbPATKey", "my_key_with_api", "5. Secret Key (DBPAT)")
// dbutils.widgets.text("primordialDateString", "2020-10-27", "6. Primordial Date")
// dbutils.widgets.text("maxDaysToLoad", "60", "7. Max Days")
// dbutils.widgets.text("scopes", "all", "8. Scopes")
```

--------------------------------

### Show The Run Report

Source: https://databrickslabs.github.io/overwatch/assets/GettingStarted/aws_runner_docs_example

Displays the run report for the specified etlDB. This command is used to visualize the results and performance of the executed pipeline stages.

```scala
display(pipReport(etlDB))
```

--------------------------------

### Construct Overwatch Parameters and Instantiate Workspace

Source: https://databrickslabs.github.io/overwatch/assets/GettingStarted/azure_runner_docs_example

Defines data targets, token secrets, event hub configurations, DBU prices, and constructs the OverwatchParams object. It then initializes the workspace based on the generated parameters.

```scala
private val dataTarget = DataTarget(
  Some(etlDB), Some(s"${storagePrefix}/${workspaceID}/${etlDB}.db"), Some(s"${storagePrefix}/global_share"),
  Some(consumerDB), Some(s"${storagePrefix}/${workspaceID}/${consumerDB}.db")
)
```

```scala
private val tokenSecret = TokenSecret(secretsScope, dbPATKey)
```

```scala
private val ehConnString = dbutils.secrets.get(secretsScope, ehKey)
```

```scala
private val ehStatePath = s"${storagePrefix}/${workspaceID}/ehState"
```

```scala
private val badRecordsPath = s"${storagePrefix}/${workspaceID}/sparkEventsBadrecords"
```

```scala
private val azureLogConfig = AzureAuditLogEventhubConfig(connectionString = ehConnString, eventHubName = ehName, auditRawEventsPrefix = ehStatePath)
```

```scala
private val interactiveDBUPrice = 0.56
```

```scala
private val automatedDBUPrice = 0.26
```

```scala
val params = OverwatchParams(
  auditLogConfig = AuditLogConfig(azureAuditLogEventhubConfig = Some(azureLogConfig)),
  dataTarget = Some(dataTarget), 
  tokenSecret = Some(tokenSecret),
  badRecordsPath = Some(badRecordsPath),
  overwatchScope = Some(scopes),
  maxDaysToLoad = maxDaysToLoad,
  databricksContractPrices = DatabricksContractPrices(interactiveDBUPrice, automatedDBUPrice),
  primordialDateString = Some(primordialDateString)
)
```

```scala
private val args = JsonUtils.objToJson(params).compactString
```

```scala
val workspace = if (args.length != 0) {
  Initializer(Array(args), debugFlag = true)
} else {
  Initializer(Array())
}
```

--------------------------------

### Execute Gold Pipeline Stage

Source: https://databrickslabs.github.io/overwatch/assets/GettingStarted/aws_runner_docs_example_070

This command executes the Gold layer of the Overwatch pipeline, further processing silver data into a gold data format.

```scala
Gold(workspace).run()
```

--------------------------------

### Install and Configure Collectd for Metrics Collection

Source: https://databrickslabs.github.io/overwatch/assets/_index/realtime_helpers

Installs the collectd package and configures it to collect various system metrics (CPU, memory, disk, network interfaces) and send them to a Graphite relay. The configuration includes specifying devices, mount points, and the Graphite node details.

```bash
apt-get update -y
DEBIAN_FRONTEND=noninteractive apt install collectd collectd-utils -y
```

```bash
cat >"/etc/collectd/collectd.conf" <<EOL
Hostname "${IP_CLEAN}"
FQDNLookup false
BaseDir "/local_disk0/tmp"

LoadPlugin cpu
LoadPlugin CPUFreq
LoadPlugin df
LoadPlugin entropy
LoadPlugin interface
LoadPlugin load
LoadPlugin memory
LoadPlugin mysql
LoadPlugin processes
LoadPlugin rrdtool
LoadPlugin users
LoadPlugin write_graphite

<Plugin df>
    Device "/dev/mapper/vg-lv"
    MountPoint "/local_disk0"
    FSType "ext4"
</Plugin>

<Plugin interface>
    Interface "ens3"
    IgnoreSelected false
</Plugin>

<Plugin write_graphite>
    <Node "graphing">
        Host "${OVERWATCH_RELAY}"
        Port "2003"
        Protocol "tcp"
        LogSendErrors true
        Prefix "v1.${WORKSPACE}.cluster_${DB_CLUSTER_ID}.machine.raw.${IS_DRIVER}."
        StoreRates true
        AlwaysAppendDS false
        EscapeCharacter "_"
    </Node>
</Plugin>
EOL
```

--------------------------------

### Configure Overwatch Job Parameters via Widgets (Scala)

Source: https://databrickslabs.github.io/overwatch/assets/GettingStarted/aws_runner_docs_example_070

Sets up Databricks widgets for interactive configuration of Overwatch job parameters such as storage prefix, database names, secrets scope, and date ranges. These parameters are then retrieved and validated.

```Scala
// dbutils.widgets.removeAll
// dbutils.widgets.text("storagePrefix", "", "1. ETL Storage Prefix")
// dbutils.widgets.text("etlDBName", "overwatch_etl", "2. ETL Database Name")
// dbutils.widgets.text("consumerDBName", "overwatch", "3. Consumer DB Name")
// dbutils.widgets.text("secretsScope", "my_secret_scope", "4. Secret Scope")
// dbutils.widgets.text("dbPATKey", "SECRET_NAME_my_key_with_api", "5. Secret Key (DBPAT)")
// dbutils.widgets.text("primordialDateString", "2021-09-30", "6. Primordial Date")
// dbutils.widgets.text("maxDaysToLoad", "60", "7. Max Days")
// dbutils.widgets.text("scopes", "all", "8. Scopes")
```

```Scala
val storagePrefix = dbutils.widgets.get("storagePrefix").toLowerCase // OUTPUT: PRIMARY OVERWATCH OUTPUT PREFIX
val etlDB = dbutils.widgets.get("etlDBName").toLowerCase
val consumerDB = dbutils.widgets.get("consumerDBName").toLowerCase
val secretsScope = dbutils.widgets.get("secretsScope")
val dbPATKey = dbutils.widgets.get("dbPATKey")
val primordialDateString = dbutils.widgets.get("primordialDateString")
val maxDaysToLoad = dbutils.widgets.get("maxDaysToLoad").toInt
val scopes = if (dbutils.widgets.get("scopes") == "all") {
  "audit,sparkEvents,jobs,clusters,clusterEvents,notebooks,pools,accounts,dbsql".split(",")
} else dbutils.widgets.get("scopes").split(",")

if (storagePrefix.isEmpty || consumerDB.isEmpty || etlDB.isEmpty || secretsScope.isEmpty || dbPATKey.isEmpty) {
  throw new IllegalArgumentException("Please specify all required parameters!")
}
```

--------------------------------

### Execute Bronze Pipeline Stage

Source: https://databrickslabs.github.io/overwatch/assets/GettingStarted/aws_runner_docs_example_070

This command executes the Bronze layer of the Overwatch pipeline, processing raw data into a bronze data format.

```scala
Bronze(workspace).run()
```

--------------------------------

### AWS EC2 DescribeSpotPriceHistory API Request Example

Source: https://databrickslabs.github.io/overwatch/faq

A sample request to the AWS EC2 API to retrieve historical spot pricing data. It specifies the time range, availability zone, and requires authentication parameters.

```http
&StartTime=2020-11-01T00:00:00.000Z
&EndTime=2020-11-01T23:59:59.000Z
&AvailabilityZone=us-west-2a
&AUTHPARAMS
```

--------------------------------

### List Storage Files (Scala)

Source: https://databrickslabs.github.io/overwatch/assets/GettingStarted/azure_runner_docs_example

Lists the files in the specified storage path for the current workspace. This is typically used to check if data already exists.

```scala
// If first run this should be empty
display(dbutils.fs.ls(s"${storagePrefix}/${workspaceID}").toDF)
```

--------------------------------

### Databricks Overwatch Init Script

Source: https://databrickslabs.github.io/overwatch/assets/_index/realtime_helpers

A bash script that initializes the Databricks environment for Overwatch. It sets up environment variables, configures Spark metrics, installs and configures collectd, and restarts the collectd service.

```bash
#!/bin/bash

OVERWATCH_RELAY=127.0.0.1 # CHANGE_ME
WORKSPACE=AWS_DEMO # CHANGE_ME

if [[ $DB_IS_DRIVER = "TRUE" ]]; then
  IS_DRIVER="driver"
else
  IS_DRIVER="worker"
fi

IP_CLEAN=$( echo ${DB_CONTAINER_IP} | tr '.' '_' )

LOG_DIR=/dbfs/cluster-logs/${DB_CLUSTER_ID}/csv_metrics/${DB_CONTAINER_IP}
if [ ! -d "/dbfs/cluster-logs/${DB_CLUSTER_ID}/csv_metrics/${DB_CONTAINER_IP}" ]
then
  mkdir -p $LOG_DIR
fi
TMP_SCRIPT=/tmp/custom_metrics.sh

cat >"$TMP_SCRIPT" <<EOL

master.source.jvm.class=org.apache.spark.metrics.source.JvmSource
worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource
driver.source.jvm.class=org.apache.spark.metrics.source.JvmSource
executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource

*.sink.graphite.class=org.apache.spark.metrics.sink.GraphiteSink
*.sink.graphite.host=${OVERWATCH_RELAY}
*.sink.graphite.port=2003
*.sink.graphite.period=10
*.sink.graphite.unit=seconds
*.sink.graphite.prefix=v1.${WORKSPACE}.cluster_${DB_CLUSTER_ID}.spark.raw
# *.sink.prometheusServlet.class=org.apache.spark.metrics.sing.PrometheusServlet
# *.sink.prometheusServlet.path=/metrics/prometheus
# master.sink.prometheusServlet.path=/metrics/master/prometheus
# applications.sink.prometheusServlet.path=/metrics/applications/prometheus
EOL

cat "$TMP_SCRIPT" >> /databricks/spark/conf/metrics.properties

apt-get update -y
DEBIAN_FRONTEND=noninteractive apt install collectd collectd-utils -y

cat >"/etc/collectd/collectd.conf" <<EOL
Hostname "${IP_CLEAN}"
FQDNLookup false
BaseDir "/local_disk0/tmp"

LoadPlugin cpu
LoadPlugin CPUFreq
LoadPlugin df
LoadPlugin entropy
LoadPlugin interface
LoadPlugin load
LoadPlugin memory
LoadPlugin mysql
LoadPlugin processes
LoadPlugin rrdtool
LoadPlugin users
LoadPlugin write_graphite

<Plugin df>
    Device "/dev/mapper/vg-lv"
    MountPoint "/local_disk0"
    FSType "ext4"
</Plugin>

<Plugin interface>
    Interface "ens3"
    IgnoreSelected false
</Plugin>

<Plugin write_graphite>
    <Node "graphing">
        Host "${OVERWATCH_RELAY}"
        Port "2003"
        Protocol "tcp"
        LogSendErrors true
        Prefix "v1.${WORKSPACE}.cluster_${DB_CLUSTER_ID}.machine.raw.${IS_DRIVER}."
        StoreRates true
        AlwaysAppendDS false
        EscapeCharacter "_"
    </Node>
</Plugin>
EOL

service collectd restart

```

--------------------------------

### Construct Overwatch Parameters and Instantiate Workspace

Source: https://databrickslabs.github.io/overwatch/assets/GettingStarted/azure_runner_docs_example_070

Constructs the `OverwatchParams` object using the retrieved configuration values, including data targets, token secrets, and event hub configurations. It then uses these parameters to initialize the Overwatch workspace, preparing it for pipeline execution.

```scala
private val dataTarget = DataTarget(
  Some(etlDB), Some(s"${storagePrefix}/${workspaceID}/${etlDB}.db"), Some(s"${storagePrefix}/global_share"),
  Some(consumerDB), Some(s"${storagePrefix}/${workspaceID}/${consumerDB}.db")
)

private val tokenSecret = TokenSecret(secretsScope, dbPATKey)
private val ehConnString = s"{{secrets/${secretsScope}/${ehKey}}}"

private val ehStatePath = s"${storagePrefix}/${workspaceID}/ehState"
private val badRecordsPath = s"${storagePrefix}/${workspaceID}/sparkEventsBadrecords"
private val azureLogConfig = AzureAuditLogEventhubConfig(connectionString = ehConnString, eventHubName = ehName, auditRawEventsPrefix = ehStatePath)
private val interactiveDBUPrice = 0.55
private val automatedDBUPrice = 0.30
private val customWorkspaceName = workspaceID // customize this to a custom name if custom workspace_name is desired

val params = OverwatchParams(
  auditLogConfig = AuditLogConfig(azureAuditLogEventhubConfig = Some(azureLogConfig)),
  dataTarget = Some(dataTarget), 
  tokenSecret = Some(tokenSecret),
  badRecordsPath = Some(badRecordsPath),
  overwatchScope = Some(scopes),
  maxDaysToLoad = maxDaysToLoad,
  databricksContractPrices = DatabricksContractPrices(interactiveDBUPrice, automatedDBUPrice),
  primordialDateString = Some(primordialDateString),
  workspace_name = Some(customWorkspaceName), 
  externalizeOptimize = false // recommend true -- reference this https://databrickslabs.github.io/overwatch/gettingstarted/advancedtopics/#externalize-optimize--z-order-as-of-060
)

private val args = JsonUtils.objToJson(params).compactString
val workspace = Initializer(args, debugFlag = false)
```

--------------------------------

### Import Overwatch Environment and Pipeline Components (Scala)

Source: https://databrickslabs.github.io/overwatch/assets/FAQ/multi_workspace_cleanup

Imports necessary classes from the Overwatch library for environment setup and pipeline initialization. These imports are essential for interacting with Databricks workspaces.

```Scala
import org.apache.spark.sql.functions._
```

```Scala
import com.databricks.labs.overwatch.env.Workspace
```

```Scala
import scala.collection.parallel.ForkJoinTaskSupport
```

```Scala
import java.util.concurrent.ForkJoinPool
```

```Scala
import com.databricks.labs.overwatch.pipeline.Initializer
```

--------------------------------

### Scala: Initialize Overwatch with updated arguments (0.4.2+)

Source: https://databrickslabs.github.io/overwatch/changelog

This Scala code shows the updated syntax for initializing Overwatch starting from version 0.4.2 when instantiating it from a notebook. It reflects a change in how arguments are passed to the `Initializer` class.

```scala
// 0.4.1 (OLD)
val workspace = Initializer(Array(args), debugFlag = true)
// 0.4.2+ (NEW)
val workspace = Initializer(args, debugFlag = true)
```

--------------------------------

### Convert Parameters to Pretty String for Display

Source: https://databrickslabs.github.io/overwatch/assets/GettingStarted/aws_runner_docs_example_070

This code snippet converts the Overwatch parameters object into a pretty-printed JSON string, making the entire configuration easily readable for review.

```scala
// Prints a pretty string of the entire config to be used in the run
JsonUtils.objToJson(params).prettyString
```

--------------------------------

### Execute The Pipeline

Source: https://databrickslabs.github.io/overwatch/assets/GettingStarted/aws_runner_docs_example

Executes the Overwatch data processing pipeline in stages: Bronze, Silver, and Gold. Each stage is represented by a class that is instantiated with the workspace object and then its run method is called.

```scala
Bronze(workspace).run()
```

```scala
Silver(workspace).run()
```

```scala
Gold(workspace).run()
```