### Getting Started Tutorials

Source: https://github.com/mozilla/data-docs/blob/main/src/SUMMARY.md

A collection of tutorials to help users get started with data analysis and tools at Mozilla.

```markdown
[Getting Started](cookbooks/getting_started/index.md)
```

--------------------------------

### Serve Documentation Locally

Source: https://github.com/mozilla/data-docs/blob/main/README.md

Starts a local server to preview the documentation. This command is used after installing mdbook-dtmo and assumes the documentation source files are present in the current directory.

```bash
mdbook-dtmo serve
```

--------------------------------

### BigQuery ETL for Live Data Setup

Source: https://github.com/mozilla/data-docs/blob/main/src/cookbooks/live_data.md

This snippet references a Python script used with bigquery-etl for setting up direct access to live data. It's presented as an example of the ease of setup for this method, particularly for creating user-facing views.

```Python
# Reference to a bigquery-etl script for setting up live data views
# Example: https://github.com/mozilla/bigquery-etl/blob/main/sql/moz-fx-data-shared-prod/hubs/active_subscription_ids/view.sql

# This is a conceptual representation. The actual code would be within the linked SQL file
# which defines a BigQuery view. The bigquery-etl tool orchestrates the creation of such views.

# Example of a SQL view definition (as found in the linked file):
# CREATE VIEW `your_project.your_dataset.hubs_active_subscription_ids_live` AS
# SELECT
#   user_id,
#   MAX(CASE WHEN subscription_active THEN 1 ELSE 0 END) AS is_active
# FROM
#   `your_project.your_dataset.hubs_subscriptions_live`
# WHERE
#   submission_timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 2 DAY)
# GROUP BY
#   user_id;
```

--------------------------------

### Install mdbook-dtmo via curl

Source: https://github.com/mozilla/data-docs/blob/main/README.md

Installs the mdbook-dtmo tool, a fork of mdBook with custom additions for Mozilla's environment, using a script from a provided URL. This is a convenient way to get the tool if Rust is already installed.

```bash
curl -LSfs https://japaric.github.io/trust/install.sh | sh -s -- --git badboy/mdbook-dtmo
```

--------------------------------

### Example Query for Telemetry (BigQuery)

Source: https://github.com/mozilla/data-docs/blob/main/src/datasets/other/addons_daily/intro.md

This snippet shows an example query for accessing the addons_daily table through the Telemetry (BigQuery) data source. It demonstrates how to retrieve data related to add-ons and their users.

```SQL
SELECT
  addon_id,
  submission_date,
  users_with_addon
FROM
  `project.dataset.addons_daily`
WHERE
  submission_date BETWEEN '2023-01-01' AND '2023-01-31'
LIMIT 100;
```

--------------------------------

### Working with Looker Introduction

Source: https://github.com/mozilla/data-docs/blob/main/src/SUMMARY.md

An introductory guide to using Looker for data visualization and analysis at Mozilla.

```markdown
[Working with Looker](cookbooks/looker/index.md)
```

--------------------------------

### Stub Installer Ping Dataset

Source: https://github.com/mozilla/data-docs/blob/main/src/SUMMARY.md

Provides data from the stub installer ping, likely related to the initial installation process of Firefox. Useful for tracking installation success and early user experience.

```markdown
[Stub installer ping](datasets/other/stub_installer/reference.md)
```

--------------------------------

### Introduction to Cookbooks

Source: https://github.com/mozilla/data-docs/blob/main/src/SUMMARY.md

An introduction to the collection of tutorials and practical guides for data analysis at Mozilla.

```markdown
[Tutorials & Cookbooks](cookbooks/index.md)
```

--------------------------------

### Install mdbook-dtmo via Cargo

Source: https://github.com/mozilla/data-docs/blob/main/README.md

Builds and installs the mdbook-dtmo preprocessors using the Cargo package manager. This method is suitable if you have the Rust toolchain and Cargo installed.

```rust
cargo install mdbook-dtmo
```

--------------------------------

### BigQuery Table Naming Convention

Source: https://github.com/mozilla/data-docs/blob/main/src/tools/guiding_principles.md

This example shows the naming convention for BigQuery tables used by the data pipeline. It includes the dataset, table name, and version, differentiating between live and stable (historical) data.

```sql
activity_stream_live.impression_stats_v1
activity_stream_stable.impression_stats_v1
```

--------------------------------

### BigQuery ETL Logic Examples

Source: https://github.com/mozilla/data-docs/blob/main/src/cookbooks/data_modeling/where_to_store.md

Examples of business logic and data transformations expected in the 'bigquery-etl' repository for BigQuery datasets. This includes core metrics, search metrics, acquisition/retention/churn calculations, and partner code mapping.

```markdown
- The calculation of [core metrics](https://docs.telemetry.mozilla.org/metrics/index.html): DAU, WAU, MAU, new profiles.
- Calculation of [search metrics](https://docs.telemetry.mozilla.org/datasets/search.html?highlight=search#terminology). E.g. Ad clicks, search with ads, organic search.
- Calculation of acquisition, retention and churn metrics.
- Mapping from partner code to platform for Bing revenue.
- Segmentation of clients that require the implementation of business logic, not just filtering on specific columns.
```

--------------------------------

### Looker Explore Definition Example (VPN Subscriptions)

Source: https://github.com/mozilla/data-docs/blob/main/src/cookbooks/data_modeling/where_to_store.md

An example of a Looker explore definition in LKML for 'subscriptions', showcasing how to join with other views and define specific dimensions and time frames.

```lkml
explore: subscriptions {
  label: "VPN Subscriptions"
  from: subscriptions
  join: users {
    type: left_outer
    sql_on: ${subscriptions.user_id} = ${users.id} ;;
  }
  # ... other join and view definitions
}
```

--------------------------------

### OpMon Project Configuration Example

Source: https://github.com/mozilla/data-docs/blob/main/src/cookbooks/operational_monitoring.md

Example TOML configuration for an OpMon project, detailing sections like [project], [data_sources], [metrics], and [dimensions]. This snippet illustrates how to define project-specific monitoring parameters.

```toml
[project]
# A custom, descriptive name of the project.
# This will be used as the generated Looker dashboard title.
name = "A new operational monitoring project"

# The name of the platform this project targets.
# For example, "firefox_desktop", "fenix", "firefox_ios", ...
platform = "firefox_desktop"

# Specifies the type of monitoring desired as described above.
# Either "submission_date" (to monitor each day) or "build_id" (to monitor build over build)
xaxis = "submission_date"

# Both start_date and end_date can be overridden, otherwise the dates configured in
# Experimenter will be used as defaults.
start_date = "2022-01-01"

# Whether to skip the analysis for this project entirely.
# Useful for skipping rollouts for which OpMon projects are generated automatically otherwise.
skip = false

# Whether the project is related to a rollout.
is_rollout = false

# Ignore the default metrics that would be computed.
skip_default_metrics = false

# Whether to have all the results in a single tile on the Looker dashboard (compact)
# or to have separate tiles for each metric.
compact_visualization = false

# Metrics, that are based on metrics, to compute.
# Defined as a list of strings. These strings are the "slug" of the metric, which is the

```

--------------------------------

### Querying Live Tables Directly Example

Source: https://github.com/mozilla/data-docs/blob/main/src/cookbooks/live_data.md

This example demonstrates how to query live tables, specifically accessing data from the last two days from live tables and older data from stable tables. It highlights the importance of disabling caching to ensure fresh data is returned. The example links to a SQL view definition for active hub subscriptions.

```SQL
-- Example SQL for accessing live data (conceptual)
-- Assumes a view named 'monitoring.topsites_click_rate_live'
-- and filtering by submission_timestamp.

SELECT
    *
FROM
    `your_project.your_dataset.monitoring.topsites_click_rate_live`
WHERE
    submission_timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 2 DAY)
    AND submission_timestamp < CURRENT_TIMESTAMP();

-- Note: Actual query will depend on the specific view and table structure.
-- Caching should be disabled in the query execution environment.
```

--------------------------------

### Example BigQuery Query: Clients Last Seen

Source: https://github.com/mozilla/data-docs/blob/main/src/cookbooks/bigquery/querying.md

An example SQL query to retrieve and aggregate data from the `mozdata.telemetry.clients_last_seen` table, demonstrating filtering, grouping, and ordering.

```sql
SELECT
    submission_date,
    os,
    COUNT(*) AS count
FROM
    mozdata.telemetry.clients_last_seen
WHERE
    submission_date >= DATE_SUB(CURRENT_DATE, INTERVAL 1 WEEK)
    AND days_since_seen = 0
GROUP BY
    submission_date,
    os
HAVING
    count > 10 -- remove outliers
    AND lower(os) NOT LIKE '%windows%'
ORDER BY
    os,
    submission_date DESC
```

--------------------------------

### SQL Style Guide

Source: https://github.com/mozilla/data-docs/blob/main/src/SUMMARY.md

Presents a style guide for writing SQL queries, promoting consistency, readability, and maintainability. Includes best practices for formatting, naming conventions, and query optimization.

```markdown
[SQL Style Guide](concepts/sql_style.md)
```

--------------------------------

### Looker View Definition Example (Browser KPIs)

Source: https://github.com/mozilla/data-docs/blob/main/src/cookbooks/data_modeling/where_to_store.md

An example of a Looker view definition in LKML for 'browser_kpis', demonstrating the implementation of cumulative days of use using a SUM aggregation.

```lkml
view: browser_kpis {
  # ... other dimension and measure definitions
  measure: cumulative_days_of_use {
    type: sum
    sql: ${days_of_use} ;;
    label: "Cumulative Days of Use"
  }
  # ... other dimension and measure definitions
}
```

--------------------------------

### Creating a Prototype Data Project on Google Cloud Platform

Source: https://github.com/mozilla/data-docs/blob/main/src/SUMMARY.md

Steps for setting up a prototype data project within Google Cloud Platform, including environment configuration and initial setup.

```markdown
[Creating a Prototype Data Project on Google Cloud Platform](cookbooks/gcp-projects.md)
```

--------------------------------

### Project Configuration Example

Source: https://github.com/mozilla/data-docs/blob/main/src/cookbooks/operational_monitoring.md

This snippet shows a typical project configuration, defining metrics, alerts, data sources, and rollout parameters.

```toml
# name of the metric definition section in either the project configuration or the platform-specific
# configuration file.
# See [metrics] section on how these metrics get defined.
metrics = [
    'shutdown_hangs',
    'main_crashes',
    'startup_crashes',
    'memory_unique_content_startup',
    'perf_page_load_time_ms'
]

alerts = [
    "ci_diffs"
]

# This section specifies the clients that should be monitored.
[project.population]

# Slug/name of the data source definition section in either the project configuration or the platform-specific
# configuration file. This data source refers to a database table.
# See [data_sources] section on how this gets defined.
data_source = "main"

# The name of the branches that have been configured for a rollout or experiment.
# If defined, this configuration overrides boolean_pref.
branches = ["enabled", "disabled"]

# A SQL snippet that results in a boolean representing whether a client is included in the rollout or experiment or not.
boolean_pref = "environment.settings.fission_enabled"

# The channel the clients should be monitored from: "release", "beta", or "nightly".
channel = "beta"

# If set to "true", the rollout and experiment configurations will be ignored and instead
# the entire client population (regardless of whether they are part of the experiment or rollout)
# will be monitored.
# This option is useful if the project is not associated to a rollout or experiment and the general
# client population of a product should be monitored.
monitor_entire_population = false

# References to dimension slugs that are used to segment the client population.
# Defined as a list of strings. These strings are the "slug" of the dimension, which is the
# name of the dimension definition section in either the project configuration or the platform-specific
# configuration file. See [dimensions] section on how these get defined.
dimensions = ["os"]

# A set of metrics that should be part of the the same visualization
[project.metric_groups.crashes]
friendly_name = "Crashes"
description = "Breakdown of crashes"
metrics = [
    "main_crashes",
    "startup_crashes",
]

```

--------------------------------

### GCP Project Cookbook

Source: https://github.com/mozilla/data-docs/blob/main/src/cookbooks/gcp-projects.md

This snippet provides a link to the GCP Project Cookbook on docs.telemetry.mozilla.org. This resource offers detailed guidance on setting up and managing GCP projects within Mozilla, likely covering best practices, configurations, and procedures.

```APIDOC
GCP Project Cookbook:
  URL: https://docs.telemetry.mozilla.org/cookbooks/gcp-projects.html
```

--------------------------------

### Left Alignment of Root Keywords in SQL

Source: https://github.com/mozilla/data-docs/blob/main/src/concepts/sql_style.md

Demonstrates the recommended practice of left-aligning root SQL keywords (like SELECT, FROM, WHERE, LIMIT) to start on the same character boundary, improving visual structure.

```sql
SELECT
  client_id,
  submission_date
FROM
  main_summary
WHERE
  sample_id = '42'
  AND submission_date > '20180101'
LIMIT
  10
```

```sql
SELECT client_id,
       submission_date
  FROM main_summary
 WHERE sample_id = '42'
   AND submission_date > '20180101'
```

--------------------------------

### SQL Join Condition Formatting

Source: https://github.com/mozilla/data-docs/blob/main/src/concepts/sql_style.md

Illustrates the correct indentation and placement for JOIN conditions using the ON clause. The ON keyword should start on a new line, indented further than the JOIN keyword, with conditions following on the same line.

```sql
FROM
  telemetry_stable.main_v4
LEFT JOIN
  static.normalized_os_name
  ON main_v4.environment.system.os.name = normalized_os_name.os_name
```

--------------------------------

### SQL Multi-line Parentheses Formatting

Source: https://github.com/mozilla/data-docs/blob/main/src/concepts/sql_style.md

Details the proper formatting for parentheses that span multiple lines. The opening parenthesis should end its line, the closing parenthesis should align with the start of the multi-line construct, and the content should be indented.

```sql
WITH sample AS (
  SELECT
    client_id,
  FROM
    main_summary
  WHERE
    sample_id = '42'
)
```

--------------------------------

### SQL Grouping Columns: Aliases vs. Implicit

Source: https://github.com/mozilla/data-docs/blob/main/src/concepts/sql_style.md

Illustrates the use of aliases or implicit column numbering for grouping in SQL, showing examples for BigQuery and Presto syntax. Emphasizes clarity and avoiding repetition of complex expressions.

```sql
-- BigQuery SQL Syntax
SELECT
  submission_date,
  normalized_channel IN ('nightly', 'aurora', 'beta') AS is_prerelease,
  count(*) AS count
FROM
  telemetry.clients_daily
WHERE
  submission_date > '2019-07-01'
GROUP BY
  submission_date,
  is_prerelease -- Grouping by aliases is supported in BigQuery
```

```sql
-- Presto SQL Syntax
SELECT
  submission_date,
  normalized_channel IN ('nightly', 'aurora', 'beta') AS is_prerelease,
  count(*) AS count
FROM
  telemetry.clients_daily
WHERE
  submission_date > '20190701'
GROUP BY
  1, 2 -- Implicit grouping avoids repeating expressions
```

```sql
-- Presto SQL Syntax
SELECT
  submission_date,
  normalized_channel IN ('nightly', 'aurora', 'beta') AS is_prerelease,
  count(*) AS count
FROM
  telemetry.clients_daily
WHERE
  submission_date > '20190701'
GROUP BY
  submission_date,
  normalized_channel IN ('nightly', 'aurora', 'beta')
```

--------------------------------

### Introduction to Operational Tasks

Source: https://github.com/mozilla/data-docs/blob/main/src/SUMMARY.md

An introduction to operational procedures and best practices for managing data infrastructure.

```markdown
[Operational](cookbooks/operational/index.md)
```

--------------------------------

### Data Pipeline Ingestion Endpoint

Source: https://github.com/mozilla/data-docs/blob/main/src/tools/guiding_principles.md

This example shows the structure of the HTTPS endpoint used to submit data payloads to the Mozilla data pipeline. It includes the namespace, document type, version, and a unique document ID for deduplication.

```bash
https://incoming.telemetry.mozilla.org/submit/activity-stream/impression-stats/1/<document_id>
```

--------------------------------

### JSON to BigQuery Map Conversion Example

Source: https://github.com/mozilla/data-docs/blob/main/src/tools/guiding_principles.md

Demonstrates the transformation of a JSON object representing a map into a BigQuery-compatible array of key-value pairs. This is necessary when dealing with free-form maps in BigQuery, following conventions for complex Avro types.

```json
{
  "key1": "value1",
  "key2": "value2"
}
```

```json
[
  {
    "key": "key1",
    "value": "value1"
  },
  {
    "key": "key2",
    "value": "value2"
  }
]
```

--------------------------------

### Clients Daily Dataset Reference

Source: https://github.com/mozilla/data-docs/blob/main/src/cookbooks/bigquery/accessing_desktop_data.md

Provides an introduction to the 'Clients Daily' derived dataset, which is built from raw ping data with transformations for easier analysis. Users are directed to a specific reference document for more information.

```markdown
See the [`clients_daily` reference](../../datasets/batch_view/clients_daily/reference.md) for more information.
```

--------------------------------

### Using the Data Catalog

Source: https://github.com/mozilla/data-docs/blob/main/src/SUMMARY.md

Instructions on how to use the Data Catalog to find, understand, and access various datasets.

```markdown
[Using the Data Catalog](cookbooks/analysis/data_catalog.md)
```

--------------------------------

### Explicit JOIN Types in SQL

Source: https://github.com/mozilla/data-docs/blob/main/src/concepts/sql_style.md

Highlights the best practice of explicitly stating the `JOIN` type (e.g., `CROSS JOIN`) instead of relying on implicit joins, improving code clarity. Shows examples for BigQuery Standard SQL.

```sql
-- BigQuery Standard SQL Syntax
SELECT
  submission_date,
  experiment.key AS experiment_id,
  experiment.value AS experiment_branch,
  count(*) AS count
FROM
  telemetry.clients_daily
CROSS JOIN
  UNNEST(experiments.key_value) AS experiment
WHERE
  submission_date > '2019-07-01'
  AND sample_id = '10'
GROUP BY
  submission_date,
  experiment_id,
  experiment_branch
```

```sql
-- BigQuery Standard SQL Syntax
SELECT
  submission_date,
  experiment.key AS experiment_id,
  experiment.value AS experiment_branch,
  count(*) AS count
FROM
  telemetry.clients_daily,
  UNNEST(experiments.key_value) AS experiment -- Implicit JOIN
WHERE
  submission_date > '2019-07-01'
  AND sample_id = '10'
GROUP BY
  1, 2, 3 -- Implicit grouping column names
```

--------------------------------

### Data Monitoring Introduction

Source: https://github.com/mozilla/data-docs/blob/main/src/SUMMARY.md

An introduction to data monitoring practices and tools used at Mozilla.

```markdown
[Data Monitoring - Intro to Bigeye](cookbooks/data_monitoring/intro.md)
```

--------------------------------

### Get Main Crashes on Windows Over a Small Interval

Source: https://github.com/mozilla/data-docs/blob/main/src/datasets/obsolete/error_aggregates/reference.md

This SQL query counts the number of 'main_crashes' for Firefox on Windows for a specific version ('58.0.2') within a defined time interval. It filters out experiments and groups the results by the window start time.

```sql
SELECT window_start as time, sum(main_crashes) AS main_crashes
FROM error_aggregates_v2
  WHERE application = 'Firefox'
  AND os_name = 'Windows_NT'
  AND channel = 'release'
  AND version = '58.0.2'
  AND window_start > timestamp '2018-02-21'
  AND window_end < timestamp '2018-02-22'
  AND experiment_id IS NULL
  AND experiment_branch IS NULL
GROUP BY window_start
```

--------------------------------

### JSON Schema Definition Example

Source: https://github.com/mozilla/data-docs/blob/main/src/tools/guiding_principles.md

This snippet illustrates how a JSON Schema is defined for a new document namespace and type within the Mozilla pipeline schemas repository. It specifies the namespace ('activity-stream') and document type ('impression-stats') with a version ('1').

```json
{
  "namespace": "activity-stream",
  "document_type": "impression-stats",
  "version": 1
}
```

--------------------------------

### Building and Deploying Containers to GCR with CircleCI

Source: https://github.com/mozilla/data-docs/blob/main/src/SUMMARY.md

A tutorial on building container images and deploying them to Google Container Registry (GCR) using CircleCI for continuous integration and deployment.

```markdown
[Building and Deploying Containers to GCR with CircleCI](cookbooks/deploying-containers.md)
```

--------------------------------

### Example Query for Event Counts

Source: https://github.com/mozilla/data-docs/blob/main/src/datasets/bigquery/events_daily/reference.md

This SQL query retrieves daily event and client counts from the events_daily table. It utilizes the mozfun.event_analysis.extract_event_counts function to parse the event strings and joins with the event_types table to get event names. The query filters data for the last 28 days.

```sql
SELECT
  submission_date,
  category,
  event,
  COUNT(*) AS client_count,
  SUM(count) AS event_count
FROM
  `moz-fx-data-shared-prod`.fenix.events_daily
CROSS JOIN
  UNNEST(mozfun.event_analysis.extract_event_counts(events))
JOIN
  `moz-fx-data-shared-prod`.fenix.event_types
  USING (index)
WHERE
  submission_date >= DATE_SUB(current_date, INTERVAL 28 DAY)
GROUP BY
  submission_date,
  category,
  event
```

--------------------------------

### Firefox Profile Creation Commands

Source: https://github.com/mozilla/data-docs/blob/main/src/concepts/profile/profile_creation.md

Demonstrates command-line arguments for creating and managing Firefox profiles. The `--createprofile` argument creates a new profile, while `--profile` allows starting Firefox with a specified existing or new profile directory.

```bash
firefox --createprofile <profile_name>
firefox --profile /path/to/profile/directory
```

--------------------------------

### BigQuery Table Listing Example

Source: https://github.com/mozilla/data-docs/blob/main/src/concepts/pipeline/schemas.md

Demonstrates how to list BigQuery tables within a specific namespace using the 'bq ls' command, showing table details like schema ID, labels, partitioning, and clustering.

```bash
$ bq ls --max_results=3 moz-fx-data-shared-prod:org_mozilla_fenix_stable

       tableId        Type                   Labels                           Time Partitioning                 Clustered Fields
 ------------------- ------- --------------------------------------- ----------------------------------- -------------------------------
  activation_v1       TABLE   schema_id:glean_ping_1                  DAY (field: submission_timestamp)   normalized_channel, sample_id
                      	schema_id:glean_ping_1
                      	schemas_build_id:202001230145_be1f11e
  baseline_v1         TABLE   schema_id:glean_ping_1                  DAY (field: submission_timestamp)   normalized_channel, sample_id
                      	schema_id:glean_ping_1
                      	schemas_build_id:202001230145_be1f11e
  bookmarks_sync_v1   TABLE   schema_id:glean_ping_1                  DAY (field: submission_timestamp)   normalized_channel, sample_id
                      	schema_id:glean_ping_1
                      	schemas_build_id:202001230145_be1f11e
```

--------------------------------

### Install Node.js Dependencies for Spell and Link Checking

Source: https://github.com/mozilla/data-docs/blob/main/README.md

Installs the necessary Node.js packages for spell checking (markdown-spellcheck) and link checking (markdown-link-check) by running `npm install` in the repository's root directory. This requires Node.js to be installed.

```bash
npm install
```

--------------------------------

### Example Query for Day 2-7 Activation by Product

Source: https://github.com/mozilla/data-docs/blob/main/src/datasets/non_desktop/day_2_7_activation/reference.md

This SQL query calculates the Day 2-7 activation metric, broken down by product. It aggregates new profiles and activated users from the firefox_nondesktop_day_2_7_activation table for a specific cohort date.

```sql
SELECT
  cohort_date,
  product,
  SUM(day_2_7_activated) as day_2_7_activated,
  SUM(new_profiles) as new_profiles,
  SAFE_DIVIDE(SUM(day_2_7_activated), SUM(new_profiles)) as day_2_7_activation
FROM
  mozdata.telemetry.firefox_nondesktop_day_2_7_activation
WHERE
  cohort_date = "2020-03-01"
GROUP BY 1,2
ORDER BY 1
```

--------------------------------

### Query Successful Installs per Country Code

Source: https://github.com/mozilla/data-docs/blob/main/src/datasets/other/stub_installer/reference.md

This SQL query retrieves the count of successful installs per normalized country code on a specific date from the `firefox_installer.install` BigQuery table. It demonstrates how to access and analyze stub installer ping data.

```sql
SELECT normalized_country_code,
       succeeded,
       count(*)
FROM firefox_installer.install
WHERE DATE(submission_timestamp) = '2021-04-20'
GROUP BY normalized_country_code,
         succeeded
```

--------------------------------

### Introduction to Operational Monitoring

Source: https://github.com/mozilla/data-docs/blob/main/src/SUMMARY.md

An overview of the systems and practices used for operational monitoring of Mozilla's services and infrastructure.

```markdown
[Introduction to Operational Monitoring](cookbooks/operational_monitoring.md)
```

--------------------------------

### SQL Date Conversion Examples

Source: https://github.com/mozilla/data-docs/blob/main/src/concepts/analysis_gotchas.md

Provides SQL examples for converting specific date fields into usable date formats. These examples are useful for data analysis and manipulation in SQL environments.

```SQL
DATE_FROM_UNIX_DATE(SAFE_CAST(environment.profile.creation_date AS INT64))
```

```SQL
SAFE.PARSE_TIMESTAMP('%a, %d %b %Y %T %Z', REPLACE(metadata.header.date, 'GMT+00:00', 'GMT'))
```

--------------------------------

### Google Cloud Platform Prototype Project Creation

Source: https://github.com/mozilla/data-docs/blob/main/src/cookbooks/gcp-projects.md

This section details the steps and benefits of creating a prototype GCP project. It covers provisioning service accounts for BigQuery and other GCP resources, writing and querying data in private BigQuery tables, making Docker images available via Google Container Registry, creating Google Cloud Storage buckets, Compute Instances, and Kubernetes clusters. It also touches on cost tracking and the advantages over traditional sandbox projects.

```APIDOC
Google Cloud Platform Prototype Project:
  Purpose: To provision a dedicated GCP project for data-intensive development and production deployment.
  Benefits:
    - Easy cost tracking of individual components.
    - Self-service administrative credentials with limited lifespan.
    - Ability to spin down projects and resources after use.
  Features:
    - Service Accounts: For BigQuery access and command-line/Docker container operations.
    - BigQuery: Private tables for writing and querying data without impacting production.
    - Google Container Registry: For hosting Docker images.
    - Google Cloud Storage: For temporary data storage.
    - Google Compute Instances: For testing software in the cloud.
    - Kubernetes Clusters: For testing scheduled jobs with telemetry-airflow.
    - Protosaur: For creating static dashboards.
  Request Process:
    - File a bug using the provided template.
  Support:
    - Contact Data Engineering contact for project creation and advice.
    - Get in touch with the data platform team for project necessity, contact, or budget queries.
  Tracking:
    - Projects are tracked on Confluence (requires Mozilla LDAP).
```

--------------------------------

### Get Crash Measures Across Platforms

Source: https://github.com/mozilla/data-docs/blob/main/src/datasets/obsolete/error_aggregates/reference.md

This SQL query retrieves various crash measures (usage hours, main crashes, content crashes, etc.) for Firefox across different operating systems and channels. It filters for specific build IDs, time windows, and excludes experiments. The results are grouped by window start, channel, build ID, version, and OS name.

```sql
SELECT window_start,
       build_id,
       channel,
       os_name,
       version,
       sum(usage_hours) AS usage_hours,
       sum(main_crashes) AS main,
       sum(content_crashes) AS content,
       sum(gpu_crashes) AS gpu,
       sum(plugin_crashes) AS plugin,
       sum(gmplugin_crashes) AS gmplugin
FROM error_aggregates_v2
  WHERE application = 'Firefox'
  AND (os_name = 'Darwin' or os_name = 'Linux' or os_name = 'Windows_NT')
  AND (channel = 'beta' or channel = 'release' or channel = 'nightly' or channel = 'esr')
  AND build_id > '201801'
  AND window_start > current_timestamp - (1 * interval '24' hour)
  AND experiment_id IS NULL
  AND experiment_branch IS NULL
GROUP BY window_start, channel, build_id, version, os_name
```

--------------------------------

### Day 2-7 Activation Dataset

Source: https://github.com/mozilla/data-docs/blob/main/src/SUMMARY.md

Details the 'Day 2-7 Activation' dataset, likely tracking user activation within the first week of using a product. Useful for onboarding analysis.

```markdown
[Day 2-7 Activation](datasets/non_desktop/day_2_7_activation/reference.md)
```

--------------------------------

### SendPing Subroutine in NSIS

Source: https://github.com/mozilla/data-docs/blob/main/src/datasets/other/stub_installer/reference.md

This refers to the SendPing subroutine within the NSIS script of the stub installer. It is responsible for forming and sending the installer pings. The exact code is located in the mozilla-central repository.

```nsis
SendPing subroutine in stub.nsi
```

--------------------------------

### Retained Installer Tables Schema

Source: https://github.com/mozilla/data-docs/blob/main/src/datasets/non_desktop/google_play_store/reference.md

Defines the schema for the retained installer tables in the Google Play Store dataset. It includes fields like Date, Package_Name, Acquisition_Channel, Store_Listing_Visitors, Installers, and various retention rates.

```json
{
    "root": {
        "Date": "date",
        "Package_Name": "string",
        "Acquisition_channel | country | UTM_source_campaign": "string",
        "Store_Listing_Visitors": "integer",
        "Installers": "integer",
        "Visitor_to_installer_conversion_rate": "float",
        "installers_retained_for_1_day": "integer",
        "installers_to_1_day_retention_rate": "float",
        "installers_retained_for_7_days": "integer",
        "installers_to_7_days_retention_rate": "float",
        "installers_retained_for_15_days": "integer",
        "installers_to_15_days_retention_rate": "float",
        "installers_retained_for_30_days": "integer",
        "installers_to_30_days_retention_rate": "float"
    }
}
```

--------------------------------

### Data Monitoring - Intro to Bigeye

Source: https://github.com/mozilla/data-docs/blob/main/src/SUMMARY.md

An introduction to using Bigeye for data quality monitoring, covering its features and initial setup.

```markdown
[Data Monitoring - Intro to Bigeye](cookbooks/data_monitoring/intro.md)
```

--------------------------------

### Implementing Experiments

Source: https://github.com/mozilla/data-docs/blob/main/src/SUMMARY.md

Guidelines for implementing experiments within Mozilla products, focusing on telemetry collection and analysis for A/B testing and feature validation.

```markdown
[Implementing Experiments](cookbooks/client_guidelines.md)
```

--------------------------------

### Guiding Principles for Data Infrastructure

Source: https://github.com/mozilla/data-docs/blob/main/src/SUMMARY.md

The core principles and philosophies that guide the design, development, and maintenance of Mozilla's data infrastructure.

```markdown
[Guiding Principles for Data Infrastructure](tools/guiding_principles.md)
```

--------------------------------

### Profile Creation

Source: https://github.com/mozilla/data-docs/blob/main/src/SUMMARY.md

Explains the process of creating and initializing user profiles. Covers the data points collected during profile creation and their significance.

```markdown
[Profile Creation](concepts/profile/profile_creation.md)
```

--------------------------------

### Looker Explore Definition Example

Source: https://github.com/mozilla/data-docs/blob/main/src/cookbooks/data_modeling/where_to_store.md

An example of a Looker explore definition in LKML, specifically for 'pocket_tile_impressions', demonstrating how to define a subset of data for analysis.

```lkml
explore: pocket_tile_impressions {
  label: "Pocket Tile Impressions"
  from: pocket_tile_impressions
  join: user_devices {
    type: left_outer
    sql_on: ${pocket_tile_impressions.device_id} = ${user_devices.id} ;;
  }
  # ... other join and view definitions
}
```

--------------------------------

### Exposed Views Filtering Example

Source: https://github.com/mozilla/data-docs/blob/main/src/concepts/pipeline/filtering.md

SQL example demonstrating filtering within exposed views, where data in stable tables is not exposed to users.

```sql
github.com/mozilla/bigquery-etl/blob/master/sql/moz-fx-data-shared-prod/telemetry/lockwise_mobile_events_v1/view.sql#L17
```

--------------------------------

### OpMon Preview Command

Source: https://github.com/mozilla/data-docs/blob/main/src/cookbooks/operational_monitoring.md

Provides a detailed breakdown of the `opmon preview` command, its options, and usage examples for generating data previews of OpMon projects. It covers project targeting, date ranges, configuration sources, and the output link to Looker dashboards.

```APIDOC
OpMon Preview Command:
  Usage: opmon preview [OPTIONS]
  Description: Create a preview for a specific project based on a subset of data.
  Options:
    --project_id, --project-id TEXT
      Project to write to
    --dataset_id, --dataset-id TEXT
      Temporary dataset to write to
    --derived_dataset_id, --derived-dataset-id TEXT
      Temporary derived dataset to write to
    --start_date, --start-date YYYY-MM-DD
      Date for which project should be started to get analyzed. Default: current date - 3 days
    --end_date, --end-date YYYY-MM-DD
      Date for which project should be stop to get analyzed. Default: current date
    --slug TEXT
      Experimenter or Normandy slug associated with the project to create a preview for [required]
    --config_file, --config-file PATH
      Custom local config file
    --config_repos, --config-repos TEXT
      URLs to public repos with configs
    --private_config_repos, --private-config-repos TEXT
      URLs to private repos with configs
    --help
      Show this message and exit.
  Example Usage:
    gcloud auth login --update-adc
    gcloud config set project mozdata
    opmon preview --slug=firefox-install-demo --config_file='/local/path/to/opmon/firefox-install-demo.toml'
  Output:
    Start running backfill for firefox-install-demo: 2022-12-17 to 2022-12-19
    Backfill 2022-12-17
    ...
    A preview is available at: https://mozilla.cloud.looker.com/dashboards/operational_monitoring::opmon_preview?Table='mozdata.tmp.firefox_install_demo_statistics'&Submission+Date=2022-12-17+to+2022-12-20
```