### Setup Python Environment

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/samples/enrichment/README.md

Navigates to the src directory and installs dependencies for the Python virtual environment.

```bash
cd src
source env.sh --install
```

--------------------------------

### Setup and Build Enrichment Agent

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/enrichment/README.md

Clone the repository, install dependencies, and build the enrichment agent project.

```bash
git clone https://github.com/googlecloudplatform/knowledge-catalog
cd toolbox/enrichment
npm install

npm run build
```

--------------------------------

### Setup Demo Resources in BigQuery

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/enrichment/README.md

Create a BigQuery dataset and table for the demo, populating it with sample e-commerce data.

```sql
bq query --use_legacy_sql=false <<EOF
CREATE SCHEMA IF NOT EXISTS 
OPTIONS (
  location = 'US',
  labels = [('usage', 'demo')]
);

CREATE TABLE IF NOT EXISTS 
PARTITION BY event_date_dt
AS
SELECT
  *,
  PARSE_DATE('%Y%m%d', event_date) AS event_date_dt
FROM
  `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`;
EOF
```

--------------------------------

### Set up Python Virtual Environment and Install Dependencies

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/samples/discovery/README.md

Create a Python virtual environment and install the necessary dependencies for the discovery agent. This ensures a clean and isolated environment for the project.

```shell
python3 -m venv /tmp/kcsearch
source /tmp/kcsearch/bin/activate

cd samples/discovery
pip3 install -r requirements.txt
```

--------------------------------

### Catalog Manifest Configuration Example

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/mdcode/docs/concept.md

This example demonstrates the structure of a catalog manifest file, including scope definitions, type aliases, and snapshot/publishing configurations for metadata.

```yaml
scope: <type>.<name>              # The resources in the snapshot. Examples:
                                  # scope: entryGroup.<projectId>.<locationId>.<entryGroupId>
                                  # scope: bq-dataset.<projectId>.<datasetId>
                                  # scope: kb.<projectId>.<locationId>.<entryGroupId>
                                  # Or multiple BigQuery datasets in array list format:
                                  # scope:
                                  #   - bq-dataset.<projectId>.<datasetId1>
                                  #   - bq-dataset.<projectId>.<datasetId2>


aliases:                          # Optional. Can always use 3-part fully qualified references.
                                  # NOTE: All built-in types have predefined simple aliases.
  ca-guidelines:
    aspect: data-agents-project.global.ca-guidelines
  ecommerce:
    glossary: data-gov-project.global.ecommerce-glossary

# Defines the specific metadata to be retrieved locally from Knowledge Catalog.
# NOTE: Required aspects of listed entry types are implicitly included.
snapshot:
  entries:
  - bigquery-dataset
  - bigquery-table
  aspects:
  - overview
  - descriptions
  - queries
  - ca-guidelines
  entryLinks:
  - definition

# Optional configuration to identify which types should be published.
# This must be a subset of the types specified in the snapshot configuration.
publishing:
  entries:
  aspects:
  - ca-guidelines
  entryLinks:
  - definition

```

--------------------------------

### Clone Repository and Install Dependencies

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/mdcode/README.md

Clone the knowledge-catalog repository and install npm dependencies for the mdcode tool.

```bash
git clone https://github.com/googlecloudplatform/knowledge-catalog
cd toolbox/mdcode
npm install
```

--------------------------------

### Initialize kcmd Library

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/mdcode/README.md

Installs the kcmd library. Use this to programmatically interact with the Knowledge Catalog.

```bash
npm install kcmd
```

--------------------------------

### Setup BigQuery Demo Resources

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/mdcode/demo/README.md

Creates a BigQuery dataset and table for the demo, and generates a catalog.yaml manifest. Run this command to set up the necessary BigQuery resources.

```bash
bun setup.ts
```

```bash
cat catalog.yaml
```

--------------------------------

### Install Dependencies

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/README.md

Installs the project dependencies, including development packages, using pip within a virtual environment.

```shell
python3.13 -m venv .venv
.venv/bin/pip install --index-url https://pypi.org/simple/ -e .[dev]
```

--------------------------------

### Setup OKF Wiki Demo

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/mdcode/demo/README.md

Creates an empty Dataplex EntryGroup for an OKF Wiki and a catalog.yaml manifest. This prepares the environment for publishing the OKF Wiki bundle.

```bash
bun setup.ts
```

```bash
cat catalog.yaml
```

```bash
ls -R catalog
```

--------------------------------

### Minimal Example Bundle Structure

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/SPEC.md

Illustrates the directory structure for a minimal OKF bundle, including index files and nested resource directories.

```tree
my_bundle/
├── index.md
├── datasets/
│   ├── index.md
│   └── sales.md
└── tables/
    ├── index.md
    ├── orders.md
    └── customers.md
```

--------------------------------

### Example SQL Query Patterns

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/src/reference_agent/prompts/reference_instruction.md

Provides example SQL snippets to demonstrate common query patterns for a given data asset. These are typically included in the 'Common query patterns' section of an OKF document.

```sql
SELECT * FROM my_table LIMIT 10;
```

--------------------------------

### Clone the Knowledge Catalog Repository

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/samples/discovery/README.md

Clone the official Github repository to get started with the Knowledge Catalog Discovery Agent.

```shell
git clone https://github.com/GoogleCloudPlatform/knowledge-catalog.git
```

--------------------------------

### Standard Layout Example

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/mdcode/docs/concept.md

Illustrates the Standard Layout for bq-dataset and entryGroup scopes. Metadata is in YAML files, with unstructured aspects in Markdown sidecars.

```yaml
path/to/root/
├── catalog.yaml                      # Catalog metadata, processing config/directives
└── catalog/                          # Contains all the Entries and EntryLinks
    └── <dir1>/<dir2>
        ├── <entry-id1>.yaml          # Single-file entry with all metadata contained within
        ├── <entry-id2>.yaml          # Multi-file entry with unstructured metadata split into
        └── <entry-id2>.<aspect>.md   # markdown sidecar files for unstructured aspects
```

--------------------------------

### Concept Bound to a Resource Example

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/SPEC.md

An example of a concept document describing a BigQuery table, including its schema and related resources.

```markdown
---
type: BigQuery Table
title: Customer Orders
description: One row per completed customer order across all channels.
resource: https://console.cloud.google.com/bigquery?p=acme&d=sales&t=orders
tags: [sales, orders, revenue]
timestamp: 2026-05-28T14:30:00Z
---

# Schema

| Column        | Type      | Description                              |
|---------------|-----------|------------------------------------------|
| `order_id`    | STRING    | Globally unique order identifier.        |
| `customer_id` | STRING    | Foreign key into [customers](/tables/customers.md). |
| `total_usd`   | NUMERIC   | Order total in US dollars.               |
| `placed_at`   | TIMESTAMP | When the customer submitted the order.   |

# Joins

Joined with [customers](/tables/customers.md) on `customer_id`.

# Citations

[1] [BigQuery table schema](https://console.cloud.google.com/bigquery?p=acme&d=sales&t=orders)
```

--------------------------------

### Documents Layout Example

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/mdcode/docs/concept.md

Illustrates the Documents Layout for kb scopes. The entry is a single Markdown file with metadata in YAML frontmatter and the overview in the body.

```yaml
path/to/root/
├── catalog.yaml                      # Catalog metadata, processing config/directives
└── catalog/                          # Contains all the Entries and EntryLinks
    └── <dir1>/<dir2>
        └── <entry-id1>.md            # Single markdown file: structured metadata in
                                      # frontmatter, Overview in body
```

--------------------------------

### Concept Not Bound to a Resource Example

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/SPEC.md

An example of a concept document for a playbook, not directly tied to a specific resource URI.

```markdown
---
type: Playbook
title: Incident response — data freshness alert
description: Steps to triage a freshness alert on the orders pipeline.
tags: [oncall, incident]
timestamp: 2026-04-12T09:00:00Z
---

# Trigger

A freshness alert fires when `orders` lags more than 30 minutes behind
its expected SLA. See the [orders table](/tables/orders.md).

# Steps

1. Check the [ingestion job dashboard](https://example.com/dash).
2. …
```

--------------------------------

### Configure gcloud CLI

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/mdcode/demo/README.md

Log in to your Google Cloud account and set the default compute region and project for the gcloud CLI. Ensure gcloud is installed and configured.

```bash
gcloud auth application-default login
gcloud config set compute/region us-central1
gcloud config set project $DEMO_CLOUD_PROJECT
```

--------------------------------

### BigQuery Dataset Metadata Example

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/SPEC.md

Defines metadata for a BigQuery dataset using OKF frontmatter. Includes type, title, description, resource link, tags, and timestamp.

```markdown
---
type: BigQuery Dataset
title: Sales
description: All sales-related tables for the retail business.
resource: https://console.cloud.google.com/bigquery?p=acme&d=sales
tags: [sales]
timestamp: 2026-05-28T00:00:00Z
---

The sales dataset contains transactional tables, including
[orders](/tables/orders.md) and [customers](/tables/customers.md).
```

--------------------------------

### Metadata Directory Layout Example

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/mdcode/README.md

Illustrates the hierarchical organization of metadata artifacts within a directory structure for Knowledge Catalog. Includes the manifest file and nested metadata entries.

```yaml
path/to/root/
├── catalog.yaml                       # Manifest and config directives
└── catalog/                           # Contains the metadata snapshot
    └── <dir1>/
        └── <entry-id1>.yaml           # Entry
        └── <dir2>/
            ├── <entry-id2>.yaml       # Entry with sidecar markdown
            └── <entry-id2>.aspect.md  # files
```

--------------------------------

### Example Cross-linking in Markdown

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/src/reference_agent/prompts/reference_instruction.md

Illustrates how to create file-relative links to other concepts within a markdown document. These links are essential for navigating between related documents in a bundle.

```markdown
Sibling table: `[users](users.md)`
Parent dataset from a table: `[dataset](../datasets/<slug>.md)`
Reference doc: `[event parameters](../references/event_parameters.md)`
```

--------------------------------

### Example Citations in OKF Format

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/src/reference_agent/prompts/reference_instruction.md

Shows the Open Knowledge Format (OKF) for citing external sources or related documents. The resource URI of the concept is typically listed first.

```markdown
[1] [Source Title](https://example.com/...)
[2] [Another Source](https://example.com/...)
```

--------------------------------

### BigQuery Table Metadata Example

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/SPEC.md

Defines metadata for a BigQuery table using OKF frontmatter and markdown for schema description. Includes type, title, description, resource link, tags, and timestamp.

```markdown
---
type: BigQuery Table
title: Orders
description: One row per completed customer order.
resource: https://console.cloud.google.com/bigquery?p=acme&d=sales&t=orders
tags: [sales, orders]
timestamp: 2026-05-28T00:00:00Z
---

# Schema

| Column        | Type      | Description                  |
|---------------|-----------|------------------------------|
| `order_id`    | STRING    | Unique order identifier.     |
| `customer_id` | STRING    | FK to [customers](/tables/customers.md). |
| `total_usd`   | NUMERIC   | Order total in USD.          |

Part of the [sales dataset](/datasets/sales.md).
```

--------------------------------

### Initialize Demo Environment

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/enrichment/README.md

Set the demo cloud project ID and configure gcloud for the demo.

```bash
export DEMO_CLOUD_PROJECT="<your-gcp-project-id>"

gcloud auth application-default login
gcloud config set project $DEMO_CLOUD_PROJECT
gcloud config set compute/region us
```

--------------------------------

### Clone Repository and Navigate to Sample Directory

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/samples/enrichment/README.md

Clones the knowledge-catalog repository and changes the directory to the enrichment sample.

```bash
git clone https://github.com/googlecloudplatform/knowledge-catalog
cd samples/enrichment
```

--------------------------------

### Initialize and Run Enrichment CLI

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/enrichment/README.md

Initialize a catalog snapshot, pull the latest data, and run the enrichment tool using the `kcmd` and `kcagent` CLIs.

```bash
# Initialize a new catalog snapshot for a bigquery dataset
kcmd init --bigquery-dataset <projectId>.<datasetId>

# Pull the latest catalog snapshot from the Knowledge Catalog service
kcmd pull

# Run the enrichment tool
kcagent enrich --catalog-path . --tools-path tools --prompt-path prompt.md
```

--------------------------------

### Retrieve Body of a Specific Tag Wiki

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/bundles/stackoverflow/tables/posts_tag_wiki.md

Use this query to get the wiki content for a specific tag by its ID.

```sql
SELECT
    body
  FROM
    `bigquery-public-data.stackoverflow.posts_tag_wiki`
  WHERE
    id = 5046395
```

--------------------------------

### CLI: Initialize Catalog Snapshot for BigQuery Dataset

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/mdcode/README.md

Initializes a new catalog snapshot for a specified BigQuery dataset. Requires project and dataset IDs.

```bash
kcmd init --bigquery-dataset <projectId>.<datasetId>
```

--------------------------------

### CLI: Initialize Catalog Snapshot for Multiple BigQuery Datasets

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/mdcode/README.md

Initializes a new catalog snapshot for multiple BigQuery datasets by specifying each dataset with the --bigquery-dataset flag.

```bash
kcmd init --bigquery-dataset <projectId>.<datasetId1> --bigquery-dataset <projectId>.<datasetId2>
```

--------------------------------

### Download Metadata Snapshot

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/samples/enrichment/README.md

Downloads an initial metadata snapshot for the enrichment process. Requires the KC_ENRICH_SAMPLE_PROJECT environment variable to be set.

```bash
python3 -m enrichment.download \
  --dir ../sample/metadata.initial \
  --dataset ${KC_ENRICH_SAMPLE_PROJECT}.kc_enrich_sample_data
```

--------------------------------

### CLI: Initialize Catalog Snapshot for Custom EntryGroup

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/mdcode/README.md

Initializes a catalog snapshot for a custom EntryGroup, requiring project, location, and entry group IDs.

```bash
kcmd init --entry-group <projectId>.<locationId>.<entryGroupId>
```

--------------------------------

### Get Latest Tag Wiki Excerpts

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/bundles/stackoverflow/viz.html

Retrieve the title, creation date, and last editor for the 10 most recently created tag wiki excerpts.

```sql
SELECT
 title,
 creation_date,
 last_editor_display_name
 FROM
 `bigquery-public-data.stackoverflow.posts_tag_wiki_excerpt`
 ORDER BY
 creation_date DESC
 LIMIT 10
```

--------------------------------

### Identify Top Post Editors

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/bundles/stackoverflow/viz.html

Finds the top 5 users who have made the most edits to posts, joining with the users table to get display names.

```sql
SELECT
 t2.display_name,
 COUNT(t1.id) AS edit_count
FROM
 `bigquery-public-data.stackoverflow.post_history` AS t1
INNER JOIN
 `bigquery-public-data.stackoverflow.users` AS t2
ON
 t1.user_id = t2.id
WHERE
 t1.post_history_type_id = 2 -- Assuming \'2\' means \'Post Edited\'
GROUP BY
 t2.display_name
ORDER BY
 edit_count DESC
LIMIT 5;
```

--------------------------------

### Initial Node Display

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/bundles/ga4/viz.html

Automatically displays the detail panel for the first node found in the bundle, prioritizing 'BigQuery' type nodes.

```javascript
// Auto-show the first node (a dataset if available, else first concept)
const initial = bundle.nodes.find((n) => n.data.type === "BigQuery" || n.data.type === "Dataset");
if (initial) {
  showDetail(initial.data.id);
} else if (bundle.elements.length > 0) {
  showDetail(bundle.elements[0].data.id);
}
```

--------------------------------

### CLI: Initialize Catalog Snapshot with Specific Types and Aspects

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/mdcode/README.md

Initializes a catalog snapshot for a BigQuery dataset, specifying the entry types and aspects to include.

```bash
kcmd init --bigquery-dataset <projectId>.<datasetId> \
  --entry bigquery-table --entry bigquery-view \
  --aspect overview --aspect description
```

--------------------------------

### Get Questions with Accepted Answers and Answer Count

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/bundles/stackoverflow/viz.html

Retrieves questions that have an accepted answer, ordered by the number of answers. Useful for identifying popular or well-answered questions.

```sql
SELECT
id,
title,
accepted_answer_id,
answer_count
FROM
`bigquery-public-data.stackoverflow.posts_questions`
WHERE
accepted_answer_id IS NOT NULL
ORDER BY
answer_count DESC
LIMIT 5
```

--------------------------------

### Catalog Entry YAML File Structure

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/mdcode/README.md

Defines metadata for a specific catalog entry, including its ID, type, resource details, and schema. This is an example for a BigQuery table.

```yaml
id: products
type: bigquery-table

resource:
  name: projects/prod-data/datasets/ecommerce/tables/products
  displayName: Products Table
  description: All products in the catalog
  labels:
    env: prod
  createTime: 2026-04-23T00:44:03Z
  updateTime: 2026-04-23T00:44:03Z

schema:
  ...

contacts:
  ...
```

--------------------------------

### Run the Knowledge Catalog Discovery Agent using ADK CLI

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/samples/discovery/README.md

Execute the Knowledge Catalog Discovery agent using the ADK CLI. Ensure you provide the correct path to the parent directory containing the agent's source code.

```shell
adk run path/to/agent/parent/folder
```

--------------------------------

### Create Sample BigQuery Data

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/samples/enrichment/README.md

Executes a Python script to create a sample BigQuery dataset for metadata curation.

```bash
python3 ../sample/data/create_data.py
```

--------------------------------

### Get Top 10 Scored Answers in a Date Range

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/bundles/stackoverflow/viz.html

Retrieves the 10 highest-scored answers within a specified date range from the Stack Overflow posts_answers table.

```sql
SELECT
 id,
 score,
 creation_date,
 body
FROM
 `bigquery-public-data.stackoverflow.posts_answers`
WHERE
 creation_date BETWEEN '2023-01-01' AND '2023-01-31'
ORDER BY
 score DESC
LIMIT 10
```

--------------------------------

### Generate Visualize HTML with Custom Output and Name

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/README.md

Generates an interactive HTML visualization of an OKF bundle, specifying a custom output path and a display name for the viewer header.

```bash
.venv/bin/python -m reference_agent visualize \
    --bundle ./bundles/crypto_bitcoin \
    --out /tmp/btc.html \
    --name "Bitcoin OKF"
```

--------------------------------

### Get Questions with Accepted Answers

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/bundles/stackoverflow/tables/posts_questions.md

Retrieves questions that have an accepted answer, ordered by the number of answers. This snippet limits the results to the top 5 questions with accepted answers.

```sql
SELECT
  id,
  title,
  accepted_answer_id,
  answer_count
FROM
  `bigquery-public-data.stackoverflow.posts_questions`
WHERE
  accepted_answer_id IS NOT NULL
ORDER BY
  answer_count DESC
LIMIT 5
```

--------------------------------

### Standard Layout: Entry Resource Info, Source, and Aspects

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/mdcode/docs/concept.md

YAML file structure for standard layout, including entry metadata, resource information, schema, aspects, and links.

```yaml
id: <id>                                # Entry metadata
type: <entryType>
resource:                               # Entry.EntrySource
  name: <entrySource.name>
  displayName: <entrySource.displayName>
  description: <entrySource.description>
  labels:
    key: value
  location: <entrySource.location>
  parent: <entry.parent>
  ancestors: <entrySource.ancestors>
  createTime: <entrySource.createTime>
  updateTime: <entrySource.updateTime>

schema:
  fields:
  - name1: <schemaField.name>
    dataType: <schemaField.dataType>
    mode: <schemaField.mode>
    …
    links:                              # EntryLinks associated with Schema.path are inlined
      definition:                       # into a Schema field to leverage context of field
      - target: glossary.term           # specification

<aspect-type>:                          # In the general-case, each top-level field is an
  [aspect.data]                         # aspect. Nested field represents aspect.data

links:                                  # EntryLinks with this entry as source listed here
  <entryLink-type>:
  - target: <target-entry-reference>
    <aspect-type>:
      [aspect.data]
```

--------------------------------

### Run Reference Agent with Bitcoin Dataset

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/samples/crypto_bitcoin/README.md

Execute the reference agent to enrich data from the crypto_bitcoin BigQuery dataset. This command specifies the source, dataset, seed file, and output directory.

```bash
.venv/bin/python -m reference_agent enrich \
    --source bq \
    --dataset bigquery-public-data.crypto_bitcoin \
    --web-seed-file samples/crypto_bitcoin/seeds.txt \
    --out ./bundles/crypto_bitcoin
```

--------------------------------

### Get Details for a Specific Stack Overflow Tag

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/bundles/stackoverflow/viz.html

Fetches detailed information, including ID, name, count, and associated post IDs for excerpts and wikis, for a tag named 'python'.

```sql
SELECT
 id,
 tag_name,
 count,
 excerpt_post_id,
 wiki_post_id
 FROM
 `bigquery-public-data.stackoverflow.tags`
 WHERE
 tag_name = 'python'
```

--------------------------------

### Get Details for a Specific Tag

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/bundles/stackoverflow/tables/tags.md

Fetch detailed information for a particular tag, such as 'python'. This query selects the tag's ID, name, count, and post IDs for its excerpt and wiki.

```sql
SELECT
    id,
    tag_name,
    count,
    excerpt_post_id,
    wiki_post_id
  FROM
    `bigquery-public-data.stackoverflow.tags`
  WHERE
    tag_name = 'python'
```

--------------------------------

### Auto-show Initial Node

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/bundles/stackoverflow/viz.html

Automatically displays the details for the first node in the bundle, prioritizing 'BigQuery Dataset' type nodes.

```javascript
const initial = bundle.nodes.find((n) => n.data.type === "BigQuery Dataset") || bundle.nodes[0];
if (initial) showDetail(initial.data.id);
})();
```

--------------------------------

### Get Top 10 Answers by Score in a Date Range

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/bundles/stackoverflow/tables/posts_answers.md

Retrieves the 10 highest-scored answers within a specified month. Ensure the date range and table name are correct for your query.

```sql
SELECT
  id,
  score,
  creation_date,
  body
FROM
  `bigquery-public-data.stackoverflow.posts_answers`
WHERE
  creation_date BETWEEN '2023-01-01' AND '2023-01-31'
ORDER BY
  score DESC
LIMIT 10
```

--------------------------------

### Get Total Transactions Per Day

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/bundles/crypto_bitcoin/viz.html

Calculates the total number of Bitcoin transactions for each day within a specified date range. Ensure the date range is correctly set for the desired period.

```sql
SELECT
 DATE(block_timestamp) AS transaction_date,
 COUNT(hash) AS transaction_count
FROM
 `bigquery-public-data.crypto_bitcoin.transactions`
WHERE
 block_timestamp BETWEEN '2023-01-01' AND '2023-01-31'
GROUP BY
 transaction_date
ORDER BY
 transaction_date DESC;
```

--------------------------------

### Generate Visualize HTML

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/README.md

Generates an interactive HTML visualization of an OKF bundle. The output file is named 'viz.html' and is placed within the bundle directory.

```bash
.venv/bin/python -m reference_agent visualize --bundle ./<name>
```

--------------------------------

### Run Enrichment with Fileset Source

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/enrichment/README.md

Execute the `kcagent enrich` command with the specified catalog path, tools path, and prompt file, utilizing the fileset source for enrichment.

```bash
../dist/kcagent enrich --catalog-path . --tools-path tools --prompt-path prompt.md
```

--------------------------------

### Count Total Events, Unique Users, and Days

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/samples/enrichment/sample/docs/example1.md

Use this query to get a high-level overview of your dataset's scale. It counts all events, distinct users based on device cookies, and the number of unique days present.

```sql
SELECT 
  COUNT(*) AS total_events, 
  COUNT(DISTINCT user_pseudo_id) AS total_unique_users, 
  COUNT(DISTINCT event_date) AS total_days
FROM 
  `kc_enrich_sample_data.ga_events`
```

--------------------------------

### Join Moderator Nominations with User Data

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/bundles/stackoverflow/tables/posts_moderator_nomination.md

Fetches the ID, creation date, nominator's display name, and body of moderator nominations, joining with the users table to get the nominator's name. Limits results to 10 nominations created on or after January 1, 2022.

```sql
SELECT
    p.id,
    p.creation_date,
    u.display_name AS nominator_name,
    p.body
  FROM
    `bigquery-public-data.stackoverflow.posts_moderator_nomination` AS p
  JOIN
    `bigquery-public-data.stackoverflow.users` AS u
    ON p.owner_user_id = u.id
  WHERE
    p.creation_date >= '2022-01-01'
  LIMIT 10
```

--------------------------------

### Build the mdcode Tool

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/mdcode/README.md

Build the mdcode tool using the provided npm script.

```bash
npm run build
```

--------------------------------

### Run Reference Agent

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/README.md

Executes the reference agent to enrich data catalog with BigQuery metadata and web crawl data. Specify dataset, web seeds, and output directory.

```shell
.venv/bin/python -m reference_agent enrich \
    --source bq \
    --dataset <project>.<dataset> \
    --web-seed-file <path/to/seeds.txt> \
    --out ./bundles/<name>
```

--------------------------------

### CatalogManifest Library Methods

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/mdcode/docs/design.md

Methods for initializing and loading CatalogManifest objects.

```APIDOC
## CatalogManifest

### Static Methods

- `static initWithEntryGroup(entryGroup: string, ctx: ApiContext): CatalogManifest`
- `static initWithBigQuery(dataset: string, ctx: ApiContext): CatalogManifest`
- `static initWithKB(kb: string, ctx: ApiContext): CatalogManifest`
- `static load(path: string, ctx: ApiContext): Promise<CatalogManifest>`

### Instance Methods

- `save(path: string): void`
```

--------------------------------

### Initialize and Render Cytoscape Graph

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/bundles/stackoverflow/viz.html

Initializes Cytoscape.js with graph data and renders it. It also sets up event listeners for node clicks to display details.

```javascript
const cy = cytoscape({
    container: document.getElementById('cy'),
    elements: bundle.elements,
    style: [
        {
            selector: 'node',
            style: {
                'background-color': '#6FB1FC',
                'color': '#fff',
                'label': 'data(id)'
            }
        },
        {
            selector: 'edge',
            style: {
                'curve-style': 'bezier',
                'target-arrow-color': '#fff',
                'target-arrow-shape': 'triangle',
                'line-color': '#fff',
                'width': 1
            }
        }
    ],
    layout: {
        name: 'grid',
        rows: 1
    }
});
```

--------------------------------

### Initialize Bundle Viewer

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/bundles/stackoverflow/viz.html

Sets the global BUNDLE_NAME and BUNDLE variables with the Stack Overflow dataset information. This is typically done on page load to populate the viewer.

```javascript
window.BUNDLE_NAME = "stackoverflow";
window.BUNDLE = {"nodes": [{"data": {"id": "datasets/stackoverflow", "label": "Stack Overflow Public Dataset", "type": "BigQuery Dataset", "description": "The Stack Overflow public dataset contains a variety of tables related to Stack Overflow user activity, posts, and tags. This dataset is no longer actively updated.", "resource": "https://bigquery.googleapis.com/v2/projects/bigquery-public-data/datasets/stackoverflow", "tags": ["Stack Overflow, public data, community, Q&A"], "color": "#8b5cf6", "size": 41}}, {"data": {"id": "references/badge_classes", "label": "Badge Classes", "type": "Reference", "description": "Enumerated classes for badges awarded on Stack Exchange sites.", "resource": "https://meta.stackexchange.com/questions/2677/database-schema-documentation-for-the-public-data-dump-and-sede", "tags": ["badges", "classes", "enum", "stackoverflow"], "color": "#10b981", "size": 31}}, {"data": {"id": "references/close_as_off_topic_reason_types", "label": "Close As Off-Topic Reason Types", "type": "Reference", "description": "Defines the types and guidance for reasons why a post might be closed as off-topic.", "resource": "https://meta.stackexchange.com/questions/2677/database-schema-documentation-for-the-public-data-dump-and-sede", "tags": ["close reasons", "off-topic", "enum", "schema", "data dump"], "color": "#10b981", "size": 37}}, {"data": {"id": "references/close_reason_types", "label": "Close Reason Types", "type": "Reference", "description": "Enumerated types for reasons why a post"
```

--------------------------------

### kcmd Library: Load Catalog Snapshot

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/mdcode/README.md

Loads an existing catalog snapshot from the filesystem.

```typescript
// Loading a catalog snapshot from the filesystem
const snapshot = kcmd.CatalogSnapshot.fromPath('/path/to/root');
```

--------------------------------

### Clean Up Demo Resources

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/enrichment/README.md

Remove the demo BigQuery dataset and its contents after the demonstration.

```bash
bq rm -r ${DEMO_CLOUD_PROJECT}:demo-dataset
```

--------------------------------

### Configure Google Cloud Project and Authentication

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/samples/enrichment/README.md

Sets the CLOUD_PROJECT environment variable and configures gcloud CLI for application default login and project settings.

```bash
export CLOUD_PROJECT=<cloud-project-id>

gcloud auth application-default login
gcloud config set core/project $CLOUD_PROJECT
gcloud auth application-default set-quota-project $CLOUD_PROJECT
```

--------------------------------

### Configure FilesKB MCP Server in Agent

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/samples/enrichment/src/tools/fileskb/README.md

This JSON configuration shows how to provide the FilesKB MCP server to the enrichment agent. It specifies the command to run the server and its arguments, including the directory to serve.

```json
{
  "mcpServers": {
    "fileskb": {
      "command": "/usr/local/google/home/nikhilko/p/kc/enrichment2/src/.venv/bin/python3",
      "args": [
        "/usr/local/google/home/nikhilko/p/kc/enrichment2/src/tools/fileskb/main.py",
        "--dir",
        "/usr/local/google/home/nikhilko/p/kc/enrichment2/demo/docs"
      ]
    }
  }
}
```

--------------------------------

### Authenticate to Google Cloud

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/samples/ga4_merch_store/README.md

Logs in to Google Cloud using application default credentials and sets the project for BigQuery access. The caller's project is billed for query bytes.

```bash
gcloud auth application-default login
gcloud config set project <your-billing-project>
```

--------------------------------

### Auto-show Initial Bitcoin Concept

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/bundles/crypto_bitcoin/viz.html

Automatically displays the details of the first node in the graph, prioritizing 'BigQuery Dataset' types, or the very first node if no dataset is found.

```javascript
// Auto-show the first node (a dataset if available, else first concept)
const initial = bundle.nodes.find((n) => n.data.type === "BigQuery Dataset") || bundle.nodes[0];
if (initial) showDetail(initial.data.id);
```

--------------------------------

### Set Environment Variables for Google Cloud and Vertex AI

Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/samples/discovery/README.md

Configure essential environment variables for your Google Cloud project and to enable Vertex AI for the discovery agent. Replace <PROJECT_ID> with your actual consumer project ID.

```shell
# Replace <PROJECT_ID> with your consumer project ID.
export GOOGLE_CLOUD_PROJECT=<PROJECT_ID>
export GOOGLE_GENAI_USE_VERTEXAI=True
```