### Setup Python Environment Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/samples/enrichment/README.md Navigates to the src directory and installs dependencies for the Python virtual environment. ```bash cd src source env.sh --install ``` -------------------------------- ### Setup and Build Enrichment Agent Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/enrichment/README.md Clone the repository, install dependencies, and build the enrichment agent project. ```bash git clone https://github.com/googlecloudplatform/knowledge-catalog cd toolbox/enrichment npm install npm run build ``` -------------------------------- ### Setup Demo Resources in BigQuery Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/enrichment/README.md Create a BigQuery dataset and table for the demo, populating it with sample e-commerce data. ```sql bq query --use_legacy_sql=false <. # The resources in the snapshot. Examples: # scope: entryGroup... # scope: bq-dataset.. # scope: kb... # Or multiple BigQuery datasets in array list format: # scope: # - bq-dataset.. # - bq-dataset.. aliases: # Optional. Can always use 3-part fully qualified references. # NOTE: All built-in types have predefined simple aliases. ca-guidelines: aspect: data-agents-project.global.ca-guidelines ecommerce: glossary: data-gov-project.global.ecommerce-glossary # Defines the specific metadata to be retrieved locally from Knowledge Catalog. # NOTE: Required aspects of listed entry types are implicitly included. snapshot: entries: - bigquery-dataset - bigquery-table aspects: - overview - descriptions - queries - ca-guidelines entryLinks: - definition # Optional configuration to identify which types should be published. # This must be a subset of the types specified in the snapshot configuration. publishing: entries: aspects: - ca-guidelines entryLinks: - definition ``` -------------------------------- ### Clone Repository and Install Dependencies Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/mdcode/README.md Clone the knowledge-catalog repository and install npm dependencies for the mdcode tool. ```bash git clone https://github.com/googlecloudplatform/knowledge-catalog cd toolbox/mdcode npm install ``` -------------------------------- ### Initialize kcmd Library Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/mdcode/README.md Installs the kcmd library. Use this to programmatically interact with the Knowledge Catalog. ```bash npm install kcmd ``` -------------------------------- ### Setup BigQuery Demo Resources Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/mdcode/demo/README.md Creates a BigQuery dataset and table for the demo, and generates a catalog.yaml manifest. Run this command to set up the necessary BigQuery resources. ```bash bun setup.ts ``` ```bash cat catalog.yaml ``` -------------------------------- ### Install Dependencies Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/README.md Installs the project dependencies, including development packages, using pip within a virtual environment. ```shell python3.13 -m venv .venv .venv/bin/pip install --index-url https://pypi.org/simple/ -e .[dev] ``` -------------------------------- ### Setup OKF Wiki Demo Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/mdcode/demo/README.md Creates an empty Dataplex EntryGroup for an OKF Wiki and a catalog.yaml manifest. This prepares the environment for publishing the OKF Wiki bundle. ```bash bun setup.ts ``` ```bash cat catalog.yaml ``` ```bash ls -R catalog ``` -------------------------------- ### Minimal Example Bundle Structure Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/SPEC.md Illustrates the directory structure for a minimal OKF bundle, including index files and nested resource directories. ```tree my_bundle/ ├── index.md ├── datasets/ │ ├── index.md │ └── sales.md └── tables/ ├── index.md ├── orders.md └── customers.md ``` -------------------------------- ### Example SQL Query Patterns Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/src/reference_agent/prompts/reference_instruction.md Provides example SQL snippets to demonstrate common query patterns for a given data asset. These are typically included in the 'Common query patterns' section of an OKF document. ```sql SELECT * FROM my_table LIMIT 10; ``` -------------------------------- ### Clone the Knowledge Catalog Repository Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/samples/discovery/README.md Clone the official Github repository to get started with the Knowledge Catalog Discovery Agent. ```shell git clone https://github.com/GoogleCloudPlatform/knowledge-catalog.git ``` -------------------------------- ### Standard Layout Example Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/mdcode/docs/concept.md Illustrates the Standard Layout for bq-dataset and entryGroup scopes. Metadata is in YAML files, with unstructured aspects in Markdown sidecars. ```yaml path/to/root/ ├── catalog.yaml # Catalog metadata, processing config/directives └── catalog/ # Contains all the Entries and EntryLinks └── / ├── .yaml # Single-file entry with all metadata contained within ├── .yaml # Multi-file entry with unstructured metadata split into └── ..md # markdown sidecar files for unstructured aspects ``` -------------------------------- ### Concept Bound to a Resource Example Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/SPEC.md An example of a concept document describing a BigQuery table, including its schema and related resources. ```markdown --- type: BigQuery Table title: Customer Orders description: One row per completed customer order across all channels. resource: https://console.cloud.google.com/bigquery?p=acme&d=sales&t=orders tags: [sales, orders, revenue] timestamp: 2026-05-28T14:30:00Z --- # Schema | Column | Type | Description | |---------------|-----------|------------------------------------------| | `order_id` | STRING | Globally unique order identifier. | | `customer_id` | STRING | Foreign key into [customers](/tables/customers.md). | | `total_usd` | NUMERIC | Order total in US dollars. | | `placed_at` | TIMESTAMP | When the customer submitted the order. | # Joins Joined with [customers](/tables/customers.md) on `customer_id`. # Citations [1] [BigQuery table schema](https://console.cloud.google.com/bigquery?p=acme&d=sales&t=orders) ``` -------------------------------- ### Documents Layout Example Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/mdcode/docs/concept.md Illustrates the Documents Layout for kb scopes. The entry is a single Markdown file with metadata in YAML frontmatter and the overview in the body. ```yaml path/to/root/ ├── catalog.yaml # Catalog metadata, processing config/directives └── catalog/ # Contains all the Entries and EntryLinks └── / └── .md # Single markdown file: structured metadata in # frontmatter, Overview in body ``` -------------------------------- ### Concept Not Bound to a Resource Example Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/SPEC.md An example of a concept document for a playbook, not directly tied to a specific resource URI. ```markdown --- type: Playbook title: Incident response — data freshness alert description: Steps to triage a freshness alert on the orders pipeline. tags: [oncall, incident] timestamp: 2026-04-12T09:00:00Z --- # Trigger A freshness alert fires when `orders` lags more than 30 minutes behind its expected SLA. See the [orders table](/tables/orders.md). # Steps 1. Check the [ingestion job dashboard](https://example.com/dash). 2. … ``` -------------------------------- ### Configure gcloud CLI Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/mdcode/demo/README.md Log in to your Google Cloud account and set the default compute region and project for the gcloud CLI. Ensure gcloud is installed and configured. ```bash gcloud auth application-default login gcloud config set compute/region us-central1 gcloud config set project $DEMO_CLOUD_PROJECT ``` -------------------------------- ### BigQuery Dataset Metadata Example Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/SPEC.md Defines metadata for a BigQuery dataset using OKF frontmatter. Includes type, title, description, resource link, tags, and timestamp. ```markdown --- type: BigQuery Dataset title: Sales description: All sales-related tables for the retail business. resource: https://console.cloud.google.com/bigquery?p=acme&d=sales tags: [sales] timestamp: 2026-05-28T00:00:00Z --- The sales dataset contains transactional tables, including [orders](/tables/orders.md) and [customers](/tables/customers.md). ``` -------------------------------- ### Metadata Directory Layout Example Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/mdcode/README.md Illustrates the hierarchical organization of metadata artifacts within a directory structure for Knowledge Catalog. Includes the manifest file and nested metadata entries. ```yaml path/to/root/ ├── catalog.yaml # Manifest and config directives └── catalog/ # Contains the metadata snapshot └── / └── .yaml # Entry └── / ├── .yaml # Entry with sidecar markdown └── .aspect.md # files ``` -------------------------------- ### Example Cross-linking in Markdown Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/src/reference_agent/prompts/reference_instruction.md Illustrates how to create file-relative links to other concepts within a markdown document. These links are essential for navigating between related documents in a bundle. ```markdown Sibling table: `[users](users.md)` Parent dataset from a table: `[dataset](../datasets/.md)` Reference doc: `[event parameters](../references/event_parameters.md)` ``` -------------------------------- ### Example Citations in OKF Format Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/src/reference_agent/prompts/reference_instruction.md Shows the Open Knowledge Format (OKF) for citing external sources or related documents. The resource URI of the concept is typically listed first. ```markdown [1] [Source Title](https://example.com/...) [2] [Another Source](https://example.com/...) ``` -------------------------------- ### BigQuery Table Metadata Example Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/SPEC.md Defines metadata for a BigQuery table using OKF frontmatter and markdown for schema description. Includes type, title, description, resource link, tags, and timestamp. ```markdown --- type: BigQuery Table title: Orders description: One row per completed customer order. resource: https://console.cloud.google.com/bigquery?p=acme&d=sales&t=orders tags: [sales, orders] timestamp: 2026-05-28T00:00:00Z --- # Schema | Column | Type | Description | |---------------|-----------|------------------------------| | `order_id` | STRING | Unique order identifier. | | `customer_id` | STRING | FK to [customers](/tables/customers.md). | | `total_usd` | NUMERIC | Order total in USD. | Part of the [sales dataset](/datasets/sales.md). ``` -------------------------------- ### Initialize Demo Environment Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/enrichment/README.md Set the demo cloud project ID and configure gcloud for the demo. ```bash export DEMO_CLOUD_PROJECT="" gcloud auth application-default login gcloud config set project $DEMO_CLOUD_PROJECT gcloud config set compute/region us ``` -------------------------------- ### Clone Repository and Navigate to Sample Directory Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/samples/enrichment/README.md Clones the knowledge-catalog repository and changes the directory to the enrichment sample. ```bash git clone https://github.com/googlecloudplatform/knowledge-catalog cd samples/enrichment ``` -------------------------------- ### Initialize and Run Enrichment CLI Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/enrichment/README.md Initialize a catalog snapshot, pull the latest data, and run the enrichment tool using the `kcmd` and `kcagent` CLIs. ```bash # Initialize a new catalog snapshot for a bigquery dataset kcmd init --bigquery-dataset . # Pull the latest catalog snapshot from the Knowledge Catalog service kcmd pull # Run the enrichment tool kcagent enrich --catalog-path . --tools-path tools --prompt-path prompt.md ``` -------------------------------- ### Retrieve Body of a Specific Tag Wiki Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/bundles/stackoverflow/tables/posts_tag_wiki.md Use this query to get the wiki content for a specific tag by its ID. ```sql SELECT body FROM `bigquery-public-data.stackoverflow.posts_tag_wiki` WHERE id = 5046395 ``` -------------------------------- ### CLI: Initialize Catalog Snapshot for BigQuery Dataset Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/mdcode/README.md Initializes a new catalog snapshot for a specified BigQuery dataset. Requires project and dataset IDs. ```bash kcmd init --bigquery-dataset . ``` -------------------------------- ### CLI: Initialize Catalog Snapshot for Multiple BigQuery Datasets Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/mdcode/README.md Initializes a new catalog snapshot for multiple BigQuery datasets by specifying each dataset with the --bigquery-dataset flag. ```bash kcmd init --bigquery-dataset . --bigquery-dataset . ``` -------------------------------- ### Download Metadata Snapshot Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/samples/enrichment/README.md Downloads an initial metadata snapshot for the enrichment process. Requires the KC_ENRICH_SAMPLE_PROJECT environment variable to be set. ```bash python3 -m enrichment.download \ --dir ../sample/metadata.initial \ --dataset ${KC_ENRICH_SAMPLE_PROJECT}.kc_enrich_sample_data ``` -------------------------------- ### CLI: Initialize Catalog Snapshot for Custom EntryGroup Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/mdcode/README.md Initializes a catalog snapshot for a custom EntryGroup, requiring project, location, and entry group IDs. ```bash kcmd init --entry-group .. ``` -------------------------------- ### Get Latest Tag Wiki Excerpts Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/bundles/stackoverflow/viz.html Retrieve the title, creation date, and last editor for the 10 most recently created tag wiki excerpts. ```sql SELECT title, creation_date, last_editor_display_name FROM `bigquery-public-data.stackoverflow.posts_tag_wiki_excerpt` ORDER BY creation_date DESC LIMIT 10 ``` -------------------------------- ### Identify Top Post Editors Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/bundles/stackoverflow/viz.html Finds the top 5 users who have made the most edits to posts, joining with the users table to get display names. ```sql SELECT t2.display_name, COUNT(t1.id) AS edit_count FROM `bigquery-public-data.stackoverflow.post_history` AS t1 INNER JOIN `bigquery-public-data.stackoverflow.users` AS t2 ON t1.user_id = t2.id WHERE t1.post_history_type_id = 2 -- Assuming \'2\' means \'Post Edited\' GROUP BY t2.display_name ORDER BY edit_count DESC LIMIT 5; ``` -------------------------------- ### Initial Node Display Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/bundles/ga4/viz.html Automatically displays the detail panel for the first node found in the bundle, prioritizing 'BigQuery' type nodes. ```javascript // Auto-show the first node (a dataset if available, else first concept) const initial = bundle.nodes.find((n) => n.data.type === "BigQuery" || n.data.type === "Dataset"); if (initial) { showDetail(initial.data.id); } else if (bundle.elements.length > 0) { showDetail(bundle.elements[0].data.id); } ``` -------------------------------- ### CLI: Initialize Catalog Snapshot with Specific Types and Aspects Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/mdcode/README.md Initializes a catalog snapshot for a BigQuery dataset, specifying the entry types and aspects to include. ```bash kcmd init --bigquery-dataset . \ --entry bigquery-table --entry bigquery-view \ --aspect overview --aspect description ``` -------------------------------- ### Get Questions with Accepted Answers and Answer Count Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/bundles/stackoverflow/viz.html Retrieves questions that have an accepted answer, ordered by the number of answers. Useful for identifying popular or well-answered questions. ```sql SELECT id, title, accepted_answer_id, answer_count FROM `bigquery-public-data.stackoverflow.posts_questions` WHERE accepted_answer_id IS NOT NULL ORDER BY answer_count DESC LIMIT 5 ``` -------------------------------- ### Catalog Entry YAML File Structure Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/mdcode/README.md Defines metadata for a specific catalog entry, including its ID, type, resource details, and schema. This is an example for a BigQuery table. ```yaml id: products type: bigquery-table resource: name: projects/prod-data/datasets/ecommerce/tables/products displayName: Products Table description: All products in the catalog labels: env: prod createTime: 2026-04-23T00:44:03Z updateTime: 2026-04-23T00:44:03Z schema: ... contacts: ... ``` -------------------------------- ### Run the Knowledge Catalog Discovery Agent using ADK CLI Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/samples/discovery/README.md Execute the Knowledge Catalog Discovery agent using the ADK CLI. Ensure you provide the correct path to the parent directory containing the agent's source code. ```shell adk run path/to/agent/parent/folder ``` -------------------------------- ### Create Sample BigQuery Data Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/samples/enrichment/README.md Executes a Python script to create a sample BigQuery dataset for metadata curation. ```bash python3 ../sample/data/create_data.py ``` -------------------------------- ### Get Top 10 Scored Answers in a Date Range Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/bundles/stackoverflow/viz.html Retrieves the 10 highest-scored answers within a specified date range from the Stack Overflow posts_answers table. ```sql SELECT id, score, creation_date, body FROM `bigquery-public-data.stackoverflow.posts_answers` WHERE creation_date BETWEEN '2023-01-01' AND '2023-01-31' ORDER BY score DESC LIMIT 10 ``` -------------------------------- ### Generate Visualize HTML with Custom Output and Name Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/README.md Generates an interactive HTML visualization of an OKF bundle, specifying a custom output path and a display name for the viewer header. ```bash .venv/bin/python -m reference_agent visualize \ --bundle ./bundles/crypto_bitcoin \ --out /tmp/btc.html \ --name "Bitcoin OKF" ``` -------------------------------- ### Get Questions with Accepted Answers Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/bundles/stackoverflow/tables/posts_questions.md Retrieves questions that have an accepted answer, ordered by the number of answers. This snippet limits the results to the top 5 questions with accepted answers. ```sql SELECT id, title, accepted_answer_id, answer_count FROM `bigquery-public-data.stackoverflow.posts_questions` WHERE accepted_answer_id IS NOT NULL ORDER BY answer_count DESC LIMIT 5 ``` -------------------------------- ### Standard Layout: Entry Resource Info, Source, and Aspects Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/mdcode/docs/concept.md YAML file structure for standard layout, including entry metadata, resource information, schema, aspects, and links. ```yaml id: # Entry metadata type: resource: # Entry.EntrySource name: displayName: description: labels: key: value location: parent: ancestors: createTime: updateTime: schema: fields: - name1: dataType: mode: … links: # EntryLinks associated with Schema.path are inlined definition: # into a Schema field to leverage context of field - target: glossary.term # specification : # In the general-case, each top-level field is an [aspect.data] # aspect. Nested field represents aspect.data links: # EntryLinks with this entry as source listed here : - target: : [aspect.data] ``` -------------------------------- ### Run Reference Agent with Bitcoin Dataset Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/samples/crypto_bitcoin/README.md Execute the reference agent to enrich data from the crypto_bitcoin BigQuery dataset. This command specifies the source, dataset, seed file, and output directory. ```bash .venv/bin/python -m reference_agent enrich \ --source bq \ --dataset bigquery-public-data.crypto_bitcoin \ --web-seed-file samples/crypto_bitcoin/seeds.txt \ --out ./bundles/crypto_bitcoin ``` -------------------------------- ### Get Details for a Specific Stack Overflow Tag Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/bundles/stackoverflow/viz.html Fetches detailed information, including ID, name, count, and associated post IDs for excerpts and wikis, for a tag named 'python'. ```sql SELECT id, tag_name, count, excerpt_post_id, wiki_post_id FROM `bigquery-public-data.stackoverflow.tags` WHERE tag_name = 'python' ``` -------------------------------- ### Get Details for a Specific Tag Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/bundles/stackoverflow/tables/tags.md Fetch detailed information for a particular tag, such as 'python'. This query selects the tag's ID, name, count, and post IDs for its excerpt and wiki. ```sql SELECT id, tag_name, count, excerpt_post_id, wiki_post_id FROM `bigquery-public-data.stackoverflow.tags` WHERE tag_name = 'python' ``` -------------------------------- ### Auto-show Initial Node Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/bundles/stackoverflow/viz.html Automatically displays the details for the first node in the bundle, prioritizing 'BigQuery Dataset' type nodes. ```javascript const initial = bundle.nodes.find((n) => n.data.type === "BigQuery Dataset") || bundle.nodes[0]; if (initial) showDetail(initial.data.id); })(); ``` -------------------------------- ### Get Top 10 Answers by Score in a Date Range Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/bundles/stackoverflow/tables/posts_answers.md Retrieves the 10 highest-scored answers within a specified month. Ensure the date range and table name are correct for your query. ```sql SELECT id, score, creation_date, body FROM `bigquery-public-data.stackoverflow.posts_answers` WHERE creation_date BETWEEN '2023-01-01' AND '2023-01-31' ORDER BY score DESC LIMIT 10 ``` -------------------------------- ### Get Total Transactions Per Day Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/bundles/crypto_bitcoin/viz.html Calculates the total number of Bitcoin transactions for each day within a specified date range. Ensure the date range is correctly set for the desired period. ```sql SELECT DATE(block_timestamp) AS transaction_date, COUNT(hash) AS transaction_count FROM `bigquery-public-data.crypto_bitcoin.transactions` WHERE block_timestamp BETWEEN '2023-01-01' AND '2023-01-31' GROUP BY transaction_date ORDER BY transaction_date DESC; ``` -------------------------------- ### Generate Visualize HTML Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/README.md Generates an interactive HTML visualization of an OKF bundle. The output file is named 'viz.html' and is placed within the bundle directory. ```bash .venv/bin/python -m reference_agent visualize --bundle ./ ``` -------------------------------- ### Run Enrichment with Fileset Source Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/enrichment/README.md Execute the `kcagent enrich` command with the specified catalog path, tools path, and prompt file, utilizing the fileset source for enrichment. ```bash ../dist/kcagent enrich --catalog-path . --tools-path tools --prompt-path prompt.md ``` -------------------------------- ### Count Total Events, Unique Users, and Days Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/samples/enrichment/sample/docs/example1.md Use this query to get a high-level overview of your dataset's scale. It counts all events, distinct users based on device cookies, and the number of unique days present. ```sql SELECT COUNT(*) AS total_events, COUNT(DISTINCT user_pseudo_id) AS total_unique_users, COUNT(DISTINCT event_date) AS total_days FROM `kc_enrich_sample_data.ga_events` ``` -------------------------------- ### Join Moderator Nominations with User Data Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/bundles/stackoverflow/tables/posts_moderator_nomination.md Fetches the ID, creation date, nominator's display name, and body of moderator nominations, joining with the users table to get the nominator's name. Limits results to 10 nominations created on or after January 1, 2022. ```sql SELECT p.id, p.creation_date, u.display_name AS nominator_name, p.body FROM `bigquery-public-data.stackoverflow.posts_moderator_nomination` AS p JOIN `bigquery-public-data.stackoverflow.users` AS u ON p.owner_user_id = u.id WHERE p.creation_date >= '2022-01-01' LIMIT 10 ``` -------------------------------- ### Build the mdcode Tool Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/mdcode/README.md Build the mdcode tool using the provided npm script. ```bash npm run build ``` -------------------------------- ### Run Reference Agent Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/README.md Executes the reference agent to enrich data catalog with BigQuery metadata and web crawl data. Specify dataset, web seeds, and output directory. ```shell .venv/bin/python -m reference_agent enrich \ --source bq \ --dataset . \ --web-seed-file \ --out ./bundles/ ``` -------------------------------- ### CatalogManifest Library Methods Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/mdcode/docs/design.md Methods for initializing and loading CatalogManifest objects. ```APIDOC ## CatalogManifest ### Static Methods - `static initWithEntryGroup(entryGroup: string, ctx: ApiContext): CatalogManifest` - `static initWithBigQuery(dataset: string, ctx: ApiContext): CatalogManifest` - `static initWithKB(kb: string, ctx: ApiContext): CatalogManifest` - `static load(path: string, ctx: ApiContext): Promise` ### Instance Methods - `save(path: string): void` ``` -------------------------------- ### Initialize and Render Cytoscape Graph Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/bundles/stackoverflow/viz.html Initializes Cytoscape.js with graph data and renders it. It also sets up event listeners for node clicks to display details. ```javascript const cy = cytoscape({ container: document.getElementById('cy'), elements: bundle.elements, style: [ { selector: 'node', style: { 'background-color': '#6FB1FC', 'color': '#fff', 'label': 'data(id)' } }, { selector: 'edge', style: { 'curve-style': 'bezier', 'target-arrow-color': '#fff', 'target-arrow-shape': 'triangle', 'line-color': '#fff', 'width': 1 } } ], layout: { name: 'grid', rows: 1 } }); ``` -------------------------------- ### Initialize Bundle Viewer Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/bundles/stackoverflow/viz.html Sets the global BUNDLE_NAME and BUNDLE variables with the Stack Overflow dataset information. This is typically done on page load to populate the viewer. ```javascript window.BUNDLE_NAME = "stackoverflow"; window.BUNDLE = {"nodes": [{"data": {"id": "datasets/stackoverflow", "label": "Stack Overflow Public Dataset", "type": "BigQuery Dataset", "description": "The Stack Overflow public dataset contains a variety of tables related to Stack Overflow user activity, posts, and tags. This dataset is no longer actively updated.", "resource": "https://bigquery.googleapis.com/v2/projects/bigquery-public-data/datasets/stackoverflow", "tags": ["Stack Overflow, public data, community, Q&A"], "color": "#8b5cf6", "size": 41}}, {"data": {"id": "references/badge_classes", "label": "Badge Classes", "type": "Reference", "description": "Enumerated classes for badges awarded on Stack Exchange sites.", "resource": "https://meta.stackexchange.com/questions/2677/database-schema-documentation-for-the-public-data-dump-and-sede", "tags": ["badges", "classes", "enum", "stackoverflow"], "color": "#10b981", "size": 31}}, {"data": {"id": "references/close_as_off_topic_reason_types", "label": "Close As Off-Topic Reason Types", "type": "Reference", "description": "Defines the types and guidance for reasons why a post might be closed as off-topic.", "resource": "https://meta.stackexchange.com/questions/2677/database-schema-documentation-for-the-public-data-dump-and-sede", "tags": ["close reasons", "off-topic", "enum", "schema", "data dump"], "color": "#10b981", "size": 37}}, {"data": {"id": "references/close_reason_types", "label": "Close Reason Types", "type": "Reference", "description": "Enumerated types for reasons why a post" ``` -------------------------------- ### kcmd Library: Load Catalog Snapshot Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/mdcode/README.md Loads an existing catalog snapshot from the filesystem. ```typescript // Loading a catalog snapshot from the filesystem const snapshot = kcmd.CatalogSnapshot.fromPath('/path/to/root'); ``` -------------------------------- ### Clean Up Demo Resources Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/toolbox/enrichment/README.md Remove the demo BigQuery dataset and its contents after the demonstration. ```bash bq rm -r ${DEMO_CLOUD_PROJECT}:demo-dataset ``` -------------------------------- ### Configure Google Cloud Project and Authentication Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/samples/enrichment/README.md Sets the CLOUD_PROJECT environment variable and configures gcloud CLI for application default login and project settings. ```bash export CLOUD_PROJECT= gcloud auth application-default login gcloud config set core/project $CLOUD_PROJECT gcloud auth application-default set-quota-project $CLOUD_PROJECT ``` -------------------------------- ### Configure FilesKB MCP Server in Agent Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/samples/enrichment/src/tools/fileskb/README.md This JSON configuration shows how to provide the FilesKB MCP server to the enrichment agent. It specifies the command to run the server and its arguments, including the directory to serve. ```json { "mcpServers": { "fileskb": { "command": "/usr/local/google/home/nikhilko/p/kc/enrichment2/src/.venv/bin/python3", "args": [ "/usr/local/google/home/nikhilko/p/kc/enrichment2/src/tools/fileskb/main.py", "--dir", "/usr/local/google/home/nikhilko/p/kc/enrichment2/demo/docs" ] } } } ``` -------------------------------- ### Authenticate to Google Cloud Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/samples/ga4_merch_store/README.md Logs in to Google Cloud using application default credentials and sets the project for BigQuery access. The caller's project is billed for query bytes. ```bash gcloud auth application-default login gcloud config set project ``` -------------------------------- ### Auto-show Initial Bitcoin Concept Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/okf/bundles/crypto_bitcoin/viz.html Automatically displays the details of the first node in the graph, prioritizing 'BigQuery Dataset' types, or the very first node if no dataset is found. ```javascript // Auto-show the first node (a dataset if available, else first concept) const initial = bundle.nodes.find((n) => n.data.type === "BigQuery Dataset") || bundle.nodes[0]; if (initial) showDetail(initial.data.id); ``` -------------------------------- ### Set Environment Variables for Google Cloud and Vertex AI Source: https://github.com/googlecloudplatform/knowledge-catalog/blob/main/samples/discovery/README.md Configure essential environment variables for your Google Cloud project and to enable Vertex AI for the discovery agent. Replace with your actual consumer project ID. ```shell # Replace with your consumer project ID. export GOOGLE_CLOUD_PROJECT= export GOOGLE_GENAI_USE_VERTEXAI=True ```