### Installing Python Prerequisites Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/1-Quickstart.ipynb Installs the necessary Python packages that are not part of the standard library, specifically devtools, requests, and tqdm, which are required to run the subsequent code and interact with the GraphRAG API. ```Python ! pip install devtools requests tqdm ``` -------------------------------- ### Installing Pre-commit Hooks - Shell Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/docs/DEVELOPMENT-GUIDE.md This command installs the pre-commit hooks configured for the repository. These hooks automatically run code style and linting checks using tools like Ruff before each commit. ```shell pre-commit install ``` -------------------------------- ### Installing Test Dependencies - Poetry Shell Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/docs/DEVELOPMENT-GUIDE.md This command navigates to the backend directory of the repository and installs project dependencies, including those required specifically for running tests, using the Poetry package manager. ```shell cd /backend poetry install --with test ``` -------------------------------- ### Executing Solution Accelerator Deployment Script (Shell) Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/docs/DEPLOYMENT-GUIDE.md This snippet demonstrates how to navigate to the 'infra' directory and execute the 'deploy.sh' script. It shows commands for viewing the help menu for additional options and performing the actual deployment using a specified parameters file. ```shell cd infra bash deploy.sh -h # view help menu for additional options bash deploy.sh -p deploy.parameters.json ``` -------------------------------- ### Registering and Verifying Azure Resource Providers Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/docs/DEPLOYMENT-GUIDE.md Registers necessary Azure resource providers for Operations Management, Alerts Management, and Compute using `az provider register`, and then verifies their successful registration status using `az provider show` formatted as a table. ```shell # register providers az provider register --namespace Microsoft.OperationsManagement az provider register --namespace Microsoft.AlertsManagement az provider register --namespace Microsoft.Compute # verify providers were registered az provider show --namespace Microsoft.OperationsManagement -o table az provider show --namespace Microsoft.AlertsManagement -o table az provider show --namespace Microsoft.Compute -o table ``` -------------------------------- ### Logging In and Setting Azure Subscription with Azure CLI Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/docs/DEPLOYMENT-GUIDE.md Authenticates the user to Azure via `az login`, displays the current account details using `az account show`, and then sets the active subscription for subsequent commands using `az account set`, specifying either the subscription name or ID. ```shell # login to Azure - may need to use the "--use-device-code" flag if using a remote host/virtual machine az login # check what subscription you are logged into az account show # set appropriate subscription az account set --subscription " or " ``` -------------------------------- ### Importing Required Python Libraries Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/1-Quickstart.ipynb Imports the Python libraries needed for the script: getpass for securely getting input, json for handling JSON data, time for time operations, pathlib for path manipulation, requests for making HTTP requests, devtools.pprint for pretty printing, and tqdm for progress bars. ```Python import getpass import json import time from pathlib import Path import requests from devtools import pprint from tqdm import tqdm ``` -------------------------------- ### Creating an Azure Resource Group with Azure CLI Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/docs/DEPLOYMENT-GUIDE.md Creates a new Azure resource group using the `az group create` command, specifying the desired name for the resource group and the Azure geographic location where it should be provisioned. ```shell az group create --name --location ``` -------------------------------- ### Starting Azurite Emulator - Docker Shell Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/docs/DEVELOPMENT-GUIDE.md This command runs the Azurite (Azure Storage emulator) as a detached Docker container, exposing the default ports for Blob, Queue, and Table storage for local development and testing. ```shell docker run -d -p 10000:10000 -p 10001:10001 -p 10002:10002 mcr.microsoft.com/azure-storage/azurite:latest ``` -------------------------------- ### Install GraphRAG API Prerequisites Python Packages Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb Installs the required Python packages for the GraphRAG API demo notebook. It uses `pip` to install `devtools`, `pandas`, `requests`, and `tqdm`. ```shell ! pip install devtools pandas requests tqdm ``` -------------------------------- ### Configuring GraphRAG API Subscription Key Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/1-Quickstart.ipynb Prompts the user to enter their Azure API Management subscription key for authenticating requests to the GraphRAG API. It then creates a dictionary containing the `Ocp-Apim-Subscription-Key` header with the provided key, which will be used in subsequent API calls. ```Python ocp_apim_subscription_key = getpass.getpass( "Enter the subscription key to the GraphRag APIM:" ) """ "Ocp-Apim-Subscription-Key": This is a custom HTTP header used by Azure API Management service (APIM) to authenticate API requests. The value for this key should be set to the subscription key provided by the Azure APIM instance in your GraphRAG resource group. """ headers = {"Ocp-Apim-Subscription-Key": ocp_apim_subscription_key} ``` -------------------------------- ### Starting Cosmos DB Emulator - Docker Shell Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/docs/DEVELOPMENT-GUIDE.md This command runs the Azure Cosmos DB emulator as a detached Docker container, mapping the default HTTP (8081) and MongoDB (1234) ports for local development and testing purposes. ```shell docker run -d -p 8081:8081 -p 1234:1234 mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator:vnext-preview ``` -------------------------------- ### Starting Indexing Job using GraphRAG API (Python) Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb Initiates the knowledge graph construction process (indexing) for data in a given storage container. It demonstrates how to pass optional custom prompts generated previously or defaults to None. ```python # check if custom prompts were generated if "auto_template_response" in locals() and auto_template_response.ok: entity_extraction_prompt = prompts["entity_extraction_prompt"] community_summarization_prompt = prompts["community_summarization_prompt"] summarize_description_prompt = prompts["entity_summarization_prompt"] else: entity_extraction_prompt = community_summarization_prompt = ( summarize_description_prompt ) = None response = build_index( storage_name=storage_name, index_name=index_name, entity_extraction_prompt=entity_extraction_prompt, community_summarization_prompt=community_summarization_prompt, entity_summarization_prompt=summarize_description_prompt, ) if response.ok: pprint(response.json()) else: print(f"Failed to submit job.\nStatus: {response.text}") ``` -------------------------------- ### Running GraphRAG Frontend Locally with Docker Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/frontend/README.md This snippet shows the shell commands to build the frontend Docker image and run it locally. It requires Docker to be installed and can optionally load environment variables from a file. The application will be accessible on `localhost:8080`. ```Shell # cd to the root directory of the repo > docker build -t graphrag:frontend -f docker/Dockerfile-frontend . > docker run --env-file -p 8080:8080 graphrag:frontend ``` -------------------------------- ### Building GraphRAG Knowledge Graph Index Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/1-Quickstart.ipynb Defines and calls the `build_index` function to initiate the creation of a knowledge graph index using the GraphRAG API's `/index` endpoint. It sends a POST request specifying the storage container name (`storage_name`) containing the data and the desired index name (`index_name`). ```Python def build_index( storage_name: str, index_name: str, ) -> requests.Response: """Create a search index. This function kicks off a job that builds a knowledge graph index from files located in a blob storage container. """ url = endpoint + "/index" return requests.post( url, params={ "index_container_name": index_name, "storage_container_name": storage_name, }, headers=headers, ) response = build_index(storage_name=storage_name, index_name=index_name) print(response) if response.ok: print(response.text) else: print(f"Failed to submit job.\nStatus: {response.text}") ``` -------------------------------- ### Get All Helm Release Details (Shell) Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/infra/helm/graphrag/templates/NOTES.txt This command retrieves comprehensive details for the specified Helm release, including the configuration values used, hooks, manifests, and notes. ```shell $ helm get all {{ .Release.Name }} ``` -------------------------------- ### Performing a Local GraphRAG Query Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/1-Quickstart.ipynb Defines and calls the `local_search` function to execute a local query against the specified knowledge graph index using the GraphRAG API's `/query/local` endpoint. It sends a POST request with the index name, the query string, and the community level, then parses and prints the result using the helper function. ```Python def local_search( index_name: str | list[str], query: str, community_level: int ) -> requests.Response: """Run a local query over the knowledge graph(s) associated with one or more indexes""" url = endpoint + "/query/local" # optional parameter: community level to query the graph at (default for local query = 2) request = { "index_name": index_name, "query": query, "community_level": community_level, } return requests.post(url, json=request, headers=headers) # perform a local query local_response = local_search( index_name=index_name, query="Summarize the main topics found in this data", community_level=2, ) local_response_data = parse_query_response(local_response, return_context_data=True) local_response_data ``` -------------------------------- ### Checking GraphRAG Indexing Job Status Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/1-Quickstart.ipynb Defines and calls the `index_status` function to retrieve the current status of an ongoing or completed indexing job using the GraphRAG API's `/index/status/{index_name}` endpoint. It sends a GET request and pretty prints the JSON response containing the status details. ```Python def index_status(index_name: str) -> requests.Response: url = endpoint + f"/index/status/{index_name}" return requests.get(url, headers=headers) response = index_status(index_name) pprint(response.json()) ``` -------------------------------- ### Running Pytest Tests - Shell Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/docs/DEVELOPMENT-GUIDE.md This command navigates to the backend directory and executes the Pytest test suite. The `-s` flag shows print statements, and `--cov=src tests` reports code coverage for the source directory. ```shell cd /backend pytest -s --cov=src tests ``` -------------------------------- ### Asserting User Configuration Variables are Set Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/1-Quickstart.ipynb Performs an assertion to ensure that the required user configuration variables (`file_directory`, `storage_name`, `index_name`, and `endpoint`) have been updated from their initial empty string values. This prevents proceeding with API calls without necessary configuration. ```Python assert ( file_directory != "" and storage_name != "" and index_name != "" and endpoint != "" ) ``` -------------------------------- ### Performing a Global GraphRAG Query Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/1-Quickstart.ipynb Defines and calls the `global_search` function to execute a global query against the specified knowledge graph index using the GraphRAG API's `/query/global` endpoint. It sends a POST request with the index name, the query string, and the community level, then parses and prints the result using the helper function. ```Python def global_search( index_name: str | list[str], query: str, community_level: int ) -> requests.Response: """Run a global query over the knowledge graph(s) associated with one or more indexes""" url = endpoint + "/query/global" # optional parameter: community level to query the graph at (default for global query = 1) request = { "index_name": index_name, "query": query, "community_level": community_level, } return requests.post(url, json=request, headers=headers) # perform a global query global_response = global_search( index_name=index_name, query="Summarize the main topics found in this data", community_level=1, ) global_response_data = parse_query_response(global_response, return_context_data=True) global_response_data ``` -------------------------------- ### Deploying GraphRAG Frontend to Azure Web App Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/frontend/README.md This shell command executes the deployment script for hosting the frontend application on Azure Web App. It requires navigating to the frontend directory and providing a path to a populated parameters JSON file. Requires Azure CLI installed and configured. ```Shell # cd to graphrag-accelerator/frontend directory > bash deploy.sh -p frontend_deploy.parameters.json ``` -------------------------------- ### Listing Files GraphRAG API Python Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb Retrieves a list of all Azure storage containers that hold raw data managed by the GraphRAG API. It performs a GET request to the '/data' endpoint. Requires the 'requests' library. ```Python def list_files() -> requests.Response: """Get a list of all azure storage containers that hold raw data.""" url = endpoint + "/data" return requests.get(url=url, headers=headers) ``` -------------------------------- ### Defining Required User Configuration Variables Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/1-Quickstart.ipynb Declares placeholder variables that the user must configure with specific values: `file_directory` (local path to data files), `storage_name` (Azure Blob Storage container name), `index_name` (unique name for the knowledge graph index), and `endpoint` (GraphRAG API Gateway URL). These are crucial for subsequent steps. ```Python """ These parameters must be defined by the notebook user: - file_directory: a local directory of text files. The file structure should be flat, with no nested directories. (i.e. file_directory/file1.txt, file_directory/file2.txt, etc.) - storage_name: a unique name to identify a blob storage container in Azure where files from `file_directory` will be uploaded. - index_name: a unique name to identify a single graphrag knowledge graph index. Note: Multiple indexes may be created from the same `storage_name` blob storage container. - endpoint: the base/endpoint URL for the GraphRAG API (this is the Gateway URL found in the APIM resource). """ file_directory = "" storage_name = "" index_name = "" endpoint = "" ``` -------------------------------- ### Parsing GraphRAG Query Response Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/1-Quickstart.ipynb Defines a helper function `parse_query_response` that takes a `requests.Response` object from a query API call. It checks if the response was successful, prints the `result` field from the JSON payload, and optionally returns the `context_data` field if `return_context_data` is True. ```Python # a helper function to parse out the result from a query response def parse_query_response( response: requests.Response, return_context_data: bool = False ) -> requests.Response | dict[list[dict]]: """ Print response['result'] value and return context data. """ if response.ok: print(json.loads(response.text)["result"]) if return_context_data: return json.loads(response.text)["context_data"] return response else: print(response.reason) print(response.content) return response ``` -------------------------------- ### Building and Running GraphRAG Backend Docker Image (Shell) Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/infra/managed-app/README.md Builds the GraphRAG backend Docker image using the specified Dockerfile and tags it `graphrag:latest`. Then, runs the built image in detached mode, mapping host port 8080 to container port 80 for accessing the API. Requires Docker to be installed. ```shell cd docker build -t graphrag:latest -f docker/Dockerfile-backend . docker run -d -p 8080:80 graphrag:latest ``` -------------------------------- ### Checking Index Status GraphRAG API Python Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb Gets the current status of a specific GraphRAG index build job via the API. It sends a GET request to the '/index/status/{container_name}' endpoint. Requires the 'requests' library. ```Python def index_status(container_name: str) -> requests.Response: """Get the status of a specific index.""" url = endpoint + f"/index/status/{container_name}" return requests.get(url, headers=headers) ``` -------------------------------- ### Uploading Files to GraphRAG Storage Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/1-Quickstart.ipynb Defines and calls the `upload_files` function to upload text files from a local directory to the Azure Blob Storage container specified by `storage_name` using the GraphRAG API's `/data` endpoint. It supports batching, retries on API busy errors (status 500), and reports the response status. ```Python def upload_files( file_directory: str, container_name: str, batch_size: int = 100, overwrite: bool = True, max_retries: int = 5, ) -> requests.Response | list[Path]: """ Upload files to a blob storage container. Args: file_directory - a local directory of .txt files to upload. All files must be in utf-8 encoding. container_name - a unique name for the Azure storage container. batch_size - the number of files to upload in a single batch. overwrite - whether or not to overwrite files if they already exist in the storage container. max_retries - the maximum number of times to retry uploading a batch of files if the API is busy. NOTE: Uploading files may sometimes fail if the blob container was recently deleted (i.e. a few seconds before. The solution "in practice" is to sleep a few seconds and try again. """ url = endpoint + "/data" def upload_batch( files: list, container_name: str, overwrite: bool, max_retries: int ) -> requests.Response: for _ in range(max_retries): response = requests.post( url=url, files=files, params={"container_name": container_name, "overwrite": overwrite}, headers=headers, ) # API may be busy, retry if response.status_code == 500: print("API busy. Sleeping and will try again.") time.sleep(10) continue return response return response batch_files = [] filepaths = list(Path(file_directory).iterdir()) for file in tqdm(filepaths): # validate that file is a file, has acceptable file type, has a .txt extension, and has utf-8 encoding if (not file.is_file()): print(f"Skipping invalid file: {file}") continue batch_files.append( ("files", open(file=file, mode="rb")) ) # upload batch of files if len(batch_files) == batch_size: response = upload_batch(batch_files, container_name, overwrite, max_retries) # if response is not ok, return early if not response.ok: return response batch_files.clear() # upload last batch of remaining files if len(batch_files) > 0: response = upload_batch(batch_files, container_name, overwrite, max_retries) return response response = upload_files( file_directory=file_directory, container_name=storage_name, batch_size=100, overwrite=True, ) if not response.ok: print(response.text) else: print(response) ``` -------------------------------- ### Formatting and Linting Bicep Files (Bash) Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/infra/managed-app/README.md Uses `find` to locate all `.bicep` files in the specified directory and its subdirectories, then applies `az bicep format` and `az bicep lint` to each file. Helps catch syntax errors and maintain code style early in the process. Requires Azure CLI with Bicep module installed. ```bash cd /infra find . -type f -name "*.bicep" -exec az bicep format --file {} \; find . -type f -name "*.bicep" -exec az bicep lint --file {} \; ``` -------------------------------- ### Deleting an Index using GraphRAG API (Python) Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb Provides an example of how to delete a specific knowledge graph index from the GraphRAG service. This snippet is commented out and intended for demonstration. ```python # # uncomment this cell to delete an index # response = delete_index(index_name) # print(response) # pprint(response.json()) ``` -------------------------------- ### Deleting Data Containers using GraphRAG API (Python) Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb Provides an example of how to remove a specific data storage container from the GraphRAG service using the `delete_files` helper function. This snippet is commented out and intended for demonstration. ```python # # uncomment this cell to delete data container # response = delete_files(storage_name) # print(response) # pprint(response.text) ``` -------------------------------- ### Listing Indexes GraphRAG API Python Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb Retrieves a list of all Azure storage containers that hold GraphRAG search indexes managed by the API. It performs a GET request to the '/index' endpoint and attempts to parse the JSON response. Requires the 'requests' and 'json' libraries. ```Python def list_indexes() -> list: """Get a list of all azure storage containers that hold search indexes.""" url = endpoint + "/index" response = requests.get(url, headers=headers) try: indexes = json.loads(response.text) return indexes["index_name"] except json.JSONDecodeError: print(response.text) return response ``` -------------------------------- ### Pushing GraphRAG Images and Charts to ACR (Shell) Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/infra/managed-app/README.md Commands to log into Azure Container Registry (ACR), build and push the GraphRAG backend Docker image, and package and push the Helm chart to the ACR using OCI. Requires Azure CLI, Docker, and Helm installed and configured. Replace `` and `` with your ACR name and Helm chart version. ```shell # push docker image az acr login --name .azurecr.io cd az acr build --registry acurecr.io -f docker/Dockerfile-backend --image graphrag:latest . # push helm chart cd /infra/helm helm package graphrag helm push graphrag-.tgz oci://.azurecr.io/helm ``` -------------------------------- ### Defining GraphRAG API Helper Functions (Python) Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb Defines Python helper functions (`get_relationship`, `get_claim`, `get_text_unit`, `parse_query_response`, `generate_prompts`) for interacting with the GraphRAG API endpoints. These functions wrap HTTP GET requests to retrieve various data types (relationships, claims, text units), parse query responses, or generate prompts. ```python """Retrieve a relationship generated by GraphRAG for a specific index.""" url = endpoint + f"/source/relationship/{index_name}/{relationship_id}" return requests.get(url, headers=headers) def get_claim(index_name: str, claim_id: str) -> requests.Response: """Retrieve a claim/covariate generated by GraphRAG for a specific index.""" url = endpoint + f"/source/claim/{index_name}/{claim_id}" return requests.get(url, headers=headers) def get_text_unit(index_name: str, text_unit_id: str) -> requests.Response: """Retrieve a text unit generated by GraphRAG for a specific index.""" url = endpoint + f"/source/text/{index_name}/{text_unit_id}" return requests.get(url, headers=headers) def parse_query_response( response: requests.Response, return_context_data: bool = False ) -> requests.Response | dict[list[dict]]: """ Prints response['result'] value and optionally returns associated context data. """ if response.ok: print(json.loads(response.text)["result"]) if return_context_data: return json.loads(response.text)["context_data"] return response else: print(response.reason) print(response.content) return response def generate_prompts(container_name: str, limit: int = 1) -> None: """Generate graphrag prompts using data provided in a specific storage container.""" url = endpoint + "/index/config/prompts" params = {"container_name": container_name, "limit": limit} return requests.get(url, params=params, headers=headers) ``` -------------------------------- ### Retrieving Report GraphRAG API Python Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb Retrieves a specific report generated by GraphRAG for a given index and report ID via the API. It sends a GET request to the '/source/report/{index_name}/{report_id}' endpoint. Requires the 'requests' library. ```Python def get_report(index_name: str, report_id: str) -> requests.Response: """Retrieve a report generated by GraphRAG for a specific index.""" url = endpoint + f"/source/report/{index_name}/{report_id}" return requests.get(url, headers=headers) ``` -------------------------------- ### Retrieving Entity GraphRAG API Python Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb Retrieves a specific entity generated by GraphRAG for a given index and entity ID via the API. It sends a GET request to the '/source/entity/{index_name}/{entity_id}' endpoint. Requires the 'requests' library. ```Python def get_entity(index_name: str, entity_id: str) -> requests.Response: """Retrieve an entity generated by GraphRAG for a specific index.""" url = endpoint + f"/source/entity/{index_name}/{entity_id}" return requests.get(url, headers=headers) ``` -------------------------------- ### Generating Minimal Test Data with get-wiki-articles.py (Shell) Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/README.md This shell command executes the `get-wiki-articles.py` script with arguments to download a smaller dataset suitable for a faster demonstration. It downloads only one article (`--num-articles 1`) and uses short summaries (`--short-summary`). The data is saved to the `testdata` directory. ```shell > python get-wiki-articles.py --short-summary --num-articles 1 testdata ``` -------------------------------- ### Creating Managed App Deployment Package (Bash) Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/infra/managed-app/README.md Fetches the OpenAPI specification from the locally running GraphRAG backend. Compiles the main Bicep file into an ARM template (`mainTemplate.json`). Zips up the scripts directory, `createUiDefinition.json`, `mainTemplate.json`, and `viewDefinition.json` into the final deployment package zip file. Requires `curl`, Azure CLI Bicep, and a zip utility like `tar` or `zip`. ```bash cd /infra # get the openapi specification file curl --fail-with-body -o core/apim/openapi.json http://localhost:8080/manpage/openapi.json # compile bicep -> ARM az bicep build --file main.bicep --outfile managed-app/mainTemplate.json # zip up all files cd managed-app tar -a -cf managed-app-deployment-pkg.zip scripts createUiDefinition.json mainTemplate.json viewDefinition.json ``` -------------------------------- ### Import GraphRAG API Demo Python Libraries Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb Imports essential Python libraries for the GraphRAG API demonstration. Includes standard libraries like `getpass` and `json`, as well as third-party libraries like `pandas`, `requests`, and `tqdm`. ```python import getpass import json import sys import time from pathlib import Path import pandas as pd import requests from devtools import pprint from tqdm import tqdm ``` -------------------------------- ### Generating Default Test Data with get-wiki-articles.py (Shell) Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/README.md This shell command executes the `get-wiki-articles.py` script to download a default set of Wikipedia articles for use as a dataset with GraphRAG. The articles will be saved into the specified directory, `testdata`. ```shell > python get-wiki-articles.py testdata ``` -------------------------------- ### Listing Data Containers using GraphRAG API (Python) Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb Shows how to list all existing data storage containers managed by the GraphRAG service using the `list_files` helper function. It prints the raw response and the parsed JSON content if the request is successful. ```python response = list_files() print(response) if response.ok: pprint(response.json()) else: pprint(response.text) ``` -------------------------------- ### Building Index GraphRAG API Python Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb Submits a job to the GraphRAG API to build a knowledge graph index from data stored in a specified container. Allows optional custom prompts for entity extraction, entity summarization, and community summarization. Requires the 'requests' library. ```Python def build_index( storage_name: str, index_name: str, entity_extraction_prompt: str = None, entity_summarization_prompt: str = None, community_summarization_prompt: str = None, ) -> requests.Response: """Build a graphrag index. This function submits a job that builds a graphrag index (i.e. a knowledge graph) from data files located in a blob storage container. """ url = endpoint + "/index" prompts = dict() if entity_extraction_prompt: prompts["entity_extraction_prompt"] = entity_extraction_prompt if entity_summarization_prompt: prompts["summarize_descriptions_prompt"] = entity_summarization_prompt if community_summarization_prompt: prompts["community_report_prompt"] = community_summarization_prompt return requests.post( url, files=prompts if len(prompts) > 0 else None, params={ "index_container_name": index_name, "storage_container_name": storage_name, }, headers=headers, ) ``` -------------------------------- ### Listing Indexes using GraphRAG API (Python) Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb Retrieves a list of all knowledge graph indexes that currently exist in the GraphRAG service. It prints the response using `pprint` for better readability. ```python all_indexes = list_indexes() pprint(all_indexes) ``` -------------------------------- ### Running Global Search GraphRAG API Python Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb Executes a non-streaming global query over one or more GraphRAG knowledge graphs associated with specified indexes. It sends a POST request with the index name(s), query string, and community level. Requires the 'requests' library. ```Python def global_search( index_name: str | list[str], query: str, community_level: int ) -> requests.Response: """Run a global query over the knowledge graph(s) associated with one or more indexes""" url = endpoint + "/query/global" # optional parameter: community level to query the graph at (default for global query = 1) request = { "index_name": index_name, "query": query, "community_level": community_level, } return requests.post(url, json=request, headers=headers) ``` -------------------------------- ### Running Local Search GraphRAG API Python Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb Executes a non-streaming local query over one or more GraphRAG knowledge graphs associated with specified indexes. It sends a POST request with the index name(s), query string, and community level. Requires the 'requests' library. ```Python def local_search( index_name: str | list[str], query: str, community_level: int ) -> requests.Response: """Run a local query over the knowledge graph(s) associated with one or more indexes""" url = endpoint + "/query/local" # optional parameter: community level to query the graph at (default for local query = 2) request = { "index_name": index_name, "query": query, "community_level": community_level, } return requests.post(url, json=request, headers=headers) ``` -------------------------------- ### Check Helm Release Status (Shell) Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/infra/helm/graphrag/templates/NOTES.txt This command displays the status of the specified Helm release, including its state, last deployment time, chart information, and any user-supplied notes. ```shell $ helm status {{ .Release.Name }} ``` -------------------------------- ### Uploading Files using GraphRAG API (Python) Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb Demonstrates how to upload a collection of files to a new storage blob container using the `upload_files` helper function. It includes basic error handling to print the response status. ```python response = upload_files( file_directory=file_directory, container_name=storage_name, batch_size=100, overwrite=True, ) if not response.ok: print(response.text) else: print(response) ``` -------------------------------- ### Declare GraphRAG API Configuration Variables Python Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb Declares placeholder variables (`file_directory`, `storage_name`, `index_name`, `endpoint`) that require user configuration. These variables define local data paths, Azure storage identifiers, index names, and the base API URL. ```python """ These parameters must be defined by the notebook user: - file_directory: a local directory of text files. The file structure should be flat, with no nested directories. (i.e. file_directory/file1.txt, file_directory/file2.txt, etc.) - storage_name: a unique name to identify a blob storage container in Azure where files from `file_directory` will be uploaded. - index_name: a unique name to identify a single graphrag knowledge graph index. Note: Multiple indexes may be created from the same `storage_name` blob storage container. - endpoint: the base/endpoint URL for the GraphRAG API (this is the Gateway URL found in the APIM resource). """ file_directory = "" storage_name = "" index_name = "" endpoint = "" ``` -------------------------------- ### Configure Azure API Management Subscription Key Python Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb Prompts the user to securely enter their Azure API Management subscription key using `getpass`. This key is then assigned to the `Ocp-Apim-Subscription-Key` header for authenticating API requests. ```python ocp_apim_subscription_key = getpass.getpass( "Enter the subscription key to the GraphRag APIM:" ) """ "Ocp-Apim-Subscription-Key": This is a custom HTTP header used by Azure API Management service (APIM) to authenticate API requests. The value for this key should be set to the subscription key provided by the Azure APIM instance in your GraphRAG resource group. """ headers = {"Ocp-Apim-Subscription-Key": ocp_apim_subscription_key} ``` -------------------------------- ### Generating Auto-Templates (Prompts) using GraphRAG API (Python) Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb Calls the `generate_prompts` function to create custom entity and relationship extraction prompts based on data samples in a specified storage container. It checks the response status and prints errors if generation fails. ```python auto_template_response = generate_prompts(container_name=storage_name, limit=1) if auto_template_response.ok: prompts = auto_template_response.json() else: print(auto_template_response.text) ``` -------------------------------- ### Rebasing and Force Pushing Git Branch for PR Updates Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/CONTRIBUTING.md These shell commands are used within the process of submitting a pull request, specifically when updating your branch based on feedback or changes in the target branch. `git rebase master -i` is used to integrate changes from the 'master' branch (or potentially 'main') interactively, allowing for commit squashing or reordering, and `git push -f` forces the update to your remote fork's branch, overwriting its history with the rebased commits. ```shell git rebase master -i git push -f ``` -------------------------------- ### Validating Graphrag Helm Chart Locally - Shell Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/infra/helm/README.md Demonstrates how to use the `helm template` command to validate a local Helm chart without deploying it. This command renders the Kubernetes manifests based on the chart and provided values. It includes setting the image repository and tag for the 'master' component, useful for testing specific image versions. ```shell helm template test ./graphrag \ --namespace graphrag \ --set "master.image.repository=registry.azurecr.io/graphrag" \ --set "master.image.tag=latest" ``` -------------------------------- ### Running Global Search Streaming GraphRAG API Python Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb Executes a streaming global query over one or more GraphRAG knowledge graphs via the API. This functionality is currently marked as not implemented but includes logic to process streamed JSON chunks containing tokens and context. Requires the 'requests', 'json', 'sys', and 'pandas' libraries. ```Python def global_search_streaming( index_name: str | list[str], query: str, community_level: int ) -> requests.Response: raise NotImplementedError("this functionality has been temporarily removed") """Run a global query across one or more indexes and stream back the response""" url = endpoint + "/query/streaming/global" # optional parameter: community level to query the graph at (default for global query = 1) request = { "index_name": index_name, "query": query, "community_level": community_level, } context_list = [] with requests.post(url, json=request, headers=headers, stream=True) as r: r.raise_for_status() for chunk in r.iter_lines(chunk_size=256 * 1024, decode_unicode=True): try: payload = json.loads(chunk) token = payload["token"] context = payload["context"] if token != "": print(token, end="") elif (token == "") and not context: print("\n") # transition from output message to context else: context_list.append(context) except json.JSONDecodeError: print(type(chunk), len(chunk), sys.getsizeof(chunk), chunk, end="\n") display(pd.DataFrame.from_dict(context_list[0]["reports"]).head(10)) ``` -------------------------------- ### Running Local Search Streaming GraphRAG API Python Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb Executes a streaming local query over one or more GraphRAG knowledge graphs via the API. This functionality is currently marked as not implemented but includes logic to process streamed JSON chunks and display results using pandas. Requires the 'requests', 'json', 'sys', and 'pandas' libraries. ```Python def local_search_streaming( index_name: str | list[str], query: str, community_level: int ) -> requests.Response: raise NotImplementedError("this functionality has been temporarily removed") """Run a global query across one or more indexes and stream back the response""" url = endpoint + "/query/streaming/local" # optional parameter: community level to query the graph at (default for local query = 2) request = { "index_name": index_name, "query": query, "community_level": community_level, } context_list = [] with requests.post(url, json=request, headers=headers, stream=True) as r: r.raise_for_status() for chunk in r.iter_lines(chunk_size=256 * 1024, decode_unicode=True): try: payload = json.loads(chunk) token = payload["token"] context = payload["context"] if token != "": print(token, end="") elif (token == "") and not context: print("\n") # transition from output message to context else: context_list.append(context) except json.JSONDecodeError: print(type(chunk), len(chunk), sys.getsizeof(chunk), chunk, end="\n") for key in context_list[0].keys(): display(pd.DataFrame.from_dict(context_list[0][key]).head(10)) ``` -------------------------------- ### Performing Global Search Query using GraphRAG API (Python) Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb Executes a global search query against a specified index (or list of indexes). This type of query is suitable for questions requiring an understanding of the dataset as a whole. It uses `parse_query_response` to handle the result. ```python # pass in a single index name as a string or to query across multiple indexes, set index_name=[myindex1, myindex2] global_response = global_search( index_name=index_name, query="Summarize the qualifications to being a delivery data scientist", community_level=2, ) # print the result and save context data in a variable global_response_data = parse_query_response(global_response, return_context_data=True) global_response_data ``` -------------------------------- ### Uploading Files GraphRAG API Python Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb Uploads files from a local directory to an Azure storage container via the GraphRAG API. It processes files in batches, handles retries for busy API states, and supports overwriting existing files. Requires the 'requests', 'Path', and 'tqdm' libraries. ```Python def upload_files( file_directory: str, container_name: str, batch_size: int = 100, overwrite: bool = True, max_retries: int = 5, ) -> requests.Response | list[Path]: """ Upload files to a blob storage container. Args: file_directory - a local directory of .txt files to upload. All files must be in utf-8 encoding. container_name - a unique name for the Azure storage container. batch_size - the number of files to upload in a single batch. overwrite - whether or not to overwrite files if they already exist in the storage container. max_retries - the maximum number of times to retry uploading a batch of files if the API is busy. NOTE: Uploading files may sometimes fail if the blob container was recently deleted (i.e. a few seconds before. The solution "in practice" is to sleep a few seconds and try again. """ url = endpoint + "/data" def upload_batch( files: list, container_name: str, overwrite: bool, max_retries: int ) -> requests.Response: for _ in range(max_retries): response = requests.post( url=url, files=files, params={"container_name": container_name, "overwrite": overwrite}, headers=headers, ) # API may be busy, retry if response.status_code == 500: print("API busy. Sleeping and will try again.") time.sleep(10) continue return response return response batch_files = [] filepaths = list(Path(file_directory).iterdir()) for file in tqdm(filepaths): # validate that file is a file, has acceptable file type, has a .txt extension, and has utf-8 encoding if (not file.is_file()): print(f"Skipping invalid file: {file}") continue batch_files.append( ("files", open(file=file, mode="rb")) ) # upload batch of files if len(batch_files) == batch_size: response = upload_batch(batch_files, container_name, overwrite, max_retries) # if response is not ok, return early if not response.ok: return response batch_files.clear() # upload last batch of remaining files if len(batch_files) > 0: response = upload_batch(batch_files, container_name, overwrite, max_retries) return response ``` -------------------------------- ### Retrieving Claim Source using GraphRAG API (Python) Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb Retrieves details for a specific claim/covariate referenced as a source using the `get_claim` helper function. It prints the JSON representation of the claim if successful. ```python claim_response = get_claim(index_name, 1) if claim_response.ok: pprint(claim_response.json()) else: print(claim_response) print(claim_response.text) ``` -------------------------------- ### Retrieving Report Source using GraphRAG API (Python) Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb Fetches the content of a specific report referenced as a source in query results using the `get_report` helper function. It prints the report text if successful. ```python report_response = get_report(index_name, 0) print(report_response.json()["text"]) if report_response.ok else ( report_response.reason, report_response.content, ) ``` -------------------------------- ### Performing Local Search Query using GraphRAG API (Python) Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb Executes a local search query against a specified index (or list of indexes). This type of query is best for focused questions about specific entities. It uses `parse_query_response` to handle the result. ```python # pass in a single index name as a string or to query across multiple indexes, set index_name=[myindex1, myindex2] local_response = local_search( index_name=index_name, query="Who are the primary actors in these communities?", community_level=2, ) # print the result and save context data in a variable local_response_data = parse_query_response(local_response, return_context_data=True) local_response_data ``` -------------------------------- ### Retrieving Relationship Source using GraphRAG API (Python) Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb Fetches details for a specific relationship referenced as a source using the `get_relationship` helper function. It prints the JSON representation of the relationship if successful. ```python relationship_response = get_relationship(index_name, 1) relationship_response.json() if relationship_response.ok else ( relationship_response.reason, relationship_response.content, ) ``` -------------------------------- ### Saving GraphML File GraphRAG API Python Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb Retrieves a GraphML representation of a GraphRAG knowledge graph from a specific index via the API and saves it to a local file. The file is downloaded in chunks. Raises a UserWarning if the output file name does not end with '.graphml'. Requires the 'requests' and 'Path' libraries. ```Python def save_graphml_file(index_name: str, graphml_file_name: str) -> None: """Retrieve and save a graphml file that represents the knowledge graph. The file is downloaded in chunks and saved to the local file system. """ url = endpoint + f"/graph/graphml/{index_name}" if Path(graphml_file_name).suffix != ".graphml": raise UserWarning(f"{graphml_file_name} must have a .graphml file extension") with requests.get(url, headers=headers, stream=True) as r: r.raise_for_status() with open(graphml_file_name, "wb") as f: for chunk in r.iter_content(chunk_size=1024): f.write(chunk) ``` -------------------------------- ### Retrieving Entity Source using GraphRAG API (Python) Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb Retrieves details for a specific entity referenced as a source using the `get_entity` helper function. It prints the JSON representation of the entity if successful. ```python entity_response = get_entity(index_name, 0) entity_response.json() if entity_response.ok else ( entity_response.reason, entity_response.content, ) ``` -------------------------------- ### Retrieving Text Unit Source using GraphRAG API (Python) Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb Retrieves the raw text content for a specific text unit referenced as a source using the `get_text_unit` helper function. It includes a check to ensure a text unit ID is provided before making the request. ```python # get a text unit id from one of the previous Source endpoint results (look for 'text_units' in the response) text_unit_id = "" if not text_unit_id: raise ValueError( "Must provide a text_unit_id from previous source results. Look for 'text_units' in the response." ) text_unit_response = get_text_unit(index_name, text_unit_id) if text_unit_response.ok: print(text_unit_response.json()["text"]) else: print(text_unit_response.reason) print(text_unit_response.content) ``` -------------------------------- ### Saving GraphRAG Knowledge Graph to GraphML File in Python Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb This snippet demonstrates how to save the GraphRAG knowledge graph to a local GraphML file using the `save_graphml_file` function. It requires the name of the index (`index_name`) and the desired output filename. The file will be saved in the current working directory. ```python # will save graphml file to the current local directory save_graphml_file(index_name, "knowledge_graph.graphml") ```