### Installing Python Prerequisites

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/1-Quickstart.ipynb

Installs the necessary Python packages that are not part of the standard library, specifically devtools, requests, and tqdm, which are required to run the subsequent code and interact with the GraphRAG API.

```Python
! pip install devtools requests tqdm
```

--------------------------------

### Installing Pre-commit Hooks - Shell

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/docs/DEVELOPMENT-GUIDE.md

This command installs the pre-commit hooks configured for the repository. These hooks automatically run code style and linting checks using tools like Ruff before each commit.

```shell
pre-commit install
```

--------------------------------

### Installing Test Dependencies - Poetry Shell

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/docs/DEVELOPMENT-GUIDE.md

This command navigates to the backend directory of the repository and installs project dependencies, including those required specifically for running tests, using the Poetry package manager.

```shell
cd <graphrag-accelerator-repo>/backend
poetry install --with test
```

--------------------------------

### Executing Solution Accelerator Deployment Script (Shell)

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/docs/DEPLOYMENT-GUIDE.md

This snippet demonstrates how to navigate to the 'infra' directory and execute the 'deploy.sh' script. It shows commands for viewing the help menu for additional options and performing the actual deployment using a specified parameters file.

```shell
cd infra
bash deploy.sh -h # view help menu for additional options
bash deploy.sh -p deploy.parameters.json
```

--------------------------------

### Registering and Verifying Azure Resource Providers

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/docs/DEPLOYMENT-GUIDE.md

Registers necessary Azure resource providers for Operations Management, Alerts Management, and Compute using `az provider register`, and then verifies their successful registration status using `az provider show` formatted as a table.

```shell
# register providers
az provider register --namespace Microsoft.OperationsManagement
az provider register --namespace Microsoft.AlertsManagement
az provider register --namespace Microsoft.Compute
# verify providers were registered
az provider show --namespace Microsoft.OperationsManagement -o table
az provider show --namespace Microsoft.AlertsManagement -o table
az provider show --namespace Microsoft.Compute -o table
```

--------------------------------

### Logging In and Setting Azure Subscription with Azure CLI

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/docs/DEPLOYMENT-GUIDE.md

Authenticates the user to Azure via `az login`, displays the current account details using `az account show`, and then sets the active subscription for subsequent commands using `az account set`, specifying either the subscription name or ID.

```shell
# login to Azure - may need to use the "--use-device-code" flag if using a remote host/virtual machine
az login
# check what subscription you are logged into
az account show
# set appropriate subscription
az account set --subscription "<subscription_name> or <subscription id>"
```

--------------------------------

### Importing Required Python Libraries

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/1-Quickstart.ipynb

Imports the Python libraries needed for the script: getpass for securely getting input, json for handling JSON data, time for time operations, pathlib for path manipulation, requests for making HTTP requests, devtools.pprint for pretty printing, and tqdm for progress bars.

```Python
import getpass
import json
import time
from pathlib import Path

import requests
from devtools import pprint
from tqdm import tqdm
```

--------------------------------

### Creating an Azure Resource Group with Azure CLI

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/docs/DEPLOYMENT-GUIDE.md

Creates a new Azure resource group using the `az group create` command, specifying the desired name for the resource group and the Azure geographic location where it should be provisioned.

```shell
az group create --name <my_resource_group> --location <my_location>
```

--------------------------------

### Starting Azurite Emulator - Docker Shell

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/docs/DEVELOPMENT-GUIDE.md

This command runs the Azurite (Azure Storage emulator) as a detached Docker container, exposing the default ports for Blob, Queue, and Table storage for local development and testing.

```shell
docker run -d -p 10000:10000 -p 10001:10001 -p 10002:10002 mcr.microsoft.com/azure-storage/azurite:latest
```

--------------------------------

### Install GraphRAG API Prerequisites Python Packages

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb

Installs the required Python packages for the GraphRAG API demo notebook. It uses `pip` to install `devtools`, `pandas`, `requests`, and `tqdm`.

```shell
! pip install devtools pandas requests tqdm
```

--------------------------------

### Configuring GraphRAG API Subscription Key

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/1-Quickstart.ipynb

Prompts the user to enter their Azure API Management subscription key for authenticating requests to the GraphRAG API. It then creates a dictionary containing the `Ocp-Apim-Subscription-Key` header with the provided key, which will be used in subsequent API calls.

```Python
ocp_apim_subscription_key = getpass.getpass(
    "Enter the subscription key to the GraphRag APIM:"
)

"""
"Ocp-Apim-Subscription-Key": 
    This is a custom HTTP header used by Azure API Management service (APIM) to 
    authenticate API requests. The value for this key should be set to the subscription 
    key provided by the Azure APIM instance in your GraphRAG resource group.
"""
headers = {"Ocp-Apim-Subscription-Key": ocp_apim_subscription_key}
```

--------------------------------

### Starting Cosmos DB Emulator - Docker Shell

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/docs/DEVELOPMENT-GUIDE.md

This command runs the Azure Cosmos DB emulator as a detached Docker container, mapping the default HTTP (8081) and MongoDB (1234) ports for local development and testing purposes.

```shell
docker run -d -p 8081:8081 -p 1234:1234 mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator:vnext-preview
```

--------------------------------

### Starting Indexing Job using GraphRAG API (Python)

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb

Initiates the knowledge graph construction process (indexing) for data in a given storage container. It demonstrates how to pass optional custom prompts generated previously or defaults to None.

```python
# check if custom prompts were generated
if "auto_template_response" in locals() and auto_template_response.ok:
    entity_extraction_prompt = prompts["entity_extraction_prompt"]
    community_summarization_prompt = prompts["community_summarization_prompt"]
    summarize_description_prompt = prompts["entity_summarization_prompt"]
else:
    entity_extraction_prompt = community_summarization_prompt = (
        summarize_description_prompt
    ) = None

response = build_index(
    storage_name=storage_name,
    index_name=index_name,
    entity_extraction_prompt=entity_extraction_prompt,
    community_summarization_prompt=community_summarization_prompt,
    entity_summarization_prompt=summarize_description_prompt,
)
if response.ok:
    pprint(response.json())
else:
    print(f"Failed to submit job.\nStatus: {response.text}")
```

--------------------------------

### Running GraphRAG Frontend Locally with Docker

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/frontend/README.md

This snippet shows the shell commands to build the frontend Docker image and run it locally. It requires Docker to be installed and can optionally load environment variables from a file. The application will be accessible on `localhost:8080`.

```Shell
# cd to the root directory of the repo
> docker build -t graphrag:frontend -f docker/Dockerfile-frontend .
> docker run --env-file <env_file> -p 8080:8080 graphrag:frontend
```

--------------------------------

### Building GraphRAG Knowledge Graph Index

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/1-Quickstart.ipynb

Defines and calls the `build_index` function to initiate the creation of a knowledge graph index using the GraphRAG API's `/index` endpoint. It sends a POST request specifying the storage container name (`storage_name`) containing the data and the desired index name (`index_name`).

```Python
def build_index(
    storage_name: str,
    index_name: str,
) -> requests.Response:
    """Create a search index.
    This function kicks off a job that builds a knowledge graph index from files located in a blob storage container.
    """
    url = endpoint + "/index"
    return requests.post(
        url,
        params={
            "index_container_name": index_name,
            "storage_container_name": storage_name,
        },
        headers=headers,
    )


response = build_index(storage_name=storage_name, index_name=index_name)
print(response)
if response.ok:
    print(response.text)
else:
    print(f"Failed to submit job.\nStatus: {response.text}")
```

--------------------------------

### Get All Helm Release Details (Shell)

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/infra/helm/graphrag/templates/NOTES.txt

This command retrieves comprehensive details for the specified Helm release, including the configuration values used, hooks, manifests, and notes.

```shell
$ helm get all {{ .Release.Name }}
```

--------------------------------

### Performing a Local GraphRAG Query

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/1-Quickstart.ipynb

Defines and calls the `local_search` function to execute a local query against the specified knowledge graph index using the GraphRAG API's `/query/local` endpoint. It sends a POST request with the index name, the query string, and the community level, then parses and prints the result using the helper function.

```Python
def local_search(
    index_name: str | list[str], query: str, community_level: int
) -> requests.Response:
    """Run a local query over the knowledge graph(s) associated with one or more indexes"""
    url = endpoint + "/query/local"
    # optional parameter: community level to query the graph at (default for local query = 2)
    request = {
        "index_name": index_name,
        "query": query,
        "community_level": community_level,
    }
    return requests.post(url, json=request, headers=headers)


# perform a local query
local_response = local_search(
    index_name=index_name,
    query="Summarize the main topics found in this data",
    community_level=2,
)
local_response_data = parse_query_response(local_response, return_context_data=True)
local_response_data
```

--------------------------------

### Checking GraphRAG Indexing Job Status

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/1-Quickstart.ipynb

Defines and calls the `index_status` function to retrieve the current status of an ongoing or completed indexing job using the GraphRAG API's `/index/status/{index_name}` endpoint. It sends a GET request and pretty prints the JSON response containing the status details.

```Python
def index_status(index_name: str) -> requests.Response:
    url = endpoint + f"/index/status/{index_name}"
    return requests.get(url, headers=headers)


response = index_status(index_name)
pprint(response.json())
```

--------------------------------

### Running Pytest Tests - Shell

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/docs/DEVELOPMENT-GUIDE.md

This command navigates to the backend directory and executes the Pytest test suite. The `-s` flag shows print statements, and `--cov=src tests` reports code coverage for the source directory.

```shell
cd <graphrag-accelerator-repo>/backend
pytest -s --cov=src tests
```

--------------------------------

### Asserting User Configuration Variables are Set

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/1-Quickstart.ipynb

Performs an assertion to ensure that the required user configuration variables (`file_directory`, `storage_name`, `index_name`, and `endpoint`) have been updated from their initial empty string values. This prevents proceeding with API calls without necessary configuration.

```Python
assert (
    file_directory != "" and storage_name != "" and index_name != "" and endpoint != ""
)
```

--------------------------------

### Performing a Global GraphRAG Query

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/1-Quickstart.ipynb

Defines and calls the `global_search` function to execute a global query against the specified knowledge graph index using the GraphRAG API's `/query/global` endpoint. It sends a POST request with the index name, the query string, and the community level, then parses and prints the result using the helper function.

```Python
def global_search(
    index_name: str | list[str], query: str, community_level: int
) -> requests.Response:
    """Run a global query over the knowledge graph(s) associated with one or more indexes"""
    url = endpoint + "/query/global"
    # optional parameter: community level to query the graph at (default for global query = 1)
    request = {
        "index_name": index_name,
        "query": query,
        "community_level": community_level,
    }
    return requests.post(url, json=request, headers=headers)


# perform a global query
global_response = global_search(
    index_name=index_name,
    query="Summarize the main topics found in this data",
    community_level=1,
)
global_response_data = parse_query_response(global_response, return_context_data=True)
global_response_data
```

--------------------------------

### Deploying GraphRAG Frontend to Azure Web App

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/frontend/README.md

This shell command executes the deployment script for hosting the frontend application on Azure Web App. It requires navigating to the frontend directory and providing a path to a populated parameters JSON file. Requires Azure CLI installed and configured.

```Shell
# cd to graphrag-accelerator/frontend directory
> bash deploy.sh -p frontend_deploy.parameters.json
```

--------------------------------

### Listing Files GraphRAG API Python

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb

Retrieves a list of all Azure storage containers that hold raw data managed by the GraphRAG API. It performs a GET request to the '/data' endpoint. Requires the 'requests' library.

```Python
def list_files() -> requests.Response:
    """Get a list of all azure storage containers that hold raw data."""
    url = endpoint + "/data"
    return requests.get(url=url, headers=headers)
```

--------------------------------

### Defining Required User Configuration Variables

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/1-Quickstart.ipynb

Declares placeholder variables that the user must configure with specific values: `file_directory` (local path to data files), `storage_name` (Azure Blob Storage container name), `index_name` (unique name for the knowledge graph index), and `endpoint` (GraphRAG API Gateway URL). These are crucial for subsequent steps.

```Python
"""
These parameters must be defined by the notebook user:

- file_directory: a local directory of text files. The file structure should be flat,
                  with no nested directories. (i.e. file_directory/file1.txt, file_directory/file2.txt, etc.)
- storage_name:   a unique name to identify a blob storage container in Azure where files
                  from `file_directory` will be uploaded.
- index_name:     a unique name to identify a single graphrag knowledge graph index.
                  Note: Multiple indexes may be created from the same `storage_name` blob storage container.
- endpoint:       the base/endpoint URL for the GraphRAG API (this is the Gateway URL found in the APIM resource).
"""

file_directory = ""
storage_name = ""
index_name = ""
endpoint = ""
```

--------------------------------

### Parsing GraphRAG Query Response

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/1-Quickstart.ipynb

Defines a helper function `parse_query_response` that takes a `requests.Response` object from a query API call. It checks if the response was successful, prints the `result` field from the JSON payload, and optionally returns the `context_data` field if `return_context_data` is True.

```Python
# a helper function to parse out the result from a query response
def parse_query_response(
    response: requests.Response, return_context_data: bool = False
) -> requests.Response | dict[list[dict]]:
    """
    Print response['result'] value and return context data.
    """
    if response.ok:
        print(json.loads(response.text)["result"])
        if return_context_data:
            return json.loads(response.text)["context_data"]
        return response
    else:
        print(response.reason)
        print(response.content)
        return response
```

--------------------------------

### Building and Running GraphRAG Backend Docker Image (Shell)

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/infra/managed-app/README.md

Builds the GraphRAG backend Docker image using the specified Dockerfile and tags it `graphrag:latest`. Then, runs the built image in detached mode, mapping host port 8080 to container port 80 for accessing the API. Requires Docker to be installed.

```shell
cd <repo_root_directory>
docker build -t graphrag:latest -f docker/Dockerfile-backend .
docker run -d -p 8080:80 graphrag:latest
```

--------------------------------

### Checking Index Status GraphRAG API Python

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb

Gets the current status of a specific GraphRAG index build job via the API. It sends a GET request to the '/index/status/{container_name}' endpoint. Requires the 'requests' library.

```Python
def index_status(container_name: str) -> requests.Response:
    """Get the status of a specific index."""
    url = endpoint + f"/index/status/{container_name}"
    return requests.get(url, headers=headers)
```

--------------------------------

### Uploading Files to GraphRAG Storage

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/1-Quickstart.ipynb

Defines and calls the `upload_files` function to upload text files from a local directory to the Azure Blob Storage container specified by `storage_name` using the GraphRAG API's `/data` endpoint. It supports batching, retries on API busy errors (status 500), and reports the response status.

```Python
def upload_files(
    file_directory: str,
    container_name: str,
    batch_size: int = 100,
    overwrite: bool = True,
    max_retries: int = 5,
) -> requests.Response | list[Path]:
    """
    Upload files to a blob storage container.

    Args:
    file_directory - a local directory of .txt files to upload. All files must be in utf-8 encoding.
    container_name - a unique name for the Azure storage container.
    batch_size - the number of files to upload in a single batch.
    overwrite - whether or not to overwrite files if they already exist in the storage container.
    max_retries - the maximum number of times to retry uploading a batch of files if the API is busy.

    NOTE: Uploading files may sometimes fail if the blob container was recently deleted
    (i.e. a few seconds before. The solution "in practice" is to sleep a few seconds and try again.
    """
    url = endpoint + "/data"

    def upload_batch(
        files: list, container_name: str, overwrite: bool, max_retries: int
    ) -> requests.Response:
        for _ in range(max_retries):
            response = requests.post(
                url=url,
                files=files,
                params={"container_name": container_name, "overwrite": overwrite},
                headers=headers,
            )
            # API may be busy, retry
            if response.status_code == 500:
                print("API busy. Sleeping and will try again.")
                time.sleep(10)
                continue
            return response
        return response

    batch_files = []
    filepaths = list(Path(file_directory).iterdir())
    for file in tqdm(filepaths):
        # validate that file is a file, has acceptable file type, has a .txt extension, and has utf-8 encoding
        if (not file.is_file()):
            print(f"Skipping invalid file: {file}")
            continue
        batch_files.append(
            ("files", open(file=file, mode="rb"))
        )
        # upload batch of files
        if len(batch_files) == batch_size:
            response = upload_batch(batch_files, container_name, overwrite, max_retries)
            # if response is not ok, return early
            if not response.ok:
                return response
            batch_files.clear()
    # upload last batch of remaining files
    if len(batch_files) > 0:
        response = upload_batch(batch_files, container_name, overwrite, max_retries)
    return response


response = upload_files(
    file_directory=file_directory,
    container_name=storage_name,
    batch_size=100,
    overwrite=True,
)
if not response.ok:
    print(response.text)
else:
    print(response)
```

--------------------------------

### Formatting and Linting Bicep Files (Bash)

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/infra/managed-app/README.md

Uses `find` to locate all `.bicep` files in the specified directory and its subdirectories, then applies `az bicep format` and `az bicep lint` to each file. Helps catch syntax errors and maintain code style early in the process. Requires Azure CLI with Bicep module installed.

```bash
cd <repo_root_directory>/infra
find . -type f -name "*.bicep" -exec az bicep format --file {} \;
find . -type f -name "*.bicep" -exec az bicep lint --file {} \;
```

--------------------------------

### Deleting an Index using GraphRAG API (Python)

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb

Provides an example of how to delete a specific knowledge graph index from the GraphRAG service. This snippet is commented out and intended for demonstration.

```python
# # uncomment this cell to delete an index
# response = delete_index(index_name)
# print(response)
# pprint(response.json())
```

--------------------------------

### Deleting Data Containers using GraphRAG API (Python)

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb

Provides an example of how to remove a specific data storage container from the GraphRAG service using the `delete_files` helper function. This snippet is commented out and intended for demonstration.

```python
# # uncomment this cell to delete data container
# response = delete_files(storage_name)
# print(response)
# pprint(response.text)
```

--------------------------------

### Listing Indexes GraphRAG API Python

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb

Retrieves a list of all Azure storage containers that hold GraphRAG search indexes managed by the API. It performs a GET request to the '/index' endpoint and attempts to parse the JSON response. Requires the 'requests' and 'json' libraries.

```Python
def list_indexes() -> list:
    """Get a list of all azure storage containers that hold search indexes."""
    url = endpoint + "/index"
    response = requests.get(url, headers=headers)
    try:
        indexes = json.loads(response.text)
        return indexes["index_name"]
    except json.JSONDecodeError:
        print(response.text)
        return response
```

--------------------------------

### Pushing GraphRAG Images and Charts to ACR (Shell)

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/infra/managed-app/README.md

Commands to log into Azure Container Registry (ACR), build and push the GraphRAG backend Docker image, and package and push the Helm chart to the ACR using OCI. Requires Azure CLI, Docker, and Helm installed and configured. Replace `<registry>` and `<version>` with your ACR name and Helm chart version.

```shell
# push docker image
az acr login --name <registry>.azurecr.io
cd <repo_root>
az acr build --registry <registry>acurecr.io -f docker/Dockerfile-backend --image graphrag:latest .
# push helm chart
cd <repo_root>/infra/helm
helm package graphrag
helm push graphrag-<version>.tgz oci://<registry>.azurecr.io/helm
```

--------------------------------

### Defining GraphRAG API Helper Functions (Python)

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb

Defines Python helper functions (`get_relationship`, `get_claim`, `get_text_unit`, `parse_query_response`, `generate_prompts`) for interacting with the GraphRAG API endpoints. These functions wrap HTTP GET requests to retrieve various data types (relationships, claims, text units), parse query responses, or generate prompts.

```python
"""Retrieve a relationship generated by GraphRAG for a specific index."""
url = endpoint + f"/source/relationship/{index_name}/{relationship_id}"
return requests.get(url, headers=headers)


def get_claim(index_name: str, claim_id: str) -> requests.Response:
    """Retrieve a claim/covariate generated by GraphRAG for a specific index."""
    url = endpoint + f"/source/claim/{index_name}/{claim_id}"
    return requests.get(url, headers=headers)


def get_text_unit(index_name: str, text_unit_id: str) -> requests.Response:
    """Retrieve a text unit generated by GraphRAG for a specific index."""
    url = endpoint + f"/source/text/{index_name}/{text_unit_id}"
    return requests.get(url, headers=headers)


def parse_query_response(
    response: requests.Response, return_context_data: bool = False
) -> requests.Response | dict[list[dict]]:
    """
    Prints response['result'] value and optionally
    returns associated context data.
    """
    if response.ok:
        print(json.loads(response.text)["result"])
        if return_context_data:
            return json.loads(response.text)["context_data"]
        return response
    else:
        print(response.reason)
        print(response.content)
        return response


def generate_prompts(container_name: str, limit: int = 1) -> None:
    """Generate graphrag prompts using data provided in a specific storage container."""
    url = endpoint + "/index/config/prompts"
    params = {"container_name": container_name, "limit": limit}
    return requests.get(url, params=params, headers=headers)
```

--------------------------------

### Retrieving Report GraphRAG API Python

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb

Retrieves a specific report generated by GraphRAG for a given index and report ID via the API. It sends a GET request to the '/source/report/{index_name}/{report_id}' endpoint. Requires the 'requests' library.

```Python
def get_report(index_name: str, report_id: str) -> requests.Response:
    """Retrieve a report generated by GraphRAG for a specific index."""
    url = endpoint + f"/source/report/{index_name}/{report_id}"
    return requests.get(url, headers=headers)
```

--------------------------------

### Retrieving Entity GraphRAG API Python

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb

Retrieves a specific entity generated by GraphRAG for a given index and entity ID via the API. It sends a GET request to the '/source/entity/{index_name}/{entity_id}' endpoint. Requires the 'requests' library.

```Python
def get_entity(index_name: str, entity_id: str) -> requests.Response:
    """Retrieve an entity generated by GraphRAG for a specific index."""
    url = endpoint + f"/source/entity/{index_name}/{entity_id}"
    return requests.get(url, headers=headers)
```

--------------------------------

### Generating Minimal Test Data with get-wiki-articles.py (Shell)

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/README.md

This shell command executes the `get-wiki-articles.py` script with arguments to download a smaller dataset suitable for a faster demonstration. It downloads only one article (`--num-articles 1`) and uses short summaries (`--short-summary`). The data is saved to the `testdata` directory.

```shell
> python get-wiki-articles.py --short-summary --num-articles 1 testdata
```

--------------------------------

### Creating Managed App Deployment Package (Bash)

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/infra/managed-app/README.md

Fetches the OpenAPI specification from the locally running GraphRAG backend. Compiles the main Bicep file into an ARM template (`mainTemplate.json`). Zips up the scripts directory, `createUiDefinition.json`, `mainTemplate.json`, and `viewDefinition.json` into the final deployment package zip file. Requires `curl`, Azure CLI Bicep, and a zip utility like `tar` or `zip`.

```bash
cd <repo_root_directory>/infra

# get the openapi specification file
curl --fail-with-body -o core/apim/openapi.json http://localhost:8080/manpage/openapi.json

# compile bicep -> ARM
az bicep build --file main.bicep --outfile managed-app/mainTemplate.json

# zip up all files
cd managed-app
tar -a -cf managed-app-deployment-pkg.zip scripts createUiDefinition.json mainTemplate.json viewDefinition.json
```

--------------------------------

### Import GraphRAG API Demo Python Libraries

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb

Imports essential Python libraries for the GraphRAG API demonstration. Includes standard libraries like `getpass` and `json`, as well as third-party libraries like `pandas`, `requests`, and `tqdm`.

```python
import getpass
import json
import sys
import time
from pathlib import Path

import pandas as pd
import requests
from devtools import pprint
from tqdm import tqdm
```

--------------------------------

### Generating Default Test Data with get-wiki-articles.py (Shell)

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/README.md

This shell command executes the `get-wiki-articles.py` script to download a default set of Wikipedia articles for use as a dataset with GraphRAG. The articles will be saved into the specified directory, `testdata`.

```shell
> python get-wiki-articles.py testdata
```

--------------------------------

### Listing Data Containers using GraphRAG API (Python)

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb

Shows how to list all existing data storage containers managed by the GraphRAG service using the `list_files` helper function. It prints the raw response and the parsed JSON content if the request is successful.

```python
response = list_files()
print(response)
if response.ok:
    pprint(response.json())
else:
    pprint(response.text)
```

--------------------------------

### Building Index GraphRAG API Python

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb

Submits a job to the GraphRAG API to build a knowledge graph index from data stored in a specified container. Allows optional custom prompts for entity extraction, entity summarization, and community summarization. Requires the 'requests' library.

```Python
def build_index(
    storage_name: str,
    index_name: str,
    entity_extraction_prompt: str = None,
    entity_summarization_prompt: str = None,
    community_summarization_prompt: str = None,
) -> requests.Response:
    """Build a graphrag index.
    This function submits a job that builds a graphrag index (i.e. a knowledge graph) from data files located in a blob storage container.
    """
    url = endpoint + "/index"
    prompts = dict()
    if entity_extraction_prompt:
        prompts["entity_extraction_prompt"] = entity_extraction_prompt
    if entity_summarization_prompt:
        prompts["summarize_descriptions_prompt"] = entity_summarization_prompt
    if community_summarization_prompt:
        prompts["community_report_prompt"] = community_summarization_prompt
    return requests.post(
        url,
        files=prompts if len(prompts) > 0 else None,
        params={
            "index_container_name": index_name,
            "storage_container_name": storage_name,
        },
        headers=headers,
    )
```

--------------------------------

### Listing Indexes using GraphRAG API (Python)

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb

Retrieves a list of all knowledge graph indexes that currently exist in the GraphRAG service. It prints the response using `pprint` for better readability.

```python
all_indexes = list_indexes()
pprint(all_indexes)
```

--------------------------------

### Running Global Search GraphRAG API Python

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb

Executes a non-streaming global query over one or more GraphRAG knowledge graphs associated with specified indexes. It sends a POST request with the index name(s), query string, and community level. Requires the 'requests' library.

```Python
def global_search(
    index_name: str | list[str], query: str, community_level: int
) -> requests.Response:
    """Run a global query over the knowledge graph(s) associated with one or more indexes"""
    url = endpoint + "/query/global"
    # optional parameter: community level to query the graph at (default for global query = 1)
    request = {
        "index_name": index_name,
        "query": query,
        "community_level": community_level,
    }
    return requests.post(url, json=request, headers=headers)
```

--------------------------------

### Running Local Search GraphRAG API Python

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb

Executes a non-streaming local query over one or more GraphRAG knowledge graphs associated with specified indexes. It sends a POST request with the index name(s), query string, and community level. Requires the 'requests' library.

```Python
def local_search(
    index_name: str | list[str], query: str, community_level: int
) -> requests.Response:
    """Run a local query over the knowledge graph(s) associated with one or more indexes"""
    url = endpoint + "/query/local"
    # optional parameter: community level to query the graph at (default for local query = 2)
    request = {
        "index_name": index_name,
        "query": query,
        "community_level": community_level,
    }
    return requests.post(url, json=request, headers=headers)
```

--------------------------------

### Check Helm Release Status (Shell)

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/infra/helm/graphrag/templates/NOTES.txt

This command displays the status of the specified Helm release, including its state, last deployment time, chart information, and any user-supplied notes.

```shell
$ helm status {{ .Release.Name }}
```

--------------------------------

### Uploading Files using GraphRAG API (Python)

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb

Demonstrates how to upload a collection of files to a new storage blob container using the `upload_files` helper function. It includes basic error handling to print the response status.

```python
response = upload_files(
    file_directory=file_directory,
    container_name=storage_name,
    batch_size=100,
    overwrite=True,
)
if not response.ok:
    print(response.text)
else:
    print(response)
```

--------------------------------

### Declare GraphRAG API Configuration Variables Python

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb

Declares placeholder variables (`file_directory`, `storage_name`, `index_name`, `endpoint`) that require user configuration. These variables define local data paths, Azure storage identifiers, index names, and the base API URL.

```python
"""
These parameters must be defined by the notebook user:

- file_directory: a local directory of text files. The file structure should be flat,
                  with no nested directories. (i.e. file_directory/file1.txt, file_directory/file2.txt, etc.)
- storage_name:   a unique name to identify a blob storage container in Azure where files
                  from `file_directory` will be uploaded.
- index_name:     a unique name to identify a single graphrag knowledge graph index.
                  Note: Multiple indexes may be created from the same `storage_name` blob storage container.
- endpoint:       the base/endpoint URL for the GraphRAG API (this is the Gateway URL found in the APIM resource).
"""

file_directory = ""
storage_name = ""
index_name = ""
endpoint = ""
```

--------------------------------

### Configure Azure API Management Subscription Key Python

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb

Prompts the user to securely enter their Azure API Management subscription key using `getpass`. This key is then assigned to the `Ocp-Apim-Subscription-Key` header for authenticating API requests.

```python
ocp_apim_subscription_key = getpass.getpass(
    "Enter the subscription key to the GraphRag APIM:"
)

"""
"Ocp-Apim-Subscription-Key": 
    This is a custom HTTP header used by Azure API Management service (APIM) to 
    authenticate API requests. The value for this key should be set to the subscription 
    key provided by the Azure APIM instance in your GraphRAG resource group.
"""
headers = {"Ocp-Apim-Subscription-Key": ocp_apim_subscription_key}
```

--------------------------------

### Generating Auto-Templates (Prompts) using GraphRAG API (Python)

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb

Calls the `generate_prompts` function to create custom entity and relationship extraction prompts based on data samples in a specified storage container. It checks the response status and prints errors if generation fails.

```python
auto_template_response = generate_prompts(container_name=storage_name, limit=1)
if auto_template_response.ok:
    prompts = auto_template_response.json()
else:
    print(auto_template_response.text)
```

--------------------------------

### Rebasing and Force Pushing Git Branch for PR Updates

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/CONTRIBUTING.md

These shell commands are used within the process of submitting a pull request, specifically when updating your branch based on feedback or changes in the target branch. `git rebase master -i` is used to integrate changes from the 'master' branch (or potentially 'main') interactively, allowing for commit squashing or reordering, and `git push -f` forces the update to your remote fork's branch, overwriting its history with the rebased commits.

```shell
git rebase master -i
git push -f
```

--------------------------------

### Validating Graphrag Helm Chart Locally - Shell

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/infra/helm/README.md

Demonstrates how to use the `helm template` command to validate a local Helm chart without deploying it. This command renders the Kubernetes manifests based on the chart and provided values. It includes setting the image repository and tag for the 'master' component, useful for testing specific image versions.

```shell
helm template test ./graphrag \
    --namespace graphrag \
    --set "master.image.repository=registry.azurecr.io/graphrag" \
    --set "master.image.tag=latest"
```

--------------------------------

### Running Global Search Streaming GraphRAG API Python

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb

Executes a streaming global query over one or more GraphRAG knowledge graphs via the API. This functionality is currently marked as not implemented but includes logic to process streamed JSON chunks containing tokens and context. Requires the 'requests', 'json', 'sys', and 'pandas' libraries.

```Python
def global_search_streaming(
    index_name: str | list[str], query: str, community_level: int
) -> requests.Response:
    raise NotImplementedError("this functionality has been temporarily removed")
    """Run a global query across one or more indexes and stream back the response"""
    url = endpoint + "/query/streaming/global"
    # optional parameter: community level to query the graph at (default for global query = 1)
    request = {
        "index_name": index_name,
        "query": query,
        "community_level": community_level,
    }
    context_list = []
    with requests.post(url, json=request, headers=headers, stream=True) as r:
        r.raise_for_status()
        for chunk in r.iter_lines(chunk_size=256 * 1024, decode_unicode=True):
            try:
                payload = json.loads(chunk)
                token = payload["token"]
                context = payload["context"]
                if token != "<EOM>":
                    print(token, end="")
                elif (token == "<EOM>") and not context:
                    print("\n")  # transition from output message to context
                else:
                    context_list.append(context)
            except json.JSONDecodeError:
                print(type(chunk), len(chunk), sys.getsizeof(chunk), chunk, end="\n")
    display(pd.DataFrame.from_dict(context_list[0]["reports"]).head(10))
```

--------------------------------

### Running Local Search Streaming GraphRAG API Python

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb

Executes a streaming local query over one or more GraphRAG knowledge graphs via the API. This functionality is currently marked as not implemented but includes logic to process streamed JSON chunks and display results using pandas. Requires the 'requests', 'json', 'sys', and 'pandas' libraries.

```Python
def local_search_streaming(
    index_name: str | list[str], query: str, community_level: int
) -> requests.Response:
    raise NotImplementedError("this functionality has been temporarily removed")
    """Run a global query across one or more indexes and stream back the response"""
    url = endpoint + "/query/streaming/local"
    # optional parameter: community level to query the graph at (default for local query = 2)
    request = {
        "index_name": index_name,
        "query": query,
        "community_level": community_level,
    }
    context_list = []
    with requests.post(url, json=request, headers=headers, stream=True) as r:
        r.raise_for_status()
        for chunk in r.iter_lines(chunk_size=256 * 1024, decode_unicode=True):
            try:
                payload = json.loads(chunk)
                token = payload["token"]
                context = payload["context"]
                if token != "<EOM>":
                    print(token, end="")
                elif (token == "<EOM>") and not context:
                    print("\n")  # transition from output message to context
                else:
                    context_list.append(context)
            except json.JSONDecodeError:
                print(type(chunk), len(chunk), sys.getsizeof(chunk), chunk, end="\n")
    for key in context_list[0].keys():
        display(pd.DataFrame.from_dict(context_list[0][key]).head(10))
```

--------------------------------

### Performing Global Search Query using GraphRAG API (Python)

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb

Executes a global search query against a specified index (or list of indexes). This type of query is suitable for questions requiring an understanding of the dataset as a whole. It uses `parse_query_response` to handle the result.

```python
# pass in a single index name as a string or to query across multiple indexes, set index_name=[myindex1, myindex2]
global_response = global_search(
    index_name=index_name,
    query="Summarize the qualifications to being a delivery data scientist",
    community_level=2,
)
# print the result and save context data in a variable
global_response_data = parse_query_response(global_response, return_context_data=True)
global_response_data
```

--------------------------------

### Uploading Files GraphRAG API Python

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb

Uploads files from a local directory to an Azure storage container via the GraphRAG API. It processes files in batches, handles retries for busy API states, and supports overwriting existing files. Requires the 'requests', 'Path', and 'tqdm' libraries.

```Python
def upload_files(
    file_directory: str,
    container_name: str,
    batch_size: int = 100,
    overwrite: bool = True,
    max_retries: int = 5,
) -> requests.Response | list[Path]:
    """
    Upload files to a blob storage container.

    Args:
    file_directory - a local directory of .txt files to upload. All files must be in utf-8 encoding.
    container_name - a unique name for the Azure storage container.
    batch_size - the number of files to upload in a single batch.
    overwrite - whether or not to overwrite files if they already exist in the storage container.
    max_retries - the maximum number of times to retry uploading a batch of files if the API is busy.

    NOTE: Uploading files may sometimes fail if the blob container was recently deleted
    (i.e. a few seconds before. The solution "in practice" is to sleep a few seconds and try again.
    """
    url = endpoint + "/data"

    def upload_batch(
        files: list, container_name: str, overwrite: bool, max_retries: int
    ) -> requests.Response:
        for _ in range(max_retries):
            response = requests.post(
                url=url,
                files=files,
                params={"container_name": container_name, "overwrite": overwrite},
                headers=headers,
            )
            # API may be busy, retry
            if response.status_code == 500:
                print("API busy. Sleeping and will try again.")
                time.sleep(10)
                continue
            return response
        return response

    batch_files = []
    filepaths = list(Path(file_directory).iterdir())
    for file in tqdm(filepaths):
        # validate that file is a file, has acceptable file type, has a .txt extension, and has utf-8 encoding
        if (not file.is_file()):
            print(f"Skipping invalid file: {file}")
            continue
        batch_files.append(
            ("files", open(file=file, mode="rb"))
        )
        # upload batch of files
        if len(batch_files) == batch_size:
            response = upload_batch(batch_files, container_name, overwrite, max_retries)
            # if response is not ok, return early
            if not response.ok:
                return response
            batch_files.clear()
    # upload last batch of remaining files
    if len(batch_files) > 0:
        response = upload_batch(batch_files, container_name, overwrite, max_retries)
    return response
```

--------------------------------

### Retrieving Claim Source using GraphRAG API (Python)

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb

Retrieves details for a specific claim/covariate referenced as a source using the `get_claim` helper function. It prints the JSON representation of the claim if successful.

```python
claim_response = get_claim(index_name, 1)
if claim_response.ok:
    pprint(claim_response.json())
else:
    print(claim_response)
    print(claim_response.text)
```

--------------------------------

### Retrieving Report Source using GraphRAG API (Python)

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb

Fetches the content of a specific report referenced as a source in query results using the `get_report` helper function. It prints the report text if successful.

```python
report_response = get_report(index_name, 0)
print(report_response.json()["text"]) if report_response.ok else (
    report_response.reason,
    report_response.content,
)
```

--------------------------------

### Performing Local Search Query using GraphRAG API (Python)

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb

Executes a local search query against a specified index (or list of indexes). This type of query is best for focused questions about specific entities. It uses `parse_query_response` to handle the result.

```python
# pass in a single index name as a string or to query across multiple indexes, set index_name=[myindex1, myindex2]
local_response = local_search(
    index_name=index_name,
    query="Who are the primary actors in these communities?",
    community_level=2,
)
# print the result and save context data in a variable
local_response_data = parse_query_response(local_response, return_context_data=True)
local_response_data
```

--------------------------------

### Retrieving Relationship Source using GraphRAG API (Python)

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb

Fetches details for a specific relationship referenced as a source using the `get_relationship` helper function. It prints the JSON representation of the relationship if successful.

```python
relationship_response = get_relationship(index_name, 1)
relationship_response.json() if relationship_response.ok else (
    relationship_response.reason,
    relationship_response.content,
)
```

--------------------------------

### Saving GraphML File GraphRAG API Python

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb

Retrieves a GraphML representation of a GraphRAG knowledge graph from a specific index via the API and saves it to a local file. The file is downloaded in chunks. Raises a UserWarning if the output file name does not end with '.graphml'. Requires the 'requests' and 'Path' libraries.

```Python
def save_graphml_file(index_name: str, graphml_file_name: str) -> None:
    """Retrieve and save a graphml file that represents the knowledge graph.
    The file is downloaded in chunks and saved to the local file system.
    """
    url = endpoint + f"/graph/graphml/{index_name}"
    if Path(graphml_file_name).suffix != ".graphml":
        raise UserWarning(f"{graphml_file_name} must have a .graphml file extension")
    with requests.get(url, headers=headers, stream=True) as r:
        r.raise_for_status()
        with open(graphml_file_name, "wb") as f:
            for chunk in r.iter_content(chunk_size=1024):
                f.write(chunk)
```

--------------------------------

### Retrieving Entity Source using GraphRAG API (Python)

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb

Retrieves details for a specific entity referenced as a source using the `get_entity` helper function. It prints the JSON representation of the entity if successful.

```python
entity_response = get_entity(index_name, 0)
entity_response.json() if entity_response.ok else (
    entity_response.reason,
    entity_response.content,
)
```

--------------------------------

### Retrieving Text Unit Source using GraphRAG API (Python)

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb

Retrieves the raw text content for a specific text unit referenced as a source using the `get_text_unit` helper function. It includes a check to ensure a text unit ID is provided before making the request.

```python
# get a text unit id from one of the previous Source endpoint results (look for 'text_units' in the response)
text_unit_id = ""
if not text_unit_id:
    raise ValueError(
        "Must provide a text_unit_id from previous source results. Look for 'text_units' in the response."
    )
text_unit_response = get_text_unit(index_name, text_unit_id)
if text_unit_response.ok:
    print(text_unit_response.json()["text"])
else:
    print(text_unit_response.reason)
    print(text_unit_response.content)
```

--------------------------------

### Saving GraphRAG Knowledge Graph to GraphML File in Python

Source: https://github.com/azure-samples/graphrag-accelerator/blob/main/notebooks/2-Advanced_Getting_Started.ipynb

This snippet demonstrates how to save the GraphRAG knowledge graph to a local GraphML file using the `save_graphml_file` function. It requires the name of the index (`index_name`) and the desired output filename. The file will be saved in the current working directory.

```python
# will save graphml file to the current local directory
save_graphml_file(index_name, "knowledge_graph.graphml")
```