### NCSP CLI: Manage VM States (Start, Stop, Restart) Source: https://context7.com/nvidia/ngc-examples/llms.txt Controls the running state of VM instances across different cloud providers. 'stop' suspends the VM while retaining resources, 'start' resumes a stopped VM, and 'restart' performs a reboot operation. These commands are consistent across AWS, GCP, and Alibaba Cloud. ```bash # Stop a running AWS VM ncsp aws stopVM # Start a stopped AWS VM ncsp aws startVM # Restart (reboot) a running VM ncsp aws restartVM # Same commands work across all providers ncsp gcp stopVM ncsp gcp startVM ncsp ali restartVM ``` -------------------------------- ### Trace ncsp Commands and Responses Example Output Source: https://github.com/nvidia/ngc-examples/blob/master/ncsp/README.md Example output demonstrating the trace functionality of ncsp when creating a VM on AWS with '--trace 1'. It shows the AWS CLI commands executed by ncsp. ```bash aws ec2 describe-security-groups --region us-west-2 aws ec2 describe-images --region us-west-2 --filters Name=name,Values="ubuntu/images/hvm-ssd/ubuntu-xenial-16.04-amd64-server*" aws ec2 run-instances --image-id ami-01eb3061 --instance-type t2.micro --region us-west-2 --key-name baseos-awskey-oregon --security-group-ids sg-0cca0173 aws ec2 create-tags --resource i-0e18d6bdac9e994e1 --tags Key=Name,Value=pbradstr-Thu-2018Feb01-185738 . . . ``` -------------------------------- ### GET /ncsp/{provider}/ssh Source: https://context7.com/nvidia/ngc-examples/llms.txt Establishes an SSH connection to a running virtual machine instance or executes remote commands. ```APIDOC ## GET /ncsp/{provider}/ssh ### Description Connects to the VM via SSH or runs a command string on the remote host. ### Method GET ### Endpoint /ncsp/{provider}/ssh ### Parameters #### Path Parameters - **provider** (string) - Required - The cloud provider (aws, gcp, or ali) #### Request Body - **command** (string) - Optional - Command to execute on the remote VM ### Request Example `ncsp aws ssh "nvidia-smi"` ### Response #### Success Response (200) - **output** (string) - Standard output from the remote command execution ``` -------------------------------- ### NCSP CLI: Timing Test for VM Operations Source: https://context7.com/nvidia/ngc-examples/llms.txt Runs automated performance tests to measure the time taken for create, stop, start, restart, and delete VM operations. Results are logged to persistent files for analysis. Supports customization of test loops and reporting. ```bash # Run timing test on AWS ncsp aws test # Run test with multiple inner loops ncsp aws --inner_loop_cnt 5 test # Run test with multiple outer loops (create/delete cycles) ncsp aws --outer_loop_cnt 3 test # Disable summary report ncsp aws --summary_report 0 test ``` -------------------------------- ### NVIDIA Docker - Run TensorFlow Container Source: https://context7.com/nvidia/ngc-examples/llms.txt This script demonstrates how to run GPU-accelerated TensorFlow training using NVIDIA Docker. It executes the MNIST training example within the official NVIDIA TensorFlow container. Users can run with a default tag or specify a particular container version. The manual `nvidia-docker run` command is also provided. ```bash # Run MNIST training with default tag (17.10) ./mnist_tensorflow.sh # Run with specific container version ./mnist_tensorflow.sh 18.03 # Manual nvidia-docker command nvidia-docker run --rm \ -w /opt/tensorflow/tensorflow/examples/tutorials/mnist \ nvcr.io/nvidia/tensorflow:18.03 \ python mnist_with_summaries.py ``` -------------------------------- ### CSP Configuration - Setting Default Values in Python Source: https://context7.com/nvidia/ngc-examples/llms.txt This Python code snippet shows how default configuration values for cloud providers are set within the `*_funcs.py` files. It provides an example for AWS, demonstrating the definition of default values for SSH key names, regions, user, image name, instance type, and a list of instance type choices. ```python # aws_funcs.py configuration example default_key_name = "my-aws-security-key" default_region = "us-west-2" default_user = "ubuntu" default_image_name = "NVIDIA Volta Deep Learning AMI*" default_instance_type = "p3.2xlarge" default_choices = ['p3.2xlarge', 'p3.8xlarge', 'p3.16xlarge'] ``` -------------------------------- ### AWS Bash Scripts for Managing Last Created Instance Source: https://context7.com/nvidia/ngc-examples/llms.txt This script manages previously created AWS instances using their saved instance ID. It supports operations such as starting, stopping, terminating, and SSH reconnecting to the instance. It can also display the SSH command for manual connections. ```bash # SSH into last created instance (starts if stopped) ./aws_last.sh # Check instance state ./aws_last.sh state # Stop the instance ./aws_last.sh stop # Terminate the instance ./aws_last.sh terminate # Show SSH command for manual connection ./aws_last.sh showssh # Output: ssh -i ~/.ssh/my-key.pem ubuntu@ec2-35-164-247-54.us-west-2.compute.amazonaws.com ``` -------------------------------- ### Automate Cross-Platform VM Lifecycle Source: https://context7.com/nvidia/ngc-examples/llms.txt A shell script demonstrating how to iterate through multiple cloud providers to create, test, and terminate virtual machine instances using the NCSP CLI tool. ```bash #!/bin/bash set -e for CSP in $(./ncsp csps); do echo "Testing on $CSP..." ./ncsp $CSP createVM ./ncsp $CSP ssh "nvidia-smi && uname -a" ./ncsp $CSP deleteVM done for CSP in "aws" "gcp" "ali"; do ./ncsp $CSP validCSP && { ./ncsp $CSP createVM ./ncsp $CSP ssh "run_my_benchmark.sh" ./ncsp $CSP deleteVM } done ``` -------------------------------- ### Create VM on Google Cloud using ncsp Source: https://github.com/nvidia/ngc-examples/blob/master/ncsp/README.md This command creates a virtual machine instance on Google Cloud Platform (GCP) using the ncsp tool. It shows how to specify accelerator counts for GPU instances. ```bash ncsp gcp --accelerator_count=8 createVM ``` -------------------------------- ### NCSP CLI: Create Virtual Machine on Cloud Providers Source: https://context7.com/nvidia/ngc-examples/llms.txt Creates a new virtual machine on AWS, Google Cloud Platform, or Alibaba Cloud using the NCSP CLI. It handles security group creation, image lookup, and waits for the VM to be fully operational and SSH-accessible. Supports specifying instance types and accelerator configurations. ```bash # Create a VM on AWS with default settings ncsp aws createVM # Create a VM on Google Cloud Platform ncsp gcp createVM # Create a VM on Alibaba Cloud ncsp ali createVM # Create an 8-GPU instance on AWS ncsp aws --instance_type p3.16xlarge createVM # Create a VM on GCP with GPU accelerators ncsp gcp --accelerator_type nvidia-tesla-p100 --accelerator_count 2 createVM # Enable tracing to see commands being executed ncsp aws --trace 1 createVM ``` -------------------------------- ### POST /ncsp/{provider}/createVM Source: https://context7.com/nvidia/ngc-examples/llms.txt Creates a new virtual machine instance on the specified cloud provider with optional custom configurations for instance types and accelerators. ```APIDOC ## POST /ncsp/{provider}/createVM ### Description Creates a new VM on the specified cloud service provider. Handles security group creation, image lookup, and waits until the VM is fully running. ### Method POST ### Endpoint /ncsp/{provider}/createVM ### Parameters #### Path Parameters - **provider** (string) - Required - The cloud provider (aws, gcp, or ali) #### Query Parameters - **instance_type** (string) - Optional - The hardware instance type (e.g., p3.16xlarge) - **accelerator_type** (string) - Optional - GPU accelerator model (e.g., nvidia-tesla-p100) - **accelerator_count** (integer) - Optional - Number of GPUs to attach ### Request Example `ncsp aws --instance_type p3.16xlarge createVM` ### Response #### Success Response (200) - **status** (string) - VM creation status and instance ID #### Response Example { "status": "success", "instance_id": "i-0e18d6bdac9e994e1" } ``` -------------------------------- ### Manage VM on Google Cloud using ncsp Source: https://github.com/nvidia/ngc-examples/blob/master/ncsp/README.md Demonstrates how to execute commands on a remote GCP VM via SSH using ncsp, such as checking uptime, and how to delete the VM. ```bash ncsp gcp ssh uptime ``` ```bash ncsp gcp deleteVM ``` -------------------------------- ### Execute Scripted VM Test on AWS, GCP, and Alibaba Source: https://github.com/nvidia/ngc-examples/blob/master/ncsp/README.md This script iterates through a list of CSPs (AWS, GCP, Alibaba) and executes a predefined test script ('mytest') on each. It ensures that the script exits on the first error. ```bash set -e # have bash exit script on first non-zero return for CSP in "aws" "gcp" "ali" ; do mytest $CSP # call script with each CSP name done ``` -------------------------------- ### Display ncsp Help Information Source: https://github.com/nvidia/ngc-examples/blob/master/ncsp/README.md This command displays help information for ncsp, including CSP-specific command-line options. It's useful for understanding available parameters. ```bash ncsp --help ``` -------------------------------- ### Configure Cloud Service Provider Settings Source: https://context7.com/nvidia/ngc-examples/llms.txt Configuration variables for GCP and Alibaba Cloud environments. These settings define the default keys, regions, instance types, and machine images required for VM deployment. ```python default_key_name = "my-gcp-key" default_region = "us-west1-b" default_user = "myusername" default_project = "my-gcp-project" default_service_account = "12345-compute@developer.gserviceaccount.com" default_image_name = "nvidia-gpu-cloud-image" default_instance_type = "n1-standard-8" default_accelerator_type = "nvidia-tesla-p100" ``` ```python default_key_name = "my-alibaba-key" default_region = "us-west-1" default_user = "root" default_image_name = "NVIDIA GPU Cloud Virtual Machine Image 18.03.0" default_instance_type = "ecs.gn5-c4g1.xlarge" ``` -------------------------------- ### Automate VM Lifecycle with Bash Source: https://github.com/nvidia/ngc-examples/blob/master/ncsp/README.md A Bash script that automates the creation, repeated starting/stopping, and deletion of a VM. It uses 'set -e' for automatic error handling based on command exit codes. ```bash #!/bin/bash set -e CSP="ali" ./ncsp $CSP validCSP ./ncsp $CSP createVM CNT=0 while [ $CNT -lt 10 ]; do echo "Loop $CNT ------------------" ./ncsp $CSP ssh uptime ./ncsp $CSP stopVM echo "" sleep 10 ./ncsp $CSP startVM echo "" sleep 10 CNT=$[$CNT+1] done ./ncsp $CSP deleteVM exit 0 ``` -------------------------------- ### Create VM on AWS using ncsp Source: https://github.com/nvidia/ngc-examples/blob/master/ncsp/README.md This command creates a virtual machine instance on Amazon Web Services (AWS) using the ncsp tool. It allows specifying instance types for different needs. ```bash ncsp aws createVM ``` ```bash ncsp aws --instance_type p3.16xlarge createVM ``` -------------------------------- ### NCSP CLI: Query Running Instances and Regions Source: https://context7.com/nvidia/ngc-examples/llms.txt Displays information about currently running VM instances for a specific cloud provider or across all configured providers. Also provides functionality to list available regions for a given provider. ```bash # Show running instances on AWS ncsp aws running # Show running instances on GCP ncsp gcp running # Show running instances across all providers ncsp ALL running # Show available regions for a provider ncsp aws regions ncsp gcp regions ``` -------------------------------- ### Configure AWS Script Parameters Source: https://github.com/nvidia/ngc-examples/blob/master/aws/bash/README.md Demonstrates how to use configuration files to override default variables in the AWS management scripts. Users can copy template files like sample.cfg and modify them to suit their environment. ```bash cp sample.cfg work.cfg vi work.cfg aws_create_instance work.cfg ``` -------------------------------- ### Manage VM on AWS using ncsp Source: https://github.com/nvidia/ngc-examples/blob/master/ncsp/README.md Demonstrates how to execute commands on a remote AWS VM via SSH using ncsp, such as checking uptime, and how to delete the VM. ```bash ncsp aws ssh uptime ``` ```bash ncsp aws deleteVM ``` -------------------------------- ### NCSP CLI Commands for VM Status and Information Source: https://context7.com/nvidia/ngc-examples/llms.txt This section details the NCSP CLI commands used to retrieve status and diagnostic information about the current VM instance and its configuration. It covers commands for checking VM status, showing detailed information, printing IP addresses, displaying arguments, cleaning cached files, validating CSP names, and listing supported cloud providers. ```bash # Show current VM status ncsp aws status # Show detailed VM information ncsp aws show # Output: vm "pbradstr-Thu-2018Feb01-185738" i-0e18d6bdac9e994e1 ec2-35-164-247-54.us-west-2.compute.amazonaws.com nsg "pbradstrNSG" sg-0cca0173 # Print the VM's IP address ncsp aws ip # Output: ec2-35-164-247-54.us-west-2.compute.amazonaws.com # Display persistent args file ncsp aws args # Clean cached files and restore defaults ncsp aws clean # Validate CSP name (returns 0 if valid) ncsp aws validCSP # List all supported cloud providers ncsp csps # Output: # aws # gcp # ali ``` -------------------------------- ### AWS Bash Scripts for Instance Creation Source: https://context7.com/nvidia/ngc-examples/llms.txt This Bash script automates the creation of AWS instances, including setting up security groups, EBS volumes, and EFS mounts. It is highly configurable through external configuration files, allowing users to specify various parameters like key paths, regions, image names, instance types, and storage options. ```bash # Run with default settings ./aws_create_instance.sh # Run with custom configuration file ./aws_create_instance.sh myconfig.cfg # Sample configuration file (sample.cfg): # KEY_PATH="~/.ssh" # KEY_NAME="my-aws-key" # KEY_FILE=$KEY_PATH/$KEY_NAME.pem # REGION="us-west-2" # IMAGE_NAME="NVIDIA Volta Deep Learning AMI*" # LOGIN_NAME="ubuntu" # SSH_SECURITY_GROUP_NAME="my-security-group" # INSTANCE_TYPE=p3.2xlarge # EBS_BOOT_VOLSIZE=64 # CREATE_PRIVATE_EBS_VOLUME="true" # MOUNT_EFS_VOLUME="true" # EFS_VOLUME_NAME_LIST=("My Data Volume, /efs/data") ``` -------------------------------- ### NCSP CLI: SSH into Virtual Machine and Execute Commands Source: https://context7.com/nvidia/ngc-examples/llms.txt Establishes an SSH connection to a running VM instance or executes remote commands. The tool automatically uses stored IP address and SSH key configurations. Supports interactive sessions and running single or multiple commands on the remote VM. ```bash # SSH into the AWS VM (interactive session) ncsp aws ssh # Run a command on the VM ncsp aws ssh uptime # Run multiple commands ncsp aws ssh "nvidia-smi && df -h" # SSH into GCP instance ncsp gcp ssh # SSH into Alibaba Cloud instance ncsp ali ssh hostname ``` -------------------------------- ### Manage NCSP Command Tracing Source: https://github.com/nvidia/ngc-examples/blob/master/ncsp/README.md Demonstrates how to enable and disable command tracing for debugging purposes. Tracing remains persistent until explicitly disabled or the VM is deleted. ```bash ncsp aws --trace 1 createVM ncsp aws deleteVM ncsp aws createVM ``` -------------------------------- ### Execute Scripted VM Test on All Known CSPs Source: https://github.com/nvidia/ngc-examples/blob/master/ncsp/README.md This script dynamically determines all known CSPs using './ncsp csps' and then executes a predefined test script ('mytest') on each. It ensures the script exits on the first error. ```bash set -e # have bash exit script on first non-zero return for CSP in $(./ncsp csps); do mytest $CSP # call script with CSP name done ``` -------------------------------- ### Configure CSP Default Variables Source: https://github.com/nvidia/ngc-examples/blob/master/ncsp/README.md Defines the required configuration variables for CSP modules, including authentication keys, regions, and instance specifications. These settings should be updated at the top of the configuration file. ```text default_key_name = "my-security-key-name" default_region = "my-region-name" default_user = "my-user-name" default_image_name = "Generic starter AMI*" default_instance_type = "type1.small" default_choices = ['type1.small', 'type1.med', 'type1.large'] ``` -------------------------------- ### Scripted VM Test Across Multiple CSPs Source: https://github.com/nvidia/ngc-examples/blob/master/ncsp/README.md A bash script that automates the creation, testing, and deletion of VMs across multiple cloud service providers (CSPs). It utilizes a loop to iterate through specified CSPs. ```bash #!/bin/bash # mytest script -- create/delete and run a test on a VM 1000 times # set -e # have bash exit script on any non-zero error code for i in `seq 1 1000`; do nscp $CSP createVM nscp $CSP ssh mytest nscp $CSP deleteVM done if [ $? -ne 0 ]; then ncsp $CSP ssh # poke around if a error leaves VM up, will fail if dies return 1 # return 1 to stop outer loop else return 0 # test ran successful, return 0 fi ``` -------------------------------- ### NVIDIA Docker - Run PyTorch Container Source: https://context7.com/nvidia/ngc-examples/llms.txt This script facilitates running GPU-accelerated PyTorch training using NVIDIA Docker with the official NGC PyTorch container image. It supports running MNIST training with a default tag or a specified container version. The equivalent manual `nvidia-docker run` command is also included. ```bash # Run MNIST training with default tag ./mnist_pytorch.sh # Run with specific container version ./mnist_pytorch.sh 18.03 # Manual nvidia-docker command nvidia-docker run --rm \ -w /opt/pytorch/examples/mnist \ nvcr.io/nvidia/pytorch:18.03 \ python main.py ``` -------------------------------- ### Trace ncsp Commands and Responses on AWS Source: https://github.com/nvidia/ngc-examples/blob/master/ncsp/README.md This command enables tracing of ncsp operations on AWS. The '--trace 1' option shows the commands being sent to the CSP, useful for debugging and understanding the underlying API calls. ```bash ./ncsp aws --trace 1 createVM ``` -------------------------------- ### DELETE /ncsp/{provider}/deleteVM Source: https://context7.com/nvidia/ngc-examples/llms.txt Terminates and removes the VM instance along with associated resources. ```APIDOC ## DELETE /ncsp/{provider}/deleteVM ### Description Terminates the VM, clears persistent configuration data, and removes the IP from known hosts. ### Method DELETE ### Endpoint /ncsp/{provider}/deleteVM ### Parameters #### Path Parameters - **provider** (string) - Required - The cloud provider (aws, gcp, or ali) ### Request Example `ncsp aws deleteVM` ### Response #### Success Response (200) - **status** (string) - Confirmation of VM termination ``` -------------------------------- ### Manage AWS Instance Lifecycle Source: https://github.com/nvidia/ngc-examples/blob/master/aws/bash/README.md The aws_last.sh script allows users to check the state, stop, terminate, or connect to the most recently created AWS instance. It relies on a local state file generated during the instance creation process. ```bash aws_last [options] # Options: # state: shows current state # stop: stops instance if running # terminate: terminates instance if running # showssh: displays ssh command needed # -h, help: displays this help text ``` -------------------------------- ### AWS Terraform Configuration for DL Instances Source: https://context7.com/nvidia/ngc-examples/llms.txt Terraform configuration for deploying NVIDIA Volta Deep Learning AMI instances on AWS. This configuration defines variables for region, instance type, key name, security group, and EBS volume size. It includes instructions for initialization, planning, and applying the configuration, with a note on how to modify it for spot instances. ```hcl # variables.tf - Define variables variable "region" { default = "us-west-2" } variable "instance-type" { default = "p3.2xlarge" } variable "key-name" { description = "Name of SSH key pair" } variable "security-group" { description = "Security group name" } variable "name" { default = "NGC-DL-Instance" } variable "ebs-size" { default = 50 } # Deploy instance # terraform init # terraform plan # terraform apply # For spot instances, modify instance.tf: # resource "aws_spot_instance_request" "dl_instance" { # spot_price = "1.50" # wait_for_fulfillment = true # ami = "${data.aws_ami.nv_volta_dl_ami.id}" # instance_type = "${var.instance-type}" # ... # } ``` -------------------------------- ### NCSP CLI: Network Security Group Management Source: https://context7.com/nvidia/ngc-examples/llms.txt Manages network security groups (NSGs) to control inbound and outbound traffic for VM instances. This includes creating, displaying, and deleting NSGs. Note that GCP handles network security differently and may not directly support NSG operations in the same manner. ```bash # Create a network security group ncsp aws createNSG # Show all security groups ncsp aws showNSGs # Delete a network security group ncsp aws deleteNSG # Note: GCP does not use NSGs in the same way ncsp gcp showNSGs ``` -------------------------------- ### NCSP CLI: Delete Virtual Machine Source: https://context7.com/nvidia/ngc-examples/llms.txt Terminates and removes a VM instance along with its associated resources from the specified cloud provider. This action clears persistent configuration data and removes the VM's IP address from SSH known hosts. ```bash # Delete an AWS VM ncsp aws deleteVM # Delete a GCP VM ncsp gcp deleteVM # Delete an Alibaba Cloud VM ncsp ali deleteVM ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.