### Install Docker and NVIDIA Container Toolkit on Ubuntu Source: https://github.com/devforth/gpt-j-6b-gpu-docker/blob/main/README.md This section details the installation of Docker and the NVIDIA Container Toolkit on Ubuntu. It involves downloading GPG keys, adding the NVIDIA Docker repository, updating package lists, installing Docker, and then installing nvidia-docker2. A reboot is required after installation. ```bash distribution=$(. /etc/os-release;echo $ID$VERSION_ID) && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | apt-key add - && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | tee /etc/apt/sources.list.d/nvidia-docker.list apt update && apt -y upgrade curl https://get.docker.com | sh && systemctl --now restart docker apt install -y nvidia-docker2 ``` -------------------------------- ### Install NVIDIA Drivers and Docker Toolkit on Ubuntu Source: https://context7.com/devforth/gpt-j-6b-gpu-docker/llms.txt A comprehensive bash script to purge existing drivers, install the latest NVIDIA drivers, configure the NVIDIA Container Toolkit, and verify GPU access within a Docker container. This setup is required to enable hardware acceleration for the GPT-J-6B model. ```bash # Check if NVIDIA drivers are installed nvidia-smi # Install NVIDIA drivers on Ubuntu 20.04 apt purge *nvidia* apt autoremove add-apt-repository ppa:graphics-drivers/ppa apt update apt install -y ubuntu-drivers-common ubuntu-drivers autoinstall # Install Docker with NVIDIA Container Toolkit distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \ && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | apt-key add - \ && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list \ | tee /etc/apt/sources.list.d/nvidia-docker.list apt update && apt -y upgrade curl https://get.docker.com | sh && systemctl --now restart docker apt install -y nvidia-docker2 # Verify CUDA works in Docker docker run --rm --gpus all nvidia/cuda:11.1-base nvidia-smi ``` -------------------------------- ### Interact with Text Generation API Source: https://context7.com/devforth/gpt-j-6b-gpu-docker/llms.txt Examples of using cURL to send POST requests to the /generate/ endpoint. These requests demonstrate basic text completion, chatbot-style prompts, and parameter tuning for deterministic output. ```bash curl -X POST http://localhost:8080/generate/ \ -H "Content-Type: application/json" \ -d '{ "text": "Hello my name is Paul and", "generate_tokens_limit": 40, "top_p": 0.7, "top_k": 0, "temperature": 1.0 }' ``` -------------------------------- ### Install NVIDIA Drivers on Ubuntu Source: https://github.com/devforth/gpt-j-6b-gpu-docker/blob/main/README.md This snippet provides commands to install NVIDIA drivers on Ubuntu systems. It includes purging existing drivers, adding a PPA for graphics drivers, updating package lists, and automatically installing recommended drivers. This is a prerequisite for using NVIDIA GPUs with Docker. ```bash apt purge *nvidia* apt autoremove add-apt-repository ppa:graphics-drivers/ppa apt update apt install -y ubuntu-drivers-common ubuntu-drivers autoinstall ``` -------------------------------- ### Generate Text with GPT-J-6B REST API Source: https://github.com/devforth/gpt-j-6b-gpu-docker/blob/main/README.md This is an example of a POST request to the GPT-J-6B model's REST API endpoint for text generation. It includes the prompt text, desired token limit, and sampling parameters like top_p, top_k, and temperature. ```json { "text": "Client: Hi, who are you?\nAI: I am Vincent and I am barista!\nClient: What do you do every day?\nAI:", "generate_tokens_limit": 40, "top_p": 0.7, "top_k": 0, "temperature":1.0 } ``` -------------------------------- ### Test CUDA in Docker Source: https://github.com/devforth/gpt-j-6b-gpu-docker/blob/main/README.md This command verifies that the NVIDIA Container Toolkit is correctly installed and configured, allowing Docker containers to access the host's GPUs. It runs a simple `nvidia-smi` command within a CUDA-enabled Docker container. ```bash docker run --rm --gpus all nvidia/cuda:11.1-base nvidia-smi ``` -------------------------------- ### POST /generate/ Source: https://github.com/devforth/gpt-j-6b-gpu-docker/blob/main/README.md Generates text completions using the GPT-J-6B model based on the provided input parameters. ```APIDOC ## POST /generate/ ### Description Sends a text prompt to the GPT-J-6B model and returns the generated completion. Note that while the web server is asynchronous, model inference is a blocking operation. ### Method POST ### Endpoint /generate/ ### Parameters #### Request Body - **text** (string) - Required - The input prompt for text generation. - **generate_tokens_limit** (integer) - Optional - The maximum number of tokens to generate. - **top_p** (float) - Optional - Nucleus sampling probability threshold. - **top_k** (integer) - Optional - The number of highest probability vocabulary tokens to keep for top-k-filtering. - **temperature** (float) - Optional - Controls the randomness of the output. ### Request Example { "text": "Client: Hi, who are you?\nAI: I am Vincent and I am barista!\nClient: What do you do every day?\nAI:", "generate_tokens_limit": 40, "top_p": 0.7, "top_k": 0, "temperature": 1.0 } ### Response #### Success Response (200) - **generated_text** (string) - The text generated by the model. #### Response Example { "generated_text": "I make coffee for people!" } ``` -------------------------------- ### Build and Run Docker Image for Development Source: https://github.com/devforth/gpt-j-6b-gpu-docker/blob/main/README.md This command builds the Docker image from the current directory's Dockerfile and then runs it. This is useful for development purposes, allowing you to test changes locally before pushing them. It maps port 8080 and enables GPU access. ```bash docker run -p8080:8080 --gpus all --rm -it $(docker build -q .) ``` -------------------------------- ### Run GPT-J-6B Docker Image Source: https://github.com/devforth/gpt-j-6b-gpu-docker/blob/main/README.md This command launches the GPT-J-6B Docker container, exposing the internal HTTP server on port 8080 to the host machine. The `--gpus all` flag ensures that the container has access to the host's GPUs for model inference. ```bash docker run -p8080:8080 --gpus all --rm -it devforth/gpt-j-6b-gpu ``` -------------------------------- ### Run and Manage GPT-J-6B Docker Container Source: https://context7.com/devforth/gpt-j-6b-gpu-docker/llms.txt Commands to pull, build, and execute the GPT-J-6B container on GPU-enabled systems. It also includes a command to verify the NVIDIA Docker runtime environment. ```bash # Pull and run the pre-built image docker run -p8080:8080 --gpus all --rm -it devforth/gpt-j-6b-gpu # Build and run from source docker run -p8080:8080 --gpus all --rm -it $(docker build -q .) # Verify NVIDIA Docker setup first docker run --rm --gpus all nvidia/cuda:11.1-base nvidia-smi ``` -------------------------------- ### Define Request Schema and Python Client Integration Source: https://context7.com/devforth/gpt-j-6b-gpu-docker/llms.txt Defines the Pydantic model for input validation and demonstrates how to consume the REST API using the Python requests library. ```python from pydantic import BaseModel import requests class Input(BaseModel): text: str generate_tokens_limit: int = 100 top_p: float = 0.7 top_k: float = 0 temperature: float = 1.0 response = requests.post( "http://localhost:8080/generate/", json={ "text": "Once upon a time", "generate_tokens_limit": 150, "top_p": 0.7, "top_k": 0, "temperature": 0.9 } ) print(response.json()["completion"]) ``` -------------------------------- ### POST /generate/ - Text Generation Endpoint Source: https://context7.com/devforth/gpt-j-6b-gpu-docker/llms.txt This endpoint accepts a JSON payload with input text and generation parameters to generate text completions using the GPT-J-6B model. It supports configurable sampling parameters and has a 2048 token context limit. ```APIDOC ## POST /generate/ ### Description Generates text completions based on the provided input text and sampling parameters. ### Method POST ### Endpoint /generate/ ### Parameters #### Request Body - **text** (string) - Required - The input prompt text for generation. - **generate_tokens_limit** (integer) - Optional - The maximum number of tokens to generate (default: 100). - **top_p** (float) - Optional - Nucleus sampling threshold (default: 0.7). - **top_k** (float) - Optional - Top-k sampling threshold (0 = disabled, default: 0). - **temperature** (float) - Optional - Controls randomness of output (default: 1.0). ### Request Example ```json { "text": "Hello my name is Paul and", "generate_tokens_limit": 40, "top_p": 0.7, "top_k": 0, "temperature": 1.0 } ``` ### Response #### Success Response (200) - **completion** (string) - The generated text completion, including the original prompt. #### Error Response - **error** (string) - Description of the error, e.g., token limit exceeded. #### Response Example ```json { "completion": "Hello my name is Paul and I am a software developer..." } ``` #### Error Response Example ```json { "error": "This model can't generate more then 2048 tokens, you passed 2000 input tokens and requested to generate 100 tokens" } ``` ``` -------------------------------- ### Internal Model Evaluation Execution Source: https://context7.com/devforth/gpt-j-6b-gpu-docker/llms.txt Demonstrates the internal usage of the model.eval() function within the container environment to process generation requests. ```python input_data = Input( text="Explain quantum computing in simple terms:", generate_tokens_limit=200, top_p=0.7, top_k=0, temperature=0.8 ) result = model.eval(input_data) ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.