### Install Docker and NVIDIA Container Toolkit on Ubuntu

Source: https://github.com/devforth/gpt-j-6b-gpu-docker/blob/main/README.md

This section details the installation of Docker and the NVIDIA Container Toolkit on Ubuntu. It involves downloading GPG keys, adding the NVIDIA Docker repository, updating package lists, installing Docker, and then installing nvidia-docker2. A reboot is required after installation.

```bash
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) 
   && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | apt-key add - 
   && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | tee /etc/apt/sources.list.d/nvidia-docker.list

apt update && apt -y upgrade
curl https://get.docker.com | sh && systemctl --now restart docker 
apt install -y nvidia-docker2
```

--------------------------------

### Install NVIDIA Drivers and Docker Toolkit on Ubuntu

Source: https://context7.com/devforth/gpt-j-6b-gpu-docker/llms.txt

A comprehensive bash script to purge existing drivers, install the latest NVIDIA drivers, configure the NVIDIA Container Toolkit, and verify GPU access within a Docker container. This setup is required to enable hardware acceleration for the GPT-J-6B model.

```bash
# Check if NVIDIA drivers are installed
nvidia-smi

# Install NVIDIA drivers on Ubuntu 20.04
apt purge *nvidia*
apt autoremove
add-apt-repository ppa:graphics-drivers/ppa
apt update
apt install -y ubuntu-drivers-common
ubuntu-drivers autoinstall

# Install Docker with NVIDIA Container Toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
  && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | apt-key add - \
  && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list \
     | tee /etc/apt/sources.list.d/nvidia-docker.list

apt update && apt -y upgrade
curl https://get.docker.com | sh && systemctl --now restart docker
apt install -y nvidia-docker2

# Verify CUDA works in Docker
docker run --rm --gpus all nvidia/cuda:11.1-base nvidia-smi
```

--------------------------------

### Interact with Text Generation API

Source: https://context7.com/devforth/gpt-j-6b-gpu-docker/llms.txt

Examples of using cURL to send POST requests to the /generate/ endpoint. These requests demonstrate basic text completion, chatbot-style prompts, and parameter tuning for deterministic output.

```bash
curl -X POST http://localhost:8080/generate/ \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello my name is Paul and",
    "generate_tokens_limit": 40,
    "top_p": 0.7,
    "top_k": 0,
    "temperature": 1.0
  }'
```

--------------------------------

### Install NVIDIA Drivers on Ubuntu

Source: https://github.com/devforth/gpt-j-6b-gpu-docker/blob/main/README.md

This snippet provides commands to install NVIDIA drivers on Ubuntu systems. It includes purging existing drivers, adding a PPA for graphics drivers, updating package lists, and automatically installing recommended drivers. This is a prerequisite for using NVIDIA GPUs with Docker.

```bash
apt purge *nvidia*
apt autoremove
add-apt-repository ppa:graphics-drivers/ppa
apt update
apt install -y ubuntu-drivers-common
ubuntu-drivers autoinstall
```

--------------------------------

### Generate Text with GPT-J-6B REST API

Source: https://github.com/devforth/gpt-j-6b-gpu-docker/blob/main/README.md

This is an example of a POST request to the GPT-J-6B model's REST API endpoint for text generation. It includes the prompt text, desired token limit, and sampling parameters like top_p, top_k, and temperature.

```json
{
  "text": "Client: Hi, who are you?\nAI: I am Vincent and I am barista!\nClient: What do you do every day?\nAI:",
  "generate_tokens_limit": 40,
  "top_p": 0.7,
  "top_k": 0,
  "temperature":1.0
}
```

--------------------------------

### Test CUDA in Docker

Source: https://github.com/devforth/gpt-j-6b-gpu-docker/blob/main/README.md

This command verifies that the NVIDIA Container Toolkit is correctly installed and configured, allowing Docker containers to access the host's GPUs. It runs a simple `nvidia-smi` command within a CUDA-enabled Docker container.

```bash
docker run --rm --gpus all nvidia/cuda:11.1-base nvidia-smi
```

--------------------------------

### POST /generate/

Source: https://github.com/devforth/gpt-j-6b-gpu-docker/blob/main/README.md

Generates text completions using the GPT-J-6B model based on the provided input parameters.

```APIDOC
## POST /generate/

### Description
Sends a text prompt to the GPT-J-6B model and returns the generated completion. Note that while the web server is asynchronous, model inference is a blocking operation.

### Method
POST

### Endpoint
/generate/

### Parameters
#### Request Body
- **text** (string) - Required - The input prompt for text generation.
- **generate_tokens_limit** (integer) - Optional - The maximum number of tokens to generate.
- **top_p** (float) - Optional - Nucleus sampling probability threshold.
- **top_k** (integer) - Optional - The number of highest probability vocabulary tokens to keep for top-k-filtering.
- **temperature** (float) - Optional - Controls the randomness of the output.

### Request Example
{
  "text": "Client: Hi, who are you?\nAI: I am Vincent and I am barista!\nClient: What do you do every day?\nAI:",
  "generate_tokens_limit": 40,
  "top_p": 0.7,
  "top_k": 0,
  "temperature": 1.0
}

### Response
#### Success Response (200)
- **generated_text** (string) - The text generated by the model.

#### Response Example
{
  "generated_text": "I make coffee for people!"
}
```

--------------------------------

### Build and Run Docker Image for Development

Source: https://github.com/devforth/gpt-j-6b-gpu-docker/blob/main/README.md

This command builds the Docker image from the current directory's Dockerfile and then runs it. This is useful for development purposes, allowing you to test changes locally before pushing them. It maps port 8080 and enables GPU access.

```bash
docker run -p8080:8080 --gpus all --rm -it $(docker build -q .)
```

--------------------------------

### Run GPT-J-6B Docker Image

Source: https://github.com/devforth/gpt-j-6b-gpu-docker/blob/main/README.md

This command launches the GPT-J-6B Docker container, exposing the internal HTTP server on port 8080 to the host machine. The `--gpus all` flag ensures that the container has access to the host's GPUs for model inference.

```bash
docker run -p8080:8080 --gpus all --rm -it devforth/gpt-j-6b-gpu
```

--------------------------------

### Run and Manage GPT-J-6B Docker Container

Source: https://context7.com/devforth/gpt-j-6b-gpu-docker/llms.txt

Commands to pull, build, and execute the GPT-J-6B container on GPU-enabled systems. It also includes a command to verify the NVIDIA Docker runtime environment.

```bash
# Pull and run the pre-built image
docker run -p8080:8080 --gpus all --rm -it devforth/gpt-j-6b-gpu

# Build and run from source
docker run -p8080:8080 --gpus all --rm -it $(docker build -q .)

# Verify NVIDIA Docker setup first
docker run --rm --gpus all nvidia/cuda:11.1-base nvidia-smi
```

--------------------------------

### Define Request Schema and Python Client Integration

Source: https://context7.com/devforth/gpt-j-6b-gpu-docker/llms.txt

Defines the Pydantic model for input validation and demonstrates how to consume the REST API using the Python requests library.

```python
from pydantic import BaseModel
import requests

class Input(BaseModel):
    text: str
    generate_tokens_limit: int = 100
    top_p: float = 0.7
    top_k: float = 0
    temperature: float = 1.0

response = requests.post(
    "http://localhost:8080/generate/",
    json={
        "text": "Once upon a time",
        "generate_tokens_limit": 150,
        "top_p": 0.7,
        "top_k": 0,
        "temperature": 0.9
    }
)
print(response.json()["completion"])
```

--------------------------------

### POST /generate/ - Text Generation Endpoint

Source: https://context7.com/devforth/gpt-j-6b-gpu-docker/llms.txt

This endpoint accepts a JSON payload with input text and generation parameters to generate text completions using the GPT-J-6B model. It supports configurable sampling parameters and has a 2048 token context limit.

```APIDOC
## POST /generate/

### Description
Generates text completions based on the provided input text and sampling parameters.

### Method
POST

### Endpoint
/generate/

### Parameters
#### Request Body
- **text** (string) - Required - The input prompt text for generation.
- **generate_tokens_limit** (integer) - Optional - The maximum number of tokens to generate (default: 100).
- **top_p** (float) - Optional - Nucleus sampling threshold (default: 0.7).
- **top_k** (float) - Optional - Top-k sampling threshold (0 = disabled, default: 0).
- **temperature** (float) - Optional - Controls randomness of output (default: 1.0).

### Request Example
```json
{
  "text": "Hello my name is Paul and",
  "generate_tokens_limit": 40,
  "top_p": 0.7,
  "top_k": 0,
  "temperature": 1.0
}
```

### Response
#### Success Response (200)
- **completion** (string) - The generated text completion, including the original prompt.

#### Error Response
- **error** (string) - Description of the error, e.g., token limit exceeded.

#### Response Example
```json
{
  "completion": "Hello my name is Paul and I am a software developer..."
}
```

#### Error Response Example
```json
{
  "error": "This model can't generate more then 2048 tokens, you passed 2000 input tokens and requested to generate 100 tokens"
}
```
```

--------------------------------

### Internal Model Evaluation Execution

Source: https://context7.com/devforth/gpt-j-6b-gpu-docker/llms.txt

Demonstrates the internal usage of the model.eval() function within the container environment to process generation requests.

```python
input_data = Input(
    text="Explain quantum computing in simple terms:",
    generate_tokens_limit=200,
    top_p=0.7,
    top_k=0,
    temperature=0.8
)

result = model.eval(input_data)
```

=== COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.