### Install CodeGeeX from Source

Source: https://context7.com/thudm/codegeex/llms.txt

Clone the repository and install the package using pip. Alternatively, use the pre-built Docker image for a quick setup.

```bash
git clone git@github.com:THUDM/CodeGeeX.git
cd CodeGeeX
pip install -e .
```

```bash
# Or use the pre-built Docker image (requires nvidia-docker)
docker pull codegeex/codegeex:latest
docker run --gpus '"device=0,1"' -it --ipc=host --name=codegeex codegeex/codegeex
```

--------------------------------

### Install CodeGeeX from Source

Source: https://github.com/thudm/codegeex/blob/main/README.md

Clone the repository and install the package in editable mode. Requires Python 3.7+, CUDA 11+, PyTorch 1.10+, and DeepSpeed 0.6+.

```bash
git clone git@github.com:THUDM/CodeGeeX.git
cd CodeGeeX
pip install -e .
```

--------------------------------

### Go Bubble Sort Example (Stealth Mode)

Source: https://github.com/thudm/codegeex/blob/main/vscode-extension/README.md

Demonstrates code generation in Stealth mode. The generated code appears in gray and can be inserted by pressing Tab. Note that modifying code before generation finishes may cause bugs.

```go
package main

import "fmt"

func main() {
	// CodeGeeX will generate the following code
	// when you stop writing.
	// Press Tab to insert the generated code.

	// Example: Bubble sort implementation in Go
	arr := []int{64, 34, 25, 12, 22, 11, 90}
	fmt.Println("Unsorted array: ", arr)
	
	// Bubble sort logic
	n := len(arr)
	for i := 0; i < n-1; i++ {
		for j := 0; j < n-i-1; j++ {
			if arr[j] > arr[j+1] {
				arr[j], arr[j+1] = arr[j+1], arr[j]
			}
		}
	}

	fmt.Println("Sorted array: ", arr)
}

```

--------------------------------

### Launch 8-GPU Pre-training with DeepSpeed ZeRO-2

Source: https://context7.com/thudm/codegeex/llms.txt

Launches an 8-GPU pre-training job using DeepSpeed ZeRO-2. Ensure to consult the full arguments in configs/codegeex_13b.sh.

```bash
deepspeed --num_gpus 8 codegeex/megatron/tools/pretrain_codegeex.py \
    --num-layers 40 \
    --hidden-size 5120 \
    --num-attention-heads 40 \
    --seq-length 2048 \
    --max-position-embeddings 2048 \
    --micro-batch-size 4 \
    --global-batch-size 512 \
    --lr 1e-4 \
    --train-iters 500000 \
    --lr-decay-iters 480000 \
    --data-path /data/code_corpus_text_document \
    --vocab-file tokenizer/vocab.json \
    --merge-file tokenizer/merges.txt \
    --data-impl mmap \
    --split 949,50,1 \
    --distributed-backend nccl \
    --fp16 \
    --deepspeed \
    --deepspeed_config configs/ds_config.json \
    --tokenizer-type GPT2BPETokenizer \
    --save /checkpoints/codegeex_13b \
    --load /checkpoints/codegeex_13b
```

--------------------------------

### Run CodeGeeX with Docker

Source: https://github.com/thudm/codegeex/blob/main/README.md

Pull the latest CodeGeeX Docker image and run it. Use the --gpus flag to enable GPU support, specifying device IDs.

```bash
docker pull codegeex/codegeex:latest
# To enable GPU support, clarify device ids with --device
docker run --gpus '"device=0,1"' -it --ipc=host --name=codegeex codegeex/codegeex
```

--------------------------------

### Building CodeGeeX Docker Image

Source: https://github.com/thudm/codegeex/blob/main/codegeex/benchmark/README.md

Build the Docker image from the provided Dockerfile if you prefer to customize the environment.

```bash
cd codegeex/docker
docker build [OPTIONS] .
```

--------------------------------

### Generate Samples with LM Evaluation Harness

Source: https://context7.com/thudm/codegeex/llms.txt

Use this wrapper to tokenize context, run generation, and decode text with the lm-evaluation-harness. Ensure model and tokenizer are properly initialized.

```python
from codegeex.megatron.code_generation_utils import generate_samples_eval
from codegeex.megatron import get_args, get_tokenizer

args = get_args()
args.seq_length = 2048
args.temperature = 1.0
args.top_k = 0
args.top_p = 1.0
args.greedy = False
args.beam_search = False

tokenizer = get_tokenizer()
context = "# language: Python\ndef add(a, b):\n    """Return the sum of a and b."""\n"

generated_text = generate_samples_eval(
    model,
    context=context,
    max_gen_length=64,
    eos_token_id=tokenizer.eod,
)
print("Generated:", generated_text)
# Expected output:
#     return a + b
```

--------------------------------

### Running CodeGeeX Docker Container

Source: https://github.com/thudm/codegeex/blob/main/codegeex/benchmark/README.md

Launch a container from the CodeGeeX Docker image, mounting local directories as needed.

```bash
docker run -it --gpus all --mount type=bind,source=<LOCAL PATH>,target=<PATH IN CONTAINER> [OPTIONS] <IMAGE NAME:TAG>
```

--------------------------------

### Run Inference with Quantization

Source: https://github.com/thudm/codegeex/blob/main/README.md

Perform inference using quantization, requiring more than 15GB of RAM. Specify the GPU ID and the path to the prompt file.

```bash
# With quantization (with more than 15GB RAM)
bash ./scripts/test_inference_quantized.sh <GPU_ID> ./tests/test_prompt.txt
```

--------------------------------

### Download and Merge Model Weights

Source: https://context7.com/thudm/codegeex/llms.txt

Download the model checkpoint using aria2c and then merge and extract the files. The result is a directory containing model state files.

```bash
# Download all parts in parallel
aria2c -x 16 -s 16 -j 4 --continue=true -i urls.txt

# Merge and extract
cat codegeex_13b.tar.gz.*
 tar xvf codegeex_13b.tar.gz
# Result: directory containing mp_rank_00_model_states.pt (and others for MP)
```

--------------------------------

### Python Code Explanation Template

Source: https://github.com/thudm/codegeex/blob/main/vscode-extension/README.md

Use this template in prompt mode to explain Python code line by line. The `<INPUT>` tag is where the selected code will be inserted.

```python
# language: Python

def sum_squares(lst):
    sum = 0
    for i in range(len(lst)):
        if i % 3 == 0:
            lst[i] = lst[i]**2
        elif i % 4 == 0:
            lst[i] = lst[i]**3
        sum += lst[i]
    return sum

<INPUT>

# Explain the code line by line
def sum_squares(lst):
    # initialize sum
    sum = 0
    # loop through the list
    for i in range(len(lst)):
        # if the index is a multiple of 3
        if i % 3 == 0:
            # square the entry
            lst[i] = lst[i]**2
        # if the index is a multiple of 4
        elif i % 4 == 0:
            # cube the entry
            lst[i] = lst[i]**3
        # add the entry to the sum
        sum += lst[i]
    # return the sum
    return sum

# Explain the code line by line
<INPUT:0,1>
```

--------------------------------

### Download Model Weights with aria2c

Source: https://github.com/thudm/codegeex/blob/main/README.md

Use aria2c to download model weights from a list of URLs provided in urls.txt. Ensure sufficient disk space for the ~26GB checkpoint.

```bash
aria2c -x 16 -s 16 -j 4 --continue=true -i urls.txt
```

--------------------------------

### Pulling CodeGeeX Docker Image

Source: https://github.com/thudm/codegeex/blob/main/codegeex/benchmark/README.md

Use this command to pull the pre-built Docker image containing the required environments for evaluation.

```bash
docker pull rishubi/codegeex:latest
```

--------------------------------

### Single-GPU Inference with CodeGeeX

Source: https://context7.com/thudm/codegeex/llms.txt

Run code generation from a prompt file on a single NVIDIA GPU. Quantized mode reduces memory requirements. Multi-GPU inference requires checkpoint conversion first.

```bash
# Write a prompt to a file
echo "# language: Python
def bubble_sort(arr):
    """Sort a list using bubble sort algorithm."""
" > tests/test_prompt.txt

# Standard inference (>27 GB GPU RAM)
bash ./scripts/test_inference.sh 0 ./tests/test_prompt.txt

# Quantized inference (>15 GB GPU RAM)
bash ./scripts/test_inference_quantized.sh 0 ./tests/test_prompt.txt

# Multi-GPU inference (first convert checkpoint, then run)
bash ./scripts/convert_ckpt_parallel.sh /path/to/ckpt /path/to/mp_ckpt 2
bash ./scripts/test_inference_parallel.sh 2 ./tests/test_prompt.txt
```

--------------------------------

### Build Cross-lingual Translation Prompts

Source: https://context7.com/thudm/codegeex/llms.txt

Load source and target language HumanEval-X datasets to construct code translation prompts. The prompt format includes source language declaration, solution, and target language declaration.

```python
from codegeex.benchmark.utils import read_translation_dataset

dataset = read_translation_dataset(
    data_file_src="codegeex/benchmark/humaneval-x/python/data/humaneval_python.jsonl.gz",
    data_file_tgt="codegeex/benchmark/humaneval-x/cpp/data/humaneval_cpp.jsonl.gz",
    lang_src="python",
    lang_tgt="cpp",
    dataset_type="humaneval",
)

sample = list(dataset.values())[0]
print(sample["prompt"])
# code translation
# Python:
# def has_close_elements(numbers: List[float], threshold: float) -> bool:
#     for idx, elem in enumerate(numbers):
#         ...
# C++:
# bool has_close_elements(vector<float> numbers, float threshold) {

```

--------------------------------

### Run Inference on Single GPU

Source: https://github.com/thudm/codegeex/blob/main/README.md

Execute inference on a single GPU with more than 27GB of RAM. Specify the GPU ID and the path to the prompt file.

```bash
# On a single GPU (with more than 27GB RAM)
bash ./scripts/test_inference.sh <GPU_ID> ./tests/test_prompt.txt
```

--------------------------------

### Sliding Window for Pre-training Data

Source: https://context7.com/thudm/codegeex/llms.txt

Generates overlapping prompt-code token pairs from a long source file for pre-training. Ensures each window fits within `seq_len` and meets `minimum_code_len`. Requires `stream_jsonl` and `sliding_window` utilities.

```python
from codegeex.data.data_utils import stream_jsonl
from codegeex.data.data_utils import sliding_window

# Tokenize a long source file (tokens are assumed to be pre-encoded int lists)
prompt_tokens = [1, 2, 3, 4, 5]           # e.g., language tag tokens
code_tokens   = list(range(100, 600))      # 500 tokens of code

windows = list(sliding_window(
    prompt_tokens=prompt_tokens,
    code_tokens=code_tokens,
    seq_len=128,
    sliding_stride=64,
    minimum_code_len=8,
))

print(f"Total windows: {len(windows)}")
for i, (p, c) in enumerate(windows[:3]):
    print(f"Window {i}: prompt_len={len(p)}, code_len={len(c)}, total={len(p)+len(c)}")
```

--------------------------------

### Evaluating Generated Codes

Source: https://github.com/thudm/codegeex/blob/main/codegeex/benchmark/README.md

Execute this script to evaluate generated code samples using the HumanEval-X benchmark. Ensure you understand the risks associated with running generated code.

```bash
bash scripts/evaluate_humaneval_x.sh <RESULT_FILE> <LANG> <N_WORKERS>
```

--------------------------------

### Convert Checkpoint for Model Parallelism

Source: https://github.com/thudm/codegeex/blob/main/README.md

Convert a checkpoint to be partitioned for model parallelism. This is a prerequisite for running inference on multiple GPUs with limited RAM per GPU.

```bash
# On multiple GPUs (with more than 6GB RAM, need to first convert ckpt to MP_SIZE partitions)
bash ./scripts/convert_ckpt_parallel.sh <LOAD_CKPT_PATH> <SAVE_CKPT_PATH> <MP_SIZE>
```

--------------------------------

### generate_samples_eval — LM Evaluation Harness Integration

Source: https://context7.com/thudm/codegeex/llms.txt

A convenience wrapper compatible with the EleutherAI lm-evaluation-harness. It tokenizes a context string, runs generation up to max_gen_length new tokens, and returns the decoded generated text.

```APIDOC
## `generate_samples_eval` — LM Evaluation Harness Integration

`generate_samples_eval` is a convenience wrapper compatible with the [EleutherAI lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness). It tokenizes a context string, runs generation up to `max_gen_length` new tokens, and returns the decoded generated text.

```python
from codegeex.megatron.code_generation_utils import generate_samples_eval
from codegeex.megatron import get_args, get_tokenizer

args = get_args()
args.seq_length = 2048
args.temperature = 1.0
args.top_k = 0
args.top_p = 1.0
args.greedy = False
args.beam_search = False

tokenizer = get_tokenizer()
context = "# language: Python\ndef add(a, b):\n    \"\"\"Return the sum of a and b.\"\"\"\n"

generated_text = generate_samples_eval(
    model,
    context=context,
    max_gen_length=64,
    eos_token_id=tokenizer.eod,
)
print("Generated:", generated_text)
# Expected output:
#     return a + b
```
```

--------------------------------

### CodeGeeXModel PyTorch Class for Inference

Source: https://context7.com/thudm/codegeex/llms.txt

Instantiate and use the `CodeGeeXModel` for inference. This involves initializing Megatron-LM, loading a checkpoint, preparing tokenized input, and generating logits.

```python
import torch
from codegeex.megatron.model import CodeGeeXModel
from codegeex.megatron import initialize_megatron, get_tokenizer

# Initialize Megatron-LM runtime (sets up distributed, tokenizer, args)
initialize_megatron(args_defaults={"tokenizer_type": "GPT2BPETokenizer"})

# Instantiate model for inference (no distributed output splitting)
model = CodeGeeXModel(num_tokentypes=0, parallel_output=False)
model = model.half().cuda()

# Load checkpoint
state_dict = torch.load("mp_rank_00_model_states.pt", map_location="cpu")
if "module" in state_dict:
    state_dict = state_dict["module"]
model.load_state_dict(state_dict)
model.eval()

# Prepare a tokenized prompt
tokenizer = get_tokenizer()
prompt = "# language: Python
def fibonacci(n):
    """Return the nth Fibonacci number."""
"
tokens = tokenizer.tokenize(prompt)  # list of int token IDs

# Build a minimal batch tensor
input_ids = torch.cuda.LongTensor([tokens])           # (1, seq_len)
position_ids = torch.arange(len(tokens)).unsqueeze(0).cuda()  # (1, seq_len)
attention_mask = torch.tril(torch.ones(1, 1, len(tokens), len(tokens), dtype=torch.bool)).cuda()

with torch.no_grad():
    logits = model(input_ids, position_ids, attention_mask)
    # logits shape: (1, seq_len, vocab_size)
    next_token_id = logits[0, -1, :].argmax().item()
    print("Next token:", tokenizer.detokenize([next_token_id]))
```

--------------------------------

### REST API for Multilingual Code Generation

Source: https://context7.com/thudm/codegeex/llms.txt

Integrate with the Tianqi platform's REST API for code generation. Authenticate using an API key and secret. The API returns multiple completions for a given prompt.

```python
import json
import requests

API_KEY    = "YOUR_API_KEY"
API_SECRET = "YOUR_API_SECRET"

# Code generation endpoint
url = "https://tianqi.aminer.cn/api/v2/multilingual_code_generate"
headers = {"Content-Type": "application/json"}

payload = {
    "apikey":    API_KEY,
    "apisecret": API_SECRET,
    "prompt": (
        "from typing import List\n\n" 
        "def has_close_elements(numbers: List[float], threshold: float) -> bool:\n" 
        "    "" Check if any two numbers in the list are closer than threshold.\n" 
        "    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n" 
        "    False\n" 
        "    ""\n"
    ),
    "n":    3,      # number of completions to return
    "lang": "Python",
}

response = requests.post(url, headers=headers, data=json.dumps(payload))
response.raise_for_status()
result = response.json()
# result format: {"result": [{"code": "    ..."}, {"code": "    ..."}, ...]}}
for i, item in enumerate(result.get("result", [])):
    print(f"--- Completion {i+1} ---\n{item['code']}\n")
```

--------------------------------

### Parallel Nucleus Sampling Generation

Source: https://context7.com/thudm/codegeex/llms.txt

Generate multiple independent completions in parallel using nucleus (top-p) sampling with `generate_nuclear_sampling`. It returns a list of `Handle` objects, each containing a token sequence and its log-probability score. Ensure `get_args` and `get_tokenizer` are imported.

```Python
from codegeex.megatron.code_generation_utils import generate_nuclear_sampling, Handle
from codegeex.megatron import get_args, get_tokenizer

args = get_args()
args.seq_length = 2048
args.out_seq_length = 128

tokenizer = get_tokenizer()
prompt = "# language: Python
def merge_sorted_lists(a, b):
    """Merge two sorted lists."""
"
context_tokens = tokenizer.tokenize(prompt)
n_prompt = len(context_tokens)

handles = generate_nuclear_sampling(
    model,
    context_tokens=context_tokens,
    num_samples=10,
    temperature=0.8,
    top_p=0.95,
    top_k=0,
)

finished = [h for h in handles if h.is_finished()]
print(f"Finished: {len(finished)}/{len(handles)}")
for h in finished[:3]:
    code = tokenizer.detokenize(h.tokens[n_prompt:])
    print(f"Score: {h.score:.3f}\n{code}\n")
```

--------------------------------

### REST API — `multilingual_code_generate`

Source: https://context7.com/thudm/codegeex/llms.txt

The Tianqi platform exposes a JSON REST endpoint for code generation. Authenticate with an API key/secret obtained from tianqi.aminer.cn.

```APIDOC
## REST API — `multilingual_code_generate`

The Tianqi platform exposes a JSON REST endpoint for code generation. Authenticate with an API key/secret obtained from [tianqi.aminer.cn](https://tianqi.aminer.cn/open/).

```python
import json
import requests

API_KEY    = "YOUR_API_KEY"
API_SECRET = "YOUR_API_SECRET"

# Code generation endpoint
url = "https://tianqi.aminer.cn/api/v2/multilingual_code_generate"
headers = {"Content-Type": "application/json"}

payload = {
    "apikey":    API_KEY,
    "apisecret": API_SECRET,
    "prompt": (
        "from typing import List\n\n"
        "def has_close_elements(numbers: List[float], threshold: float) -> bool:\n"
        "    \"\"\" Check if any two numbers in the list are closer than threshold.\n"
        "    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\"\n"
        "    False\n"
        "    \"\"\"
    ),
    "n":    3,      # number of completions to return
    "lang": "Python",
}

response = requests.post(url, headers=headers, data=json.dumps(payload))
response.raise_for_status()
result = response.json()
# result format: {"result": [{"code": "    ..."}, {"code": "    ..."}, ...]}}
for i, item in enumerate(result.get("result", [])):
    print(f"--- Completion {i+1} ---\n{item['code']}\n")
```
```

--------------------------------

### Stream and Write JSONL Datasets

Source: https://context7.com/thudm/codegeex/llms.txt

Utilize `stream_jsonl` for lazy reading of `.jsonl` or `.jsonl.gz` files and `write_jsonl` for writing iterables of dictionaries. These are essential for handling HumanEval-X datasets and generation outputs.

```python
from codegeex.data.data_utils import stream_jsonl, write_jsonl

# Read a gzip-compressed HumanEval-X dataset
data_file = "codegeex/benchmark/humaneval-x/python/data/humaneval_python.jsonl.gz"
problems = {task["task_id"]:  task for task in stream_jsonl(data_file)}

# Inspect a sample
sample = problems["Python/0"]
print("task_id:", sample["task_id"])
print("prompt:\n", sample["prompt"])
print("canonical_solution:\n", sample["canonical_solution"])
# task_id: Python/0
# prompt: from typing import List\ndef has_close_elements(...) -> bool:\n    ...
# canonical_solution:     for idx, elem in enumerate(numbers): ...

# Write generated completions to a gzip output file
generations = [
    {"task_id": "Python/0", "generation": "    for i in range(len(numbers)):\n        ..."},
    {"task_id": "Python/1", "generation": "    return numbers[0] if len(numbers) == 1 else ..."},
]
write_jsonl("my_generations.jsonl.gz", generations)
```

--------------------------------

### Python Docstring Generation Template

Source: https://github.com/thudm/codegeex/blob/main/vscode-extension/README.md

This template can be used in prompt mode for generating Python docstrings. The `<INPUT>` tag is where the selected code will be inserted.

```python
def add_binary(a, b):
    '''
    Returns the sum of two decimal numbers in binary digits.

    Parameters:
            a (int): A decimal integer
            b (int): Another decimal integer

    Returns:
            binary_sum (str): Binary string of the sum of a and b
    '''.
    binary_sum = bin(a+b)[2:]
    return binary_sum

<INPUT>
```

--------------------------------

### Extract Model Weights

Source: https://github.com/thudm/codegeex/blob/main/README.md

Concatenate split model weight files and then extract the tar archive to obtain the full model weights.

```bash
cat codegeex_13b.tar.gz.* > codegeex_13b.tar.gz
tar xvf codegeex_13b.tar.gz
```

--------------------------------

### sliding_window

Source: https://context7.com/thudm/codegeex/llms.txt

Generates overlapping (prompt_tokens, code_tokens) pairs from a long source file for pre-training data preparation. Ensures each window fits within a specified sequence length.

```APIDOC
## sliding_window

### Description
Generates overlapping (prompt_tokens, code_tokens) pairs from a long source file for pre-training data preparation. Ensures each window fits within a specified sequence length.

### Parameters
- **prompt_tokens** (list of integers) - Required - Tokens representing the prompt.
- **code_tokens** (list of integers) - Required - Tokens representing the code.
- **seq_len** (integer) - Required - The maximum sequence length for each window.
- **sliding_stride** (integer) - Required - The stride for sliding the window.
- **minimum_code_len** (integer) - Required - The minimum length of code tokens required in a window.
```

--------------------------------

### Run Inference on Multiple GPUs

Source: https://github.com/thudm/codegeex/blob/main/README.md

Execute inference across multiple GPUs using a pre-converted checkpoint. Specify the model parallelism size and the path to the prompt file.

```bash
bash ./scripts/test_inference_parallel.sh <MP_SIZE> ./tests/test_prompt.txt
```

--------------------------------

### read_translation_dataset — Cross-lingual Translation Prompt Builder

Source: https://context7.com/thudm/codegeex/llms.txt

read_translation_dataset loads a source and target language HumanEval-X dataset and constructs the "code translation" prompt format: source language declaration + solution, followed by the target language declaration.

```APIDOC
## `read_translation_dataset` — Cross-lingual Translation Prompt Builder

`read_translation_dataset` loads a source and target language HumanEval-X dataset and constructs the "code translation" prompt format: source language declaration + solution, followed by the target language declaration.

```python
from codegeex.benchmark.utils import read_translation_dataset

dataset = read_translation_dataset(
    data_file_src="codegeex/benchmark/humaneval-x/python/data/humaneval_python.jsonl.gz",
    data_file_tgt="codegeex/benchmark/humaneval-x/cpp/data/humaneval_cpp.jsonl.gz",
    lang_src="python",
    lang_tgt="cpp",
    dataset_type="humaneval",
)

sample = list(dataset.values())[0]
print(sample["prompt"])
# code translation
# Python:
# def has_close_elements(numbers: List[float], threshold: float) -> bool:
#     for idx, elem in enumerate(numbers):
#         ...
# C++:
# bool has_close_elements(vector<float> numbers, float threshold) {
```
```

--------------------------------

### generate_nuclear_sampling

Source: https://context7.com/thudm/codegeex/llms.txt

`generate_nuclear_sampling` generates `num_samples` independent completions in parallel using nucleus (top-p) sampling, returning a list of `Handle` objects each containing a token sequence and log-probability score.

```APIDOC
## `generate_nuclear_sampling` — Parallel Nucleus Sampling

`generate_nuclear_sampling` generates `num_samples` independent completions in parallel using nucleus (top-p) sampling, returning a list of `Handle` objects each containing a token sequence and log-probability score.

```python
from codegeex.megatron.code_generation_utils import generate_nuclear_sampling, Handle
from codegeex.megatron import get_args, get_tokenizer

args = get_args()
args.seq_length = 2048
args.out_seq_length = 128

tokenizer = get_tokenizer()
prompt = "# language: Python\ndef merge_sorted_lists(a, b):\n    """Merge two sorted lists."""\n"
context_tokens = tokenizer.tokenize(prompt)
n_prompt = len(context_tokens)

handles = generate_nuclear_sampling(
    model,
    context_tokens=context_tokens,
    num_samples=10,
    temperature=0.8,
    top_p=0.95,
    top_k=0,
)

finished = [h for h in handles if h.is_finished()]
print(f"Finished: {len(finished)}/{len(handles)}")
for h in finished[:3]:
    code = tokenizer.detokenize(h.tokens[n_prompt:])
    print(f"Score: {h.score:.3f}\n{code}\n")
```
```

--------------------------------

### is_code_generation_finished / cleanup_code

Source: https://context7.com/thudm/codegeex/llms.txt

Detects natural completion boundaries in code generation and trims trailing content. `is_code_generation_finished` identifies the boundary, and `cleanup_code` performs the trimming.

```APIDOC
## is_code_generation_finished / cleanup_code

### Description
Detects natural completion boundaries in code generation and trims trailing content. `is_code_generation_finished` identifies the boundary, and `cleanup_code` performs the trimming.

### Parameters for `is_code_generation_finished`
- **code** (string) - Required - The code string to check.
- **language_type** (string) - Required - The programming language of the code (e.g., "python", "java").
- **dataset** (string) - Required - The dataset used for generation (e.g., "humaneval").

### Parameters for `cleanup_code`
- **code** (string) - Required - The code string to clean.
- **language_type** (string) - Required - The programming language of the code (e.g., "python", "java").
- **dataset** (string) - Required - The dataset used for generation (e.g., "humaneval").
```

--------------------------------

### stream_jsonl / write_jsonl — JSONL Dataset Utilities

Source: https://context7.com/thudm/codegeex/llms.txt

stream_jsonl lazily reads .jsonl or .jsonl.gz files as dictionaries, while write_jsonl writes an iterable of dicts to the same formats. These are the primary I/O primitives for HumanEval-X datasets and generation outputs.

```APIDOC
## `stream_jsonl` / `write_jsonl` — JSONL Dataset Utilities

`stream_jsonl` lazily reads `.jsonl` or `.jsonl.gz` files as dictionaries, while `write_jsonl` writes an iterable of dicts to the same formats. These are the primary I/O primitives for HumanEval-X datasets and generation outputs.

```python
from codegeex.data.data_utils import stream_jsonl, write_jsonl

# Read a gzip-compressed HumanEval-X dataset
data_file = "codegeex/benchmark/humaneval-x/python/data/humaneval_python.jsonl.gz"
problems = {task["task_id"]:	task for task in stream_jsonl(data_file)}

# Inspect a sample
sample = problems["Python/0"]
print("task_id:", sample["task_id"])
print("prompt:\n", sample["prompt"])
print("canonical_solution:\n", sample["canonical_solution"])
# task_id: Python/0
# prompt: from typing import List\ndef has_close_elements(...) -> bool:\n    ...
# canonical_solution:     for idx, elem in enumerate(numbers): ...

# Write generated completions to a gzip output file
generations = [
    {"task_id": "Python/0", "generation": "    for i in range(len(numbers)):\n        ..."},
    {"task_id": "Python/1", "generation": "    return numbers[0] if len(numbers) == 1 else ..."},
]
write_jsonl("my_generations.jsonl.gz", generations)
```
```

--------------------------------

### Beam Search Decoding

Source: https://context7.com/thudm/codegeex/llms.txt

Implement greedy beam search decoding using `beam_search` to find top-scoring sequences. This function requires the model, context tokens, and the number of beams. Note that it does not support model parallelism. Ensure `get_args` and `get_tokenizer` are imported.

```Python
from codegeex.megatron.code_generation_utils import beam_search, Beam
from codegeex.megatron import get_args, get_tokenizer

args = get_args()
args.seq_length = 2048
args.out_seq_length = 256
args.beam_warmup = False
args.evaluation = True   # use is_code_generation_finished() as stop criterion

tokenizer = get_tokenizer()
prompt = "// language: Java
public static int factorial(int n) {
    // Return n!
"
context_tokens = tokenizer.tokenize(prompt)

beams = beam_search(model, context_tokens=context_tokens, num_beams=5)

for rank, beam in enumerate(beams):
    code = tokenizer.detokenize(beam.tokens[len(context_tokens):])
    print(f"Beam {rank} | score={beam.score:.4f}\n{code}\n")
    # Example output:
    # Beam 0 | score=-3.2145
    #     if (n <= 1) return 1;
    #     return n * factorial(n - 1);
    # }
```

--------------------------------

### Code Generation Stopping and Cleanup

Source: https://context7.com/thudm/codegeex/llms.txt

Detects natural completion boundaries in model output and trims trailing content. `is_code_generation_finished` checks for completion, and `cleanup_code` performs the trimming. Supports Python and Java.

```python
from codegeex.benchmark.utils import is_code_generation_finished, cleanup_code

# --- Python: stop at a new top-level statement ---
code_py = "    result = []\n    for x in nums:\n        result.append(x)\n    return result\n\ndef helper():\n    pass"
print(is_code_generation_finished(code_py, language_type="python", dataset="humaneval"))
# True  (new top-level def detected)
cleaned_py = cleanup_code(code_py, language_type="python", dataset="humaneval")
print(cleaned_py)
# "    result = []\n    for x in nums:\n        result.append(x)\n    return result"

# --- Java: stop when braces balance (one extra closing brace) ---
code_java = "    return n <= 1 ? n : fib(n-1) + fib(n-2);\n}"
print(is_code_generation_finished(code_java, language_type="java", dataset="humaneval"))
# True  (opens=0 + 1 == closes=1)
cleaned_java = cleanup_code(code_java, language_type="java", dataset="humaneval")
print(cleaned_java)
# "    return n <= 1 ? n : fib(n-1) + fib(n-2);\n}"
```

--------------------------------

### Sampling Filter for Logits

Source: https://context7.com/thudm/codegeex/llms.txt

Use `top_k_logits` to filter logits in-place, retaining only top-k tokens or those within a cumulative top-p probability mass. This is applied before softmax sampling. Ensure PyTorch and `torch.nn.functional` are imported.

```Python
import torch
import torch.nn.functional as F
from codegeex.megatron.code_generation_utils import top_k_logits

# Simulated logits for a vocabulary of 50400 tokens
logits = torch.randn(1, 50400).cuda()

# Apply top-k=50, top-p=0.9 filtering
filtered = top_k_logits(logits.clone(), top_k=50, top_p=0.9)

# Sample from filtered distribution
probs = F.softmax(filtered, dim=-1)
next_token = torch.multinomial(probs, num_samples=1)
print("Sampled token id:", next_token.item())

# Pure top-p (nucleus) sampling with no top-k cap
filtered_p = top_k_logits(logits.clone(), top_k=0, top_p=0.95)
probs_p = F.softmax(filtered_p, dim=-1)
print("Non-zero logits remaining:", (filtered_p > -1e9).sum().item())
```

--------------------------------

### beam_search

Source: https://context7.com/thudm/codegeex/llms.txt

`beam_search` implements a greedy beam search over the model's token distribution, returning the top-scoring complete or partial beam sequences. Note: does not support model parallelism.

```APIDOC
## `beam_search` — Beam Search Decoding

`beam_search` implements a greedy beam search over the model's token distribution, returning the top-scoring complete or partial beam sequences. Note: does not support model parallelism.

```python
from codegeex.megatron.code_generation_utils import beam_search, Beam
from codegeex.megatron import get_args, get_tokenizer

args = get_args()
args.seq_length = 2048
args.out_seq_length = 256
args.beam_warmup = False
args.evaluation = True   # use is_code_generation_finished() as stop criterion

tokenizer = get_tokenizer()
prompt = "// language: Java\npublic static int factorial(int n) {\n    // Return n!"
context_tokens = tokenizer.tokenize(prompt)

beams = beam_search(model, context_tokens=context_tokens, num_beams=5)

for rank, beam in enumerate(beams):
    code = tokenizer.detokenize(beam.tokens[len(context_tokens):])
    print(f"Beam {rank} | score={beam.score:.4f}\n{code}\n")
    # Example output:
    # Beam 0 | score=-3.2145
    #     if (n <= 1) return 1;
    #     return n * factorial(n - 1);
    # }
```
```

--------------------------------

### evaluate_functional_correctness — HumanEval-X Evaluation Pipeline

Source: https://context7.com/thudm/codegeex/llms.txt

evaluate_functional_correctness runs the full evaluation pipeline: it loads a JSONL file of model generations, executes each against the language-specific test suite in parallel, and computes the unbiased pass@k metric for k ∈ {1, 10, 100}.

```APIDOC
## `evaluate_functional_correctness` — HumanEval-X Evaluation Pipeline

`evaluate_functional_correctness` runs the full evaluation pipeline: it loads a JSONL file of model generations, executes each against the language-specific test suite in parallel, and computes the unbiased pass@k metric for k ∈ {1, 10, 100}.

```python
from codegeex.benchmark.evaluate_humaneval_x import evaluate_functional_correctness
```
```

--------------------------------

### Evaluate Functional Correctness

Source: https://context7.com/thudm/codegeex/llms.txt

Execute the full evaluation pipeline for HumanEval-X datasets. This function loads model generations, runs tests in parallel, and computes pass@k metrics.

```python
from codegeex.benchmark.evaluate_humaneval_x import evaluate_functional_correctness

```

--------------------------------

### top_k_logits

Source: https://context7.com/thudm/codegeex/llms.txt

`top_k_logits` filters a logit tensor in-place to keep only the top-k tokens and/or tokens within a cumulative top-p probability mass, setting all other logits to `-inf` before softmax sampling.

```APIDOC
## `top_k_logits` — Sampling Filter

`top_k_logits` filters a logit tensor in-place to keep only the top-k tokens and/or tokens within a cumulative top-p probability mass, setting all other logits to `-inf` before softmax sampling.

```python
import torch
import torch.nn.functional as F
from codegeex.megatron.code_generation_utils import top_k_logits

# Simulated logits for a vocabulary of 50400 tokens
logits = torch.randn(1, 50400).cuda()

# Apply top-k=50, top-p=0.9 filtering
filtered = top_k_logits(logits.clone(), top_k=50, top_p=0.9)

# Sample from filtered distribution
probs = F.softmax(filtered, dim=-1)
next_token = torch.multinomial(probs, num_samples=1)
print("Sampled token id:", next_token.item())

# Pure top-p (nucleus) sampling with no top-k cap
filtered_p = top_k_logits(logits.clone(), top_k=0, top_p=0.95)
probs_p = F.softmax(filtered_p, dim=-1)
print("Non-zero logits remaining:", (filtered_p > -1e9).sum().item())
```
```

--------------------------------

### estimate_pass_at_k

Source: https://context7.com/thudm/codegeex/llms.txt

Computes the unbiased pass@k estimator from Codex. Given n total samples and c correct samples per problem, it estimates the probability that at least one of k random draws is correct.

```APIDOC
## estimate_pass_at_k

### Description
Computes the unbiased pass@k estimator from Codex. Given n total samples and c correct samples per problem, it estimates the probability that at least one of k random draws is correct.

### Parameters
- **num_samples** (integer) - Required - Total number of samples per problem.
- **num_correct** (numpy.array) - Required - Array of correct counts per problem.
- **k** (integer) - Required - The value of k for the pass@k metric.
```

--------------------------------

### Evaluate Python Generations

Source: https://context7.com/thudm/codegeex/llms.txt

Evaluates the functional correctness of Python code generations against a problem set. Specify input generation files, problem files, and output directories. Can be invoked from bash.

```python
evaluate_functional_correctness(
    input_file="my_generations.jsonl",         # {"task_id": "Python/0", "generation": "..."}
    problem_file="codegeex/benchmark/humaneval-x/python/data/humaneval_python.jsonl.gz",
    tmp_dir="./tmp_eval/",
    n_workers=16,
    timeout=10.0,
    out_dir="./results/",
    k=[1, 10, 100],
    example_test=False,                        # True = use public test cases only
)
```

```bash
bash scripts/evaluate_humaneval_x.sh my_generations.jsonl python 16
```

--------------------------------

### get_token_stream

Source: https://context7.com/thudm/codegeex/llms.txt

`get_token_stream` is the central generation function that returns a Python generator yielding successive token tensors as the model decodes. Supports greedy, top-k, top-p (nucleus), temperature sampling, and beam search via runtime args.

```APIDOC
## `get_token_stream` — Autoregressive Generation Iterator

`get_token_stream` is the central generation function that returns a Python generator yielding successive token tensors as the model decodes. Supports greedy, top-k, top-p (nucleus), temperature sampling, and beam search via runtime args.

```python
import copy
import torch
from codegeex.megatron import get_args, get_tokenizer
from codegeex.megatron.code_generation_utils import get_token_stream

tokenizer = get_tokenizer()
args = get_args()
args.out_seq_length = 128   # max new tokens to generate
args.temperature = 0.8
args.top_k = 0
args.top_p = 0.95
args.greedy = False
args.beam_search = False

prompt = "# language: Python\ndef is_prime(n):\n    """Check if n is a prime number."""\n"
context_tokens = tokenizer.tokenize(prompt)
n_prompt = len(context_tokens)

# Batched generation: micro_batch_size copies of the same prompt
micro_batch_size = 4
token_stream = get_token_stream(
    model,
    [copy.deepcopy(context_tokens) for _ in range(micro_batch_size)],
    return_scores=True,
    prompt_length=n_prompt,
    micro_batch_size=micro_batch_size,
    bad_ids=None,          # token IDs to suppress, e.g. [tokenizer.eod]
    temperature=0.8,
    topp=0.95,
    topk=0,
)

# Consume the stream; last iteration holds the full generated sequences
for generated_tokens, (lengths, scores) in token_stream:
    pass

# Decode each sample
for i in range(micro_batch_size):
    toks = generated_tokens[i].cpu().numpy().tolist()
    code = tokenizer.detokenize(toks[n_prompt:])
    print(f"--- Sample {i} (score={scores[i]:.3f}) ---\n{code}\n")
```
```

--------------------------------

### Autoregressive Generation Iterator

Source: https://context7.com/thudm/codegeex/llms.txt

Use `get_token_stream` to generate successive token tensors for autoregressive decoding. It supports various sampling methods like greedy, top-k, top-p, and beam search, configurable via runtime arguments. Ensure necessary imports and tokenizer/arguments are initialized.

```Python
import copy
import torch
from codegeex.megatron import get_args, get_tokenizer
from codegeex.megatron.code_generation_utils import get_token_stream

tokenizer = get_tokenizer()
args = get_args()
args.out_seq_length = 128   # max new tokens to generate
args.temperature = 0.8
args.top_k = 0
args.top_p = 0.95
args.greedy = False
args.beam_search = False

prompt = "# language: Python
def is_prime(n):
    """Check if n is a prime number."""
"
context_tokens = tokenizer.tokenize(prompt)
n_prompt = len(context_tokens)

# Batched generation: micro_batch_size copies of the same prompt
micro_batch_size = 4
token_stream = get_token_stream(
    model,
    [copy.deepcopy(context_tokens) for _ in range(micro_batch_size)],
    return_scores=True,
    prompt_length=n_prompt,
    micro_batch_size=micro_batch_size,
    bad_ids=None,          # token IDs to suppress, e.g. [tokenizer.eod]
    temperature=0.8,
    topp=0.95,
    topk=0,
)

# Consume the stream; last iteration holds the full generated sequences
for generated_tokens, (lengths, scores) in token_stream:
    pass

# Decode each sample
for i in range(micro_batch_size):
    toks = generated_tokens[i].cpu().numpy().tolist()
    code = tokenizer.detokenize(toks[n_prompt:])
    print(f"--- Sample {i} (score={scores[i]:.3f}) ---\n{code}\n")
```

--------------------------------

### evaluate_functional_correctness

Source: https://context7.com/thudm/codegeex/llms.txt

Evaluates the functional correctness of generated Python code against a set of problems. It takes an input file of generations and a problem file, then outputs results to a specified directory.

```APIDOC
## evaluate_functional_correctness

### Description
Evaluates the functional correctness of generated Python code against a set of problems. It takes an input file of generations and a problem file, then outputs results to a specified directory.

### Parameters
- **input_file** (string) - Required - Path to the JSONL file containing code generations.
- **problem_file** (string) - Required - Path to the JSONL.gz file containing the problems.
- **tmp_dir** (string) - Optional - Directory for temporary files during evaluation.
- **n_workers** (integer) - Optional - Number of worker processes to use for evaluation.
- **timeout** (float) - Optional - Timeout in seconds for each test case.
- **out_dir** (string) - Optional - Directory to save the evaluation results.
- **k** (list of integers) - Optional - List of k values for pass@k calculation.
- **example_test** (boolean) - Optional - If True, use only public test cases.
```

--------------------------------

### Estimate Pass@k Metric

Source: https://context7.com/thudm/codegeex/llms.txt

Computes the unbiased pass@k estimator. Use this to estimate the probability that at least one of k random draws is correct, given total samples and per-problem correct counts. Requires numpy.

```python
import numpy as np
from codegeex.benchmark.metric import estimate_pass_at_k

# Example: 200 samples per problem, with varying correct counts
num_samples = 200
num_correct = np.array([20, 50, 100, 180, 0, 200])  # per-problem correct counts

for k in [1, 10, 100]:
    scores = estimate_pass_at_k(num_samples, num_correct, k)
    print(f"pass@{k}: {scores.mean():.4f}  (per-problem: {np.round(scores, 3)})")
```

--------------------------------

### Check for Close Elements in Array (Java)

Source: https://github.com/thudm/codegeex/blob/main/tests/test_prompt.txt

Checks if any two elements in an integer array are within a specified threshold of each other. This is useful for proximity checks.

```Java
public class Solution {
    public static boolean hasCloseElements(int[] nums, int threshold) {
        for (int i = 0; i < nums.length - 1; i++) {
            for (int j = i + 1; j < nums.length; j++) {
                if (Math.abs(nums[i] - nums[j]) < threshold) {
                    return true;
                }
            }
        }
        return false;
    }
}
```

=== COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.