### Plugin Supports Format Combination Example

Source: https://docs.nvidia.com/deeplearning/tensorrt-rtx/latest/_static/cpp-api/classnvinfer1_1_1_i_plugin_v2_dynamic_ext.html

This example demonstrates how to define a plugin that supports only FP16 NCHW format and datatype. It checks if the current input/output format and type match the requirements.

```cpp
return inOut[pos].format == TensorFormat::kLINEAR && inOut[pos].type == DataType::kHALF;
```

--------------------------------

### Deconvolution Example with Input, Output, and Expected Values - Python

Source: https://docs.nvidia.com/deeplearning/tensorrt-rtx/latest/_static/operators/Deconvolution.html

This example demonstrates setting up input data, determining output shape, and defining expected output values for a deconvolution layer. It's useful for testing and verifying deconvolution operations.

```python
inputs[in1.name] = np.array([[[[-3.0, -2.0, -1.0], [0.0, 1.0, 2.0], [2.0, 5.0, 6.0]]]])

outputs[layer.get_output(0).name] = layer.get_output(0).shape

expected[layer.get_output(0).name] = np.array(
    [
        [
            [
                [-3.0, -5.0, -6.0, -3.0, -1.0],
                [-3.0, -4.0, -3.0, 0.0, 1.0],
                [-1.0, 3.0, 10.0, 11.0, 7.0],
                [2.0, 8.0, 16.0, 14.0, 8.0],
                [2.0, 7.0, 13.0, 11.0, 6.0],
            ]
        ]
    ]
)
```

--------------------------------

### Activation Operator Example

Source: https://docs.nvidia.com/deeplearning/tensorrt-rtx/latest/_static/operators/Activation.html

This example demonstrates how to use the add_activation method to apply a RELU activation function to an input tensor.

```APIDOC
## Activation Operator

### Description
Applies an activation function to an input tensor.

### Method
`network.add_activation(input_tensor, type)`

### Parameters
*   **input_tensor** (tensor) - The input tensor to apply the activation function to.
*   **type** (ActivationType) - The type of activation function to apply. Supported types include `RELU`, `SIGMOID`, `TANH`, `LEAKY_RELU`, `ELU`, `SELU`, `SOFTSIGN`, `SOFTPLUS`, `CLIP`, `HARD_SIGMOID`, `SCALED_TANH`, `THRESHOLDED_RELU`.
*   **alpha** (float, optional) - Parameter used for activation functions like `LEAKY_RELU`, `ELU`, `SELU`, `SOFTPLUS`, `CLIP`, `HARD_SIGMOID`, `SCALED_TANH`, `THRESHOLDED_RELU`.
*   **beta** (float, optional) - Parameter used for activation functions like `SELU`, `SOFTPLUS`, `CLIP`, `HARD_SIGMOID`, `SCALED_TANH`.

### Inputs
*   **input** (tensor of type T) - The input tensor.

### Outputs
*   **output** (tensor of type T) - The output tensor with the activation function applied.

### Data Types
*   **T**: `float16`, `float32`, `bfloat16`, `int32`, `int64` (Note: `int32` and `int64` are supported only for `RELU`)

### Shape Information
The output tensor has the same shape as the input tensor.

### Example
```python
in1 = network.add_input("input1", dtype=trt.float32, shape=(2, 3))
layer = network.add_activation(in1, type=trt.ActivationType.RELU)
network.mark_output(layer.get_output(0))

# Example usage with numpy for input data
inputs[in1.name] = np.array([[-3.0, -2.0, -1.0], [0.0, 1.0, 2.0]])

# Storing output shape
outputs[layer.get_output(0).name] = layer.get_output(0).shape

# Expected output for RELU activation
expected[layer.get_output(0).name] = np.array([[0.0, 0.0, 0.0], [0.0, 1.0, 2.0]])
```
```

--------------------------------

### Block Quantize and Dequantize Example

Source: https://docs.nvidia.com/deeplearning/tensorrt-rtx/latest/_static/operators/Quantize.html

This example illustrates block quantization and dequantization, useful for specific quantization schemes where blocks of data are quantized together.

```APIDOC
## Block Quantize and Dequantize Example

### Description
This example demonstrates block quantization and dequantization.

### Method
`network.add_quantize()` and `network.add_dequantize()` with block shape specified.

### Parameters
- **input**: Input tensor.
- **scale**: Quantization scale tensor.
- **toType**: The DataType of the output tensor (e.g., `trt.int4`).
- **block_shape**: The shape of the quantization block.

### Request Example
```python
weights = network.add_constant(shape=(4, 8), weights=np.array([
                                                                [1.0, 1.0, 2.0, 2.0, 3.0, 3.0, 4.0, 4.0],
                                                                [1.1, 1.2, 2.1, 2.2, 3.1, 3.2, 4.1, 4.2],
                                                                [4.0, 4.0, 5.0, 5.0, 6.0, 6.0, 7.0, 7.0],
                                                                [4.1, 4.2, 5.1, 5.2, 6.1, 6.2, 7.1, 7.2],
                                                               ], dtype=np.float32))
scale = network.add_constant(shape=(2, 8), weights=np.array([
                                                            [1, 1, 2, 2, 3, 3, 4, 4],
                                                            [4, 4, 5, 5, 6, 6, 7, 7]
                                                          ], dtype=np.float32))
quantize = network.add_quantize(weights.get_output(0), scale.get_output(0), trt.int4)
dequantize = network.add_dequantize(quantize.get_output(0), scale.get_output(0), trt.float32)
network.mark_output(dequantize.get_output(0))

outputs[dequantize.get_output(0).name] = dequantize.get_output(0).shape
expected[dequantize.get_output(0).name] = np.array(
    [
        [
            [1, 1, 2, 2, 3, 3, 4, 4],
            [1, 1, 2, 2, 3, 3, 4, 4],
            [4, 4, 5, 5, 6, 6, 7, 7],
            [4, 4, 5, 5, 6, 6, 7, 7],
        ]
    ]
)
```
```

--------------------------------

### Plugin Supports Conditional Format Combination Example

Source: https://docs.nvidia.com/deeplearning/tensorrt-rtx/latest/_static/cpp-api/classnvinfer1_1_1_i_plugin_v2_dynamic_ext.html

This example shows a plugin that supports FP16 NCHW for its first two inputs and FP32 NCHW for its single output. The support is conditional based on the input/output position.

```cpp
return inOut[pos].format == TensorFormat::kLINEAR && (inOut[pos].type == (pos < 2 ? DataType::kHALF :
  DataType::kFLOAT));
```

--------------------------------

### Quantize and Dequantize Example

Source: https://docs.nvidia.com/deeplearning/tensorrt-rtx/latest/_static/operators/Quantize.html

This example demonstrates how to quantize a tensor and then dequantize it back to its original floating-point type using the Quantize and Dequantize operators.

```APIDOC
## Quantize and Dequantize Example

### Description
This example demonstrates quantizing a tensor and then dequantizing it.

### Method
`network.add_quantize()` and `network.add_dequantize()`

### Parameters
- **input**: Input tensor.
- **scale**: Quantization scale tensor.
- **axis**: The axis to perform quantization on (optional).
- **toType**: The DataType of the output tensor (optional, defaults to `int8`).

### Request Example
```python
in1 = network.add_input("input1", dtype=trt.float32, shape=(1, 1, 3, 3))
scale = network.add_constant(shape=(1,), weights=np.array([1 / 127], dtype=np.float32))
quantize = network.add_quantize(in1, scale.get_output(0))
quantize.axis = 3
dequantize = network.add_dequantize(quantize.get_output(0), scale.get_output(0))
dequantize.axis = 3
network.mark_output(dequantize.get_output(0))

inputs[in1.name] = np.array(
    [
        [
            [0.56, 0.89, 1.4],
            [-0.56, 0.39, 6.0],
            [0.67, 0.11, -3.6],
        ]
    ]
)

outputs[dequantize.get_output(0).name] = dequantize.get_output(0).shape
expected[dequantize.get_output(0).name] = np.array(
    [
        [
            [0.56, 0.89, 1],
            [-0.56, 0.39, 1.0],
            [0.67, 0.11, -1.0],
        ]
    ]
)
```
```

--------------------------------

### Polymorphic Plugin Format Combination Example

Source: https://docs.nvidia.com/deeplearning/tensorrt-rtx/latest/_static/cpp-api/classnvinfer1_1_1_i_plugin_v2_dynamic_ext.html

This example defines a 'polymorphic' plugin with two inputs and one output. It supports any format or type, but requires that all inputs and the output share the same format and type as the first input.

```cpp
return pos == 0 || (inOut[pos].format == inOut.format[0] && inOut[pos].type == inOut[0].type);
```

--------------------------------

### Attention Operator Example

Source: https://docs.nvidia.com/deeplearning/tensorrt-rtx/latest/_static/operators/Attention.html

Example demonstrating how to use the Attention operator with various configurations, including input definitions, mask application, and output marking.

```APIDOC
## Attention Operator

### Description
Generates an attention mechanism. This operator supports various configurations through its attributes and inputs.

### Method
`network.add_attention()`

### Parameters
#### Attributes
- `normalizationOp` (enum): Specifies the normalization function to apply. Can be `NONE` or `SOFTMAX`.
- `causal` (boolean): Determines if the attention runs causal inference.
- `decomposable` (boolean): Determines if the attention can be decomposed into multiple kernels if a fused kernel is not found.
- `normalizationQuantizeToType` (enum, optional): Specifies the datatype for attention normalization quantization. Options include `DataType::kFP8` and `DataType::kINT8`.
- `nbRanks` (integer, default: 1): Specifies the number of ranks for multi-device attention execution.

#### Inputs
- **query** (tensor T1): The query tensor.
- **key** (tensor T1): The key tensor.
- **value** (tensor T1): The value tensor.
- **mask** (tensor T2, optional): An optional mask tensor. If boolean, `True` indicates allowed attention. If float, it's an add mask.
- **normalizationQuantizeScale** (tensor T1, optional): The quantization scale for the attention normalization output.

#### Outputs
- **outputs** (tensor T1): The output tensor of the attention operation.

### Data Types
- T1: `float32`, `float16`, `bfloat16`
- T2: `float32`, `float16`, `bfloat16`, `bool`. T2 must match T1 if not `bool`.

### Shape Information
- **query** and **outputs**: [b, dq, sq, h]
- **key** and **value**: [b, dkv, skv, h]
- **mask**: [a0, a1, sq, skv] where a0 and a1 are broadcastable to b and h.
- **normalizationQuantizeScale**: [a0,...,an], 0 <= n >= 1

### Example
```python
network = get_runner.builder.create_network(flags=1 << int(trt.NetworkDefinitionCreationFlag.STRONGLY_TYPED))
qkv_shape = (1, 8, 1, 16)
mask_shape = (1, 1, 1, 1)
query = network.add_input("query", dtype=trt.float16, shape=qkv_shape)
key = network.add_input("key", dtype=trt.float16, shape=qkv_shape)
value = network.add_input("value", dtype=trt.float16, shape=qkv_shape)
mask = network.add_input("mask", dtype=trt.bool, shape=mask_shape)
layer = network.add_attention(query, key, value, trt.AttentionNormalizationOp.SOFTMAX, False)
layer.mask = mask
network.mark_output(layer.get_output(0))

# Input data preparation and execution would follow here...
```

### C++ API Reference
For more information about the C++ IAttention operator, refer to the C++ IAttention documentation.
```

--------------------------------

### Example: Transpose Last Two Dimensions

Source: https://docs.nvidia.com/deeplearning/tensorrt-rtx/latest/_static/cpp-api/classnvinfer1_1_1_i_plugin_v2_dynamic_ext.html

This example demonstrates how to override getOutputDimensions for a plugin that transposes the last two dimensions of its single input.

```cpp
DimsExprs output(inputs[0]);
std::swap(output.d[output.nbDims-1], output.d[output.nbDims-2]);
return output;
```

--------------------------------

### Assertion Firing Example (Build Time)

Source: https://docs.nvidia.com/deeplearning/tensorrt-rtx/latest/_static/operators/Assertion.html

This example is designed to trigger a build-time error by creating a condition that the builder can prove is false. It uses inputs with different shapes to ensure inequality.

```python
# This test should fail during build stage
in1 = network.add_input("input1", dtype=trt.float32, shape=(3, 4, 4))
shape1 = network.add_shape(in1)
in2 = network.add_input("input2", dtype=trt.float32, shape=(3, 3, 4))
shape2 = network.add_shape(in2)
identity = network.add_identity(in1)
cond = network.add_elementwise(shape1.get_output(0), shape2.get_output(0), op=trt.ElementWiseOperation.EQUAL)
assertion = network.add_assertion(cond.get_output(0), message="Should fail")
network.mark_output(identity.get_output(0))

inputs[in1.name] = np.zeros(shape=(2, 4))
outputs[identity.get_output(0).name] = identity.get_output(0).shape
```

--------------------------------

### Reduce Operator Example (Keep Dims False)

Source: https://docs.nvidia.com/deeplearning/tensorrt-rtx/latest/_static/operators/Reduce.html

This example shows how to use the `add_reduce` function to perform a PROD reduction, removing the reduced dimensions by setting `keep_dims` to False.

```APIDOC
## Reduce Operator Example (Keep Dims False)

### Description
This example shows how to use the `add_reduce` function to perform a PROD reduction, removing the reduced dimensions by setting `keep_dims` to False.

### Method
`network.add_reduce(input_tensor, op, axes, keep_dims)`

### Parameters
* `input_tensor` (tensor): The input tensor to reduce.
* `op` (trt.ReduceOperation): The reduction operation to perform (e.g., `trt.ReduceOperation.PROD`).
* `axes` (int): A bitmask representing the axes to reduce.
* `keep_dims` (bool): If True, preserves the reduced dimensions with a size of 1. If False, removes the reduced dimensions.

### Request Example
```python
in1 = network.add_input("input1", dtype=trt.float32, shape=(1, 2, 2, 3))
layer = network.add_reduce(in1, op=trt.ReduceOperation.PROD, axes=6, keep_dims=False)
network.mark_output(layer.get_output(0))

inputs[in1.name] = np.array(
    [
        [
            [[-3.0, -2.0, -1.0], [0.0, 1.0, 2.0]],
            [[3.0, 4.0, 5.0], [6.0, 7.0, 8.0]],
        ]
    ]
)

outputs[layer.get_output(0).name] = layer.get_output(0).shape
expected[layer.get_output(0).name] = np.array([[0.0, -56.0, -80.0]])
```
```

--------------------------------

### Padding Operator Example

Source: https://docs.nvidia.com/deeplearning/tensorrt-rtx/latest/_static/operators/Padding.html

This example demonstrates how to use the add_padding_nd method to pad an input tensor. It shows the creation of the layer, marking the output, and provides expected input and output shapes and values.

```APIDOC
## Padding Operator

### Description
Pads with zeros (or trims) an input tensor along the two innermost dimensions and store the result in an output tensor.

### Attributes
- `pre_padding_nd` (int tuple) - The amount of pre-padding to use for each dimension. If positive, the tensor is pad with zeros, otherwise, it’s trimmed.
- `post_padding_nd` (int tuple) - The amount of post-padding to use for each dimension. If positive, the tensor is pad with zeros, otherwise, it’s trimmed.

### Inputs
- **input** (tensor of type `T`) - The input tensor to be padded or trimmed.

### Outputs
- **output** (tensor of type `T`) - The resulting tensor after padding or trimming.

### Data Types
- **T**: `int8`, `int32`, `float16`, `float32`

### Shape Information
- **input** is a tensor with a shape of [a0,...,an−1], n≥4
- **output** is a tensor with a shape of [b0,...,bn−1], where:
  pjpre = pre padding at spatial dimension j
  pjpost = post padding at spatial dimension j
  bi = ai, 0 ≤ i < n−2
  bi = ai + pjpre + pjpost, n−2 ≤ i < n, j = i − (n−2)

### Example
```python
in1 = network.add_input("input1", dtype=trt.float32, shape=(1, 1, 3, 5))
layer = network.add_padding_nd(in1, pre_padding=(-1, 3), post_padding=(3, -2))
network.mark_output(layer.get_output(0))

inputs[in1.name] = np.array(
    [[[[-3.0, -2.0, -1.0, 10.0, -25.0], [-4.0, -9.0, -1.0, 10.0, -25.0], [0.0, 1.0, 2.0, -2.0, -1.0]]]]
)

outputs[layer.get_output(0).name] = layer.get_output(0).shape
expected[layer.get_output(0).name] = np.array(
    [
        [
            [
                [0.0, 0.0, 0.0, -4.0, -9.0, -1.0],
                [0.0, 0.0, 0.0, 0.0, 1.0, 2.0],
                [0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
                [0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
                [0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
            ]
        ]
    ]
)
```
```

--------------------------------

### Assertion Not Firing Example

Source: https://docs.nvidia.com/deeplearning/tensorrt-rtx/latest/_static/operators/Assertion.html

This example demonstrates an assertion that is not expected to fail during build or runtime. It sets up inputs and operations to create a condition that remains true.

```python
in1 = network.add_input("input1", dtype=trt.float32, shape=(3, 4, 4))
shape = network.add_shape(in1)
identity = network.add_identity(in1)
cond = network.add_elementwise(shape.get_output(0), shape.get_output(0), op=trt.ElementWiseOperation.EQUAL)
assertion = network.add_assertion(cond.get_output(0), message="Shouldn't fail")
network.mark_output(identity.get_output(0))

inputs[in1.name] = np.zeros(shape=(2, 4))
outputs[identity.get_output(0).name] = identity.get_output(0).shape
```

--------------------------------

### 2D Block Dynamic Quantize Example

Source: https://docs.nvidia.com/deeplearning/tensorrt-rtx/latest/_static/operators/DynamicQuantize.html

This example demonstrates how to use the `add_dynamic_quantize_v2` operator for 2D block dynamic quantization. It quantizes FP32 input to FP8, then dequantizes it back to FP32 using per-block scales.

```APIDOC
## add_dynamic_quantize_v2

### Description
Adds a dynamic quantize layer to the network. This layer dynamically quantizes the input tensor to a specified format (e.g., FP8) and produces a scale tensor.

### Method
`network.add_dynamic_quantize_v2(input, block_shape, output_dtype, scale_dtype)`

### Parameters
#### Path Parameters
None

#### Query Parameters
None

#### Request Body
- **input** (ITensor) - The input tensor to be quantized.
- **block_shape** (Dims) - The shape of the quantization block.
- **output_dtype** (DataType) - The data type of the quantized output tensor.
- **scale_dtype** (DataType) - The data type of the scale tensor.

### Request Example
```python
# Assuming 'network' is a valid nvinfer.Network object and 'in1' is an ITensor
block_shape = trt.Dims([4, 3])
dynq = network.add_dynamic_quantize_v2(in1, block_shape, trt.fp8, trt.float32)
data_f8 = dynq.get_output(0)
scale_f32 = dynq.get_output(1)
```

### Response
#### Success Response (200)
Returns an object with two outputs:
- **Output 0**: The quantized tensor.
- **Output 1**: The scale tensor.

#### Response Example
```json
{
  "quantized_tensor": "ITensor",
  "scale_tensor": "ITensor"
}
```

## add_dequantize

### Description
Adds a dequantize layer to the network. This layer dequantizes a tensor from a specified format (e.g., FP8) to another format (e.g., FP32) using provided scales.

### Method
`network.add_dequantize(input, scale, output_dtype)`

### Parameters
#### Path Parameters
None

#### Query Parameters
None

#### Request Body
- **input** (ITensor) - The tensor to be dequantized.
- **scale** (ITensor) - The scale tensor used for dequantization.
- **output_dtype** (DataType) - The data type of the dequantized output tensor.

### Request Example
```python
# Assuming 'network' is a valid nvinfer.Network object, 'data_f8' and 'scale_f32' are ITensors
dequantize_data = network.add_dequantize(data_f8, scale_f32, trt.float32)
dequantize_data.block_shape = block_shape # Set block_shape if applicable
data_dq = dequantize_data.get_output(0)
```

### Response
#### Success Response (200)
Returns the dequantized tensor.

#### Response Example
```json
{
  "dequantized_tensor": "ITensor"
}
```

## mark_output

### Description
Marks a tensor as an output of the network.

### Method
`network.mark_output(tensor)`

### Parameters
#### Path Parameters
None

#### Query Parameters
None

#### Request Body
- **tensor** (ITensor) - The tensor to be marked as an output.

### Request Example
```python
# Assuming 'network' is a valid nvinfer.Network object and 'data_dq' is an ITensor
network.mark_output(data_dq)
```

### Response
None
```

--------------------------------

### RotaryEmbedding Operator Example

Source: https://docs.nvidia.com/deeplearning/tensorrt-rtx/latest/_static/operators/RotaryEmbedding.html

This snippet demonstrates how to create and configure a RotaryEmbedding layer within a TensorRT network. It shows the setup of inputs, caches, and position IDs, and how to mark the output. The reference implementation for computing the rotary embedding is also provided for clarity.

```python
get_runner.network = get_runner.builder.create_network(flags=1 << int(trt.NetworkDefinitionCreationFlag.STRONGLY_TYPED))
network = get_runner.network
input = network.add_input("input", dtype=trt.float32, shape=(2, 8, 4, 512))
cos_cache = network.add_input("cos_cache", dtype=trt.float32, shape=(100, 256))
sin_cache = network.add_input("sin_cache", dtype=trt.float32, shape=(100, 256))
position_ids = network.add_input("position_ids", dtype=trt.int64, shape=(2, 4))
layer = network.add_rotary_embedding(input=input, cos_cache=cos_cache, sin_cache=sin_cache, interleaved=False, rotary_embedding_dim=0)
layer.set_input(3, position_ids)
network.mark_output(layer.get_output(0))
```

```python
inputs[input.name] = np.random.rand(2, 8, 4, 512).astype("f")
inputs[cos_cache.name] = np.random.rand(100, 256).astype("f")
inputs[sin_cache.name] = np.random.rand(100, 256).astype("f")
inputs[position_ids.name] = np.array([[6, 2, 1, 7], [2, 8, 3, 6]])
```

```python
outputs[layer.get_output(0).name] = layer.get_output(0).shape
```

```python
# This is a reference implementation of the rotary embedding operator.
def compute_rotary_embedding(
    input,
    cos_cache,
    sin_cache,
    position_ids=None,
    interleaved=False,
    rotary_embedding_dim=0,
):
    # Shape of input: (batch_size, num_heads, seq_len, head_size)
    head_size = input.shape[3]

    # Process partial RoPE
    rotary_embedding_dim = head_size if rotary_embedding_dim == 0 else rotary_embedding_dim
    x_rotate, x_not_rotate = np.split(input, [rotary_embedding_dim], axis=-1)

    # Get cached cosine and sine values
    cache = cos_cache + 1j * sin_cache
    if position_ids is not None:
        cache = cache[position_ids] # Shape: (batch_size, seq_len, rotary_embedding_dim/2)
    cache = cache[:, np.newaxis, :, :] # Shape: (batch_size, 1, seq_len, rotary_embedding_dim/2)

    # Get the 2-d vectors to rotate
    if interleaved:
        x1, x2 = x_rotate[..., 0::2], x_rotate[..., 1::2]
    else:
        x1, x2 = np.split(x_rotate, 2, axis=-1)
    x = x1 + 1j * x2

    # Rotate the vectors
    x = x * cache

    # Put the rotated vectors back
    if interleaved:
        x = np.expand_dims(x, axis=-1)
        x = np.concatenate((np.real(x), np.imag(x)), axis=-1)
        x = np.reshape(x, x_rotate.shape)
    else:
        x = np.concatenate((np.real(x), np.imag(x)), axis=-1)

    # Process partial RoPE
    output = np.concatenate((x, x_not_rotate), axis=-1)
    return output
```

```python
expected[layer.get_output(0).name] = compute_rotary_embedding(inputs[input.name], inputs[cos_cache.name], inputs[sin_cache.name], inputs[position_ids.name], interleaved=False, rotary_embedding_dim=0)
```

--------------------------------

### Create a Loop with Accumulator and Trip Limit

Source: https://docs.nvidia.com/deeplearning/tensorrt-rtx/latest/_static/operators/Loop.html

This example demonstrates creating a loop structure using TensorRT. It includes an ElementWise layer for accumulation, a recurrence layer for managing state, and a TripLimit layer to control the number of iterations. This setup is useful for recurrent computations where a value is updated iteratively.

```python
'''
This example creates a Loop consisting of an ElementWise layer that is used as an accumulator.
The accumalter value is named `accumaltor_value`, and a the added value for each iteration is named `accumaltor_added_value`.
The Loop stop condition is a counter initialized to `num_iterations`, which is implemented using the TripLimit layer.
The expected output is `accumaltor_value` + `num_iterations`*`accumaltor_added_value`
'''
num_iterations = 3
trip_limit = network.add_constant(shape=(), weights=trt.Weights(np.array([num_iterations], dtype=np.dtype("i4"))))
accumaltor_value = network.add_input("input1", dtype=trt.float32, shape=(2, 3))
accumaltor_added_value = network.add_input("input2", dtype=trt.float32, shape=(2, 3))
loop = network.add_loop()
# setting the ITripLimit layer to stop after `num_iterations` iterations
loop.add_trip_limit(trip_limit.get_output(0), trt.TripLimit.COUNT)
# initialzing IRecurrenceLayer with a init value
rec = loop.add_recurrence(accumaltor_value)
# eltwise inputs are 'accumaltor_added_value', and the IRecurrenceLayer output.
eltwise = network.add_elementwise(accumaltor_added_value, rec.get_output(0), op=trt.ElementWiseOperation.SUM)
# wiring the IRecurrenceLayer with the output of eltwise.
# The IRecurrenceLayer output would now be `accumaltor_value` for the first iteration, and the eltwise output for any other iteration
rec.set_input(1, eltwise.get_output(0))
# marking the IRecurrenceLayer output as the Loop output
loop_out = loop.add_loop_output(rec.get_output(0), trt.LoopOutput.LAST_VALUE)
# marking the Loop output as the network output
network.mark_output(loop_out.get_output(0))

inputs[accumaltor_value.name] = np.array(
    [
        [2.7, -4.9, 23.34],
        [8.9, 10.3, -19.8],
    ])
inputs[accumaltor_added_value.name] = np.array(
    [
        [1.1, 2.2, 3.3],
        [-5.7, 1.3, 4.6],
    ])

outputs[loop_out.get_output(0).name] = eltwise.get_input(0).shape
expected[loop_out.get_output(0).name] = inputs[accumaltor_value.name] + inputs[accumaltor_added_value.name] * num_iterations
```

--------------------------------

### Create and Set Runtime Cache (C++)

Source: https://docs.nvidia.com/deeplearning/tensorrt-rtx/latest/performance/best-practices.html

Demonstrates how to create a runtime cache and set it in the runtime configuration using C++. This enables caching of compiled kernels for future use.

```cpp
1// Create a runtime cache.
2auto runtimeCache = std::unique_ptr<nvinfer1::IRuntimeCache>(runtimeConfig->createRuntimeCache());
3
4// Set the runtime cache in runtime configuration.
5runtimeConfig->setRuntimeCache(*runtimeCache);
```

--------------------------------

### KVCacheUpdate Operator Example

Source: https://docs.nvidia.com/deeplearning/tensorrt-rtx/latest/_static/operators/KVCacheUpdate.html

This example demonstrates how to use the KVCacheUpdate operator in Python with TensorRT.

```APIDOC
## KVCacheUpdate

Performs Key (K) / Value (V) cache update for attention computations.
Users provide the newly computed K/V values as inputs, and the layer will output the updated K/V cache. The writeIndices input specifies where to write K/V updates for each sequence in the batch.
Separate KVCacheUpdate layers should be used for K and V.

### Attributes
`cacheMode` specifies the cache update mode:
  * `LINEAR` In linear mode, for each batch element i and sequence position s: `output[i, :, writeIndices[i] + s, :] = update[i, :, s, :]`

### Inputs
**cache** : tensor of type `T`, the key/value cache tensor. Must be a network input and have a static sequence length dimension.
**update** : tensor of type `T`, the newly computed key/value tensor to write into the cache.
**writeIndices** : tensor of type `M`, specifies the write position index for each batch element i. Values must satisfy `writeIndices[i] + sequenceLength <= maxSequenceLength`.

### Outputs
**output** : tensor of type `T`, the updated cache tensor. Must be a network output and shares the same device memory address with the cache input (in-place update).

### Data Types
T: `float32`, `float16`, `bfloat16`
M: `int32`, `int64`

### Shape Information
**cache** and **output** are tensors with the same shape of [b,d,smax,h]
**update** is a tensor with the shape of [b,d,s,h] where s≤smax
**writeIndices** is a tensor with the shape of [b]
Where:
  * b is the batch size
  * d is the number of heads
  * smax is the maximum sequence length (must be static)
  * s is the update sequence length
  * h is the head size

### DLA Support
Not supported.

### Examples
KVCacheUpdate
```python
network = get_runner.builder.create_network(flags=1 << int(trt.NetworkDefinitionCreationFlag.STRONGLY_TYPED))
cache_shape = (4, 2, 8, 1)
update_shape = (4, 2, 4, 1)
write_indices_shape = (4,)

cache = network.add_input("cache", dtype=trt.float32, shape=cache_shape)
update = network.add_input("update", dtype=trt.float32, shape=update_shape)
write_indices = network.add_input("write_indices", dtype=trt.int32, shape=write_indices_shape)
layer = network.add_kv_cache_update(cache, update, write_indices, trt.KVCacheMode.LINEAR)
network.mark_output(layer.get_output(0))

cache_data = np.array(
    [
        [0.53, 0.88, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
        [0.41, 0.0,  0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
        [0.67, 0.0,  0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
        [0.32, 0.79, 0.64, 0.0, 0.0, 0.0, 0.0, 0.0],
    ],
    dtype=np.float32,
)
inputs[cache.name] = cache_data[:, None, :, None] + np.zeros((1, 2, 1, 1))

update_data = np.array(
    [
        [0.72, 0.0,  0.0, 0.0],
        [0.55, 0.94, 0.0, 0.0],
        [0.61, 0.28, 0.0, 0.0],
        [0.83, 0.0,  0.0, 0.0],
    ],
    dtype=np.float32,
)
inputs[update.name] = update_data[:, None, :, None] + np.zeros((1, 2, 1, 1))

write_indices_data = np.array([2, 1, 1, 3], dtype=np.int32)
inputs[write_indices.name] = write_indices_data

outputs[layer.get_output(0).name] = layer.get_output(0).shape

expected_data = np.array(
    [
        [0.53, 0.88, 0.72, 0.0, 0.0, 0.0, 0.0, 0.0],
        [0.41, 0.55, 0.94, 0.0, 0.0, 0.0, 0.0, 0.0],
        [0.67, 0.61, 0.28, 0.0, 0.0, 0.0, 0.0, 0.0],
        [0.32, 0.79, 0.64, 0.83, 0.0, 0.0, 0.0, 0.0],
    ],
    dtype=np.float32,
)

expected[layer.get_output(0).name] = expected_data[:, None, :, None] + np.zeros((1, 2, 1, 1))

# Set get_runner.network back to the new STRONGLY_TYPED network
get_runner.network = network
```

## C++ API
For more information about the C++ IKVCacheUpdateLayer operator, refer to the C++ IKVCacheUpdateLayer documentation.
## Python API
For more information about the Python IKVCacheUpdateLayer operator, refer to the Python IKVCacheUpdateLayer documentation.
```

--------------------------------

### initialize()

Source: https://docs.nvidia.com/deeplearning/tensorrt-rtx/latest/_static/cpp-api/functions_func_i.html

Initializes a plugin.

```APIDOC
## initialize()

### Description
Initializes a plugin. This method is called by TensorRT when the plugin is first used.

### Method
`bool initialize()`

### Endpoint
N/A (C++ API)

### Parameters
None

### Request Example
N/A

### Response
Returns `true` if initialization is successful, `false` otherwise.
```

--------------------------------

### NMS Operator Example

Source: https://docs.nvidia.com/deeplearning/tensorrt-rtx/latest/_static/operators/NMS.html

Example demonstrating the usage of the NMS operator in Python with TensorRT.

```APIDOC
## NMS Operator

### Description
The NMS algorithm iterates through a set of bounding boxes and their confidence scores, in decreasing order of score. Boxes are selected if their score is above a given threshold, and their intersection-over-union (IoU) with previously selected boxes is less than or equal to a given threshold. This layer implements NMS per batch item and per class.

Per batch item, boxes are initially sorted by their scores without regard to class. Only boxes up to a maximum of the TopK limit are considered for selection (per batch). During selection, only overlapping boxes of the same class are compared, so that overlapping boxes of different classes do not suppress each other.

### Attributes
- `fmt`: The bounding box format can be one of: `CORNER_PAIRS` (x1, y1, x2, y2) or `CENTER_SIZES` (x_center, y_center, width, height). Default is `CORNER_PAIRS`.
- `limit`: The TopK box limit, maximum number of filtered boxes considered for selection per batch item. Default is 2000 for SM 5.3 and 6.2 devices, and 5000 otherwise.

### Inputs
- **Boxes**: tensor of type `T1`.
- **Scores**: tensor of type `T1`.
- **MaxOutputBoxesPerClass**: tensor of type `int32`.
- **IoUThreshold** (optional): tensor of type `float32`. Scalar value in range [0.0f, 1.0f]. Default is 0.0f.
- **ScoreThreshold** (optional): tensor of type `float32`. Default is 0.0f.

### Outputs
- **SelectedIndices**: tensor of type `T2`. Shape [NumOutputBoxes, 3]. Each row contains (batchIndex, classIndex, boxIndex).
- **NumOutputBoxes**: tensor of type `int32`. Scalar value.

### Data Types
- **T1**: `float16`, `float32`, `bfloat16`
- **T2**: `int32`, `int64`

### Shape Information
- **Boxes**: [batchSize, numInputBoundingBoxes, numClasses, 4] or [batchSize, numInputBoundingBoxes, 4]
- **Scores**: [batchSize, numInputBoundingBoxes, numClasses]
- **MaxOutputBoxesPerClass**: 0D tensor (scalar)
- **IoUThreshold**: 0D tensor (scalar)
- **ScoreThreshold**: 0D tensor (scalar)
- **SelectedIndices**: [NumOutputBoxes, 3]
- **NumOutputBoxes**: 0D tensor (scalar)

### Volume Limits
- **Boxes**, **Scores**, and **SelectedIndices** can have up to 2^31 - 1 elements.

### Example
```python
opt_profile = get_runner.builder.create_optimization_profile()
get_runner.config.add_optimization_profile(opt_profile)

boxes = network.add_input("boxes", dtype=trt.float32, shape=(1, 3, 4))
scores = network.add_input("scores", dtype=trt.float32, shape=(1, 3, 3))
constant = network.add_constant(shape=(), weights=np.ones(shape=(), dtype=np.int32))
max_output_boxes_per_class = constant.get_output(0)

layer = network.add_nms(boxes, scores, max_output_boxes_per_class)
network.mark_output(layer.get_output(0))
network.mark_output(layer.get_output(1))
layer.get_output(0).dtype = trt.int32
layer.get_output(1).dtype = trt.int32

inputs[boxes.name] = np.array([[[0.0, 0.0, 0.1, 0.1], [0.2, 0.2, 0.4, 0.4], [0.5, 0.5, 0.6, 0.6]]])
inputs[scores.name] = np.array([[[0.7, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.9]]])

# Expected shape is [2, 3]
outputs[layer.get_output(0).name] = layer.get_output(0).shape
expected[layer.get_output(0).name] = np.array([[0, 2, 2], [0, 0, 0]])

# Expected shape is [] with a scalar value of 2
outputs[layer.get_output(1).name] = layer.get_output(1).shape
expected[layer.get_output(1).name] = np.array(2)
```
```

--------------------------------

### Linear Resize Example

Source: https://docs.nvidia.com/deeplearning/tensorrt-rtx/latest/_static/operators/Resize.html

Demonstrates how to use the Resize operator for linear interpolation, specifying the output shape directly.

```APIDOC
## Linear Resize

### Description
Resizes an input tensor to a specified output shape using linear interpolation.

### Method
`network.add_resize()`

### Parameters
- `input`: The input tensor.
- `resize_mode`: Set to `trt.InterpolationMode.LINEAR`.
- `shape`: The desired output shape. Example: `(1, 1, 5, 5)`.
- `coordinate_transformation`: Controls coordinate mapping. Example: `trt.ResizeCoordinateTransformation.ALIGN_CORNERS`.

### Request Example
```python
input_tensor = network.add_input("input", dtype=trt.float32, shape=(1, 1, 3, 3))
layer = network.add_resize(input_tensor)
layer.resize_mode = trt.InterpolationMode.LINEAR
layer.shape = (1, 1, 5, 5)
layer.coordinate_transformation = trt.ResizeCoordinateTransformation.ALIGN_CORNERS
network.mark_output(layer.get_output(0))
```

### Response Example
```json
{
  "output_shape": [1, 1, 5, 5]
}
```
```

--------------------------------

### Squeeze Operator Example

Source: https://docs.nvidia.com/deeplearning/tensorrt-rtx/latest/_static/operators/Squeeze.html

This example demonstrates how to add a Squeeze layer to a TensorRT network. It specifies the input tensor and the axes to squeeze. The example also includes setting up test data and verifying the output shape.

```python
in1 = network.add_input("input1", dtype=trt.float32, shape=(3, 1, 4, 1))
axes_weights = trt.Weights(np.array([1, -1], dtype=np.int64))
axes_layer = network.add_constant((2,), axes_weights)
axes_tensor = axes_layer.get_output(0)
layer = network.add_squeeze(in1, axes_tensor)
network.mark_output(layer.get_output(0))

test_data = np.array(
    [
        [1.0, 2.0, 3.0, 4.0],
        [10.0, 20.0, 30.0, 40.0],
        [100.0, 200.0, 300.0, 400.0],
    ]
)

inputs[in1.name] = test_data.reshape(3, 1, 4, 1)

outputs[layer.get_output(0).name] = layer.get_output(0).shape

expected[layer.get_output(0).name] = test_data
```

--------------------------------

### Create and Configure Optimization Profile (Python)

Source: https://docs.nvidia.com/deeplearning/tensorrt-rtx/latest/inference-library/work-with-dynamic-shapes.html

This Python code demonstrates how to create an optimization profile, define its input shapes (min, opt, max), and add it to the configuration.

```python
profile = builder.create_optimization_profile();
profile.set_shape("foo", (3, 100, 200), (3, 150, 250), (3, 200, 300))
config.add_optimization_profile(profile)
```