### Install and Use CodeBLEU CLI

Source: https://context7.com/k4black/codebleu/llms.txt

Provides instructions for installing CodeBLEU and its language-specific parsers, along with examples of basic command-line usage for evaluating code files. Supports multiple reference files and custom weights.

```bash
# Installation
pip install codebleu

# Install language-specific tree-sitter parser
pip install tree-sitter-python

# Or install all supported languages at once
pip install codebleu[all]

# Basic command-line usage
# Create reference file (references.txt):
# def foo ( x ) :
#     return x

# Create hypothesis file (predictions.txt):
# def bar ( x ) :
#     return x

codebleu --refs references.txt --hyp predictions.txt --lang python

# Output:
# ngram_match: 0.6514
# weighted_ngram_match: 0.6585
# syntax_match: 1.0
# dataflow_match: 1.0
# CodeBLEU score: 0.8275

# With multiple reference files
codebleu --refs ref1.txt ref2.txt --hyp predictions.txt --lang java

# With custom weights (alpha, beta, gamma, theta)
codebleu --refs references.txt --hyp predictions.txt --lang python --params 0.1,0.1,0.4,0.4

# Using as Python module from command line
python -m codebleu --refs references.txt --hyp predictions.txt --lang python
```

--------------------------------

### Install CodeBLEU via pip

Source: https://github.com/k4black/codebleu/blob/main/README.md

Commands to install the CodeBLEU package from PyPI or directly from the repository.

```bash
pip install codebleu
```

```bash
pip install git+https://github.com/k4black/codebleu.git
```

--------------------------------

### Install Development Dependencies

Source: https://github.com/k4black/codebleu/blob/main/README.md

Install the library with all precompiled languages and test extras. Requires an internet connection to download tree-sitter dependencies.

```bash
python -m pip install -e .[all,test]
```

```bash
python -m pip install -e .\[all,test\]  # for macos
```

--------------------------------

### Install tree-sitter language dependencies

Source: https://github.com/k4black/codebleu/blob/main/README.md

Commands to install specific or all tree-sitter language parsers required for AST matching.

```bash
pip install tree-sitter-python
```

```bash
pip install codebleu[all]
```

```bash
pip install pip install git+https://github.com/tree-sitter/tree-sitter-python.git
```

--------------------------------

### Load and Use CodeBLEU Metric with HuggingFace Evaluate

Source: https://context7.com/k4black/codebleu/llms.txt

Shows how to load the CodeBLEU metric using HuggingFace's `evaluate` library and compute scores for single predictions and batch evaluations. Requires the `evaluate` library and the CodeBLEU metric to be installed.

```python
import evaluate

# Load the CodeBLEU metric from HuggingFace Hub
metric = evaluate.load("dvitel/codebleu")

# Single prediction evaluation
ref = "def sum ( first , second ) :\n return second + first"
pred = "def add ( a , b ) :\n return a + b"

results = metric.compute(
    references=[ref],
    predictions=[pred],
    lang=["python"],  # Note: must be a list
    weights=(0.25, 0.25, 0.25, 0.25)
)

print(results)
# Output:
# {
#     'codebleu': 0.5537,
#     'ngram_match_score': 0.1041,
#     'weighted_ngram_match_score': 0.1109,
#     'syntax_match_score': 1.0,
#     'dataflow_match_score': 1.0
# }

# Batch evaluation for model comparison
model_outputs = [
    "def factorial ( n ) :\n    if n <= 1 :\n        return 1\n    return n * factorial ( n - 1 )",
    "def fibonacci ( n ) :\n    if n <= 1 :\n        return n\n    return fibonacci ( n - 1 ) + fibonacci ( n - 2 )"
]

ground_truth = [
    "def factorial ( n ) :\n    result = 1\n    for i in range ( 1 , n + 1 ) :\n        result *= i\n    return result",
    "def fib ( n ) :\n    if n <= 1 :\n        return n\n    return fib ( n - 1 ) + fib ( n - 2 )"
]

results = metric.compute(
    references=ground_truth,
    predictions=model_outputs,
    lang=["python"]
)

print(f"Batch CodeBLEU: {results['codebleu']:.4f}")
```

--------------------------------

### Calculate CodeBLEU for Rust

Source: https://context7.com/k4black/codebleu/llms.txt

Demonstrates calculating the CodeBLEU score for Rust code snippets using the `calc_codebleu` function. Ensure the `codebleu` library is installed.

```python
rust_pred = "fn foo ( x ) -> i32 { x }"
rust_ref = "fn bar ( y ) -> i32 { y }"

result = calc_codebleu([rust_ref], [rust_pred], lang="rust")
print(f"Rust CodeBLEU: {result['codebleu']:.4f}")
```

--------------------------------

### Calculate CodeBLEU score

Source: https://github.com/k4black/codebleu/blob/main/README.md

Example usage of the calc_codebleu function to evaluate code similarity between a prediction and a reference.

```python
from codebleu import calc_codebleu

prediction = "def add ( a , b ) :\n return a + b"
reference = "def sum ( first , second ) :\n return second + first"

result = calc_codebleu([reference], [prediction], lang="python", weights=(0.25, 0.25, 0.25, 0.25), tokenizer=None)
print(result)
# {
#   'codebleu': 0.5537, 
#   'ngram_match_score': 0.1041, 
#   'weighted_ngram_match_score': 0.1109, 
#   'syntax_match_score': 1.0, 
#   'dataflow_match_score': 1.0
# }
```

--------------------------------

### Calculate CodeBLEU using pip package

Source: https://github.com/k4black/codebleu/blob/main/evaluate_app/README.md

Use `calc_codebleu` from the `codebleu` package to compute the CodeBLEU score. Ensure you have the necessary tree-sitter language installed. The `lang` parameter is required.

```python
from codebleu import calc_codebleu

prediction = "def add ( a , b ) :\n return a + b"
reference = "def sum ( first , second ) :\n return second + first"

result = calc_codebleu([reference], [prediction], lang="python", weights=(0.25, 0.25, 0.25, 0.25), tokenizer=None)
print(result)
```

```json
{
  "codebleu": 0.5537, 
  "ngram_match_score": 0.1041, 
  "weighted_ngram_match_score": 0.1109, 
  "syntax_match_score": 1.0, 
  "dataflow_match_score": 1.0
}
```

--------------------------------

### Clone the CodeBLEU Repository

Source: https://github.com/k4black/codebleu/blob/main/README.md

Use this command to create a local copy of the project from GitHub.

```bash
git clone https://github.com/k4black/codebleu
```

--------------------------------

### CodeBLEU with Evaluate Library

Source: https://github.com/k4black/codebleu/blob/main/evaluate_app/README.md

This snippet shows how to load and use the CodeBLEU metric via the `evaluate` library.

```APIDOC
## Load and Use CodeBLEU with Evaluate Library

### Description
Loads the CodeBLEU metric using the `evaluate` library and computes the score.

### Method
`evaluate.load("k4black/codebleu").compute(references, predictions, lang, weights, tokenizer)`

### Parameters
#### Path Parameters
None

#### Query Parameters
None

#### Request Body
- **references** (list[str] or list[list[str]]) - Required - The reference code(s).
- **predictions** (list[str]) - Required - The predicted code.
- **lang** (str) - Required - The programming language of the code (e.g., 'python', 'c_sharp').
- **weights** (tuple[float,float,float,float]) - Optional - Weights for ngram_match, weighted_ngram_match, syntax_match, and dataflow_match. Defaults to (0.25, 0.25, 0.25, 0.25).
- **tokenizer** (callable) - Optional - A function to tokenize code strings. Defaults to `s.split()`.

### Request Example
```python
import evaluate
metric = evaluate.load("k4black/codebleu")

prediction = "def add ( a , b ) :\n return a + b"
reference = "def sum ( first , second ) :\n return second + first"

result = metric.compute([reference], [prediction], lang="python", weights=(0.25, 0.25, 0.25, 0.25), tokenizer=None)
```

### Response
#### Success Response (200)
- **codebleu** (float) - The final CodeBLEU score.
- **ngram_match_score** (float) - The ngram_match score (BLEU).
- **weighted_ngram_match_score** (float) - The weighted_ngram_match score (BLEU-weighted).
- **syntax_match_score** (float) - The syntax_match score (AST match).
- **dataflow_match_score** (float) - The dataflow_match score (data-flow match).

Each score is in the range [0, 1], where 1 is the best score.

#### Response Example
```json
{
  "codebleu": 0.5537,
  "ngram_match_score": 0.1041,
  "weighted_ngram_match_score": 0.1109,
  "syntax_match_score": 1.0,
  "dataflow_match_score": 1.0
}
```
```

--------------------------------

### Perform Style and Type Checks

Source: https://github.com/k4black/codebleu/blob/main/README.md

Run static analysis and formatting checks to ensure code quality.

```bash
python -m isort codebleu --check
python -m black codebleu --check
python -m ruff codebleu
python -m mypy codebleu
```

--------------------------------

### Run Project Tests

Source: https://github.com/k4black/codebleu/blob/main/README.md

Execute the test suite using pytest.

```bash
python -m pytest
```

--------------------------------

### Evaluate using HuggingFace evaluate library

Source: https://github.com/k4black/codebleu/blob/main/README.md

Alternative usage of CodeBLEU via the HuggingFace evaluate library.

```python
import evaluate
metric = evaluate.load("dvitel/codebleu")

prediction = "def add ( a , b ) :\n return a + b"
reference = "def sum ( first , second ) :\n return second + first"

result = metric.compute([reference], [prediction], lang="python", weights=(0.25, 0.25, 0.25, 0.25))
```

--------------------------------

### Calculate CodeBLEU using evaluate library

Source: https://github.com/k4black/codebleu/blob/main/evaluate_app/README.md

Load the CodeBLEU metric using `evaluate.load` and then use the `compute` method. The `lang` parameter is required.

```python
import evaluate
metric = evaluate.load("k4black/codebleu")

prediction = "def add ( a , b ) :\n return a + b"
reference = "def sum ( first , second ) :\n return second + first"

result = metric.compute([reference], [prediction], lang="python", weights=(0.25, 0.25, 0.25, 0.25), tokenizer=None)
```

--------------------------------

### Compute Syntax Match Score using corpus_syntax_match

Source: https://context7.com/k4black/codebleu/llms.txt

Demonstrates calculating the syntax match score by comparing Abstract Syntax Tree (AST) subtrees between reference and candidate code using the `corpus_syntax_match` function. This function requires lists of lists for references and lists for candidates, specifying the programming language.

```python
from codebleu.syntax_match import corpus_syntax_match, calc_syntax_match

# Single pair syntax matching
references = [["def foo ( x ) :\n    return x * x"]]
candidates = ["def bar ( y ) :\n    return y * y"]

syntax_score = corpus_syntax_match(references, candidates, lang="python")
print(f"Syntax Match Score: {syntax_score:.4f}")
# Output: Syntax Match Score: 1.0 (identical AST structure)

# Different structures
references = [["def foo ( x ) :\n    return x"]]
candidates = ["def bar ( x , y ) :\n    if x > y :\n        return x\n    return y"]

syntax_score = corpus_syntax_match(references, candidates, lang="python")
print(f"Syntax Match Score (different structure): {syntax_score:.4f}")

# Batch evaluation across multiple samples
references_batch = [
    ["int foo ( int x ) { return x ; }"],
    ["public void bar ( ) { System.out.println ( \"hello\" ) ; }"]
]
candidates_batch = [
    "int bar ( int y ) { return y ; }",
    "public void foo ( ) { System.out.println ( \"world\" ) ; }"
]

syntax_score = corpus_syntax_match(references_batch, candidates_batch, lang="java")
print(f"Batch Syntax Score: {syntax_score:.4f}")

# Single reference-candidate pair (convenience function)
score = calc_syntax_match(
    references=["def foo ( x ) : return x"],
    candidate="def bar ( x ) : return x",
    lang="python"
)
print(f"Single Pair Syntax Score: {score:.4f}")
```

--------------------------------

### Load Tree-Sitter Language Parsers

Source: https://context7.com/k4black/codebleu/llms.txt

Dynamically loads language parsers for AST analysis and iterates through available languages.

```python
from codebleu.utils import get_tree_sitter_language, AVAILABLE_LANGS
from tree_sitter import Parser

# Load Python language parser
python_lang = get_tree_sitter_language("python")

# Create parser and parse code
parser = Parser()
parser.language = python_lang

code = b"""
def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n - 1) + fibonacci(n - 2)
"""

tree = parser.parse(code)
root_node = tree.root_node

print(f"Root node type: {root_node.type}")
print(f"Children: {[child.type for child in root_node.children]}")

# Parse Java code
java_lang = get_tree_sitter_language("java")
parser.language = java_lang

java_code = b"""
public class Main {
    public static void main(String[] args) {
        System.out.println("Hello");
    }
}
"""

tree = parser.parse(java_code)
print(f"Java root: {tree.root_node.type}")

# Iterate through all supported languages
for lang in AVAILABLE_LANGS:
    try:
        lang_parser = get_tree_sitter_language(lang)
        print(f"✓ {lang} parser loaded successfully")
    except ImportError as e:
        print(f"✗ {lang}: {e}")
```

--------------------------------

### Preprocess Code by Removing Comments and Docstrings

Source: https://context7.com/k4black/codebleu/llms.txt

Strips documentation and comments from source code to focus evaluation on logic, useful for improving CodeBLEU accuracy.

```python
from codebleu.parser import remove_comments_and_docstrings

# Python code with comments and docstrings
python_code = '''
def factorial(n):
    """
    Calculate the factorial of n.

    Args:
        n: A non-negative integer

    Returns:
        The factorial of n
    """
    # Base case
    if n <= 1:
        return 1  # Return 1 for 0 and 1
    # Recursive case
    return n * factorial(n - 1)
'''

cleaned = remove_comments_and_docstrings(python_code, "python")
print("Cleaned Python code:")
print(cleaned)

# Java code with comments
java_code = '''
/**
 * Main application class
 */
public class Calculator {
    // Addition method
    public int add(int a, int b) {
        return a + b; // Return sum
    }
}
'''

cleaned_java = remove_comments_and_docstrings(java_code, "java")
print("\nCleaned Java code:")
print(cleaned_java)

# Use in evaluation pipeline
from codebleu import calc_codebleu

# Code with heavy documentation
ref_with_docs = '''
def add(a, b):
    """Add two numbers together."""
    return a + b
'''

pred_no_docs = '''
def add(x, y):
    return x + y
'''

# Without preprocessing, docstrings affect the score
result = calc_codebleu([ref_with_docs], [pred_no_docs], lang="python")
print(f"\nWith docstrings in ref: {result['codebleu']:.4f}")

# Preprocess both before evaluation
ref_cleaned = remove_comments_and_docstrings(ref_with_docs, "python")
pred_cleaned = remove_comments_and_docstrings(pred_no_docs, "python")

result_cleaned = calc_codebleu([ref_cleaned], [pred_cleaned], lang="python")
print(f"After preprocessing: {result_cleaned['codebleu']:.4f}")
```

--------------------------------

### AVAILABLE_LANGS - Supported Languages Constant

Source: https://context7.com/k4black/codebleu/llms.txt

A list constant containing all programming languages supported by CodeBLEU for AST parsing and dataflow analysis.

```APIDOC
## AVAILABLE_LANGS - Supported Languages Constant

### Description
A list constant containing all programming languages supported by CodeBLEU for AST parsing and dataflow analysis. Use this to validate language inputs before calling `calc_codebleu`.

### Value
A list of strings, where each string is a supported language identifier.

### Example
```python
from codebleu import AVAILABLE_LANGS

print(AVAILABLE_LANGS)
# Output: ['java', 'javascript', 'c_sharp', 'php', 'c', 'cpp', 'python', 'go', 'ruby', 'rust']

# Validate language before evaluation
def evaluate_code(predictions, references, lang):
    if lang not in AVAILABLE_LANGS:
        raise ValueError(f"Language '{lang}' not supported. Choose from: {AVAILABLE_LANGS}")
    return calc_codebleu(references, predictions, lang)
```
```

--------------------------------

### Cite CodeBLEU

Source: https://github.com/k4black/codebleu/blob/main/README.md

BibTeX entry for citing the official CodeBLEU paper.

```bibtex
@misc{ren2020codebleu,
      title={CodeBLEU: a Method for Automatic Evaluation of Code Synthesis}, 
      author={Shuo Ren and Daya Guo and Shuai Lu and Long Zhou and Shujie Liu and Duyu Tang and Neel Sundaresan and Ming Zhou and Ambrosio Blanco and Shuai Ma},
      year={2020},
      eprint={2009.10297},
      archivePrefix={arXiv},
      primaryClass={cs.SE}
}
```

--------------------------------

### Evaluate Dataflow Matching

Source: https://context7.com/k4black/codebleu/llms.txt

Computes similarity scores based on dataflow structures. Supports single and batch comparisons across different programming languages.

```python
references = [["def foo ( x ) :\n    y = x * 2\n    return y"]]
candidates = ["def bar ( a ) :\n    b = a * 2\n    return b"]

# Same dataflow structure (variable assigned from parameter, then returned)
dataflow_score = corpus_dataflow_match(references, candidates, lang="python")
print(f"Dataflow Match Score: {dataflow_score:.4f}")
```

```python
# Different dataflow patterns
references = [["def foo ( x ) :\n    return x"]]
candidates = ["def bar ( x ) :\n    y = x * x\n    z = y + 1\n    return z"]

dataflow_score = corpus_dataflow_match(references, candidates, lang="python")
print(f"Different Dataflow Score: {dataflow_score:.4f}")
```

```java
# Java example with complex dataflow
java_refs = [["public int calc ( int x , int y ) { int sum = x + y ; return sum ; }"]]
java_cands = ["public int compute ( int a , int b ) { int result = a + b ; return result ; }"]

dataflow_score = corpus_dataflow_match(java_refs, java_cands, lang="java")
print(f"Java Dataflow Score: {dataflow_score:.4f}")
```

```python
# Batch evaluation
references_batch = [
    ["def add ( a , b ) :\n    return a + b"],
    ["def mult ( x , y ) :\n    result = x * y\n    return result"]
]
candidates_batch = [
    "def sum ( x , y ) :\n    return x + y",
    "def product ( a , b ) :\n    val = a * b\n    return val"
]

dataflow_score = corpus_dataflow_match(references_batch, candidates_batch, lang="python")
print(f"Batch Dataflow Score: {dataflow_score:.4f}")
```

--------------------------------

### Calculate Keyword-Weighted BLEU

Source: https://context7.com/k4black/codebleu/llms.txt

Computes the BLEU score based on tokenized references and hypotheses with keyword weighting.

```python
ref_tokens = ['def', 'factorial', '(', 'n', ')', ':', 'if', 'n', '<=', '1', ':', 'return', '1', 'return', 'n', '*', 'factorial', '(', 'n', '-', '1', ')']
hyp_tokens = ['def', 'fact', '(', 'x', ')', ':', 'if', 'x', '<=', '1', ':', 'return', '1', 'return', 'x', '*', 'fact', '(', 'x', '-', '1', ')']

ref_weights = make_weights(ref_tokens, python_keywords)
references_with_weights = [[[ref_tokens, ref_weights]]]

score = weighted_corpus_bleu(references_with_weights, [hyp_tokens])
print(f"Keyword-Weighted BLEU: {score:.4f}")
```

--------------------------------

### CodeBLEU Calculation

Source: https://github.com/k4black/codebleu/blob/main/evaluate_app/README.md

This snippet demonstrates how to calculate the CodeBLEU score using the `calc_codebleu` function from the `codebleu` package.

```APIDOC
## Calculate CodeBLEU Score

### Description
Calculates the CodeBLEU score for given predictions and references.

### Method
`calc_codebleu(references, predictions, lang, weights, tokenizer)`

### Parameters
#### Path Parameters
None

#### Query Parameters
None

#### Request Body
- **references** (list[str] or list[list[str]]) - Required - The reference code(s).
- **predictions** (list[str]) - Required - The predicted code.
- **lang** (str) - Required - The programming language of the code (e.g., 'python', 'c_sharp').
- **weights** (tuple[float,float,float,float]) - Optional - Weights for ngram_match, weighted_ngram_match, syntax_match, and dataflow_match. Defaults to (0.25, 0.25, 0.25, 0.25).
- **tokenizer** (callable) - Optional - A function to tokenize code strings. Defaults to `s.split()`.

### Request Example
```python
from codebleu import calc_codebleu

prediction = "def add ( a , b ) :\n return a + b"
reference = "def sum ( first , second ) :\n return second + first"

result = calc_codebleu([reference], [prediction], lang="python", weights=(0.25, 0.25, 0.25, 0.25), tokenizer=None)
print(result)
```

### Response
#### Success Response (200)
- **codebleu** (float) - The final CodeBLEU score.
- **ngram_match_score** (float) - The ngram_match score (BLEU).
- **weighted_ngram_match_score** (float) - The weighted_ngram_match score (BLEU-weighted).
- **syntax_match_score** (float) - The syntax_match score (AST match).
- **dataflow_match_score** (float) - The dataflow_match score (data-flow match).

Each score is in the range [0, 1], where 1 is the best score.

#### Response Example
```json
{
  "codebleu": 0.5537,
  "ngram_match_score": 0.1041,
  "weighted_ngram_match_score": 0.1109,
  "syntax_match_score": 1.0,
  "dataflow_match_score": 1.0
}
```
```

--------------------------------

### get_tree_sitter_language

Source: https://context7.com/k4black/codebleu/llms.txt

Dynamically loads and returns the tree-sitter Language object for parsing code in the specified programming language.

```APIDOC
## get_tree_sitter_language

### Description
Dynamically loads and returns the tree-sitter Language object for parsing code in the specified programming language. This is used internally but can also be used directly for custom AST analysis.

### Parameters
#### Arguments
- **lang** (string) - Required - The name of the programming language to load (e.g., "python", "java").

### Response
- **Language** (object) - The tree-sitter Language object for the requested language.
```

--------------------------------

### Validate supported languages with AVAILABLE_LANGS

Source: https://context7.com/k4black/codebleu/llms.txt

Use the AVAILABLE_LANGS constant to ensure the target language is supported before performing evaluation.

```python
from codebleu import AVAILABLE_LANGS

# Check available languages
print(AVAILABLE_LANGS)
# Output: ['java', 'javascript', 'c_sharp', 'php', 'c', 'cpp', 'python', 'go', 'ruby', 'rust']

# Validate language before evaluation
def evaluate_code(predictions, references, lang):
    if lang not in AVAILABLE_LANGS:
        raise ValueError(f"Language '{lang}' not supported. Choose from: {AVAILABLE_LANGS}")
    return calc_codebleu(references, predictions, lang)

# Example with Java code
java_pred = "public static int Sign ( double d ) { return ( int ) ( ( d == 0 ) ? 0 : ( d < 0 ) ? -1 : 1 ) ; }"
java_ref = "public static int Sign ( double d ) { return ( int ) ( ( d == 0 ) ? 0 : ( d < 0 ) ? -1 : 1 ) ; }"

result = calc_codebleu([java_ref], [java_pred], lang="java")
print(f"Java CodeBLEU: {result['codebleu']:.4f}")

# Example with JavaScript code
js_pred = "function foo ( x ) { return x }"
js_ref = "function bar ( y ) {\n   return y\n}"

result = calc_codebleu([js_ref], [js_pred], lang="javascript")
print(f"JavaScript CodeBLEU: {result['codebleu']:.4f}")
```

--------------------------------

### calc_codebleu - Main Evaluation Function

Source: https://context7.com/k4black/codebleu/llms.txt

The primary function for computing CodeBLEU scores. It takes reference code and predictions, then returns a dictionary containing the overall CodeBLEU score along with individual component scores.

```APIDOC
## calc_codebleu - Main Evaluation Function

### Description
Computes CodeBLEU scores by comparing predicted code against reference code. It returns a dictionary with the overall CodeBLEU score and individual component scores (n-gram match, weighted n-gram match, syntax match, dataflow match).

### Method
`calc_codebleu`

### Parameters
- **references** (list of str or list of list of str) - Required - A list of reference code strings, or a list of lists of reference strings if multiple references are provided per prediction.
- **predictions** (list of str) - Required - A list of predicted code strings.
- **lang** (str) - Required - The programming language of the code (e.g., 'python', 'java'). Must be one of the languages listed in `AVAILABLE_LANGS`.
- **weights** (tuple of float, optional) - Optional - A tuple of four floats representing the weights for n-gram match, weighted n-gram match, syntax match, and dataflow match, respectively. Defaults to (0.25, 0.25, 0.25, 0.25).
- **tokenizer** (callable, optional) - Optional - A custom tokenizer function that takes a code string and returns a list of tokens. Defaults to `None`, which uses a whitespace tokenizer.

### Request Example
```python
from codebleu import calc_codebleu

prediction = "def add ( a , b ) :\n return a + b"
reference = "def sum ( first , second ) :\n return second + first"

result = calc_codebleu(
    references=[reference],
    predictions=[prediction],
    lang="python",
    weights=(0.25, 0.25, 0.25, 0.25),
    tokenizer=None
)
print(result)
```

### Response
#### Success Response (dict)
- **codebleu** (float) - The overall CodeBLEU score.
- **ngram_match_score** (float) - The score for standard n-gram matching.
- **weighted_ngram_match_score** (float) - The score for keyword-weighted n-gram matching.
- **syntax_match_score** (float) - The score for Abstract Syntax Tree (AST) matching.
- **dataflow_match_score** (float) - The score for data-flow analysis matching.

#### Response Example
```json
{
    "codebleu": 0.5537,
    "ngram_match_score": 0.1041,
    "weighted_ngram_match_score": 0.1109,
    "syntax_match_score": 1.0,
    "dataflow_match_score": 1.0
}
```
```

--------------------------------

### Compute Keyword-Weighted BLEU

Source: https://context7.com/k4black/codebleu/llms.txt

Calculates BLEU scores where specific tokens (like language keywords) are assigned higher importance weights.

```python
from codebleu.weighted_ngram_match import corpus_bleu as weighted_corpus_bleu

# Tokenized code with weights
# Format: [[tokens, weights_dict], ...]
hypothesis = ['def', 'foo', '(', 'x', ')', ':', 'return', 'x']

# Reference with keyword weights (keywords like 'def', 'return' have weight 1.0, others 0.2)
reference_tokens = ['def', 'bar', '(', 'y', ')', ':', 'return', 'y']
weights = {
    'def': 1.0, 'return': 1.0,  # Keywords
    'bar': 0.2, 'y': 0.2, '(': 0.2, ')': 0.2, ':': 0.2  # Non-keywords
}

references_with_weights = [[[reference_tokens, weights]]]
hypotheses = [hypothesis]

weighted_score = weighted_corpus_bleu(references_with_weights, hypotheses)
print(f"Weighted N-gram Score: {weighted_score:.4f}")

# Practical example with Python keywords
python_keywords = ['def', 'return', 'if', 'else', 'for', 'while', 'class', 'import', 'from', 'in', 'not', 'and', 'or', 'True', 'False', 'None']

def make_weights(tokens, keywords):
    """Create weight dictionary: 1.0 for keywords, 0.2 for others"""
    return {token: 1.0 if token in keywords else 0.2 for token in tokens}
```

--------------------------------

### Compute CodeBLEU scores with calc_codebleu

Source: https://context7.com/k4black/codebleu/llms.txt

The primary function for evaluating code similarity. It supports single or multiple references, custom weights, and user-defined tokenizers.

```python
from codebleu import calc_codebleu

# Basic usage: Compare predicted code against reference code
prediction = "def add ( a , b ) :\n return a + b"
reference = "def sum ( first , second ) :\n return second + first"

result = calc_codebleu(
    references=[reference],
    predictions=[prediction],
    lang="python",
    weights=(0.25, 0.25, 0.25, 0.25),  # Equal weights for all components
    tokenizer=None  # Uses default whitespace tokenizer
)

print(result)
# Output:
# {
#     'codebleu': 0.5537,
#     'ngram_match_score': 0.1041,
#     'weighted_ngram_match_score': 0.1109,
#     'syntax_match_score': 1.0,
#     'dataflow_match_score': 1.0
# }

# Multiple predictions and references
predictions = [
    "def foo ( x ) :\n    return x * x",
    "def bar ( x ) :\n    return x"
]
references = [
    "def bar ( x ) :\n    return x",
    "def foo ( x ) :\n    return x"
]

result = calc_codebleu(references, predictions, lang="python")
print(f"CodeBLEU: {result['codebleu']:.4f}")

# Using multiple references per prediction (list of lists format)
predictions = ["def foo ( x ) : pass"]
references = [["def bar ( x ) : pass", "def foo ( x ) : pass"]]  # Multiple valid references

result = calc_codebleu(references, predictions, lang="python")
print(f"CodeBLEU with multiple refs: {result['codebleu']:.4f}")

# Custom weights emphasizing syntax correctness
result = calc_codebleu(
    references=["def foo ( x ) :\n    return x"],
    predictions=["def bar ( x ) :\n    return x"],
    lang="python",
    weights=(0.1, 0.1, 0.4, 0.4)  # Higher weight for syntax and dataflow
)
print(f"Syntax-weighted CodeBLEU: {result['codebleu']:.4f}")

# Custom tokenizer for specialized tokenization
def custom_tokenizer(code):
    """Custom tokenizer that handles camelCase splitting"""
    import re
    tokens = code.split()
    expanded = []
    for token in tokens:
        # Split camelCase
        parts = re.findall(r'[A-Z]?[a-z]+|[A-Z]+(?=[A-Z]|$)', token)
        expanded.extend(parts if parts else [token])
    return expanded

result = calc_codebleu(
    references=["def calculateSum ( numbers ) :\n    return sum ( numbers )"],
    predictions=["def computeTotal ( values ) :\n    return sum ( values )"],
    lang="python",
    tokenizer=custom_tokenizer
)
```

--------------------------------

### Compute Dataflow Match Score using corpus_dataflow_match

Source: https://context7.com/k4black/codebleu/llms.txt

This section is intended to describe the `corpus_dataflow_match` function for analyzing data-flow graphs (DFG) to compare logical variable dependencies. The provided code snippet is incomplete.

```python
from codebleu.dataflow_match import corpus_dataflow_match, calc_dataflow_match


```

--------------------------------

### calc_codebleu

Source: https://github.com/k4black/codebleu/blob/main/README.md

Calculates the CodeBLEU score by comparing reference code against predicted code using weighted metrics.

```APIDOC
## calc_codebleu

### Description
Calculates the CodeBLEU score, which is a weighted combination of n-gram match, weighted n-gram match, AST match, and data-flow match scores.

### Parameters
- **references** (list[str] or list[list[str]]) - Required - The reference code snippets.
- **predictions** (list[str]) - Required - The predicted code snippets.
- **lang** (str) - Required - The programming language of the code (e.g., python, c_sharp, c, cpp, javascript, java, php, go, ruby).
- **weights** (tuple[float, float, float, float]) - Optional - Weights for ngram_match, weighted_ngram_match, syntax_match, and dataflow_match. Defaults to (0.25, 0.25, 0.25, 0.25).
- **tokenizer** (callable) - Optional - Function to split code string into tokens. Defaults to s.split().

### Response
#### Success Response (200)
- **codebleu** (float) - The final CodeBLEU score.
- **ngram_match_score** (float) - The BLEU score.
- **weighted_ngram_match_score** (float) - The weighted BLEU score.
- **syntax_match_score** (float) - The AST match score.
- **dataflow_match_score** (float) - The data-flow match score.

### Response Example
{
  "codebleu": 0.5537,
  "ngram_match_score": 0.1041,
  "weighted_ngram_match_score": 0.1109,
  "syntax_match_score": 1.0,
  "dataflow_match_score": 1.0
}
```

--------------------------------

### Compute N-gram BLEU Scores

Source: https://context7.com/k4black/codebleu/llms.txt

Calculates standard BLEU scores for code evaluation, including support for custom weights and smoothing functions for short sequences.

```python
from codebleu.bleu import corpus_bleu, sentence_bleu, SmoothingFunction

# Tokenized code for BLEU computation
hypothesis = ['def', 'foo', '(', 'x', ')', ':', 'return', 'x']
reference = ['def', 'bar', '(', 'x', ')', ':', 'return', 'x']

# Sentence-level BLEU
score = sentence_bleu([reference], hypothesis)
print(f"Sentence BLEU: {score:.4f}")

# Corpus-level BLEU with multiple samples
hypotheses = [
    ['def', 'add', '(', 'a', ',', 'b', ')', ':', 'return', 'a', '+', 'b'],
    ['def', 'mult', '(', 'x', ',', 'y', ')', ':', 'return', 'x', '*', 'y']
]
list_of_references = [
    [['def', 'sum', '(', 'x', ',', 'y', ')', ':', 'return', 'x', '+', 'y']],
    [['def', 'product', '(', 'a', ',', 'b', ')', ':', 'return', 'a', '*', 'b']]
]

corpus_score = corpus_bleu(list_of_references, hypotheses)
print(f"Corpus BLEU: {corpus_score:.4f}")

# Custom weights for different n-gram orders
# BLEU-2 (unigrams and bigrams only)
weights_bleu2 = (0.5, 0.5, 0.0, 0.0)
score = sentence_bleu([reference], hypothesis, weights=weights_bleu2)
print(f"BLEU-2 Score: {score:.4f}")

# Using smoothing for short sequences
chencherry = SmoothingFunction()
short_hyp = ['return', 'x']
short_ref = ['return', 'y']

# Without smoothing, may return 0 for short sequences
score_no_smooth = sentence_bleu([short_ref], short_hyp)
print(f"Without smoothing: {score_no_smooth:.4f}")

# With smoothing (method1 adds epsilon to zero counts)
score_smoothed = sentence_bleu([short_ref], short_hyp, smoothing_function=chencherry.method1)
print(f"With smoothing: {score_smoothed:.4f}")
```

--------------------------------

### parser.remove_comments_and_docstrings

Source: https://context7.com/k4black/codebleu/llms.txt

Removes comments and docstrings from source code before evaluation to focus on actual code logic.

```APIDOC
## parser.remove_comments_and_docstrings

### Description
Removes comments and docstrings from source code before evaluation. This ensures that documentation differences don't affect the CodeBLEU score, focusing the evaluation on actual code logic.

### Parameters
#### Arguments
- **source_code** (string) - Required - The raw source code string.
- **lang** (string) - Required - The programming language of the source code.

### Response
- **cleaned_code** (string) - The source code with all comments and docstrings removed.
```

=== COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.