### Install and Use CodeBLEU CLI Source: https://context7.com/k4black/codebleu/llms.txt Provides instructions for installing CodeBLEU and its language-specific parsers, along with examples of basic command-line usage for evaluating code files. Supports multiple reference files and custom weights. ```bash # Installation pip install codebleu # Install language-specific tree-sitter parser pip install tree-sitter-python # Or install all supported languages at once pip install codebleu[all] # Basic command-line usage # Create reference file (references.txt): # def foo ( x ) : # return x # Create hypothesis file (predictions.txt): # def bar ( x ) : # return x codebleu --refs references.txt --hyp predictions.txt --lang python # Output: # ngram_match: 0.6514 # weighted_ngram_match: 0.6585 # syntax_match: 1.0 # dataflow_match: 1.0 # CodeBLEU score: 0.8275 # With multiple reference files codebleu --refs ref1.txt ref2.txt --hyp predictions.txt --lang java # With custom weights (alpha, beta, gamma, theta) codebleu --refs references.txt --hyp predictions.txt --lang python --params 0.1,0.1,0.4,0.4 # Using as Python module from command line python -m codebleu --refs references.txt --hyp predictions.txt --lang python ``` -------------------------------- ### Install CodeBLEU via pip Source: https://github.com/k4black/codebleu/blob/main/README.md Commands to install the CodeBLEU package from PyPI or directly from the repository. ```bash pip install codebleu ``` ```bash pip install git+https://github.com/k4black/codebleu.git ``` -------------------------------- ### Install Development Dependencies Source: https://github.com/k4black/codebleu/blob/main/README.md Install the library with all precompiled languages and test extras. Requires an internet connection to download tree-sitter dependencies. ```bash python -m pip install -e .[all,test] ``` ```bash python -m pip install -e .\[all,test\] # for macos ``` -------------------------------- ### Install tree-sitter language dependencies Source: https://github.com/k4black/codebleu/blob/main/README.md Commands to install specific or all tree-sitter language parsers required for AST matching. ```bash pip install tree-sitter-python ``` ```bash pip install codebleu[all] ``` ```bash pip install pip install git+https://github.com/tree-sitter/tree-sitter-python.git ``` -------------------------------- ### Load and Use CodeBLEU Metric with HuggingFace Evaluate Source: https://context7.com/k4black/codebleu/llms.txt Shows how to load the CodeBLEU metric using HuggingFace's `evaluate` library and compute scores for single predictions and batch evaluations. Requires the `evaluate` library and the CodeBLEU metric to be installed. ```python import evaluate # Load the CodeBLEU metric from HuggingFace Hub metric = evaluate.load("dvitel/codebleu") # Single prediction evaluation ref = "def sum ( first , second ) :\n return second + first" pred = "def add ( a , b ) :\n return a + b" results = metric.compute( references=[ref], predictions=[pred], lang=["python"], # Note: must be a list weights=(0.25, 0.25, 0.25, 0.25) ) print(results) # Output: # { # 'codebleu': 0.5537, # 'ngram_match_score': 0.1041, # 'weighted_ngram_match_score': 0.1109, # 'syntax_match_score': 1.0, # 'dataflow_match_score': 1.0 # } # Batch evaluation for model comparison model_outputs = [ "def factorial ( n ) :\n if n <= 1 :\n return 1\n return n * factorial ( n - 1 )", "def fibonacci ( n ) :\n if n <= 1 :\n return n\n return fibonacci ( n - 1 ) + fibonacci ( n - 2 )" ] ground_truth = [ "def factorial ( n ) :\n result = 1\n for i in range ( 1 , n + 1 ) :\n result *= i\n return result", "def fib ( n ) :\n if n <= 1 :\n return n\n return fib ( n - 1 ) + fib ( n - 2 )" ] results = metric.compute( references=ground_truth, predictions=model_outputs, lang=["python"] ) print(f"Batch CodeBLEU: {results['codebleu']:.4f}") ``` -------------------------------- ### Calculate CodeBLEU for Rust Source: https://context7.com/k4black/codebleu/llms.txt Demonstrates calculating the CodeBLEU score for Rust code snippets using the `calc_codebleu` function. Ensure the `codebleu` library is installed. ```python rust_pred = "fn foo ( x ) -> i32 { x }" rust_ref = "fn bar ( y ) -> i32 { y }" result = calc_codebleu([rust_ref], [rust_pred], lang="rust") print(f"Rust CodeBLEU: {result['codebleu']:.4f}") ``` -------------------------------- ### Calculate CodeBLEU score Source: https://github.com/k4black/codebleu/blob/main/README.md Example usage of the calc_codebleu function to evaluate code similarity between a prediction and a reference. ```python from codebleu import calc_codebleu prediction = "def add ( a , b ) :\n return a + b" reference = "def sum ( first , second ) :\n return second + first" result = calc_codebleu([reference], [prediction], lang="python", weights=(0.25, 0.25, 0.25, 0.25), tokenizer=None) print(result) # { # 'codebleu': 0.5537, # 'ngram_match_score': 0.1041, # 'weighted_ngram_match_score': 0.1109, # 'syntax_match_score': 1.0, # 'dataflow_match_score': 1.0 # } ``` -------------------------------- ### Calculate CodeBLEU using pip package Source: https://github.com/k4black/codebleu/blob/main/evaluate_app/README.md Use `calc_codebleu` from the `codebleu` package to compute the CodeBLEU score. Ensure you have the necessary tree-sitter language installed. The `lang` parameter is required. ```python from codebleu import calc_codebleu prediction = "def add ( a , b ) :\n return a + b" reference = "def sum ( first , second ) :\n return second + first" result = calc_codebleu([reference], [prediction], lang="python", weights=(0.25, 0.25, 0.25, 0.25), tokenizer=None) print(result) ``` ```json { "codebleu": 0.5537, "ngram_match_score": 0.1041, "weighted_ngram_match_score": 0.1109, "syntax_match_score": 1.0, "dataflow_match_score": 1.0 } ``` -------------------------------- ### Clone the CodeBLEU Repository Source: https://github.com/k4black/codebleu/blob/main/README.md Use this command to create a local copy of the project from GitHub. ```bash git clone https://github.com/k4black/codebleu ``` -------------------------------- ### CodeBLEU with Evaluate Library Source: https://github.com/k4black/codebleu/blob/main/evaluate_app/README.md This snippet shows how to load and use the CodeBLEU metric via the `evaluate` library. ```APIDOC ## Load and Use CodeBLEU with Evaluate Library ### Description Loads the CodeBLEU metric using the `evaluate` library and computes the score. ### Method `evaluate.load("k4black/codebleu").compute(references, predictions, lang, weights, tokenizer)` ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body - **references** (list[str] or list[list[str]]) - Required - The reference code(s). - **predictions** (list[str]) - Required - The predicted code. - **lang** (str) - Required - The programming language of the code (e.g., 'python', 'c_sharp'). - **weights** (tuple[float,float,float,float]) - Optional - Weights for ngram_match, weighted_ngram_match, syntax_match, and dataflow_match. Defaults to (0.25, 0.25, 0.25, 0.25). - **tokenizer** (callable) - Optional - A function to tokenize code strings. Defaults to `s.split()`. ### Request Example ```python import evaluate metric = evaluate.load("k4black/codebleu") prediction = "def add ( a , b ) :\n return a + b" reference = "def sum ( first , second ) :\n return second + first" result = metric.compute([reference], [prediction], lang="python", weights=(0.25, 0.25, 0.25, 0.25), tokenizer=None) ``` ### Response #### Success Response (200) - **codebleu** (float) - The final CodeBLEU score. - **ngram_match_score** (float) - The ngram_match score (BLEU). - **weighted_ngram_match_score** (float) - The weighted_ngram_match score (BLEU-weighted). - **syntax_match_score** (float) - The syntax_match score (AST match). - **dataflow_match_score** (float) - The dataflow_match score (data-flow match). Each score is in the range [0, 1], where 1 is the best score. #### Response Example ```json { "codebleu": 0.5537, "ngram_match_score": 0.1041, "weighted_ngram_match_score": 0.1109, "syntax_match_score": 1.0, "dataflow_match_score": 1.0 } ``` ``` -------------------------------- ### Perform Style and Type Checks Source: https://github.com/k4black/codebleu/blob/main/README.md Run static analysis and formatting checks to ensure code quality. ```bash python -m isort codebleu --check python -m black codebleu --check python -m ruff codebleu python -m mypy codebleu ``` -------------------------------- ### Run Project Tests Source: https://github.com/k4black/codebleu/blob/main/README.md Execute the test suite using pytest. ```bash python -m pytest ``` -------------------------------- ### Evaluate using HuggingFace evaluate library Source: https://github.com/k4black/codebleu/blob/main/README.md Alternative usage of CodeBLEU via the HuggingFace evaluate library. ```python import evaluate metric = evaluate.load("dvitel/codebleu") prediction = "def add ( a , b ) :\n return a + b" reference = "def sum ( first , second ) :\n return second + first" result = metric.compute([reference], [prediction], lang="python", weights=(0.25, 0.25, 0.25, 0.25)) ``` -------------------------------- ### Calculate CodeBLEU using evaluate library Source: https://github.com/k4black/codebleu/blob/main/evaluate_app/README.md Load the CodeBLEU metric using `evaluate.load` and then use the `compute` method. The `lang` parameter is required. ```python import evaluate metric = evaluate.load("k4black/codebleu") prediction = "def add ( a , b ) :\n return a + b" reference = "def sum ( first , second ) :\n return second + first" result = metric.compute([reference], [prediction], lang="python", weights=(0.25, 0.25, 0.25, 0.25), tokenizer=None) ``` -------------------------------- ### Compute Syntax Match Score using corpus_syntax_match Source: https://context7.com/k4black/codebleu/llms.txt Demonstrates calculating the syntax match score by comparing Abstract Syntax Tree (AST) subtrees between reference and candidate code using the `corpus_syntax_match` function. This function requires lists of lists for references and lists for candidates, specifying the programming language. ```python from codebleu.syntax_match import corpus_syntax_match, calc_syntax_match # Single pair syntax matching references = [["def foo ( x ) :\n return x * x"]] candidates = ["def bar ( y ) :\n return y * y"] syntax_score = corpus_syntax_match(references, candidates, lang="python") print(f"Syntax Match Score: {syntax_score:.4f}") # Output: Syntax Match Score: 1.0 (identical AST structure) # Different structures references = [["def foo ( x ) :\n return x"]] candidates = ["def bar ( x , y ) :\n if x > y :\n return x\n return y"] syntax_score = corpus_syntax_match(references, candidates, lang="python") print(f"Syntax Match Score (different structure): {syntax_score:.4f}") # Batch evaluation across multiple samples references_batch = [ ["int foo ( int x ) { return x ; }"], ["public void bar ( ) { System.out.println ( \"hello\" ) ; }"] ] candidates_batch = [ "int bar ( int y ) { return y ; }", "public void foo ( ) { System.out.println ( \"world\" ) ; }" ] syntax_score = corpus_syntax_match(references_batch, candidates_batch, lang="java") print(f"Batch Syntax Score: {syntax_score:.4f}") # Single reference-candidate pair (convenience function) score = calc_syntax_match( references=["def foo ( x ) : return x"], candidate="def bar ( x ) : return x", lang="python" ) print(f"Single Pair Syntax Score: {score:.4f}") ``` -------------------------------- ### Load Tree-Sitter Language Parsers Source: https://context7.com/k4black/codebleu/llms.txt Dynamically loads language parsers for AST analysis and iterates through available languages. ```python from codebleu.utils import get_tree_sitter_language, AVAILABLE_LANGS from tree_sitter import Parser # Load Python language parser python_lang = get_tree_sitter_language("python") # Create parser and parse code parser = Parser() parser.language = python_lang code = b""" def fibonacci(n): if n <= 1: return n return fibonacci(n - 1) + fibonacci(n - 2) """ tree = parser.parse(code) root_node = tree.root_node print(f"Root node type: {root_node.type}") print(f"Children: {[child.type for child in root_node.children]}") # Parse Java code java_lang = get_tree_sitter_language("java") parser.language = java_lang java_code = b""" public class Main { public static void main(String[] args) { System.out.println("Hello"); } } """ tree = parser.parse(java_code) print(f"Java root: {tree.root_node.type}") # Iterate through all supported languages for lang in AVAILABLE_LANGS: try: lang_parser = get_tree_sitter_language(lang) print(f"✓ {lang} parser loaded successfully") except ImportError as e: print(f"✗ {lang}: {e}") ``` -------------------------------- ### Preprocess Code by Removing Comments and Docstrings Source: https://context7.com/k4black/codebleu/llms.txt Strips documentation and comments from source code to focus evaluation on logic, useful for improving CodeBLEU accuracy. ```python from codebleu.parser import remove_comments_and_docstrings # Python code with comments and docstrings python_code = ''' def factorial(n): """ Calculate the factorial of n. Args: n: A non-negative integer Returns: The factorial of n """ # Base case if n <= 1: return 1 # Return 1 for 0 and 1 # Recursive case return n * factorial(n - 1) ''' cleaned = remove_comments_and_docstrings(python_code, "python") print("Cleaned Python code:") print(cleaned) # Java code with comments java_code = ''' /** * Main application class */ public class Calculator { // Addition method public int add(int a, int b) { return a + b; // Return sum } } ''' cleaned_java = remove_comments_and_docstrings(java_code, "java") print("\nCleaned Java code:") print(cleaned_java) # Use in evaluation pipeline from codebleu import calc_codebleu # Code with heavy documentation ref_with_docs = ''' def add(a, b): """Add two numbers together.""" return a + b ''' pred_no_docs = ''' def add(x, y): return x + y ''' # Without preprocessing, docstrings affect the score result = calc_codebleu([ref_with_docs], [pred_no_docs], lang="python") print(f"\nWith docstrings in ref: {result['codebleu']:.4f}") # Preprocess both before evaluation ref_cleaned = remove_comments_and_docstrings(ref_with_docs, "python") pred_cleaned = remove_comments_and_docstrings(pred_no_docs, "python") result_cleaned = calc_codebleu([ref_cleaned], [pred_cleaned], lang="python") print(f"After preprocessing: {result_cleaned['codebleu']:.4f}") ``` -------------------------------- ### AVAILABLE_LANGS - Supported Languages Constant Source: https://context7.com/k4black/codebleu/llms.txt A list constant containing all programming languages supported by CodeBLEU for AST parsing and dataflow analysis. ```APIDOC ## AVAILABLE_LANGS - Supported Languages Constant ### Description A list constant containing all programming languages supported by CodeBLEU for AST parsing and dataflow analysis. Use this to validate language inputs before calling `calc_codebleu`. ### Value A list of strings, where each string is a supported language identifier. ### Example ```python from codebleu import AVAILABLE_LANGS print(AVAILABLE_LANGS) # Output: ['java', 'javascript', 'c_sharp', 'php', 'c', 'cpp', 'python', 'go', 'ruby', 'rust'] # Validate language before evaluation def evaluate_code(predictions, references, lang): if lang not in AVAILABLE_LANGS: raise ValueError(f"Language '{lang}' not supported. Choose from: {AVAILABLE_LANGS}") return calc_codebleu(references, predictions, lang) ``` ``` -------------------------------- ### Cite CodeBLEU Source: https://github.com/k4black/codebleu/blob/main/README.md BibTeX entry for citing the official CodeBLEU paper. ```bibtex @misc{ren2020codebleu, title={CodeBLEU: a Method for Automatic Evaluation of Code Synthesis}, author={Shuo Ren and Daya Guo and Shuai Lu and Long Zhou and Shujie Liu and Duyu Tang and Neel Sundaresan and Ming Zhou and Ambrosio Blanco and Shuai Ma}, year={2020}, eprint={2009.10297}, archivePrefix={arXiv}, primaryClass={cs.SE} } ``` -------------------------------- ### Evaluate Dataflow Matching Source: https://context7.com/k4black/codebleu/llms.txt Computes similarity scores based on dataflow structures. Supports single and batch comparisons across different programming languages. ```python references = [["def foo ( x ) :\n y = x * 2\n return y"]] candidates = ["def bar ( a ) :\n b = a * 2\n return b"] # Same dataflow structure (variable assigned from parameter, then returned) dataflow_score = corpus_dataflow_match(references, candidates, lang="python") print(f"Dataflow Match Score: {dataflow_score:.4f}") ``` ```python # Different dataflow patterns references = [["def foo ( x ) :\n return x"]] candidates = ["def bar ( x ) :\n y = x * x\n z = y + 1\n return z"] dataflow_score = corpus_dataflow_match(references, candidates, lang="python") print(f"Different Dataflow Score: {dataflow_score:.4f}") ``` ```java # Java example with complex dataflow java_refs = [["public int calc ( int x , int y ) { int sum = x + y ; return sum ; }"]] java_cands = ["public int compute ( int a , int b ) { int result = a + b ; return result ; }"] dataflow_score = corpus_dataflow_match(java_refs, java_cands, lang="java") print(f"Java Dataflow Score: {dataflow_score:.4f}") ``` ```python # Batch evaluation references_batch = [ ["def add ( a , b ) :\n return a + b"], ["def mult ( x , y ) :\n result = x * y\n return result"] ] candidates_batch = [ "def sum ( x , y ) :\n return x + y", "def product ( a , b ) :\n val = a * b\n return val" ] dataflow_score = corpus_dataflow_match(references_batch, candidates_batch, lang="python") print(f"Batch Dataflow Score: {dataflow_score:.4f}") ``` -------------------------------- ### Calculate Keyword-Weighted BLEU Source: https://context7.com/k4black/codebleu/llms.txt Computes the BLEU score based on tokenized references and hypotheses with keyword weighting. ```python ref_tokens = ['def', 'factorial', '(', 'n', ')', ':', 'if', 'n', '<=', '1', ':', 'return', '1', 'return', 'n', '*', 'factorial', '(', 'n', '-', '1', ')'] hyp_tokens = ['def', 'fact', '(', 'x', ')', ':', 'if', 'x', '<=', '1', ':', 'return', '1', 'return', 'x', '*', 'fact', '(', 'x', '-', '1', ')'] ref_weights = make_weights(ref_tokens, python_keywords) references_with_weights = [[[ref_tokens, ref_weights]]] score = weighted_corpus_bleu(references_with_weights, [hyp_tokens]) print(f"Keyword-Weighted BLEU: {score:.4f}") ``` -------------------------------- ### CodeBLEU Calculation Source: https://github.com/k4black/codebleu/blob/main/evaluate_app/README.md This snippet demonstrates how to calculate the CodeBLEU score using the `calc_codebleu` function from the `codebleu` package. ```APIDOC ## Calculate CodeBLEU Score ### Description Calculates the CodeBLEU score for given predictions and references. ### Method `calc_codebleu(references, predictions, lang, weights, tokenizer)` ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body - **references** (list[str] or list[list[str]]) - Required - The reference code(s). - **predictions** (list[str]) - Required - The predicted code. - **lang** (str) - Required - The programming language of the code (e.g., 'python', 'c_sharp'). - **weights** (tuple[float,float,float,float]) - Optional - Weights for ngram_match, weighted_ngram_match, syntax_match, and dataflow_match. Defaults to (0.25, 0.25, 0.25, 0.25). - **tokenizer** (callable) - Optional - A function to tokenize code strings. Defaults to `s.split()`. ### Request Example ```python from codebleu import calc_codebleu prediction = "def add ( a , b ) :\n return a + b" reference = "def sum ( first , second ) :\n return second + first" result = calc_codebleu([reference], [prediction], lang="python", weights=(0.25, 0.25, 0.25, 0.25), tokenizer=None) print(result) ``` ### Response #### Success Response (200) - **codebleu** (float) - The final CodeBLEU score. - **ngram_match_score** (float) - The ngram_match score (BLEU). - **weighted_ngram_match_score** (float) - The weighted_ngram_match score (BLEU-weighted). - **syntax_match_score** (float) - The syntax_match score (AST match). - **dataflow_match_score** (float) - The dataflow_match score (data-flow match). Each score is in the range [0, 1], where 1 is the best score. #### Response Example ```json { "codebleu": 0.5537, "ngram_match_score": 0.1041, "weighted_ngram_match_score": 0.1109, "syntax_match_score": 1.0, "dataflow_match_score": 1.0 } ``` ``` -------------------------------- ### get_tree_sitter_language Source: https://context7.com/k4black/codebleu/llms.txt Dynamically loads and returns the tree-sitter Language object for parsing code in the specified programming language. ```APIDOC ## get_tree_sitter_language ### Description Dynamically loads and returns the tree-sitter Language object for parsing code in the specified programming language. This is used internally but can also be used directly for custom AST analysis. ### Parameters #### Arguments - **lang** (string) - Required - The name of the programming language to load (e.g., "python", "java"). ### Response - **Language** (object) - The tree-sitter Language object for the requested language. ``` -------------------------------- ### Validate supported languages with AVAILABLE_LANGS Source: https://context7.com/k4black/codebleu/llms.txt Use the AVAILABLE_LANGS constant to ensure the target language is supported before performing evaluation. ```python from codebleu import AVAILABLE_LANGS # Check available languages print(AVAILABLE_LANGS) # Output: ['java', 'javascript', 'c_sharp', 'php', 'c', 'cpp', 'python', 'go', 'ruby', 'rust'] # Validate language before evaluation def evaluate_code(predictions, references, lang): if lang not in AVAILABLE_LANGS: raise ValueError(f"Language '{lang}' not supported. Choose from: {AVAILABLE_LANGS}") return calc_codebleu(references, predictions, lang) # Example with Java code java_pred = "public static int Sign ( double d ) { return ( int ) ( ( d == 0 ) ? 0 : ( d < 0 ) ? -1 : 1 ) ; }" java_ref = "public static int Sign ( double d ) { return ( int ) ( ( d == 0 ) ? 0 : ( d < 0 ) ? -1 : 1 ) ; }" result = calc_codebleu([java_ref], [java_pred], lang="java") print(f"Java CodeBLEU: {result['codebleu']:.4f}") # Example with JavaScript code js_pred = "function foo ( x ) { return x }" js_ref = "function bar ( y ) {\n return y\n}" result = calc_codebleu([js_ref], [js_pred], lang="javascript") print(f"JavaScript CodeBLEU: {result['codebleu']:.4f}") ``` -------------------------------- ### calc_codebleu - Main Evaluation Function Source: https://context7.com/k4black/codebleu/llms.txt The primary function for computing CodeBLEU scores. It takes reference code and predictions, then returns a dictionary containing the overall CodeBLEU score along with individual component scores. ```APIDOC ## calc_codebleu - Main Evaluation Function ### Description Computes CodeBLEU scores by comparing predicted code against reference code. It returns a dictionary with the overall CodeBLEU score and individual component scores (n-gram match, weighted n-gram match, syntax match, dataflow match). ### Method `calc_codebleu` ### Parameters - **references** (list of str or list of list of str) - Required - A list of reference code strings, or a list of lists of reference strings if multiple references are provided per prediction. - **predictions** (list of str) - Required - A list of predicted code strings. - **lang** (str) - Required - The programming language of the code (e.g., 'python', 'java'). Must be one of the languages listed in `AVAILABLE_LANGS`. - **weights** (tuple of float, optional) - Optional - A tuple of four floats representing the weights for n-gram match, weighted n-gram match, syntax match, and dataflow match, respectively. Defaults to (0.25, 0.25, 0.25, 0.25). - **tokenizer** (callable, optional) - Optional - A custom tokenizer function that takes a code string and returns a list of tokens. Defaults to `None`, which uses a whitespace tokenizer. ### Request Example ```python from codebleu import calc_codebleu prediction = "def add ( a , b ) :\n return a + b" reference = "def sum ( first , second ) :\n return second + first" result = calc_codebleu( references=[reference], predictions=[prediction], lang="python", weights=(0.25, 0.25, 0.25, 0.25), tokenizer=None ) print(result) ``` ### Response #### Success Response (dict) - **codebleu** (float) - The overall CodeBLEU score. - **ngram_match_score** (float) - The score for standard n-gram matching. - **weighted_ngram_match_score** (float) - The score for keyword-weighted n-gram matching. - **syntax_match_score** (float) - The score for Abstract Syntax Tree (AST) matching. - **dataflow_match_score** (float) - The score for data-flow analysis matching. #### Response Example ```json { "codebleu": 0.5537, "ngram_match_score": 0.1041, "weighted_ngram_match_score": 0.1109, "syntax_match_score": 1.0, "dataflow_match_score": 1.0 } ``` ``` -------------------------------- ### Compute Keyword-Weighted BLEU Source: https://context7.com/k4black/codebleu/llms.txt Calculates BLEU scores where specific tokens (like language keywords) are assigned higher importance weights. ```python from codebleu.weighted_ngram_match import corpus_bleu as weighted_corpus_bleu # Tokenized code with weights # Format: [[tokens, weights_dict], ...] hypothesis = ['def', 'foo', '(', 'x', ')', ':', 'return', 'x'] # Reference with keyword weights (keywords like 'def', 'return' have weight 1.0, others 0.2) reference_tokens = ['def', 'bar', '(', 'y', ')', ':', 'return', 'y'] weights = { 'def': 1.0, 'return': 1.0, # Keywords 'bar': 0.2, 'y': 0.2, '(': 0.2, ')': 0.2, ':': 0.2 # Non-keywords } references_with_weights = [[[reference_tokens, weights]]] hypotheses = [hypothesis] weighted_score = weighted_corpus_bleu(references_with_weights, hypotheses) print(f"Weighted N-gram Score: {weighted_score:.4f}") # Practical example with Python keywords python_keywords = ['def', 'return', 'if', 'else', 'for', 'while', 'class', 'import', 'from', 'in', 'not', 'and', 'or', 'True', 'False', 'None'] def make_weights(tokens, keywords): """Create weight dictionary: 1.0 for keywords, 0.2 for others""" return {token: 1.0 if token in keywords else 0.2 for token in tokens} ``` -------------------------------- ### Compute CodeBLEU scores with calc_codebleu Source: https://context7.com/k4black/codebleu/llms.txt The primary function for evaluating code similarity. It supports single or multiple references, custom weights, and user-defined tokenizers. ```python from codebleu import calc_codebleu # Basic usage: Compare predicted code against reference code prediction = "def add ( a , b ) :\n return a + b" reference = "def sum ( first , second ) :\n return second + first" result = calc_codebleu( references=[reference], predictions=[prediction], lang="python", weights=(0.25, 0.25, 0.25, 0.25), # Equal weights for all components tokenizer=None # Uses default whitespace tokenizer ) print(result) # Output: # { # 'codebleu': 0.5537, # 'ngram_match_score': 0.1041, # 'weighted_ngram_match_score': 0.1109, # 'syntax_match_score': 1.0, # 'dataflow_match_score': 1.0 # } # Multiple predictions and references predictions = [ "def foo ( x ) :\n return x * x", "def bar ( x ) :\n return x" ] references = [ "def bar ( x ) :\n return x", "def foo ( x ) :\n return x" ] result = calc_codebleu(references, predictions, lang="python") print(f"CodeBLEU: {result['codebleu']:.4f}") # Using multiple references per prediction (list of lists format) predictions = ["def foo ( x ) : pass"] references = [["def bar ( x ) : pass", "def foo ( x ) : pass"]] # Multiple valid references result = calc_codebleu(references, predictions, lang="python") print(f"CodeBLEU with multiple refs: {result['codebleu']:.4f}") # Custom weights emphasizing syntax correctness result = calc_codebleu( references=["def foo ( x ) :\n return x"], predictions=["def bar ( x ) :\n return x"], lang="python", weights=(0.1, 0.1, 0.4, 0.4) # Higher weight for syntax and dataflow ) print(f"Syntax-weighted CodeBLEU: {result['codebleu']:.4f}") # Custom tokenizer for specialized tokenization def custom_tokenizer(code): """Custom tokenizer that handles camelCase splitting""" import re tokens = code.split() expanded = [] for token in tokens: # Split camelCase parts = re.findall(r'[A-Z]?[a-z]+|[A-Z]+(?=[A-Z]|$)', token) expanded.extend(parts if parts else [token]) return expanded result = calc_codebleu( references=["def calculateSum ( numbers ) :\n return sum ( numbers )"], predictions=["def computeTotal ( values ) :\n return sum ( values )"], lang="python", tokenizer=custom_tokenizer ) ``` -------------------------------- ### Compute Dataflow Match Score using corpus_dataflow_match Source: https://context7.com/k4black/codebleu/llms.txt This section is intended to describe the `corpus_dataflow_match` function for analyzing data-flow graphs (DFG) to compare logical variable dependencies. The provided code snippet is incomplete. ```python from codebleu.dataflow_match import corpus_dataflow_match, calc_dataflow_match ``` -------------------------------- ### calc_codebleu Source: https://github.com/k4black/codebleu/blob/main/README.md Calculates the CodeBLEU score by comparing reference code against predicted code using weighted metrics. ```APIDOC ## calc_codebleu ### Description Calculates the CodeBLEU score, which is a weighted combination of n-gram match, weighted n-gram match, AST match, and data-flow match scores. ### Parameters - **references** (list[str] or list[list[str]]) - Required - The reference code snippets. - **predictions** (list[str]) - Required - The predicted code snippets. - **lang** (str) - Required - The programming language of the code (e.g., python, c_sharp, c, cpp, javascript, java, php, go, ruby). - **weights** (tuple[float, float, float, float]) - Optional - Weights for ngram_match, weighted_ngram_match, syntax_match, and dataflow_match. Defaults to (0.25, 0.25, 0.25, 0.25). - **tokenizer** (callable) - Optional - Function to split code string into tokens. Defaults to s.split(). ### Response #### Success Response (200) - **codebleu** (float) - The final CodeBLEU score. - **ngram_match_score** (float) - The BLEU score. - **weighted_ngram_match_score** (float) - The weighted BLEU score. - **syntax_match_score** (float) - The AST match score. - **dataflow_match_score** (float) - The data-flow match score. ### Response Example { "codebleu": 0.5537, "ngram_match_score": 0.1041, "weighted_ngram_match_score": 0.1109, "syntax_match_score": 1.0, "dataflow_match_score": 1.0 } ``` -------------------------------- ### Compute N-gram BLEU Scores Source: https://context7.com/k4black/codebleu/llms.txt Calculates standard BLEU scores for code evaluation, including support for custom weights and smoothing functions for short sequences. ```python from codebleu.bleu import corpus_bleu, sentence_bleu, SmoothingFunction # Tokenized code for BLEU computation hypothesis = ['def', 'foo', '(', 'x', ')', ':', 'return', 'x'] reference = ['def', 'bar', '(', 'x', ')', ':', 'return', 'x'] # Sentence-level BLEU score = sentence_bleu([reference], hypothesis) print(f"Sentence BLEU: {score:.4f}") # Corpus-level BLEU with multiple samples hypotheses = [ ['def', 'add', '(', 'a', ',', 'b', ')', ':', 'return', 'a', '+', 'b'], ['def', 'mult', '(', 'x', ',', 'y', ')', ':', 'return', 'x', '*', 'y'] ] list_of_references = [ [['def', 'sum', '(', 'x', ',', 'y', ')', ':', 'return', 'x', '+', 'y']], [['def', 'product', '(', 'a', ',', 'b', ')', ':', 'return', 'a', '*', 'b']] ] corpus_score = corpus_bleu(list_of_references, hypotheses) print(f"Corpus BLEU: {corpus_score:.4f}") # Custom weights for different n-gram orders # BLEU-2 (unigrams and bigrams only) weights_bleu2 = (0.5, 0.5, 0.0, 0.0) score = sentence_bleu([reference], hypothesis, weights=weights_bleu2) print(f"BLEU-2 Score: {score:.4f}") # Using smoothing for short sequences chencherry = SmoothingFunction() short_hyp = ['return', 'x'] short_ref = ['return', 'y'] # Without smoothing, may return 0 for short sequences score_no_smooth = sentence_bleu([short_ref], short_hyp) print(f"Without smoothing: {score_no_smooth:.4f}") # With smoothing (method1 adds epsilon to zero counts) score_smoothed = sentence_bleu([short_ref], short_hyp, smoothing_function=chencherry.method1) print(f"With smoothing: {score_smoothed:.4f}") ``` -------------------------------- ### parser.remove_comments_and_docstrings Source: https://context7.com/k4black/codebleu/llms.txt Removes comments and docstrings from source code before evaluation to focus on actual code logic. ```APIDOC ## parser.remove_comments_and_docstrings ### Description Removes comments and docstrings from source code before evaluation. This ensures that documentation differences don't affect the CodeBLEU score, focusing the evaluation on actual code logic. ### Parameters #### Arguments - **source_code** (string) - Required - The raw source code string. - **lang** (string) - Required - The programming language of the source code. ### Response - **cleaned_code** (string) - The source code with all comments and docstrings removed. ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.