### Example Java Code Context Extraction Source: https://github.com/tech-srl/code2vec/blob/master/README.md Illustrates a potential novel Java context extraction format for code2vec, differing from the standard AST path approach. It shows a space-delimited list of tokens and ternary contexts. ```java void fooBar() { System.out.println("Hello World"); } ``` -------------------------------- ### Perform Manual Model Prediction Source: https://github.com/tech-srl/code2vec/blob/master/README.md Starts the interactive prediction mode to test the model on specific Java code snippets provided in Input.java. ```bash python3 code2vec.py --load models/java14_model/saved_model_iter8.release --predict ``` -------------------------------- ### Preprocess New Java Datasets Source: https://github.com/tech-srl/code2vec/blob/master/README.md Prepares a custom dataset for training by executing the preprocess script. Users must configure the script to point to their specific data directories. ```bash source preprocess.sh ``` -------------------------------- ### Train code2vec Model from Scratch Source: https://github.com/tech-srl/code2vec/blob/master/README.md Initiates the training process using the configured dataset. Hyper-parameters can be adjusted in config.py before execution. ```bash source train.sh ``` -------------------------------- ### Configure and Execute Training Source: https://context7.com/tech-srl/code2vec/llms.txt A template script for training a new model, allowing users to define dataset paths, model directories, and execution parameters. ```bash type=java14m dataset_name=java14m data_dir=data/${dataset_name} data=${data_dir}/${dataset_name} test_data=${data_dir}/${dataset_name}.val.c2v model_dir=models/${type} mkdir -p ${model_dir} set -e python3 -u code2vec.py --data ${data} --test ${test_data} --save ${model_dir}/saved_model ``` -------------------------------- ### Train code2vec Model with Keras and TensorBoard Source: https://context7.com/tech-srl/code2vec/llms.txt This command initiates the training of a code2vec model using the Keras backend. It specifies the data and test sets, the directory for saving the model, and enables TensorBoard for visualization of training progress. The verbose flag controls the logging level. ```bash python3 -u code2vec.py --framework keras --data ${data} --test ${test_data} --save ${model_dir}/saved_model --tensorboard --verbose 2 ``` -------------------------------- ### Evaluate Trained Model Source: https://github.com/tech-srl/code2vec/blob/master/README.md Runs the model against a test dataset to calculate performance metrics. Results are logged to log.txt. ```bash python3 code2vec.py --load models/java14_model/saved_model_iter8.release --test data/java14m/java14m.test.c2v ``` -------------------------------- ### Download Java-small Dataset for code2vec Source: https://github.com/tech-srl/code2vec/blob/master/README.md Downloads the Java-small dataset, which is preprocessed for training code2vec models. This dataset is derived from Allamanis et al. (ICML'2016) and split by-project. ```bash wget https://s3.amazonaws.com/code2vec/data/java-small_data.tar.gz ``` -------------------------------- ### Download Datasets and Models Source: https://context7.com/tech-srl/code2vec/llms.txt Provides shell commands to retrieve pre-trained models and various datasets for the Code2Vec framework from S3 storage. ```bash wget https://s3.amazonaws.com/code2vec/data/java14m_data.tar.gz tar -xvzf java14m_data.tar.gz wget https://s3.amazonaws.com/code2vec/model/java14m_model.tar.gz tar -xvzf java14m_model.tar.gz wget https://s3.amazonaws.com/code2vec/model/java14m_model_trainable.tar.gz tar -xvzf java14m_model_trainable.tar.gz ``` -------------------------------- ### Manage Vocabulary Indices and Persistence Source: https://context7.com/tech-srl/code2vec/llms.txt Demonstrates how to map tokens to indices and vice versa, access special tokens, and persist vocabulary state to disk. This is essential for preparing data for model input. ```python word_index = token_vocab.word_to_index.get('System', token_vocab.word_to_index['']) print(f"Index of 'System': {word_index}") word = token_vocab.index_to_word.get(100, '') print(f"Word at index 100: {word}") print(f"OOV token: {token_vocab.special_words.OOV}") vocab = vocabs.get(VocabType.Token) vocabs.save('path/to/dictionaries.bin') ``` -------------------------------- ### Download Pre-trained code2vec Models Source: https://github.com/tech-srl/code2vec/blob/master/README.md Downloads either a stripped model for inference or a full model for continued training. The stripped version is optimized for prediction tasks. ```bash wget https://s3.amazonaws.com/code2vec/model/java14m_model.tar.gz tar -xvzf java14m_model.tar.gz wget https://s3.amazonaws.com/code2vec/model/java14m_model_trainable.tar.gz tar -xvzf java14m_model_trainable.tar ``` -------------------------------- ### Train code2vec Model with Console/File Logging Source: https://context7.com/tech-srl/code2vec/llms.txt This command trains a code2vec model and demonstrates how to manage training logs. By default, logs are written to the console. The `--logs-path` argument can be used to redirect these logs to a specified file for persistent storage and later analysis. ```bash python3 -u code2vec.py --data ${data} --test ${test_data} --save ${model_dir}/saved_model --logs-path ${model_dir}/training.log ``` -------------------------------- ### Download and Extract code2vec Datasets Source: https://github.com/tech-srl/code2vec/blob/master/README.md Downloads the preprocessed Java dataset from S3 and extracts the archive. This provides the necessary training, test, and validation files for the model. ```bash wget https://s3.amazonaws.com/code2vec/data/java14m_data.tar.gz tar -xvzf java14m_data.tar.gz ``` -------------------------------- ### Download Java-large Dataset for code2vec Source: https://github.com/tech-srl/code2vec/blob/master/README.md Downloads the Java-large dataset, preprocessed for code2vec. This dataset includes 9500 top-starred Java projects created since January 2007. ```bash wget https://s3.amazonaws.com/code2vec/data/java-large_data.tar.gz ``` -------------------------------- ### Code2vec CLI: Train, Evaluate, Predict, and Export Source: https://context7.com/tech-srl/code2vec/llms.txt Command-line interface for code2vec. Supports training, evaluation, interactive prediction, model release, and exporting token/target embeddings and code vectors. Uses TensorFlow or Keras. ```bash python3 code2vec.py --data data/java14m/java14m --test data/java14m/java14m.val.c2v --save models/java14m/saved_model ``` ```bash python3 code2vec.py --load models/java14m/saved_model_iter8.release --test data/java14m/java14m.test.c2v ``` ```bash python3 code2vec.py --load models/java14m/saved_model_iter8.release --predict ``` ```bash python3 code2vec.py --framework keras --load models/java14m/saved_model_iter8.release --predict ``` ```bash python3 code2vec.py --load models/java14m/saved_model_iter8 --release ``` ```bash python3 code2vec.py --load models/java14m/saved_model_iter8.release --save_w2v models/java14m/tokens.txt ``` ```bash python3 code2vec.py --load models/java14m/saved_model_iter8.release --save_t2v models/java14m/targets.txt ``` ```bash python3 code2vec.py --load models/java14m/saved_model_iter8.release --test data/java14m/java14m.test.c2v --export_code_vectors ``` -------------------------------- ### Download Trainable Java-large Model for code2vec Source: https://github.com/tech-srl/code2vec/blob/master/README.md Downloads a trainable code2vec model that was trained on the Java-large dataset. This model can be used as a baseline or further fine-tuned. ```bash wget https://code2vec.s3.amazonaws.com/model/java-large-model.tar.gz ``` -------------------------------- ### Data Preprocessing Pipeline Source: https://context7.com/tech-srl/code2vec/llms.txt Scripts to convert raw Java source code into the format required for training, including path extraction and vocabulary creation. ```APIDOC ## Data Preprocessing Pipeline ### Description The preprocessing scripts convert raw Java source code into the format required for training, including path extraction and vocabulary creation. ### Setup Edit `preprocess.sh` to set your data directories: ```bash TRAIN_DIR=my_train_dir VAL_DIR=my_val_dir TEST_DIR=my_test_dir DATASET_NAME=my_dataset ``` ### Running the Pipeline ```bash source preprocess.sh ``` ### Step 1: Extract Paths ```bash python3 JavaExtractor/extract.py --dir ${VAL_DIR} --max_path_length 8 --max_path_width 2 --num_threads 64 --jar JavaExtractor/JPredict/target/JavaExtractor-0.0.1-SNAPSHOT.jar > dataset.val.raw.txt ``` ### Step 2: Create Histograms ```bash # Target vocabulary histogram cat train.raw.txt | cut -d' ' -f1 | awk '{n[$0]++} END {for (i in n) print i,n[i]}' > histo.tgt.c2v # Original token vocabulary histogram cat train.raw.txt | cut -d' ' -f2- | tr ' ' '\n' | cut -d',' -f1,3 | tr ',' '\n' | awk '{n[$0]++} END {for (i in n) print i,n[i]}' > histo.ori.c2v # Path vocabulary histogram cat train.raw.txt | cut -d' ' -f2- | tr ' ' '\n' | cut -d',' -f2 | awk '{n[$0]++} END {for (i in n) print i,n[i]}' > histo.path.c2v ``` ### Step 3: Build Vocabularies and Create Dataset Files ```bash python3 preprocess.py --train_data train.raw.txt --test_data test.raw.txt --val_data val.raw.txt \ --max_contexts 200 --word_vocab_size 1301136 --path_vocab_size 911417 --target_vocab_size 261245 \ --word_histogram histo.ori.c2v --path_histogram histo.path.c2v --target_histogram histo.tgt.c2v \ --output_name data/my_dataset/my_dataset ``` ``` -------------------------------- ### Config Class: Managing Hyperparameters and Runtime Settings Source: https://context7.com/tech-srl/code2vec/llms.txt Python class for managing code2vec hyperparameters and runtime configuration. Loads defaults and command-line arguments, providing access to training, testing, and model settings. ```python from config import Config # Create config with defaults and load from command-line args config = Config(set_defaults=True, load_from_args=True, verify=True) # Access training hyperparameters print(f"Batch size: {config.TRAIN_BATCH_SIZE}") # Default: 1024 print(f"Epochs: {config.NUM_TRAIN_EPOCHS}") # Default: 20 print(f"Max contexts: {config.MAX_CONTEXTS}") # Default: 200 print(f"Embedding size: {config.DEFAULT_EMBEDDINGS_SIZE}") # Default: 128 print(f"Dropout keep rate: {config.DROPOUT_KEEP_RATE}") # Default: 0.75 # Check runtime mode if config.is_training: print(f"Training data: {config.train_data_path}") if config.is_testing: print(f"Test data: {config.TEST_DATA_PATH}") if config.is_loading: print(f"Loading model from: {config.MODEL_LOAD_PATH}") # Vocabulary size limits print(f"Max token vocab: {config.MAX_TOKEN_VOCAB_SIZE}") # Default: 1301136 print(f"Max target vocab: {config.MAX_TARGET_VOCAB_SIZE}") # Default: 261245 print(f"Max path vocab: {config.MAX_PATH_VOCAB_SIZE}") # Default: 911417 # Code vector size (computed from embeddings) print(f"Code vector size: {config.context_vector_size}") # PATH_EMBEDDINGS_SIZE + 2 * TOKEN_EMBEDDINGS_SIZE ``` -------------------------------- ### Continue Training code2vec Model from Checkpoint Source: https://context7.com/tech-srl/code2vec/llms.txt This command allows for resuming training of a code2vec model from a previously saved checkpoint. It loads the model from the specified checkpoint path and continues saving subsequent checkpoints to the new save directory. ```bash python3 -u code2vec.py --data ${data} --test ${test_data} --load ${model_dir}/saved_model_iter5 --save ${model_dir}/saved_model ``` -------------------------------- ### Configure Code2vec Training Parameters Source: https://github.com/tech-srl/code2vec/blob/master/README.md Defines essential configuration constants for the training process, including batch sizes, vocabulary limits, and embedding dimensions. These settings control memory usage, model capacity, and training throughput. ```python config.SAVE_EVERY_EPOCHS = 1 config.TRAIN_BATCH_SIZE = 1024 config.MAX_CONTEXTS = 200 config.MAX_TOKEN_VOCAB_SIZE = 1301136 config.DEFAULT_EMBEDDINGS_SIZE = 128 config.DROPOUT_KEEP_RATE = 0.75 ``` -------------------------------- ### Download Released Java-large Model for code2vec Source: https://github.com/tech-srl/code2vec/blob/master/README.md Downloads a released, non-trainable code2vec model. This model was used as a baseline in the code2seq paper and is ready for inference. ```bash wget https://code2vec.s3.amazonaws.com/model/java-large-released-model.tar.gz ``` -------------------------------- ### Download Java-med Dataset for code2vec Source: https://github.com/tech-srl/code2vec/blob/master/README.md Downloads the Java-med dataset, preprocessed for code2vec. This dataset comprises 1000 top-starred Java projects from GitHub, suitable for training. ```bash wget https://s3.amazonaws.com/code2vec/data/java-med_data.tar.gz ``` -------------------------------- ### Data Preprocessing Pipeline for Code2Vec Source: https://context7.com/tech-srl/code2vec/llms.txt A bash script and associated Python commands to preprocess raw Java source code into the format required for training code2vec models. This includes path extraction, vocabulary histogram creation, and final dataset file generation. ```bash # Edit preprocess.sh to set your data directories # TRAIN_DIR=my_train_dir # VAL_DIR=my_val_dir # TEST_DIR=my_test_dir # DATASET_NAME=my_dataset # Run the preprocessing pipeline source preprocess.sh # The script performs: # 1. Extract paths from Java files using JavaExtractor python3 JavaExtractor/extract.py --dir ${VAL_DIR} --max_path_length 8 --max_path_width 2 --num_threads 64 --jar JavaExtractor/JPredict/target/JavaExtractor-0.0.1-SNAPSHOT.jar > dataset.val.raw.txt # 2. Create histograms for vocabulary building cat train.raw.txt | cut -d' ' -f1 | awk '{n[$0]++} END {for (i in n) print i,n[i]}' > histo.tgt.c2v cat train.raw.txt | cut -d' ' -f2- | tr ' ' '\n' | cut -d',' -f1,3 | tr ',' '\n' | awk '{n[$0]++} END {for (i in n) print i,n[i]}' > histo.ori.c2v cat train.raw.txt | cut -d' ' -f2- | tr ' ' '\n' | cut -d',' -f2 | awk '{n[$0]++} END {for (i in n) print i,n[i]}' > histo.path.c2v # 3. Build vocabularies and create final dataset files python3 preprocess.py --train_data train.raw.txt --test_data test.raw.txt --val_data val.raw.txt \ --max_contexts 200 --word_vocab_size 1301136 --path_vocab_size 911417 --target_vocab_size 261245 \ --word_histogram histo.ori.c2v --path_histogram histo.path.c2v --target_histogram histo.tgt.c2v \ --output_name data/my_dataset/my_dataset ``` -------------------------------- ### Interactive Predictor for Method Name Prediction Source: https://context7.com/tech-srl/code2vec/llms.txt Enables interactive method name prediction from Java source files. It loads a model, extracts AST paths, and displays predictions with probability scores and attention weights. Requires configuration and a Code2VecModel. ```python from config import Config from interactive_predict import InteractivePredictor # Set up for interactive prediction config = Config(set_defaults=True, load_from_args=True, verify=True) # Load model (assuming --load and --predict flags are set) if config.DL_FRAMEWORK == 'keras': from keras_model import Code2VecModel else: from tensorflow_model import Code2VecModel model = Code2VecModel(config) # Create predictor and start interactive session predictor = InteractivePredictor(config, model) predictor.predict() ``` -------------------------------- ### Vocabulary Management Source: https://context7.com/tech-srl/code2vec/llms.txt Manages token, path, and target vocabularies with TensorFlow lookup tables for efficient training and inference. ```APIDOC ## Vocabulary Management ### Description The `Code2VecVocabs` class manages token, path, and target vocabularies with TensorFlow lookup tables for efficient training and inference. ### Initialization and Access ```python from vocabularies import Code2VecVocabs, Vocab, VocabType from config import Config config = Config(set_defaults=True, load_from_args=True, verify=True) # Vocabularies are automatically loaded/created during model initialization vocabs = Code2VecVocabs(config) # Access individual vocabularies token_vocab = vocabs.token_vocab path_vocab = vocabs.path_vocab target_vocab = vocabs.target_vocab print(f"Token vocab size: {token_vocab.size}") print(f"Path vocab size: {path_vocab.size}") print(f"Target vocab size: {target_vocab.size}") ``` ``` -------------------------------- ### Exporting Token and Target Embeddings Source: https://github.com/tech-srl/code2vec/blob/master/README.md Use the --save_w2v and --save_t2v flags to export token and target embedding matrices from a trained model into a text file formatted for word2vec. ```bash python3 code2vec.py --load models/java14_model/saved_model_iter8.release --save_w2v models/java14_model/tokens.txt python3 code2vec.py --load models/java14_model/saved_model_iter8.release --save_t2v models/java14_model/targets.txt ``` -------------------------------- ### Release Trained Model for Inference Source: https://github.com/tech-srl/code2vec/blob/master/README.md Command to release a trained model, which creates a optimized version for inference that typically consumes significantly less disk space. This process is intended for models that no longer require further training. ```bash python3 code2vec.py --load models/java14_model/saved_model_iter8 --release ``` -------------------------------- ### Code2VecVocabs for Vocabulary Management Source: https://context7.com/tech-srl/code2vec/llms.txt Manages token, path, and target vocabularies for the code2vec model using TensorFlow lookup tables. Vocabularies are loaded or created during model initialization, and individual vocabularies can be accessed. ```python from vocabularies import Code2VecVocabs, Vocab, VocabType from config import Config config = Config(set_defaults=True, load_from_args=True, verify=True) # Vocabularies are automatically loaded/created during model initialization vocabs = Code2VecVocabs(config) # Access individual vocabularies token_vocab = vocabs.token_vocab path_vocab = vocabs.path_vocab target_vocab = vocabs.target_vocab print(f"Token vocab size: {token_vocab.size}") print(f"Path vocab size: {path_vocab.size}") print(f"Target vocab size: {target_vocab.size}") ``` -------------------------------- ### Code2VecModelBase: Abstract Model Interface Source: https://context7.com/tech-srl/code2vec/llms.txt Abstract base class defining the interface for code2vec model implementations. Provides methods for dynamic loading based on framework (TensorFlow/Keras), training, evaluation, and prediction. ```python from config import Config from vocabularies import VocabType # Load model dynamically based on framework config def load_model_dynamically(config: Config): assert config.DL_FRAMEWORK in {'tensorflow', 'keras'} if config.DL_FRAMEWORK == 'tensorflow': from tensorflow_model import Code2VecModel elif config.DL_FRAMEWORK == 'keras': from keras_model import Code2VecModel return Code2VecModel(config) # Initialize and use the model config = Config(set_defaults=True, load_from_args=True, verify=True) model = load_model_dynamically(config) # Train the model if config.is_training: model.train() # Evaluate on test set if config.is_testing: results = model.evaluate() print(f"Top-k accuracy: {results.topk_acc}") print(f"Subtoken precision: {results.subtoken_precision}") print(f"Subtoken recall: {results.subtoken_recall}") print(f"Subtoken F1: {results.subtoken_f1}") print(f"Loss: {results.loss}") ``` -------------------------------- ### Extractor Class for Java Path Extraction Source: https://context7.com/tech-srl/code2vec/llms.txt Provides a Python interface to extract Abstract Syntax Tree (AST) paths from Java source code. It converts code into path-context representations suitable for the model. Requires a configured Extractor object and a Java file. ```python from extractor import Extractor from config import Config config = Config(set_defaults=True) config.MAX_CONTEXTS = 200 # Initialize extractor with Java JAR path extractor = Extractor( config=config, jar_path='JavaExtractor/JPredict/target/JavaExtractor-0.0.1-SNAPSHOT.jar', max_path_length=8, max_path_width=2 ) # Extract paths from a Java file try: predict_lines, hash_to_string_dict = extractor.extract_paths('Input.java') # predict_lines contains model input format: # "method_name token1,path_hash,token2 token3,path_hash2,token4 ..." for line in predict_lines: print(line) # hash_to_string_dict maps path hashes back to readable strings for hash_val, path_str in hash_to_string_dict.items(): print(f"{hash_val} -> {path_str}") except ValueError as e: print(f"Extraction error: {e}") # The java_string_hashcode static method replicates Java's String.hashCode() path = "MethodDeclaration|SimpleName|MethodInvocation" hash_code = Extractor.java_string_hashcode(path) print(f"Hash of '{path}': {hash_code}") ``` -------------------------------- ### Inspecting Embeddings with Gensim Source: https://github.com/tech-srl/code2vec/blob/master/README.md Load exported word2vec format files into Python using the gensim library to perform similarity analysis and vector operations. ```python from gensim.models import KeyedVectors as word2vec vectors_text_path = 'models/java14_model/targets.txt' model = word2vec.load_word2vec_format(vectors_text_path, binary=False) model.most_similar(positive=['equals', 'to|lower']) model.most_similar(positive=['download', 'send'], negative=['receive']) ``` -------------------------------- ### InteractivePredictor Class Source: https://context7.com/tech-srl/code2vec/llms.txt Enables interactive method name prediction from Java source files, displaying predictions with probability scores and attention weights. ```APIDOC ## InteractivePredictor Class ### Description The `InteractivePredictor` class enables interactive method name prediction from Java source files, displaying predictions with probability scores and attention weights. ### Initialization ```python from config import Config from interactive_predict import InteractivePredictor config = Config(set_defaults=True, load_from_args=True, verify=True) if config.DL_FRAMEWORK == 'keras': from keras_model import Code2VecModel else: from tensorflow_model import Code2VecModel model = Code2VecModel(config) predictor = InteractivePredictor(config, model) ``` ### Usage ```python predictor.predict() ``` ### Functionality The `predict()` method prompts the user to modify `Input.java`, extracts AST paths, and displays predictions with probabilities and attention scores. ### Example Output Format ``` Original name: fooBar (0.85) predicted: foo|bar (0.10) predicted: process|data (0.03) predicted: handle|request Attention: 0.15 context: System,MethodInvocation->Name,println 0.12 context: void,Method->ReturnType,METHOD_NAME ``` ``` -------------------------------- ### Analyze Embeddings with Gensim Source: https://context7.com/tech-srl/code2vec/llms.txt Utilizes Gensim's KeyedVectors to load exported Code2Vec embeddings. It supports semantic similarity queries, arithmetic operations on method names, and analogy testing. ```python from gensim.models import KeyedVectors as word2vec token_vectors = word2vec.load_word2vec_format('models/java14m/tokens.txt', binary=False) target_vectors = word2vec.load_word_vec_format('models/java14m/targets.txt', binary=False) similar_to_equals = target_vectors.most_similar(positive=['equals'], topn=5) result = target_vectors.most_similar(positive=['equals', 'to|lower'], topn=3) result_analogy = target_vectors.most_similar(positive=['download', 'send'], negative=['receive'], topn=3) similarity = token_vectors.similarity('String', 'Integer') ``` -------------------------------- ### Exporting Code Vectors Source: https://github.com/tech-srl/code2vec/blob/master/README.md Use the --export_code_vectors flag to generate vector representations for specific code snippets. When used with --test, it saves to a file; with --predict, it outputs to the console. ```bash python3 code2vec.py --load models/java14_model/saved_model_iter8.release --test test_file.java --export_code_vectors python3 code2vec.py --load models/java14_model/saved_model_iter8.release --predict --export_code_vectors ``` -------------------------------- ### Extractor Class Source: https://context7.com/tech-srl/code2vec/llms.txt Provides a Python interface to the Java AST path extraction tool, converting source code into path-context representations for the model. ```APIDOC ## Extractor Class ### Description The `Extractor` class provides a Python interface to the Java AST path extraction tool, converting source code into path-context representations for the model. ### Initialization ```python from extractor import Extractor from config import Config config = Config(set_defaults=True) config.MAX_CONTEXTS = 200 extractor = Extractor( config=config, jar_path='JavaExtractor/JPredict/target/JavaExtractor-0.0.1-SNAPSHOT.jar', max_path_length=8, max_path_width=2 ) ``` ### Path Extraction ```python try: predict_lines, hash_to_string_dict = extractor.extract_paths('Input.java') for line in predict_lines: print(line) for hash_val, path_str in hash_to_string_dict.items(): print(f"{hash_val} -> {path_str}") except ValueError as e: print(f"Extraction error: {e}") ``` ### Hash Calculation ```python path = "MethodDeclaration|SimpleName|MethodInvocation" hash_code = Extractor.java_string_hashcode(path) print(f"Hash of '{path}': {hash_code}") ``` ### Output Format (`predict_lines`) ``` method_name token1,path_hash,token2 token3,path_hash2,token4 ... ``` ``` -------------------------------- ### Handle Model Evaluation and Prediction Structures Source: https://context7.com/tech-srl/code2vec/llms.txt Defines how to interpret the ModelEvaluationResults and ModelPredictionResults named tuples. These structures hold metrics like accuracy, precision, recall, and attention weights. ```python from model_base import ModelEvaluationResults, ModelPredictionResults results = ModelEvaluationResults(topk_acc=[0.45, 0.52, 0.58, 0.62, 0.65], subtoken_precision=0.72, subtoken_recall=0.68, subtoken_f1=0.70, loss=2.15) prediction = ModelPredictionResults(original_name='processUserInput', topk_predicted_words=['process|user|input', 'handle|input', 'parse|input'], topk_predicted_words_scores=[0.75, 0.12, 0.08], attention_per_context={('user', 'MethodDeclaration|Name', 'input'): 0.15}, code_vector=[0.1, -0.2, 0.3]) ``` -------------------------------- ### Save Word2Vec Embeddings Source: https://context7.com/tech-srl/code2vec/llms.txt Saves model embeddings in word2vec format for both tokens and targets. It also includes a step to clean up the model session. ```python model.save_word2vec_format('tokens.txt', VocabType.Token) model.save_word2vec_format('targets.txt', VocabType.Target) # Clean up model.close_session() ``` -------------------------------- ### Model Embedding Saving Source: https://context7.com/tech-srl/code2vec/llms.txt Saves model embeddings in word2vec format for both tokens and targets. Also includes session cleanup. ```APIDOC ## Model Embedding Saving ### Description Saves model embeddings in word2vec format for tokens and targets, and cleans up the model session. ### Method ```python model.save_word2vec_format('tokens.txt', VocabType.Token) model.save_word2vec_format('targets.txt', VocabType.Target) model.close_session() ``` ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.