### Example Java Code Context Extraction

Source: https://github.com/tech-srl/code2vec/blob/master/README.md

Illustrates a potential novel Java context extraction format for code2vec, differing from the standard AST path approach. It shows a space-delimited list of tokens and ternary contexts.

```java
void fooBar() {
	System.out.println("Hello World");
}
```

--------------------------------

### Perform Manual Model Prediction

Source: https://github.com/tech-srl/code2vec/blob/master/README.md

Starts the interactive prediction mode to test the model on specific Java code snippets provided in Input.java.

```bash
python3 code2vec.py --load models/java14_model/saved_model_iter8.release --predict
```

--------------------------------

### Preprocess New Java Datasets

Source: https://github.com/tech-srl/code2vec/blob/master/README.md

Prepares a custom dataset for training by executing the preprocess script. Users must configure the script to point to their specific data directories.

```bash
source preprocess.sh
```

--------------------------------

### Train code2vec Model from Scratch

Source: https://github.com/tech-srl/code2vec/blob/master/README.md

Initiates the training process using the configured dataset. Hyper-parameters can be adjusted in config.py before execution.

```bash
source train.sh
```

--------------------------------

### Configure and Execute Training

Source: https://context7.com/tech-srl/code2vec/llms.txt

A template script for training a new model, allowing users to define dataset paths, model directories, and execution parameters.

```bash
type=java14m
dataset_name=java14m
data_dir=data/${dataset_name}
data=${data_dir}/${dataset_name}
test_data=${data_dir}/${dataset_name}.val.c2v
model_dir=models/${type}
mkdir -p ${model_dir}
set -e
python3 -u code2vec.py --data ${data} --test ${test_data} --save ${model_dir}/saved_model
```

--------------------------------

### Train code2vec Model with Keras and TensorBoard

Source: https://context7.com/tech-srl/code2vec/llms.txt

This command initiates the training of a code2vec model using the Keras backend. It specifies the data and test sets, the directory for saving the model, and enables TensorBoard for visualization of training progress. The verbose flag controls the logging level.

```bash
python3 -u code2vec.py --framework keras --data ${data} --test ${test_data}     --save ${model_dir}/saved_model --tensorboard --verbose 2
```

--------------------------------

### Evaluate Trained Model

Source: https://github.com/tech-srl/code2vec/blob/master/README.md

Runs the model against a test dataset to calculate performance metrics. Results are logged to log.txt.

```bash
python3 code2vec.py --load models/java14_model/saved_model_iter8.release --test data/java14m/java14m.test.c2v
```

--------------------------------

### Download Java-small Dataset for code2vec

Source: https://github.com/tech-srl/code2vec/blob/master/README.md

Downloads the Java-small dataset, which is preprocessed for training code2vec models. This dataset is derived from Allamanis et al. (ICML'2016) and split by-project.

```bash
wget https://s3.amazonaws.com/code2vec/data/java-small_data.tar.gz
```

--------------------------------

### Download Datasets and Models

Source: https://context7.com/tech-srl/code2vec/llms.txt

Provides shell commands to retrieve pre-trained models and various datasets for the Code2Vec framework from S3 storage.

```bash
wget https://s3.amazonaws.com/code2vec/data/java14m_data.tar.gz
tar -xvzf java14m_data.tar.gz
wget https://s3.amazonaws.com/code2vec/model/java14m_model.tar.gz
tar -xvzf java14m_model.tar.gz
wget https://s3.amazonaws.com/code2vec/model/java14m_model_trainable.tar.gz
tar -xvzf java14m_model_trainable.tar.gz
```

--------------------------------

### Manage Vocabulary Indices and Persistence

Source: https://context7.com/tech-srl/code2vec/llms.txt

Demonstrates how to map tokens to indices and vice versa, access special tokens, and persist vocabulary state to disk. This is essential for preparing data for model input.

```python
word_index = token_vocab.word_to_index.get('System', token_vocab.word_to_index['<OOV>'])
print(f"Index of 'System': {word_index}")
word = token_vocab.index_to_word.get(100, '<OOV>')
print(f"Word at index 100: {word}")
print(f"OOV token: {token_vocab.special_words.OOV}")
vocab = vocabs.get(VocabType.Token)
vocabs.save('path/to/dictionaries.bin')
```

--------------------------------

### Download Pre-trained code2vec Models

Source: https://github.com/tech-srl/code2vec/blob/master/README.md

Downloads either a stripped model for inference or a full model for continued training. The stripped version is optimized for prediction tasks.

```bash
wget https://s3.amazonaws.com/code2vec/model/java14m_model.tar.gz
tar -xvzf java14m_model.tar.gz

wget https://s3.amazonaws.com/code2vec/model/java14m_model_trainable.tar.gz
tar -xvzf java14m_model_trainable.tar
```

--------------------------------

### Train code2vec Model with Console/File Logging

Source: https://context7.com/tech-srl/code2vec/llms.txt

This command trains a code2vec model and demonstrates how to manage training logs. By default, logs are written to the console. The `--logs-path` argument can be used to redirect these logs to a specified file for persistent storage and later analysis.

```bash
python3 -u code2vec.py --data ${data} --test ${test_data} --save ${model_dir}/saved_model     --logs-path ${model_dir}/training.log
```

--------------------------------

### Download and Extract code2vec Datasets

Source: https://github.com/tech-srl/code2vec/blob/master/README.md

Downloads the preprocessed Java dataset from S3 and extracts the archive. This provides the necessary training, test, and validation files for the model.

```bash
wget https://s3.amazonaws.com/code2vec/data/java14m_data.tar.gz
tar -xvzf java14m_data.tar.gz
```

--------------------------------

### Download Java-large Dataset for code2vec

Source: https://github.com/tech-srl/code2vec/blob/master/README.md

Downloads the Java-large dataset, preprocessed for code2vec. This dataset includes 9500 top-starred Java projects created since January 2007.

```bash
wget https://s3.amazonaws.com/code2vec/data/java-large_data.tar.gz
```

--------------------------------

### Code2vec CLI: Train, Evaluate, Predict, and Export

Source: https://context7.com/tech-srl/code2vec/llms.txt

Command-line interface for code2vec. Supports training, evaluation, interactive prediction, model release, and exporting token/target embeddings and code vectors. Uses TensorFlow or Keras.

```bash
python3 code2vec.py --data data/java14m/java14m --test data/java14m/java14m.val.c2v --save models/java14m/saved_model
```

```bash
python3 code2vec.py --load models/java14m/saved_model_iter8.release --test data/java14m/java14m.test.c2v
```

```bash
python3 code2vec.py --load models/java14m/saved_model_iter8.release --predict
```

```bash
python3 code2vec.py --framework keras --load models/java14m/saved_model_iter8.release --predict
```

```bash
python3 code2vec.py --load models/java14m/saved_model_iter8 --release
```

```bash
python3 code2vec.py --load models/java14m/saved_model_iter8.release --save_w2v models/java14m/tokens.txt
```

```bash
python3 code2vec.py --load models/java14m/saved_model_iter8.release --save_t2v models/java14m/targets.txt
```

```bash
python3 code2vec.py --load models/java14m/saved_model_iter8.release --test data/java14m/java14m.test.c2v --export_code_vectors
```

--------------------------------

### Download Trainable Java-large Model for code2vec

Source: https://github.com/tech-srl/code2vec/blob/master/README.md

Downloads a trainable code2vec model that was trained on the Java-large dataset. This model can be used as a baseline or further fine-tuned.

```bash
wget https://code2vec.s3.amazonaws.com/model/java-large-model.tar.gz
```

--------------------------------

### Data Preprocessing Pipeline

Source: https://context7.com/tech-srl/code2vec/llms.txt

Scripts to convert raw Java source code into the format required for training, including path extraction and vocabulary creation.

```APIDOC
## Data Preprocessing Pipeline

### Description
The preprocessing scripts convert raw Java source code into the format required for training, including path extraction and vocabulary creation.

### Setup
Edit `preprocess.sh` to set your data directories:
```bash
TRAIN_DIR=my_train_dir
VAL_DIR=my_val_dir
TEST_DIR=my_test_dir
DATASET_NAME=my_dataset
```

### Running the Pipeline
```bash
source preprocess.sh
```

### Step 1: Extract Paths
```bash
python3 JavaExtractor/extract.py --dir ${VAL_DIR} --max_path_length 8 --max_path_width 2 --num_threads 64 --jar JavaExtractor/JPredict/target/JavaExtractor-0.0.1-SNAPSHOT.jar > dataset.val.raw.txt
```

### Step 2: Create Histograms
```bash
# Target vocabulary histogram
cat train.raw.txt | cut -d' ' -f1 | awk '{n[$0]++} END {for (i in n) print i,n[i]}' > histo.tgt.c2v

# Original token vocabulary histogram
cat train.raw.txt | cut -d' ' -f2- | tr ' ' '\n' | cut -d',' -f1,3 | tr ',' '\n' | awk '{n[$0]++} END {for (i in n) print i,n[i]}' > histo.ori.c2v

# Path vocabulary histogram
cat train.raw.txt | cut -d' ' -f2- | tr ' ' '\n' | cut -d',' -f2 | awk '{n[$0]++} END {for (i in n) print i,n[i]}' > histo.path.c2v
```

### Step 3: Build Vocabularies and Create Dataset Files
```bash
python3 preprocess.py --train_data train.raw.txt --test_data test.raw.txt --val_data val.raw.txt \
  --max_contexts 200 --word_vocab_size 1301136 --path_vocab_size 911417 --target_vocab_size 261245 \
  --word_histogram histo.ori.c2v --path_histogram histo.path.c2v --target_histogram histo.tgt.c2v \
  --output_name data/my_dataset/my_dataset
```
```

--------------------------------

### Config Class: Managing Hyperparameters and Runtime Settings

Source: https://context7.com/tech-srl/code2vec/llms.txt

Python class for managing code2vec hyperparameters and runtime configuration. Loads defaults and command-line arguments, providing access to training, testing, and model settings.

```python
from config import Config

# Create config with defaults and load from command-line args
config = Config(set_defaults=True, load_from_args=True, verify=True)

# Access training hyperparameters
print(f"Batch size: {config.TRAIN_BATCH_SIZE}")  # Default: 1024
print(f"Epochs: {config.NUM_TRAIN_EPOCHS}")  # Default: 20
print(f"Max contexts: {config.MAX_CONTEXTS}")  # Default: 200
print(f"Embedding size: {config.DEFAULT_EMBEDDINGS_SIZE}")  # Default: 128
print(f"Dropout keep rate: {config.DROPOUT_KEEP_RATE}")  # Default: 0.75

# Check runtime mode
if config.is_training:
    print(f"Training data: {config.train_data_path}")
if config.is_testing:
    print(f"Test data: {config.TEST_DATA_PATH}")
if config.is_loading:
    print(f"Loading model from: {config.MODEL_LOAD_PATH}")

# Vocabulary size limits
print(f"Max token vocab: {config.MAX_TOKEN_VOCAB_SIZE}")  # Default: 1301136
print(f"Max target vocab: {config.MAX_TARGET_VOCAB_SIZE}")  # Default: 261245
print(f"Max path vocab: {config.MAX_PATH_VOCAB_SIZE}")  # Default: 911417

# Code vector size (computed from embeddings)
print(f"Code vector size: {config.context_vector_size}")  # PATH_EMBEDDINGS_SIZE + 2 * TOKEN_EMBEDDINGS_SIZE
```

--------------------------------

### Continue Training code2vec Model from Checkpoint

Source: https://context7.com/tech-srl/code2vec/llms.txt

This command allows for resuming training of a code2vec model from a previously saved checkpoint. It loads the model from the specified checkpoint path and continues saving subsequent checkpoints to the new save directory.

```bash
python3 -u code2vec.py --data ${data} --test ${test_data}     --load ${model_dir}/saved_model_iter5 --save ${model_dir}/saved_model
```

--------------------------------

### Configure Code2vec Training Parameters

Source: https://github.com/tech-srl/code2vec/blob/master/README.md

Defines essential configuration constants for the training process, including batch sizes, vocabulary limits, and embedding dimensions. These settings control memory usage, model capacity, and training throughput.

```python
config.SAVE_EVERY_EPOCHS = 1
config.TRAIN_BATCH_SIZE = 1024
config.MAX_CONTEXTS = 200
config.MAX_TOKEN_VOCAB_SIZE = 1301136
config.DEFAULT_EMBEDDINGS_SIZE = 128
config.DROPOUT_KEEP_RATE = 0.75
```

--------------------------------

### Download Released Java-large Model for code2vec

Source: https://github.com/tech-srl/code2vec/blob/master/README.md

Downloads a released, non-trainable code2vec model. This model was used as a baseline in the code2seq paper and is ready for inference.

```bash
wget https://code2vec.s3.amazonaws.com/model/java-large-released-model.tar.gz
```

--------------------------------

### Download Java-med Dataset for code2vec

Source: https://github.com/tech-srl/code2vec/blob/master/README.md

Downloads the Java-med dataset, preprocessed for code2vec. This dataset comprises 1000 top-starred Java projects from GitHub, suitable for training.

```bash
wget https://s3.amazonaws.com/code2vec/data/java-med_data.tar.gz
```

--------------------------------

### Data Preprocessing Pipeline for Code2Vec

Source: https://context7.com/tech-srl/code2vec/llms.txt

A bash script and associated Python commands to preprocess raw Java source code into the format required for training code2vec models. This includes path extraction, vocabulary histogram creation, and final dataset file generation.

```bash
# Edit preprocess.sh to set your data directories
# TRAIN_DIR=my_train_dir
# VAL_DIR=my_val_dir
# TEST_DIR=my_test_dir
# DATASET_NAME=my_dataset

# Run the preprocessing pipeline
source preprocess.sh

# The script performs:
# 1. Extract paths from Java files using JavaExtractor
python3 JavaExtractor/extract.py --dir ${VAL_DIR} --max_path_length 8 --max_path_width 2 --num_threads 64 --jar JavaExtractor/JPredict/target/JavaExtractor-0.0.1-SNAPSHOT.jar > dataset.val.raw.txt

# 2. Create histograms for vocabulary building
cat train.raw.txt | cut -d' ' -f1 | awk '{n[$0]++} END {for (i in n) print i,n[i]}' > histo.tgt.c2v
cat train.raw.txt | cut -d' ' -f2- | tr ' ' '\n' | cut -d',' -f1,3 | tr ',' '\n' | awk '{n[$0]++} END {for (i in n) print i,n[i]}' > histo.ori.c2v
cat train.raw.txt | cut -d' ' -f2- | tr ' ' '\n' | cut -d',' -f2 | awk '{n[$0]++} END {for (i in n) print i,n[i]}' > histo.path.c2v

# 3. Build vocabularies and create final dataset files
python3 preprocess.py --train_data train.raw.txt --test_data test.raw.txt --val_data val.raw.txt \
  --max_contexts 200 --word_vocab_size 1301136 --path_vocab_size 911417 --target_vocab_size 261245 \
  --word_histogram histo.ori.c2v --path_histogram histo.path.c2v --target_histogram histo.tgt.c2v \
  --output_name data/my_dataset/my_dataset
```

--------------------------------

### Interactive Predictor for Method Name Prediction

Source: https://context7.com/tech-srl/code2vec/llms.txt

Enables interactive method name prediction from Java source files. It loads a model, extracts AST paths, and displays predictions with probability scores and attention weights. Requires configuration and a Code2VecModel.

```python
from config import Config
from interactive_predict import InteractivePredictor

# Set up for interactive prediction
config = Config(set_defaults=True, load_from_args=True, verify=True)

# Load model (assuming --load and --predict flags are set)
if config.DL_FRAMEWORK == 'keras':
    from keras_model import Code2VecModel
else:
    from tensorflow_model import Code2VecModel

model = Code2VecModel(config)

# Create predictor and start interactive session
predictor = InteractivePredictor(config, model)
predictor.predict()
```

--------------------------------

### Vocabulary Management

Source: https://context7.com/tech-srl/code2vec/llms.txt

Manages token, path, and target vocabularies with TensorFlow lookup tables for efficient training and inference.

```APIDOC
## Vocabulary Management

### Description
The `Code2VecVocabs` class manages token, path, and target vocabularies with TensorFlow lookup tables for efficient training and inference.

### Initialization and Access
```python
from vocabularies import Code2VecVocabs, Vocab, VocabType
from config import Config

config = Config(set_defaults=True, load_from_args=True, verify=True)

# Vocabularies are automatically loaded/created during model initialization
vocabs = Code2VecVocabs(config)

# Access individual vocabularies
token_vocab = vocabs.token_vocab
path_vocab = vocabs.path_vocab
target_vocab = vocabs.target_vocab

print(f"Token vocab size: {token_vocab.size}")
print(f"Path vocab size: {path_vocab.size}")
print(f"Target vocab size: {target_vocab.size}")
```
```

--------------------------------

### Exporting Token and Target Embeddings

Source: https://github.com/tech-srl/code2vec/blob/master/README.md

Use the --save_w2v and --save_t2v flags to export token and target embedding matrices from a trained model into a text file formatted for word2vec.

```bash
python3 code2vec.py --load models/java14_model/saved_model_iter8.release --save_w2v models/java14_model/tokens.txt
python3 code2vec.py --load models/java14_model/saved_model_iter8.release --save_t2v models/java14_model/targets.txt
```

--------------------------------

### Release Trained Model for Inference

Source: https://github.com/tech-srl/code2vec/blob/master/README.md

Command to release a trained model, which creates a optimized version for inference that typically consumes significantly less disk space. This process is intended for models that no longer require further training.

```bash
python3 code2vec.py --load models/java14_model/saved_model_iter8 --release
```

--------------------------------

### Code2VecVocabs for Vocabulary Management

Source: https://context7.com/tech-srl/code2vec/llms.txt

Manages token, path, and target vocabularies for the code2vec model using TensorFlow lookup tables. Vocabularies are loaded or created during model initialization, and individual vocabularies can be accessed.

```python
from vocabularies import Code2VecVocabs, Vocab, VocabType
from config import Config

config = Config(set_defaults=True, load_from_args=True, verify=True)

# Vocabularies are automatically loaded/created during model initialization
vocabs = Code2VecVocabs(config)

# Access individual vocabularies
token_vocab = vocabs.token_vocab
path_vocab = vocabs.path_vocab
target_vocab = vocabs.target_vocab

print(f"Token vocab size: {token_vocab.size}")
print(f"Path vocab size: {path_vocab.size}")
print(f"Target vocab size: {target_vocab.size}")
```

--------------------------------

### Code2VecModelBase: Abstract Model Interface

Source: https://context7.com/tech-srl/code2vec/llms.txt

Abstract base class defining the interface for code2vec model implementations. Provides methods for dynamic loading based on framework (TensorFlow/Keras), training, evaluation, and prediction.

```python
from config import Config
from vocabularies import VocabType

# Load model dynamically based on framework config
def load_model_dynamically(config: Config):
    assert config.DL_FRAMEWORK in {'tensorflow', 'keras'}
    if config.DL_FRAMEWORK == 'tensorflow':
        from tensorflow_model import Code2VecModel
    elif config.DL_FRAMEWORK == 'keras':
        from keras_model import Code2VecModel
    return Code2VecModel(config)

# Initialize and use the model
config = Config(set_defaults=True, load_from_args=True, verify=True)
model = load_model_dynamically(config)

# Train the model
if config.is_training:
    model.train()

# Evaluate on test set
if config.is_testing:
    results = model.evaluate()
    print(f"Top-k accuracy: {results.topk_acc}")
    print(f"Subtoken precision: {results.subtoken_precision}")
    print(f"Subtoken recall: {results.subtoken_recall}")
    print(f"Subtoken F1: {results.subtoken_f1}")
    print(f"Loss: {results.loss}")

```

--------------------------------

### Extractor Class for Java Path Extraction

Source: https://context7.com/tech-srl/code2vec/llms.txt

Provides a Python interface to extract Abstract Syntax Tree (AST) paths from Java source code. It converts code into path-context representations suitable for the model. Requires a configured Extractor object and a Java file.

```python
from extractor import Extractor
from config import Config

config = Config(set_defaults=True)
config.MAX_CONTEXTS = 200

# Initialize extractor with Java JAR path
extractor = Extractor(
    config=config,
    jar_path='JavaExtractor/JPredict/target/JavaExtractor-0.0.1-SNAPSHOT.jar',
    max_path_length=8,
    max_path_width=2
)

# Extract paths from a Java file
try:
    predict_lines, hash_to_string_dict = extractor.extract_paths('Input.java')

    # predict_lines contains model input format:
    # "method_name token1,path_hash,token2 token3,path_hash2,token4 ..."
    for line in predict_lines:
        print(line)

    # hash_to_string_dict maps path hashes back to readable strings
    for hash_val, path_str in hash_to_string_dict.items():
        print(f"{hash_val} -> {path_str}")

except ValueError as e:
    print(f"Extraction error: {e}")

# The java_string_hashcode static method replicates Java's String.hashCode()
path = "MethodDeclaration|SimpleName|MethodInvocation"
hash_code = Extractor.java_string_hashcode(path)
print(f"Hash of '{path}': {hash_code}")
```

--------------------------------

### Inspecting Embeddings with Gensim

Source: https://github.com/tech-srl/code2vec/blob/master/README.md

Load exported word2vec format files into Python using the gensim library to perform similarity analysis and vector operations.

```python
from gensim.models import KeyedVectors as word2vec
vectors_text_path = 'models/java14_model/targets.txt'
model = word2vec.load_word2vec_format(vectors_text_path, binary=False)
model.most_similar(positive=['equals', 'to|lower'])
model.most_similar(positive=['download', 'send'], negative=['receive'])
```

--------------------------------

### InteractivePredictor Class

Source: https://context7.com/tech-srl/code2vec/llms.txt

Enables interactive method name prediction from Java source files, displaying predictions with probability scores and attention weights.

```APIDOC
## InteractivePredictor Class

### Description
The `InteractivePredictor` class enables interactive method name prediction from Java source files, displaying predictions with probability scores and attention weights.

### Initialization
```python
from config import Config
from interactive_predict import InteractivePredictor

config = Config(set_defaults=True, load_from_args=True, verify=True)

if config.DL_FRAMEWORK == 'keras':
    from keras_model import Code2VecModel
else:
    from tensorflow_model import Code2VecModel

model = Code2VecModel(config)
predictor = InteractivePredictor(config, model)
```

### Usage
```python
predictor.predict()
```

### Functionality
The `predict()` method prompts the user to modify `Input.java`, extracts AST paths, and displays predictions with probabilities and attention scores.

### Example Output Format
```
Original name:    fooBar
    (0.85) predicted: foo|bar
    (0.10) predicted: process|data
    (0.03) predicted: handle|request
Attention:
0.15    context: System,MethodInvocation->Name,println
0.12    context: void,Method->ReturnType,METHOD_NAME
```
```

--------------------------------

### Analyze Embeddings with Gensim

Source: https://context7.com/tech-srl/code2vec/llms.txt

Utilizes Gensim's KeyedVectors to load exported Code2Vec embeddings. It supports semantic similarity queries, arithmetic operations on method names, and analogy testing.

```python
from gensim.models import KeyedVectors as word2vec
token_vectors = word2vec.load_word2vec_format('models/java14m/tokens.txt', binary=False)
target_vectors = word2vec.load_word_vec_format('models/java14m/targets.txt', binary=False)
similar_to_equals = target_vectors.most_similar(positive=['equals'], topn=5)
result = target_vectors.most_similar(positive=['equals', 'to|lower'], topn=3)
result_analogy = target_vectors.most_similar(positive=['download', 'send'], negative=['receive'], topn=3)
similarity = token_vectors.similarity('String', 'Integer')
```

--------------------------------

### Exporting Code Vectors

Source: https://github.com/tech-srl/code2vec/blob/master/README.md

Use the --export_code_vectors flag to generate vector representations for specific code snippets. When used with --test, it saves to a file; with --predict, it outputs to the console.

```bash
python3 code2vec.py --load models/java14_model/saved_model_iter8.release --test test_file.java --export_code_vectors
python3 code2vec.py --load models/java14_model/saved_model_iter8.release --predict --export_code_vectors
```

--------------------------------

### Extractor Class

Source: https://context7.com/tech-srl/code2vec/llms.txt

Provides a Python interface to the Java AST path extraction tool, converting source code into path-context representations for the model.

```APIDOC
## Extractor Class

### Description
The `Extractor` class provides a Python interface to the Java AST path extraction tool, converting source code into path-context representations for the model.

### Initialization
```python
from extractor import Extractor
from config import Config

config = Config(set_defaults=True)
config.MAX_CONTEXTS = 200

extractor = Extractor(
    config=config,
    jar_path='JavaExtractor/JPredict/target/JavaExtractor-0.0.1-SNAPSHOT.jar',
    max_path_length=8,
    max_path_width=2
)
```

### Path Extraction
```python
try:
    predict_lines, hash_to_string_dict = extractor.extract_paths('Input.java')
    for line in predict_lines:
        print(line)
    for hash_val, path_str in hash_to_string_dict.items():
        print(f"{hash_val} -> {path_str}")
except ValueError as e:
    print(f"Extraction error: {e}")
```

### Hash Calculation
```python
path = "MethodDeclaration|SimpleName|MethodInvocation"
hash_code = Extractor.java_string_hashcode(path)
print(f"Hash of '{path}': {hash_code}")
```

### Output Format (`predict_lines`)
```
method_name token1,path_hash,token2 token3,path_hash2,token4 ...
```
```

--------------------------------

### Handle Model Evaluation and Prediction Structures

Source: https://context7.com/tech-srl/code2vec/llms.txt

Defines how to interpret the ModelEvaluationResults and ModelPredictionResults named tuples. These structures hold metrics like accuracy, precision, recall, and attention weights.

```python
from model_base import ModelEvaluationResults, ModelPredictionResults
results = ModelEvaluationResults(topk_acc=[0.45, 0.52, 0.58, 0.62, 0.65], subtoken_precision=0.72, subtoken_recall=0.68, subtoken_f1=0.70, loss=2.15)
prediction = ModelPredictionResults(original_name='processUserInput', topk_predicted_words=['process|user|input', 'handle|input', 'parse|input'], topk_predicted_words_scores=[0.75, 0.12, 0.08], attention_per_context={('user', 'MethodDeclaration|Name', 'input'): 0.15}, code_vector=[0.1, -0.2, 0.3])
```

--------------------------------

### Save Word2Vec Embeddings

Source: https://context7.com/tech-srl/code2vec/llms.txt

Saves model embeddings in word2vec format for both tokens and targets. It also includes a step to clean up the model session.

```python
model.save_word2vec_format('tokens.txt', VocabType.Token)
model.save_word2vec_format('targets.txt', VocabType.Target)

# Clean up
model.close_session()
```

--------------------------------

### Model Embedding Saving

Source: https://context7.com/tech-srl/code2vec/llms.txt

Saves model embeddings in word2vec format for both tokens and targets. Also includes session cleanup.

```APIDOC
## Model Embedding Saving

### Description
Saves model embeddings in word2vec format for tokens and targets, and cleans up the model session.

### Method
```python
model.save_word2vec_format('tokens.txt', VocabType.Token)
model.save_word2vec_format('targets.txt', VocabType.Target)
model.close_session()
```
```

=== COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.