### Preprocess Dataset with Python Source: https://github.com/recommendation-algorithm/fibinet/blob/master/README.md Run this command to preprocess datasets for training. Replace `{dataset_name}` with `criteo` or `avazu`. Ensure you have downloaded the original datasets. ```python python -u -m fibinet.preprocessing.{dataset_name}.{dataset_name}_process ``` -------------------------------- ### Train Original FiBiNet (v1) Source: https://context7.com/recommendation-algorithm/fibinet/llms.txt Train the original FiBiNet model (version v1) by specifying the `--version v1` flag. Ensure the configuration file and mode are set correctly. ```bash python -u -m fibinet.run \ --version v1 \ --config ./config/criteo/config_dense.json \ --mode train \ --model_path fibinet_v1 ``` -------------------------------- ### Resume Training from Checkpoint Source: https://context7.com/recommendation-algorithm/fibinet/llms.txt Use the `retrain` mode to continue training from the latest saved checkpoint. This is useful for resuming interrupted training sessions or fine-tuning. ```bash python -u -m fibinet.run \ --version ++ \ --config ./config/criteo/config_dense.json \ --mode retrain \ --model_path fibinet_criteo ``` -------------------------------- ### Loading Configuration and Dataset Files with DataLoader Source: https://context7.com/recommendation-algorithm/fibinet/llms.txt Use DataLoader to load JSON configurations and dataset files in various formats (.txt, .csv, .npy, .json). It can auto-detect formats and resolve file paths. ```python from fibinet.common.data_loader import DataLoader # Load a JSON config that defines feature schema and file paths config = DataLoader.load_config_dict("./config/criteo/config_dense.json") print(config["model"]["data_prefix"]) print(len(config["features"])) # Resolve a list of partition names to absolute file paths files = DataLoader.get_files( prefix="./data/criteo/k_fold/", paths=["part0", "part1", "part2"] ) print(files[:2]) # Auto-detect format and load a dataset partition as a NumPy array data = DataLoader.smart_load_data(files[0]) print(data.shape) print(data[0, 0]) ``` -------------------------------- ### Train FiBiNet++ on Criteo Demo Dataset Source: https://context7.com/recommendation-algorithm/fibinet/llms.txt Use this command to train the default FiBiNet++ model on the Criteo dataset. Specify training, validation, and test data paths, along with training hyperparameters. ```bash python -u -m fibinet.run \ --version ++ \ --config ./config/criteo/config_dense.json \ --train_paths part0,part1,part2,part3,part4,part5,part6,part7 \ --valid_paths part8 \ --test_paths part9 \ --embedding_size 10 \ --epochs 5 \ --batch_size 1024 \ --learning_rate 0.0001 \ --mode train \ --model_path fibinet_criteo ``` -------------------------------- ### Train FiBiNet++ Model Source: https://github.com/recommendation-algorithm/fibinet/blob/master/README.md Use this command to train a specific model version. Specify the model version (`v1`, `++`, or `custom`) and the configuration file path. The mode can be `train`, `retrain`, or `test`. ```python python -u -m fibinet.run --version {version} --config {config_path} ``` -------------------------------- ### Run Inference and Evaluation Source: https://context7.com/recommendation-algorithm/fibinet/llms.txt Execute the model in test mode to evaluate checkpoints from specified epochs. The `--restore_epochs` argument takes a comma-separated list of epochs to evaluate. ```bash python -u -m fibinet.run \ --version ++ \ --config ./config/criteo/config_dense.json \ --mode test \ --restore_epochs 1,6,1 \ --model_path fibinet_criteo ``` -------------------------------- ### Define Input Features: SparseFeat, DenseFeat, VarLenSparseFeat Source: https://context7.com/recommendation-algorithm/fibinet/llms.txt Use these classes to describe the schema of input features for the data pipeline and model. Ensure correct dtype and dimension are specified. ```python from fibinet.model.components.inputs import SparseFeat, DenseFeat, VarLenSparseFeat # Categorical field with a known vocabulary size of 305 categories cat_feat = SparseFeat( name="C5", dimension=305, # vocabulary size (number of unique values) use_hash=False, # set True to apply tf.string_to_hash_bucket_fast dtype="int32", embedding=True # whether to create an embedding lookup table ) # Continuous numerical field num_feat = DenseFeat( name="I1", dimension=1, # number of scalar values in this field dtype="float32" ) # Variable-length sparse field (e.g., multi-valued tags, browsing history) seq_feat = VarLenSparseFeat( name="history_items", dimension=10000, # vocabulary size maxlen=50, # maximum sequence length (padded/truncated) combiner="mean", # pooling: "mean", "sum", or "max" use_hash=False, dtype="int32", embedding=True ) print(cat_feat) # SparseFeat(name='C5', dimension=305, use_hash=False, dtype='int32', ...) print(num_feat) # DenseFeat(name='I1', dimension=1, dtype='float32', ...) ``` -------------------------------- ### Criteo Dataset Preprocessing Pipeline Source: https://context7.com/recommendation-algorithm/fibinet/llms.txt This pipeline encodes sparse features, scales dense features, and creates k-fold splits for the Criteo dataset. Ensure you run this from the project root. ```python # Preprocess Criteo dataset (run from project root) # python -u -m fibinet.preprocessing.criteo.criteo_process from fibinet.preprocessing.dense_process import DenseProcess from fibinet.preprocessing.kfold_process import KFoldProcess from fibinet.preprocessing.sparse_process import SparseProcess # Step 1: encode categorical fields, drop rare values (< 10 occurrences) sparse = SparseProcess(config_path="./config/criteo/config_template.json") sparse.fit(min_occurrences=10) sparse.transform() # Writes updated config to e.g. ./config/criteo/config_sparse.json # Step 2: scale dense/numerical fields config_path = sparse.target_config_path dense = DenseProcess(config_path=config_path) dense.fit(dense.scale_multi_min_max) # min-max normalization per field dense.transform() # Writes updated config to e.g. ./config/criteo/config_dense.json # Step 3: create 10-fold train/valid/test splits config_path = dense.target_config_path kfold = KFoldProcess(config_path=config_path) kfold.fit() kfold.transform() # Writes split files to ./data/criteo/k_fold/part{0..9}/train.txt ``` -------------------------------- ### Construct FiBiNet++ Model with Keras Source: https://context7.com/recommendation-algorithm/fibinet/llms.txt Define feature columns and instantiate the FiBiNetModel with various configuration parameters. The `get_model()` method builds and returns the compiled Keras functional-API model. ```python from fibinet.model.components.inputs import SparseFeat, DenseFeat from fibinet.model.fibinet_model import FiBiNetModel import tensorflow as tf # Define feature schema feature_columns = [ SparseFeat(name="C1", dimension=1443, use_hash=False, dtype="int32", embedding=True), SparseFeat(name="C2", dimension=554, use_hash=False, dtype="int32", embedding=True), DenseFeat(name="I1", dimension=1, dtype="float32"), DenseFeat(name="I2", dimension=1, dtype="float32"), ] # Instantiate FiBiNet++ model fibinet = FiBiNetModel( params={}, feature_columns=feature_columns, embedding_size=10, embedding_l2_reg=0.0, embedding_dropout=0.0, sparse_embedding_norm_type="bn", # BatchNorm on sparse embeddings (++ default) dense_embedding_norm_type="layer_norm", # LayerNorm on dense embeddings (++ default) senet_squeeze_mode="group_mean_max", # Grouped mean+max squeeze (++ default) senet_squeeze_group_num=2, senet_excitation_mode="bit", # Bit-level excitation (++ default) senet_activation="none", senet_use_skip_connection=True, senet_reweight_norm_type="ln", origin_bilinear_type="all_ip", # Inner-product bilinear (++ default) origin_bilinear_dnn_units=[50], senet_bilinear_type="none", dnn_hidden_units=[400, 400, 400], dnn_activation="relu", dnn_dropout=0.0, enable_linear=False, seed=1024, ) model = fibinet.get_model() optimizer = tf.keras.optimizers.Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=1e-8) model.compile(optimizer, loss=tf.keras.losses.BinaryCrossentropy(), metrics=["AUC", "binary_crossentropy"]) model.summary() # Expected output: model with Input layers for each feature → Embedding/Hash → # SENETLayer → BilinearInteraction → DNNLayer → Dense(1) → sigmoid → output (None, 1) ``` -------------------------------- ### Implement SENet Layer for Feature Reweighting Source: https://context7.com/recommendation-algorithm/fibinet/llms.txt The SENETLayer applies the Squeeze-and-Excitation mechanism to reweight input embeddings. Configure squeeze/excitation modes and reduction ratio based on model requirements. ```python import tensorflow as tf from fibinet.model.components.layers import SENETLayer # Simulate a batch of 4 samples, 6 feature fields, embedding dim 10 embeddings = tf.random.normal([4, 6, 10]) # FiBiNet++ SENet configuration senet = SENETLayer( senet_squeeze_mode="group_mean_max", # concat(group_mean, group_max) squeeze senet_squeeze_group_num=2, senet_reduction_ratio=3.0, # bottleneck ratio senet_excitation_mode="bit", # per-bit (field × embedding) excitation senet_activation="none", # no extra activation in excitation path senet_use_skip_connection=True, # residual: output += input senet_reweight_norm_type="ln", # layer-norm after reweighting seed=1024 ) reweighted = senet(embeddings) print(reweighted.shape) # (4, 6, 10) — same shape as input # Original FiBiNET v1 SENet configuration senet_v1 = SENETLayer( senet_squeeze_mode="mean", senet_reduction_ratio=3.0, senet_excitation_mode="vector", senet_activation="relu", senet_use_skip_connection=False, senet_reweight_norm_type="none", ) reweighted_v1 = senet_v1(embeddings) print(reweighted_v1.shape) # (4, 6, 10) ``` -------------------------------- ### Restore Model Weights from Checkpoint Source: https://context7.com/recommendation-algorithm/fibinet/llms.txt Restores model weights from a specific epoch's checkpoint. Ensure the checkpoint path is correctly configured. ```python model, latest_epoch = runner.restore_model_from_checkpoint(restore_epoch=3) print(f"Restored model from epoch {latest_epoch}") ``` -------------------------------- ### Build a Multi-Layer Perceptron Classifier: DNNLayer Source: https://context7.com/recommendation-algorithm/fibinet/llms.txt Use DNNLayer for the final classification stage, stacking dense layers with optional batch normalization and dropout. Configure hidden units and activation functions as needed. ```python import tensorflow as tf from fibinet.model.components.layers import DNNLayer # Input: flattened bilinear interaction output flat_input = tf.random.normal([4, 200]) dnn = DNNLayer( hidden_units=[400, 400, 400], activation="relu", l2_reg=0.0, dropout_rate=0.0, use_bn=False, seed=1024 ) output = dnn(flat_input, training=False) print(output.shape) # (4, 400) — last hidden layer activations ``` -------------------------------- ### Compute Pairwise Bilinear Interactions Source: https://context7.com/recommendation-algorithm/fibinet/llms.txt The BilinearInteraction layer computes feature interactions using various modes. Choose 'all_ip' for FiBiNET++'s default inner-product, or 'all'/'interaction' for element-wise products. ```python import tensorflow as tf from fibinet.model.components.layers import BilinearInteraction embeddings = tf.random.normal([4, 6, 10]) # [batch, fields, embedding_size] # FiBiNet++ inner-product mode with a small projection DNN bilinear_ip = BilinearInteraction( bilinear_type="all_ip", # shared W; inner product of W*v_i and v_j dnn_units=[50], # optional output DNN to reduce dimension dnn_activation="linear", seed=1024 ) out_ip = bilinear_ip(embeddings) print(out_ip.shape) # (4, 1, 50) — projected to 50 dims # Original FiBiNET element-wise product mode bilinear_all = BilinearInteraction(bilinear_type="all", seed=1024) out_all = bilinear_all(embeddings) # fields*(fields-1)/2 = 15 interaction pairs, each of size 10 print(out_all.shape) # (4, 15, 10) # Per-pair weight matrices bilinear_interaction = BilinearInteraction(bilinear_type="interaction", seed=1024) out_interaction = bilinear_interaction(embeddings) print(out_interaction.shape) # (4, 15, 10) ``` -------------------------------- ### Generating Batches from Text Files with BatchGenerator Source: https://context7.com/recommendation-algorithm/fibinet/llms.txt This utility generates Keras-compatible (x, y) generator tuples from pre-processed text files. It supports shuffling, remainder handling, and counting batches without loading all data. ```python from fibinet.model.batch_generator import BatchGenerator from fibinet.model.components.inputs import SparseFeat, DenseFeat feature_columns = [ DenseFeat("I1", 1, "float32"), SparseFeat("C1", 1443, dtype="int32"), ] train_files = ["./data/criteo/k_fold/part0/train.txt"] # Count the number of batches without loading data into memory num_batches = BatchGenerator.get_txt_dataset_length( paths=train_files, batch_size=1024, drop_remainder=True ) print(num_batches) # Create an infinite generator (wraps around epoch boundaries) gen = BatchGenerator.generate_arrays_from_file( paths=train_files, batch_size=1024, drop_remainder=True, features=feature_columns, shuffle=True ) x_batch, y_batch = next(gen) print(len(x_batch)) print(x_batch[0].shape) print(y_batch[0].shape) ``` -------------------------------- ### Checkpoint Management with FiBiNetRunner Source: https://context7.com/recommendation-algorithm/fibinet/llms.txt FiBiNetRunner provides utilities to create Keras ModelCheckpoint and EarlyStopping callbacks for managing model weights during training. Configure callbacks to save weights only, set periods, monitor validation loss, and restore best weights. ```python # Programmatic usage of FiBiNetRunner checkpoint utilities import sys sys.argv = [ "run", "--version", "++", "--config", "./config/criteo/config_dense.json", "--model_path", "fibinet_criteo", "--mode", "train", ] from fibinet.run import FiBiNetRunner runner = FiBiNetRunner() # Create a ModelCheckpoint callback that saves every epoch cp_cb = runner.create_checkpoint_callback(save_weights_only=True, period=1) # Saves to: ./data/model/fibinet_criteo/checkpoint/cp-{epoch:04d}.ckpt # Create an EarlyStopping callback watching validation loss es_cb = runner.create_earlystopping_callback( monitor="val_loss", patience=3, restore_best_weights=True ) ``` -------------------------------- ### Regularized DNN Layer with Batch Normalization and Dropout Source: https://context7.com/recommendation-algorithm/fibinet/llms.txt Use this layer for larger datasets requiring regularization. It includes L2 regularization, dropout, and batch normalization. ```python dnn_regularized = DNNLayer( hidden_units=[512, 256, 128], activation="relu", l2_reg=1e-5, dropout_rate=0.3, use_bn=True, seed=42 ) output_reg = dnn_regularized(flat_input, training=True) print(output_reg.shape) # (4, 128) ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.