### Example Command Line for Demo Script Source: https://code.accel-brain.com/Automatic-Summarization/index.html An example of how to run the demo script with specific arguments: a Wikipedia URL, the Jaccard similarity filter, and a similarity limit of 0.3. ```bash python demo/demo_similarity_filtering_japanese_web_page.py https://ja.wikipedia.org/wiki/%E5%BE%AA%E7%92%B0%E8%AB%96%E6%B3%95 Jaccard 0.3 ``` -------------------------------- ### Install pysummarization via pip Source: https://code.accel-brain.com/Automatic-Summarization/README.html Use this command to install the library in your Python environment. ```bash pip install pysummarization ``` -------------------------------- ### Example Usage: Reading PDF from Command Line Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/readablewebpdf/web_pdf_reading.html This example demonstrates how to use the WebPDFReading class by taking a URL as a command-line argument, converting the PDF to text, and printing the first 300 characters. ```python if __name__ == "__main__": import sys url = sys.argv[1] text = WebPDFReading().url_to_text(url) print(text[:300]) ``` -------------------------------- ### View demo output Source: https://code.accel-brain.com/Automatic-Summarization/README.html Example output from the demo script. ```text 循環論法 出典: フリー百科事典『ウィキペディア(Wikipedia)』 移動先: 案内 、 検索 循環論法 (じゅんかんろんぽう、circular reasoning, circular logic, vicious circle [1] )とは、 ある命題の 証明 において、その命題を仮定した議論を用いること [1] 。 証明すべき結論を前提として用いる論法 [2] 。 ある用語の 定義 を与える表現の中にその用語自体が本質的に登場していること [1] ``` -------------------------------- ### Setup Dataset Helper Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/vectorizabletoken/encoder_decoder.html Helper method to calculate sentence lengths for a given list of sentences. ```python def __setup_dataset(self, sentence_list, token_master_list): sentence_len_list = [0] * len(sentence_list) for i in range(len(sentence_list)): sentence_len_list[i] = len(sentence_list[i]) ``` -------------------------------- ### Setup Vectorizer and Token Iterator Source: https://code.accel-brain.com/Automatic-Summarization/README.html Sets up the THotVectorizer and TokenIterator for preparing data for the re-seq2seq model. Requires 'token_arr' and 'token_list' to be defined. ```python vectorizable_token = THotVectorizer(token_list=token_arr.tolist()) vector_list = vectorizable_token.vectorize(token_list=token_arr.tolist()) vector_arr = np.array(vector_list) token_arr = np.array(token_list) token_iterator = TokenIterator( vectorizable_token=vectorizable_token, token_arr=token_arr, epochs=300, batch_size=25, seq_len=5, test_size=0.3, norm_mode=None, ctx=mx.gpu() ) for observed_arr, _, _, _ in token_iterator.generate_learned_samples(): break print(observed_arr.shape) # (batch size, the length of series, dimension) ``` -------------------------------- ### Initialize TfidfVectorizer Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/vectorizabletoken/tfidf_vectorizer.html Initializes the TfidfVectorizer with a list of token lists. Requires nltk to be installed. ```python # -*- coding: utf-8 -*- import nltk from pysummarization.vectorizable_token import VectorizableToken [docs] class TfidfVectorizer(VectorizableToken): ''' Vectorize token. ' # Document __collection = [] def __init__(self, token_list_list): ''' Initialize. Args: token_list_list: The list of list of tokens. ' self.__collection = nltk.TextCollection(token_list_list) ``` -------------------------------- ### Setup Dataset for Training Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/vectorizablesentence/lstm_rtrbm.html Internal helper method to transform a list of sentences into a formatted numpy array for model processing. ```python def __setup_dataset(self, sentence_list, token_master_list, seq_len): sentence_len_list = [0] * len(sentence_list) for i in range(len(sentence_list)): sentence_len_list[i] = len(sentence_list[i]) observed_list = [None] * len(sentence_list) for i in range(len(sentence_list)): arr_list = [None] * seq_len for j in range(seq_len): arr = np.zeros(len(token_master_list)) try: token = sentence_list[i][j] arr[token_master_list.index(token)] = 1 except IndexError: pass finally: arr = arr.astype(np.float64) arr_list[j] = arr observed_list[i] = arr_list observed_arr = np.array(observed_list) return observed_arr ``` -------------------------------- ### Setup Logging for Re-Seq2Seq Source: https://code.accel-brain.com/Automatic-Summarization/README.html Configures a logger to output debug messages for the re-seq2seq implementation. ```python from logging import getLogger, StreamHandler, NullHandler, DEBUG, ERROR logger = getLogger("accelbrainbase") handler = StreamHandler() handler.setLevel(DEBUG) logger.setLevel(DEBUG) logger.addHandler(handler) ``` -------------------------------- ### Get and Set Computation Context (GPU/CPU) Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/abstractablesemantics/_mxnet/enc_dec_ad.html Provides methods to get and set the computation context for MXNet operations, allowing the user to specify whether to use the GPU or CPU. ```python from accelbrainbase.observabledata._mxnet.lstmnetworks.encoder_decoder import EncoderDecoder encdec_ad = EncDecAD() # Get the current context ctx = encdec_ad.ctx print(f"Current context: {ctx}") # Set the context to CPU encdec_ad.ctx = mx.cpu() print(f"Updated context: {encdec_ad.ctx}") # Set the context back to GPU (if available) encdec_ad.ctx = mx.gpu() print(f"Updated context: {encdec_ad.ctx}") ``` -------------------------------- ### Demo: Summarize Japanese Web Page Source: https://code.accel-brain.com/Automatic-Summarization/README.html Example of summarizing a Japanese Wikipedia page using the provided script. The {URL} placeholder is replaced with the actual Wikipedia URL. ```python python demo/demo_summarization_japanese_web_page.py https://ja.wikipedia.org/wiki/%E8%87%AA%E5%8B%95%E8%A6%81%E7%B4%84 ``` -------------------------------- ### Import necessary libraries for DBM THot Vectorizer Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/vectorizabletoken/thotvectorizer/dbm_t_hot_vectorizer.html Imports required modules for vectorization, distance calculation, and Deep Boltzmann Machine functionalities. Ensure these libraries are installed. ```python # -*- coding: utf-8 -*- import numpy as np from pysummarization.vectorizabletoken.t_hot_vectorizer import THotVectorizer from pysummarization.computable_distance import ComputableDistance from pysummarization.computabledistance.euclid_distance import EuclidDistance # `StackedAutoEncoder` is-a `DeepBoltzmannMachine`. from pydbm.dbm.deepboltzmannmachine.stacked_auto_encoder import StackedAutoEncoder # The `Concrete Builder` in Builder Pattern. from pydbm.dbm.builders.dbm_multi_layer_builder import DBMMultiLayerBuilder # Contrastive Divergence for function approximation. from pydbm.approximation.contrastive_divergence import ContrastiveDivergence ``` -------------------------------- ### Import LSTM, Loss, Optimizer, and Activation Functions Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/abstractablesemantics/enc_dec_ad.html This code imports necessary classes for building encoder-decoder models, including LSTM graphs, Mean Squared Error loss, Nadam optimizer, and various activation functions like Logistic, Tanh, and Softmax. Ensure these libraries are installed. ```python # -*- coding: utf-8 -*- from logging import getLogger import numpy as np from pysummarization.abstractable_semantics import AbstractableSemantics from pysummarization.vectorizable_token import VectorizableToken # LSTM Graph which is-a `Synapse`. from pydbm.synapse.recurrenttemporalgraph.lstm_graph import LSTMGraph as EncoderGraph from pydbm.synapse.recurrenttemporalgraph.lstm_graph import LSTMGraph as DecoderGraph # Loss function. from pydbm.loss.mean_squared_error import MeanSquaredError # Adam as a Loss function. from pydbm.optimization.optparams.nadam import Nadam as EncoderAdam from pydbm.optimization.optparams.nadam import Nadam as DecoderAdam # Verification. from pydbm.verification.verificate_function_approximation import VerificateFunctionApproximation # LSTM model. from pydbm.rnn.lstm_model import LSTMModel from pydbm.rnn.lstm_model import LSTMModel as Encoder from pydbm.rnn.lstm_model import LSTMModel as Decoder # Logistic Function as activation function. from pydbm.activation.logistic_function import LogisticFunction # Tanh Function as activation function. from pydbm.activation.tanh_function import TanhFunction # Softmax Function as activation function. from pydbm.activation.softmax_function import SoftmaxFunction ``` -------------------------------- ### Run English web page summarization demo Source: https://code.accel-brain.com/Automatic-Summarization/README.html Execute the provided demo script from the command line, passing a URL as an argument. ```bash python demo/demo_summarization_english_web_page.py {URL} ``` -------------------------------- ### Get Token List Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/vectorizabletoken/skip_gram_vectorizer.html Retrieves the list of tokens. ```APIDOC ## GET /token_list ### Description Retrieves the list of tokens. ### Method GET ### Endpoint /token_list ### Response #### Success Response (200) - **token_list** (list) - The list of tokens. #### Response Example ```json { "token_list": ["token1", "token2", "token3"] } ``` ``` -------------------------------- ### Get Vectorizable Token Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/iteratabledata/token_iterator.html Retrieves the `vectorizable_token` attribute. ```python def get_vectorizable_token(self): ''' getter ''' return self.__vectorizable_token ``` -------------------------------- ### Initialize Encoder-Decoder Controller Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/abstractablesemantics/_mxnet/enc_dec_ad.html Sets up the controller instance, including loss calculation and logger initialization. ```python computable_loss = L2NormLoss() self.__normal_prior_flag = normal_prior_flag if encoder_decoder_controller is None: encoder_decoder_controller = self.__build_encoder_decoder_controller( computable_loss=computable_loss, hidden_neuron_count=hidden_neuron_count, output_neuron_count=output_neuron_count, dropout_rate=dropout_rate, batch_size=batch_size, learning_rate=learning_rate, attenuate_epoch=attenuate_epoch, learning_attenuate_rate=learning_attenuate_rate, seq_len=seq_len, ) else: if isinstance(encoder_decoder_controller, EncoderDecoderController) is False: raise TypeError() self.__encoder_decoder_controller = encoder_decoder_controller logger = getLogger("accelbrainbase") self.__logger = logger self.__logs_tuple_list = [] self.__computable_loss = computable_loss ``` -------------------------------- ### Initialize Controller and Execute Learning Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/vectorizabletoken/encoder_decoder.html Initializes the EncoderDecoderController and triggers the learning process. ```python encoder_decoder_controller = EncoderDecoderController( encoder=encoder, decoder=decoder, epochs=epochs, batch_size=batch_size, learning_rate=learning_rate, learning_attenuate_rate=learning_attenuate_rate, attenuate_epoch=attenuate_epoch, test_size_rate=test_size_rate, computable_loss=MeanSquaredError(), verificatable_result=VerificateFunctionApproximation(), tol=0.0 ) # Learning. encoder_decoder_controller.learn(observed_arr, observed_arr) self.__controller = encoder_decoder_controller self.__token_master_list = token_master_list ``` -------------------------------- ### GET /logs Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/abstractablesemantics/_mxnet/re_seq_2_seq.html Retrieves the current logs array from the model. ```APIDOC ## GET /logs ### Description Returns the logs as a numpy array. ### Method GET ### Response #### Success Response (200) - **logs_arr** (array) - The array containing the logs of the model. ``` -------------------------------- ### GET /generate_inferenced_samples Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/iteratabledata/token_iterator.html Generates samples specifically for inference tasks. ```APIDOC ## generate_inferenced_samples ### Description Draws and generates data batches for inference. ### Response - **Returns** (Tuple) - Yields a tuple containing (None, None, test_observed_data, file_path). ``` -------------------------------- ### Get Auto Encoder Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/vectorizabletoken/skip_gram_vectorizer.html Retrieves the auto-encoder model used for feature extraction. ```APIDOC ## GET /auto_encoder ### Description Retrieves the auto-encoder model. ### Method GET ### Endpoint /auto_encoder ### Response #### Success Response (200) - **auto_encoder** (object) - The auto-encoder model object. #### Response Example ```json { "auto_encoder": { ... auto-encoder model details ... } } ``` ``` -------------------------------- ### Run demo batch program Source: https://code.accel-brain.com/Automatic-Summarization/README.html Command line usage for the demo script. ```bash python demo/demo_similarity_filtering_japanese_web_page.py {URL} {SimilarityFilter} {SimilarityLimit} ``` ```bash python demo/demo_similarity_filtering_japanese_web_page.py https://ja.wikipedia.org/wiki/%E5%BE%AA%E7%92%B0%E8%AB%96%E6%B3%95 Jaccard 0.3 ``` -------------------------------- ### Get Token Array Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/vectorizabletoken/skip_gram_vectorizer.html Retrieves the internal array of tokens used by the system. ```APIDOC ## GET /token_arr ### Description Retrieves the internal array of tokens. ### Method GET ### Endpoint /token_arr ### Response #### Success Response (200) - **token_arr** (list) - The array of tokens. #### Response Example ```json { "token_arr": ["token1", "token2", "token3"] } ``` ``` -------------------------------- ### Run Demo Script for Web Page Summarization Source: https://code.accel-brain.com/Automatic-Summarization/index.html Execute a demo script to summarize a web page. This script requires the URL, a SimilarityFilter type, and a SimilarityLimit value as command-line arguments. ```bash python demo/demo_similarity_filtering_japanese_web_page.py {URL} {SimilarityFilter} {SimilarityLimit} ``` -------------------------------- ### Instantiate NLP object Source: https://code.accel-brain.com/Automatic-Summarization/README.html Initialize the NLP base and configure the MeCab tokenizer. ```python # The object of the NLP. nlp_base = NlpBase() # Set tokenizer. This is japanese tokenizer with MeCab. nlp_base.tokenizable_doc = MeCabTokenizer() ``` -------------------------------- ### Get Controller Property Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/vectorizabletoken/encoder_decoder.html A getter method for the controller property. This is part of a property definition. ```python def get_controller(self): ''' getter ''' return self.__controller ``` -------------------------------- ### Initialize Similarity Filters Source: https://code.accel-brain.com/Automatic-Summarization/index.html Import and instantiate classes for calculating similarity measures between sentences. ```python from pysummarization.similarityfilter.dice import Dice similarity_filter = Dice() ``` ```python from pysummarization.similarityfilter.jaccard import Jaccard similarity_filter = Jaccard() ``` ```python from pysummarization.similarityfilter.simpson import Simpson similarity_filter = Simpson() ``` -------------------------------- ### Get Token List Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/vectorizabletoken/dbm_like_skip_gram_vectorizer.html Getter method for the token list. This property is intended to be read-only. ```python def get_token_list(self): ''' getter ''' return self.__token_list ``` -------------------------------- ### Configure Encoder and Decoder Models Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/vectorizabletoken/encoder_decoder.html Sets up optimizer parameters and initializes the Encoder and Decoder instances with specific training configurations. ```python encoder_opt_params = EncoderAdam() encoder_opt_params.weight_limit = weight_limit encoder_opt_params.dropout_rate = dropout_rate encoder = Encoder( # Delegate `graph` to `LSTMModel`. graph=encoder_graph, # The number of epochs in mini-batch training. epochs=epochs, # The batch size. batch_size=batch_size, # Learning rate. learning_rate=learning_rate, # Attenuate the `learning_rate` by a factor of this value every `attenuate_epoch`. learning_attenuate_rate=learning_attenuate_rate, # Attenuate the `learning_rate` by a factor of `learning_attenuate_rate` every `attenuate_epoch`. attenuate_epoch=attenuate_epoch, # Refereed maxinum step `t` in BPTT. If `0`, this class referes all past data in BPTT. bptt_tau=bptt_tau, # Size of Test data set. If this value is `0`, the validation will not be executed. test_size_rate=test_size_rate, # Loss function. computable_loss=MeanSquaredError(), # Optimizer. opt_params=encoder_opt_params, # Verification function. verificatable_result=VerificateFunctionApproximation(), tol=0.0 ) decoder_opt_params = DecoderAdam() decoder_opt_params.weight_limit = weight_limit decoder_opt_params.dropout_rate = dropout_rate decoder = Decoder( # Delegate `graph` to `LSTMModel`. graph=decoder_graph, # The number of epochs in mini-batch training. epochs=epochs, # The batch size. batch_size=batch_size, # Learning rate. learning_rate=learning_rate, # Attenuate the `learning_rate` by a factor of this value every `attenuate_epoch`. learning_attenuate_rate=learning_attenuate_rate, # Attenuate the `learning_rate` by a factor of `learning_attenuate_rate` every `attenuate_epoch`. attenuate_epoch=attenuate_epoch, # The length of sequences. seq_len=observed_arr.shape[1], # Refereed maxinum step `t` in BPTT. If `0`, this class referes all past data in BPTT. bptt_tau=bptt_tau, # Size of Test data set. If this value is `0`, the validation will not be executed. test_size_rate=test_size_rate, # Loss function. computable_loss=MeanSquaredError(), # Optimizer. opt_params=decoder_opt_params, # Verification function. verificatable_result=VerificateFunctionApproximation(), tol=0.0 ) ``` -------------------------------- ### Initialize Tokenizer and Vectorizer Source: https://code.accel-brain.com/Automatic-Summarization/README.html Initializes NlpBase, SimpleTokenizer, and sets up sentence and token lists from a given document. Ensure 'document' is defined before this snippet. ```python # `str` of your document. document = "Your document." nlp_base = NlpBase() nlp_base.delimiter_list = [".", "\n"] tokenizable_doc = SimpleTokenizer() sentence_list = nlp_base.listup_sentence(document) token_list = tokenizable_doc.tokenize(document) ``` -------------------------------- ### Get Token Array Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/vectorizabletoken/dbm_like_skip_gram_vectorizer.html Getter method for the token array. This property is intended to be read-only. ```python def get_token_arr(self): ''' getter ''' return self.__token_arr ``` -------------------------------- ### Initialize Decoder Model Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/abstractablesemantics/enc_dec_ad.html Sets up the Decoder model with its graph, optimizer, and training parameters including epochs, batch size, learning rate, sequence length, and validation settings. ```python decoder = Decoder( # Delegate `graph` to `LSTMModel`. graph=decoder_graph, # The number of epochs in mini-batch training. epochs=100, # The batch size. batch_size=batch_size, # Learning rate. learning_rate=learning_rate, # Attenuate the `learning_rate` by a factor of this value every `attenuate_epoch`. learning_attenuate_rate=1.0, # Attenuate the `learning_rate` by a factor of `learning_attenuate_rate` every `attenuate_epoch`. attenuate_epoch=50, # The length of sequences. seq_len=seq_len, # Refereed maxinum step `t` in BPTT. If `0`, this class referes all past data in BPTT. bptt_tau=bptt_tau, # Size of Test data set. If this value is `0`, the validation will not be executed. test_size_rate=0.3, # Loss function. computable_loss=MeanSquaredError(), # Optimizer. opt_params=decoder_opt_params, # Verification function. verificatable_result=VerificateFunctionApproximation() ) ``` -------------------------------- ### Context Management Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/abstractablesemantics/_mxnet/enc_dec_ad.html Methods to get and set the execution context (GPU or CPU) for the model. ```APIDOC ## Context Management ### Description Manages the MXNet execution context (e.g., mx.gpu() or mx.cpu()). ### Methods - **get_ctx()**: Returns the current execution context. - **set_ctx(value)**: Sets the execution context to the provided value. ``` -------------------------------- ### GET /generate_learned_samples Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/iteratabledata/token_iterator.html Generates learned samples for training and testing by yielding batches of data. ```APIDOC ## generate_learned_samples ### Description Draws and generates data batches for training and testing cycles. ### Response - **Returns** (Tuple) - Yields a tuple containing (training_observed, training_supervised, test_observed, test_supervised) as mxnet.ndarray objects. ``` -------------------------------- ### Initialize THotVectorizer Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/vectorizabletoken/t_hot_vectorizer.html Initializes the THotVectorizer with a list of all unique tokens. This establishes the vocabulary for encoding. ```python # -*- coding: utf-8 -*- import numpy as np from pysummarization.vectorizable_token import VectorizableToken [docs]class THotVectorizer(VectorizableToken): ''' Vectorize token by t-hot Vectorizer. ''' def __init__(self, token_list): ''' Initialize. Args: token_list: The list of all tokens. ''' self.__token_arr = np.array(list(set(token_list))) ``` -------------------------------- ### Summarize Japanese Web Page Source: https://code.accel-brain.com/Automatic-Summarization/index.html Run the batch program for Japanese web-page summarization by providing the target URL. ```bash python demo/demo_summarization_japanese_web_page.py {URL} ``` ```bash python demo/demo_summarization_japanese_web_page.py https://ja.wikipedia.org/wiki/%E8%87%AA%E5%8B%95%E8%A6%81%E7%B4%84 ``` -------------------------------- ### Get Retrospective Encoder Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/abstractablesemantics/re_seq_2_seq.html Returns the retrospective encoder object. This getter method is associated with the retrospective_encoder property. ```python def get_retrospective_encoder(self): ''' getter ''' return self.__retrospective_encoder retrospective_encoder = property(get_retrospective_encoder, set_readonly) ``` -------------------------------- ### Initialize Encoder-Decoder Controller Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/abstractablesemantics/re_seq_2_seq.html Configures the encoder and decoder graphs with specific activation functions and optimization parameters, then initializes the controller. ```python seq_len=8, bptt_tau=8, test_size_rate=0.3, tol=1e-10, tld=100.0 ): encoder_graph = EncoderGraph() encoder_graph.observed_activating_function = LogisticFunction() encoder_graph.input_gate_activating_function = LogisticFunction() encoder_graph.forget_gate_activating_function = LogisticFunction() encoder_graph.output_gate_activating_function = LogisticFunction() encoder_graph.hidden_activating_function = LogisticFunction() encoder_graph.output_activating_function = LogisticFunction() encoder_graph.create_rnn_cells( input_neuron_count=input_neuron_count, hidden_neuron_count=hidden_neuron_count, output_neuron_count=1 ) encoder_opt_params = EncoderAdam() encoder_opt_params.weight_limit = weight_limit encoder_opt_params.dropout_rate = dropout_rate encoder = Encoder( graph=encoder_graph, epochs=100, batch_size=batch_size, learning_rate=learning_rate, learning_attenuate_rate=1.0, attenuate_epoch=50, bptt_tau=8, test_size_rate=0.3, computable_loss=MeanSquaredError(), opt_params=encoder_opt_params, verificatable_result=VerificateFunctionApproximation(), tol=tol, tld=tld ) decoder_graph = DecoderGraph() decoder_graph.observed_activating_function = LogisticFunction() decoder_graph.input_gate_activating_function = LogisticFunction() decoder_graph.forget_gate_activating_function = LogisticFunction() decoder_graph.output_gate_activating_function = LogisticFunction() decoder_graph.hidden_activating_function = LogisticFunction() decoder_graph.output_activating_function = SoftmaxFunction() decoder_graph.create_rnn_cells( input_neuron_count=hidden_neuron_count, hidden_neuron_count=hidden_neuron_count, output_neuron_count=input_neuron_count ) decoder_opt_params = DecoderAdam() decoder_opt_params.weight_limit = weight_limit decoder_opt_params.dropout_rate = dropout_rate decoder = Decoder( graph=decoder_graph, epochs=100, batch_size=batch_size, learning_rate=learning_rate, learning_attenuate_rate=1.0, attenuate_epoch=50, seq_len=seq_len, bptt_tau=bptt_tau, test_size_rate=0.3, computable_loss=MeanSquaredError(), opt_params=decoder_opt_params, verificatable_result=VerificateFunctionApproximation() ) encoder_decoder_controller = EncoderDecoderController( encoder=encoder, decoder=decoder, epochs=epochs, batch_size=batch_size, learning_rate=learning_rate, learning_attenuate_rate=learning_attenuate_rate, attenuate_epoch=attenuate_epoch, test_size_rate=test_size_rate, computable_loss=MeanSquaredError(), verificatable_result=VerificateFunctionApproximation(), tol=tol, tld=tld ) return encoder_decoder_controller ``` -------------------------------- ### EncoderDecoderController Initialization and Build Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/abstractablesemantics/enc_dec_ad.html This section details the initialization of the EncoderDecoderController, including its parameters and the internal method for building the encoder and decoder components. ```APIDOC ## EncoderDecoderController Initialization and Build ### Description Initializes the EncoderDecoderController, which manages the encoder and decoder models for automatic summarization. It allows for custom controller injection or builds a default one with specified parameters. ### Method Constructor ### Parameters - **normal_prior_flag** (bool) - Optional - Flag for normal prior. - **encoder_decoder_controller** (EncoderDecoderController) - Optional - An existing controller instance. - **input_neuron_count** (int) - Optional - Number of neurons in the input layer. Defaults to 20. - **hidden_neuron_count** (int) - Optional - Number of neurons in the hidden layer. Defaults to 20. - **weight_limit** (float) - Optional - Limit for weights. Defaults to 1e+10. - **dropout_rate** (float) - Optional - Dropout rate for regularization. Defaults to 0.5. - **pre_learning_epochs** (int) - Optional - Number of epochs for pre-learning. Defaults to 1000. - **batch_size** (int) - Optional - Batch size for training. Defaults to 20. - **learning_rate** (float) - Optional - Initial learning rate. Defaults to 1e-05. - **attenuate_epoch** (int) - Optional - Epoch interval for learning rate attenuation. Defaults to 50. - **learning_attenuate_rate** (float) - Optional - Factor to attenuate the learning rate. Defaults to 1.0. - **seq_len** (int) - Optional - Length of sequences in the Decoder with Attention model. Defaults to 8. - **bptt_tau** (int) - Optional - Maximum step `t` in Backpropagation Through Time (BPTT). If 0, all past data is used. Defaults to 8. - **test_size_rate** (float) - Optional - Size of the test dataset. If 0, validation is skipped. Defaults to 0.3. - **tol** (float) - Optional - Tolerance for optimization convergence. Defaults to 1e-10. - **tld** (float) - Optional - Tolerance for loss deviation. Defaults to 100.0. ### Internal Method: `__build_encoder_decoder_controller` #### Description This private method constructs the `EncoderGraph` and `DecoderGraph`, setting up activation functions, initialization strategies, and optimizer parameters for both the encoder and decoder components. #### Parameters (used internally by `__build_encoder_decoder_controller`) - **input_neuron_count** (int) - Number of neurons in the input layer. - **hidden_neuron_count** (int) - Number of neurons in the hidden layer. - **weight_limit** (float) - Limit for weights. - **dropout_rate** (float) - Dropout rate for regularization. - **epochs** (int) - Number of epochs for training (set to 100 for encoder). - **batch_size** (int) - Batch size for training. - **learning_rate** (float) - Initial learning rate. - **attenuate_epoch** (int) - Epoch interval for learning rate attenuation. - **learning_attenuate_rate** (float) - Factor to attenuate the learning rate. - **seq_len** (int) - Length of sequences in the Decoder with Attention model. - **bptt_tau** (int) - Maximum step `t` in BPTT. - **test_size_rate** (float) - Size of the test dataset. - **tol** (float) - Tolerance for optimization convergence. - **tld** (float) - Tolerance for loss deviation. #### Encoder Configuration Details: - **Activation Functions**: LogisticFunction for observed, input gate, forget gate, output gate, hidden, and output layers. - **Initialization**: Weights and biases initialized using `np.random.normal(size=hoge) * 0.01`. - **Optimizer**: `EncoderAdam` with specified `weight_limit` and `dropout_rate`. - **Loss Function**: `MeanSquaredError`. - **Verification Function**: `VerificateFunctionApproximation`. #### Decoder Configuration Details: - **Activation Functions**: LogisticFunction for observed activation. ### Notes - The `attenuate_epoch` parameter constrains weight matrices every `attenuate_epoch` for regularization. - If `bptt_tau` is 0, the model refers to all past data in BPTT. - If `test_size_rate` is 0, validation will not be executed. ``` -------------------------------- ### Get Encoder-Decoder Controller Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/abstractablesemantics/re_seq_2_seq.html Returns the encoder-decoder controller object. This getter method is associated with the encoder_decoder_controller property. ```python def get_encoder_decoder_controller(self): ''' getter ''' return self.__encoder_decoder_controller encoder_decoder_controller = property(get_encoder_decoder_controller, set_readonly) ``` -------------------------------- ### Get Token Array Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/vectorizabletoken/t_hot_vectorizer.html Getter method to retrieve the internal array of unique tokens used for vectorization. ```python def get_token_arr(self): ''' getter ''' return self.__token_arr ``` -------------------------------- ### EncoderDecoder Initialization Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/vectorizablesentence/encoder_decoder.html Initializes the EncoderDecoder model and sets up logging. ```APIDOC ## EncoderDecoder() ### Description Initializes the EncoderDecoder model and sets up the logger. ### Method __init__ ### Parameters None ### Request Example None ### Response None ``` -------------------------------- ### Configure Decoder Graph and Initialize RNN Cells Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/abstractablesemantics/enc_dec_ad.html Sets up the activation functions for the decoder's gates and hidden layers, and initializes the RNN cells with specified neuron counts. Weights and biases are initialized using a Gaussian distribution. ```python decoder_graph.input_gate_activating_function = LogisticFunction() decoder_graph.forget_gate_activating_function = LogisticFunction() decoder_graph.output_gate_activating_function = LogisticFunction() decoder_graph.hidden_activating_function = LogisticFunction() decoder_graph.output_activating_function = SoftmaxFunction() # Initialization strategy. # This method initialize each weight matrices and biases in Gaussian distribution: `np.random.normal(size=hoge) * 0.01`. decoder_graph.create_rnn_cells( input_neuron_count=hidden_neuron_count, hidden_neuron_count=hidden_neuron_count, output_neuron_count=input_neuron_count ) ``` -------------------------------- ### Get Logs Array Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/abstractablesemantics/re_seq_2_seq.html Returns the collected training and testing logs as a NumPy array. This getter method is part of the logs_arr property. ```python def get_logs_arr(self): ''' getter ''' return np.array( self.__logs_tuple_list, ) logs_arr = property(get_logs_arr, set_readonly) ``` -------------------------------- ### Configure Encoder Optimizer Parameters Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/vectorizablesentence/encoder_decoder.html Sets up optimization parameters for the encoder, including weight limit and dropout rate, using the EncoderAdam optimizer. ```python encoder_opt_params = EncoderAdam() encoder_opt_params.weight_limit = weight_limit encoder_opt_params.dropout_rate = dropout_rate ``` -------------------------------- ### Calculate Sentence Similarity Source: https://code.accel-brain.com/Automatic-Summarization/README.html Call the 'calculate' method on an instantiated similarity filter object with tokenized sentence lists to get the similarity score. ```python # Tokenized sentences token_list_x = ["Dice", "coefficient", "is", "a", "similarity", "measure", "."] token_list_y = ["Jaccard", "coefficient", "is", "a", "similarity", "measure", "."] # 0.75 similarity_num = similarity_filter.calculate(token_list_x, token_list_y) ``` -------------------------------- ### EncDecAD Initialization Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/abstractablesemantics/_mxnet/enc_dec_ad.html Initializes the EncDecAD model with parameters for training, network architecture, and loss computation. ```APIDOC ## EncDecAD Initialization ### Description Initializes the LSTM-based Encoder/Decoder model for anomaly detection. ### Parameters - **computable_loss** (ComputableLoss) - Optional - Loss function implementation. - **normal_prior_flag** (bool) - Optional - If True, selects abstract sentences with low reconstruction error. - **encoder_decoder_controller** (EncoderDecoderController) - Optional - Controller for the encoder-decoder architecture. - **hidden_neuron_count** (int) - Optional - Number of units in hidden layers (default: 20). - **output_neuron_count** (int) - Optional - Number of units in output layers (default: 20). - **dropout_rate** (float) - Optional - Probability of dropout (default: 0.5). - **epochs** (int) - Optional - Number of training epochs (default: 100). - **batch_size** (int) - Optional - Batch size (default: 20). - **learning_rate** (float) - Optional - Learning rate (default: 1e-05). - **learning_attenuate_rate** (float) - Optional - Factor to attenuate learning rate (default: 1.0). - **attenuate_epoch** (int) - Optional - Epoch interval for learning rate attenuation (default: 50). - **seq_len** (int) - Optional - Sequence length for the decoder (default: 8). ``` -------------------------------- ### Learn features from FeatureGenerator Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/abstractablesemantics/re_seq_2_seq.html Initializes the learning process by validating the feature generator and executing pre-learning epochs before starting the main training loop. ```python def learn_generated(self, feature_generator): ''' Learn features generated by `FeatureGenerator`. Args: feature_generator: is-a `FeatureGenerator`. ''' if isinstance(feature_generator, FeatureGenerator) is False: raise TypeError("The type of `feature_generator` must be `FeatureGenerator`.") # Pre-learning. if self.__pre_learning_epochs > 0: self.__encoder_decoder_controller.learn_generated(feature_generator) learning_rate = self.__learning_rate encoder_best_params_list = [] decoder_best_params_list = [] re_encoder_best_params_list = [] try: self.__change_inferencing_mode(False) self.__memory_tuple_list = [] eary_stop_flag = False loss_list = [] min_loss = None epoch = 0 for batch_observed_arr, batch_target_arr, test_batch_observed_arr, test_batch_target_arr in feature_generator.generate(): epoch += 1 if ((epoch + 1) % self.__attenuate_epoch == 0): learning_rate = learning_rate * self.__learning_attenuate_rate try: _ = self.inference(batch_observed_arr) delta_arr, _, loss = self.compute_retrospective_loss() self.__encoder_decoder_controller.decoder.graph.cec_activity_arr = np.array([]) self.__retrospective_encoder.graph.hidden_activity_arr = np.array([]) self.__retrospective_encoder.graph.cec_activity_arr = np.array([]) except FloatingPointError: if epoch > int(self.__epochs * 0.7): self.__logger.debug( "Underflow occurred when the parameters are being updated. Because of early stopping, this error is catched and the parameter is not updated." ) eary_stop_flag = True break else: self.__logger.debug( "Underflow occurred when the parameters are being updated." ) raise if self.__test_size_rate > 0: rand_index = np.random.choice(test_observed_arr.shape[0], size=self.__batch_size) test_batch_observed_arr = test_observed_arr[rand_index] test_batch_target_arr = test_target_arr[rand_index] self.__change_inferencing_mode(True) _ = self.inference(test_batch_observed_arr) _, _, test_loss = self.compute_retrospective_loss() remember_flag = False if len(loss_list) > 0: if abs(test_loss - (sum(loss_list)/len(loss_list))) > self.__tld: remember_flag = True if remember_flag is True: self.__remember_best_params( encoder_best_params_list, decoder_best_params_list, re_encoder_best_params_list ) # Re-try. _ = self.inference(test_batch_observed_arr) _, _, test_loss = self.compute_retrospective_loss() self.__change_inferencing_mode(False) self.__verificate_retrospective_loss(loss, test_loss) self.__encoder_decoder_controller.encoder.graph.hidden_activity_arr = np.array([]) self.__encoder_decoder_controller.encoder.graph.cec_activity_arr = np.array([]) self.__encoder_decoder_controller.decoder.graph.hidden_activity_arr = np.array([]) self.__encoder_decoder_controller.decoder.graph.cec_activity_arr = np.array([]) if epoch > 1 and abs(loss - loss_list[-1]) < self.__tol: eary_stop_flag = True break loss_list.append(loss) except KeyboardInterrupt: self.__logger.debug("Interrupt.") if eary_stop_flag is True: self.__logger.debug("Early stopping.") eary_stop_flag = False self.__remember_best_params( encoder_best_params_list, decoder_best_params_list, re_encoder_best_params_list ) self.__change_inferencing_mode(True) self.__logger.debug("end. ") ``` -------------------------------- ### Epochs Getter and Setter Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/iteratabledata/token_iterator.html Provides methods to get and set the number of training epochs. The `epochs` property is defined using these getter and setter methods. ```python def get_epochs(self): ''' getter ''' return self.__epochs def set_epochs(self, value): ''' setter ''' self.__epochs = value epochs = property(get_epochs, set_epochs) ``` -------------------------------- ### Configuration Methods Source: https://code.accel-brain.com/Automatic-Summarization/genindex.html Methods for configuring various parameters for tokenization, vectorization, and filtering. ```APIDOC ## set_batch_size() ### Description Sets the batch size for the TokenIterator. ## set_similarity_limit() ### Description Sets the similarity limit for the SimilarityFilter. ## set_top_n() ### Description Sets the top N parameter for the TopNRankAbstractor. ``` -------------------------------- ### Get DBM Feature Point Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/vectorizabletoken/thotvectorizer/dbm_t_hot_vectorizer.html Internal helper method to retrieve the t-hot encoded vector representation for a given token from the DBM's first layer. ```python def __dbm_t_hot(self, token): arr = np.zeros(len(self.token_arr)) key = self.token_arr.tolist().index(token) arr = self.__dbm.get_feature_point(layer_number=0)[key] return arr ``` -------------------------------- ### Import dependencies for re_seq_2_seq Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/abstractablesemantics/re_seq_2_seq.html Initializes the necessary components including LSTM graphs, loss functions, optimizers, and activation functions for the sequence-to-sequence model. ```python # -*- coding: utf-8 -*- from logging import getLogger import numpy as np from pysummarization.abstractable_semantics import AbstractableSemantics from pysummarization.vectorizable_token import VectorizableToken # LSTM Graph which is-a `Synapse`. from pydbm.synapse.recurrenttemporalgraph.lstm_graph import LSTMGraph as EncoderGraph from pydbm.synapse.recurrenttemporalgraph.lstm_graph import LSTMGraph as DecoderGraph from pydbm.synapse.recurrenttemporalgraph.lstm_graph import LSTMGraph as ReEncoderGraph # Loss function. from pydbm.loss.mean_squared_error import MeanSquaredError # Adam as a Loss function. from pydbm.optimization.optparams.nadam import Nadam as EncoderAdam from pydbm.optimization.optparams.nadam import Nadam as DecoderAdam from pydbm.optimization.optparams.nadam import Nadam as ReEncoderAdam # Verification. from pydbm.verification.verificate_function_approximation import VerificateFunctionApproximation # LSTM model. from pydbm.rnn.lstm_model import LSTMModel from pydbm.rnn.lstm_model import LSTMModel as Encoder from pydbm.rnn.lstm_model import LSTMModel as Decoder from pydbm.rnn.lstm_model import LSTMModel as ReEncoder # Logistic Function as activation function. from pydbm.activation.logistic_function import LogisticFunction # Tanh Function as activation function. from pydbm.activation.tanh_function import TanhFunction # Softmax Function as activation function. from pydbm.activation.softmax_function import SoftmaxFunction # Encoder/Decoder from pydbm.rnn.encoder_decoder_controller import EncoderDecoderController ``` -------------------------------- ### Import SGD Optimizer Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/vectorizablesentence/lstm_rtrbm.html Import the Stochastic Gradient Descent optimizer for use in model optimization. ```python from pydbm.optimization.optparams.sgd import SGD ``` -------------------------------- ### Batch Size Getter and Setter Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/iteratabledata/token_iterator.html Provides methods to get and set the batch size for training. The `batch_size` property is defined using these getter and setter methods. ```python def get_batch_size(self): ''' getter ''' return self.__batch_size def set_batch_size(self, value): ''' setter ''' self.__batch_size = value batch_size = property(get_batch_size, set_batch_size) ``` -------------------------------- ### Initialize and Learn with StackedAutoEncoder Source: https://code.accel-brain.com/Automatic-Summarization/_modules/pysummarization/vectorizabletoken/thotvectorizer/dbm_t_hot_vectorizer.html Initializes a StackedAutoEncoder if not provided, configuring activation and approximation functions, and then trains the DBM model with the provided data. Ensure the `dbm` argument, if supplied, is an instance of `StackedAutoEncoder`. ```python from pydbm.activation.logistic_function import LogisticFunction [docs]class DBMTHotVectorizer(THotVectorizer): ''' Vectorize token by t-hot Vectorizer. This class outputs the dimension reduced vectors with Deep Boltzmann Machines as a Stacked Auto Encoder. ''' # is-a `StackedAutoEncoder`. __dbm = None # is-a `ComputableDistance`. __computable_distance = None [docs] def pre_learn( self, hidden_n=100, training_count=1000, batch_size=10, learning_rate=1e-05, dbm=None ): if dbm is not None and isinstance(dbm, StackedAutoEncoder) is False: raise TypeError("The type of `dbm` must be `StackedAutoEncoder`.") vector_arr = np.array(super().vectorize(self.token_arr.tolist())) if dbm is None: # Setting objects for activation function. activation_list = [ LogisticFunction(), LogisticFunction(), LogisticFunction() ] # Setting the object for function approximation. approximaion_list = [ContrastiveDivergence(), ContrastiveDivergence()] dbm = StackedAutoEncoder( DBMMultiLayerBuilder(), [vector_arr.shape[1], hidden_n, vector_arr.shape[1]], activation_list, approximaion_list, learning_rate # Setting learning rate. ) # Execute learning. dbm.learn( vector_arr, training_count=training_count, # If approximation is the Contrastive Divergence, this parameter is `k` in CD method. batch_size=batch_size, # Batch size in mini-batch training. r_batch_size=-1, # if `r_batch_size` > 0, the function of `dbm.learn` is a kind of reccursive learning. sgd_flag=True ) dbm.learn( vector_arr, training_count=1, batch_size=vector_arr.shape[0], r_batch_size=-1, sgd_flag=True ) self.__dbm = dbm ```