### PyTorchModel: Preparation and Setup Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/model_serialization/quick_start.rst This snippet provides the initial setup for a PyTorchModel within the ADS framework. It imports necessary libraries including tempfile, torch, torchvision, and the ADS PyTorchModel class. This code is a starting point for further model preparation, saving, and deployment steps. ```python3 import tempfile import torch import torchvision from ads.catalog.model import ModelCatalog from ads.model.framework.pytorch_model import PyTorchModel ``` -------------------------------- ### Manually Configure core-site.xml for Resource Principal (XML) Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/apachespark/setup-installation.rst Provides an example of a manually configured `core-site.xml` file using resource principals for authentication. This file specifies the Object Storage endpoint and the custom authenticator class. ```xml fs.oci.client.hostname https://objectstorage.us-ashburn-1.oraclecloud.com fs.oci.client.custom.authenticator com.oracle.bmc.hdfs.auth.ResourcePrincipalsCustomAuthenticator ``` -------------------------------- ### Install ADS with Multiple Modules Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/cli/quickstart.rst Installs the oracle-ads package with multiple extra dependencies simultaneously. This allows for a customized installation based on specific project requirements. ```bash $ python3 -m pip install "oracle-ads[notebook,viz,text]" ``` -------------------------------- ### Install oracle-ads Base Package Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/cli/quickstart.rst Installs the core oracle-ads Python package. This is the fundamental installation required to use the ADS SDK. ```bash $ python3 -m pip install oracle-ads ``` -------------------------------- ### Install ADS with Optuna Module Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/cli/quickstart.rst Installs the oracle-ads package with the 'optuna' extra libraries for hyperparameter optimization tasks. It includes the Optuna library and visualization tools. ```bash $ python3 -m pip install "oracle-ads[optuna]" ``` -------------------------------- ### Configure core-site.xml with API Key (Bash) Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/apachespark/setup-installation.rst Automates the configuration of `core-site.xml` for authentication with Object Storage using API keys. This command reads from your OCI configuration file to populate `core-site.xml`. ```bash odsc core-site config -o ``` -------------------------------- ### Configure core-site.xml with Resource Principal (Bash) Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/apachespark/setup-installation.rst Automates the configuration of `core-site.xml` for authentication with Object Storage using resource principals. This command populates the necessary values to connect to Object Storage. ```bash odsc core-site config -a resource_principal ``` -------------------------------- ### Diagnose Infrastructure Setup using ADS CLI Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/model_training/distributed_training/developer/developer.rst Runs a diagnosis of the infrastructure setup for distributed training on OCI Data Science. It starts a single-node 'jobrun' using the container image specified in the train.yaml file. The output is saved to a specified HTML file. ```bash ads opctl check -f train.yaml --output infra_report.html ``` -------------------------------- ### Install Feature Store Helm Chart Source: https://github.com/oracle/accelerated-data-science/blob/main/ads/feature_store/docs/source/user_guides.setup.helm_chart.rst Command to upgrade or install the Feature Store Helm chart. It requires the application name, Helm chart image path, Kubernetes namespace, path to the custom values.yaml file, and the marketplace version. The --wait flag ensures the command waits for resources to be ready. ```bash helm upgrade oci:// --namespace --values --timeout 300s --wait -i --version ``` -------------------------------- ### Install Oracle ADS using pip Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/operators/recommender_operator/quickstart.rst Installs the oracle_ads Python library using pip. Ensure Python 3 is available and pip is configured. ```bash python3 -m pip install "oracle_ads" ``` -------------------------------- ### Install ADS with Text Module Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/cli/quickstart.rst Installs the oracle-ads package with the 'text' extra libraries for natural language processing tasks. This includes libraries like wordcloud and spacy. ```bash $ python3 -m pip install "oracle-ads[text]" ``` -------------------------------- ### Install Multiple ADS SDK Extra Dependencies Source: https://github.com/oracle/accelerated-data-science/blob/main/README.md Demonstrates how to install multiple optional dependencies for the ADS SDK simultaneously. This example installs the 'notebook', 'viz', and 'text' modules in a single pip command. ```bash python3 -m pip install 'oracle-ads[notebook,viz,text]' ``` -------------------------------- ### Install ADS with ONNX Module Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/cli/quickstart.rst Installs the oracle-ads package with the 'onnx' extra libraries for working with ONNX-compatible runtimes. This module facilitates model portability and performance optimization. ```bash $ python3 -m pip install "oracle-ads[onnx]" ``` -------------------------------- ### Install ADS with Torch Module Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/cli/quickstart.rst Installs the oracle-ads package with the 'torch' extra libraries for PyTorch deep learning tasks. This includes PyTorch and visualization utilities. ```bash $ python3 -m pip install "oracle-ads[torch]" ``` -------------------------------- ### Configure Anomaly Detection Job YAML Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/operators/anomaly_detection_operator/quickstart.rst Provides an example of the 'anomaly.yaml' configuration file. This file specifies data input, model type, and column mappings for the anomaly detection job. Users should update this file with their specific data details. ```yaml kind: operator type: anomaly version: v1 spec: datetime_column: name: timestamp target_category_columns: - series_id input_data: url: https://raw.githubusercontent.com/oracle/accelerated-data-science/refs/heads/main/ads/opctl/operator/common/data/synthetic.csv model: autots target_column: target ``` -------------------------------- ### Install ADS CLI Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/cli/quickstart.rst Installs the ADS CLI along with its operational control (opctl) capabilities. This CLI is used for setting up development environments, building Docker images, and managing Data Science Jobs. It requires Python 3.8-3.10. ```shell python3 -m pip install "oracle-ads[opctl]" ``` -------------------------------- ### Install ADS with Viz Module Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/cli/quickstart.rst Installs the oracle-ads package with the 'viz' extra libraries for data visualization tasks. This module includes popular plotting libraries like Bokeh and Seaborn. ```bash $ python3 -m pip install "oracle-ads[viz]" ``` -------------------------------- ### Install ADS with Boosted Module Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/cli/quickstart.rst Installs the oracle-ads package with the 'boosted' extra libraries. This module is necessary for working with gradient boosting models, including XGBoost and LightGBM. ```bash $ python3 -m pip install "oracle-ads[boosted]" ``` -------------------------------- ### Install ADS Pipeline Extension (Magic Command) Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/pipeline/quick_start.rst Installs the ADS pipeline extension, enabling the use of ADS Magic Commands within a notebook environment. This is a prerequisite for using other pipeline magic commands. ```python %load_ext ads.pipeline.extension ``` -------------------------------- ### Install ADS with Spark Module Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/cli/quickstart.rst Installs the oracle-ads package with the 'spark' extra libraries for Apache Spark tasks. This enables distributed data processing and analytics within ADS. ```bash $ python3 -m pip install "oracle-ads[spark]" ``` -------------------------------- ### Install ADS with Data Module Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/cli/quickstart.rst Installs the oracle-ads package with the 'data' extra libraries for working with various data formats like Excel and Avro. It includes libraries for file parsing and data handling. ```bash $ python3 -m pip install "oracle-ads[data]" ``` -------------------------------- ### Install ADS with Notebook Module Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/cli/quickstart.rst Installs the oracle-ads package with the 'notebook' extra libraries for use within Oracle Cloud Infrastructure Data Science Notebook Sessions. It includes essential Jupyter and IPython libraries. ```bash $ python3 -m pip install "oracle-ads[notebook]" ``` -------------------------------- ### Run PII Operator Command Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/operators/pii_operator/getting_started.rst This command executes the PII operator using a specified configuration file. It takes the path to the YAML configuration file as an argument. Ensure the `pii.yaml` file is correctly formatted and contains all necessary parameters before running. ```bash ads operator run -f pii.yaml ``` -------------------------------- ### Configure Recommender Operator YAML Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/operators/recommender_operator/quickstart.rst Example configuration for the Recommender Operator, specifying input data files (users, items, interactions) and interaction parameters like top_k and column names. This YAML file is crucial for defining the recommender job's behavior. ```yaml kind: operator type: recommendation version: v1 spec: user_data: url: users.csv item_data: url: items.csv interactions_data: url: interactions.csv top_k: 4 user_column: user_id item_column: movie_id interaction_column: rating ``` -------------------------------- ### Install Miniconda for Linux Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/README.md Downloads and installs the latest Miniconda installer for Linux systems. This script fetches the installer using curl and then executes it using bash. It's a prerequisite for setting up the conda environment. ```bash curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -o Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-x86_64.sh ``` -------------------------------- ### Run Default ADS SDK Unit Tests (Bash) Source: https://github.com/oracle/accelerated-data-science/blob/main/README-development.md Installs test dependencies from 'test-requirements.txt' and then executes the default setup unit tests for the ADS SDK using pytest. These tests verify core functionality without optional dependencies. ```bash pip install -r test-requirements.txt python3 -m pytest tests/unitary/default_setup ``` -------------------------------- ### Install ADS SDK with PyTorch Libraries Source: https://github.com/oracle/accelerated-data-science/blob/main/README.md Installs the 'torch' module for the ADS SDK, which includes PyTorch and libraries from the 'viz' module for visualization. Installation is handled by pip. ```bash python3 -m pip install 'oracle-ads[torch]' ``` -------------------------------- ### Initialize ADS SDK and Set Debug Mode Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/quickstart/quickstart.rst Imports the ADS SDK and provides a function to toggle debug mode for enhanced logging. Debug mode is useful for troubleshooting. ```python import ads # Turn debug mode on or off with: ads.set_debug_mode(True) ``` -------------------------------- ### Starting the AI Quick Actions API Server Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/aqua/apiserver.rst This command demonstrates how to start the AI Quick Actions API server using Python. Ensure you are in the same directory as your .env file for the configuration to be applied. ```python python -m ads.aqua.server ``` -------------------------------- ### Install ADS SDK with Visualization Libraries Source: https://github.com/oracle/accelerated-data-science/blob/main/README.md Installs the 'viz' module for the ADS SDK, which provides libraries for various visualization tasks. Key packages include bokeh, folium, and seaborn. Installation is performed using pip. ```bash python3 -m pip install 'oracle-ads[viz]' ``` -------------------------------- ### Train and Prepare LightGBM Model for Deployment Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/model_registration/frameworks/lightgbmmodel.rst This example demonstrates training a LightGBM model using scikit-learn's make_classification dataset. It includes data splitting for training and testing and imports necessary libraries for model training and metadata handling. ```python3 from ads.model.framework.lightgbm_model import LightGBMModel from ads.common.model_metadata import UseCaseType import lightgbm as lgb from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split import tempfile seed = 42 # Create a classification dataset X, y = make_classification(n_samples=10000, n_features=15, n_classes=2, flip_y=0.05) trainx, testx, trainy, testy = train_test_split(X, y, test_size=30, random_state=seed) # Train LGBM model ``` -------------------------------- ### Complete ML Workflow Integration Example Source: https://context7.com/oracle/accelerated-data-science/llms.txt Demonstrates a full ML workflow starting with authentication setup, followed by loading data from a feature store into a pandas DataFrame. This snippet sets the stage for subsequent model training and deployment steps. ```python import ads from ads.jobs import Job, DataScienceJob, PythonRuntime from ads.pipeline import Pipeline, PipelineStep from ads.model.framework.sklearn_model import SklearnModel from ads.feature_store import FeatureStore, Dataset import pandas as pd from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split import tempfile # 1. Authentication Setup ads.set_auth("resource_principal") # 2. Load data from Feature Store feature_store = FeatureStore.from_id( "ocid1.featurestore.oc1.iad.xxxxx" ) dataset = Dataset.from_id("ocid1.dataset.oc1.iad.xxxxx") df = dataset.as_df() ``` -------------------------------- ### Configure Application Environment and Probes (YAML) Source: https://github.com/oracle/accelerated-data-science/blob/main/ads/feature_store/docs/source/user_guides.setup.helm_chart.rst Sets environment variables for the application container and configures liveness and readiness probes to manage pod health and availability. ```yaml applicationEnv: containerName: #Container name livenessProbe: # Liveness probe details initialDelaySeconds: periodSeconds: timeoutSeconds: failureThreshold: readinessProbe: # Readiness probe details initialDelaySeconds: periodSeconds: timeoutSeconds: failureThreshold: ``` -------------------------------- ### Install ADS with TensorFlow Module Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/cli/quickstart.rst Installs the oracle-ads package with the 'tensorflow' extra libraries for deep learning tasks. This includes TensorFlow and visualization utilities. ```bash $ python3 -m pip install "oracle-ads[tensorflow]" ``` -------------------------------- ### Fill spaCy Configuration File (Bash) Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/data_labeling/example.rst This bash command initializes a new Python interpreter to run the spaCy module. It takes a 'base_config.cfg' file and fills in the default values to create a complete 'config.cfg' file, which contains all hyperparameters and settings required for training. ```bash !$CONDA_PREFIX/bin/python -m spacy init fill-config ~/base_config.cfg ~/config.cfg ``` -------------------------------- ### Install ADS with Geo Module Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/cli/quickstart.rst Installs the oracle-ads package with the 'geo' extra libraries for geospatial data analysis. This includes geopandas and visualization libraries. ```bash $ python3 -m pip install "oracle-ads[geo]" ``` -------------------------------- ### Install ADS SDK with Optuna Libraries Source: https://github.com/oracle/accelerated-data-science/blob/main/README.md Installs the 'optuna' module for the ADS SDK, which is used for hyperparameter optimization tasks. This module includes the optuna library and libraries from the 'viz' module. Installation is handled by pip. ```bash python3 -m pip install 'oracle-ads[optuna]' ``` -------------------------------- ### GenericModel Shortcut: Prepare, Save, Deploy Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/model_serialization/quick_start.rst Demonstrates a shortcut for model management using `prepare_save_deploy` method of `GenericModel`. This method combines preparation, saving to the model catalog, and deployment into a single step, followed by prediction and deployment cleanup. This is useful for streamlining common model workflows. ```python import tempfile from ads.catalog.model import ModelCatalog from ads.model.generic_model import GenericModel class Toy: def predict(self, x): pass # Note: The provided snippet is incomplete and requires further context to be fully functional. ``` -------------------------------- ### Configure PII Operator with YAML Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/operators/pii_operator/getting_started.rst This snippet shows the basic structure of a `pii.yaml` file used to configure the PII operator. It specifies input data path, target column, output directory, and detector configurations. Additional parameters like `show_sensitive_content` and `action` can be included for more advanced control. ```yaml kind: operator type: pii version: v1 spec: input_data: url: mydata.csv target_column: target output_directory: url: result/ detectors: - name: default.phone action: mask ``` -------------------------------- ### Prepare Model Artifacts for LightGBM Deployment Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/model_registration/frameworks/lightgbmmodel.rst This code prepares the necessary artifacts for deploying a trained LightGBM model using the `LightGBMModel` class. It requires the trained estimator, an artifact directory, sample data for inference, and the use case type. This step is crucial for packaging the model for production. ```python from ads.common.model_metadata import UseCaseType from ads.model.framework.lightgbm_model import LightGBMModel import tempfile artifact_dir = tempfile.mkdtemp() lightgbm_model = LightGBMModel(estimator=model, artifact_dir=artifact_dir) lightgbm_model.prepare( inference_conda_env="generalml_p38_cpu_v1", training_conda_env="generalml_p38_cpu_v1", X_sample=trainx, y_sample=trainy, use_case_type=UseCaseType.BINARY_CLASSIFICATION, ) ``` -------------------------------- ### Building and Running AI Quick Actions Docker Image Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/aqua/apiserver.rst These commands show how to build a Docker image for the AI Quick Actions API server and then run it. The 'docker run' command mounts the .env file and OCI configuration directory into the container and maps the API port. ```shell TAG=1.0 docker build -t aqua:$TAG . KEY_PATH=$HOME/.oci docker run --rm -it --env-file .env \ -v ~/.oci:/root/.oci \ -v $KEY_PATH:$KEY_PATH \ -p 8080:8080 aqua:$TAG ``` -------------------------------- ### Install ADS SDK with ONNX Libraries Source: https://github.com/oracle/accelerated-data-science/blob/main/README.md Installs the 'onnx' module for the ADS SDK, providing support for ONNX-compatible runtimes and libraries for performance optimization and model portability. It includes onnx, onnxruntime, onnxxmltools, skl2onnx, xgboost, lightgbm, and viz module libraries. Installation is done via pip. ```bash python3 -m pip install 'oracle-ads[onnx]' ``` -------------------------------- ### Deploy LightGBM Model with ADS Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/model_registration/quick_start.rst This example illustrates how to use ADS to prepare, verify, and register a LightGBM model. It automates the creation of deployment artifacts and requires libraries such as ads, lightgbm, and sklearn. ```python import tempfile import ads import lightgbm as lgb from ads.model.framework.lightgbm_model import LightGBMModel from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split ads.set_auth(auth="resource_principal") # Load dataset and Prepare train and test split iris = load_iris() X, y = iris.data, iris.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25) # Train a XBoost Classifier model train = lgb.Dataset(X_train, label=y_train) param = { 'objective': 'multiclass', 'num_class': 3, } lightgbm_estimator = lgb.train(param, train) # Instantiate ads.model.lightgbm_model.XGBoostModel using the trained LGBM Model lightgbm_model = LightGBMModel(estimator=lightgbm_estimator, artifact_dir=tempfile.mkdtemp()) # Autogenerate score.py, serialized model, runtime.yaml, input_schema.json and output_schema.json lightgbm_model.prepare( inference_conda_env="generalml_p38_cpu_v1", X_sample=X_train, y_sample=y_train, ) # Verify generated artifacts lightgbm_model.verify(X_test) # Register LightGBM model model_id = lightgbm_model.save(display_name="LightGBM Model") ``` -------------------------------- ### Install ADS SDK with Text Processing Libraries Source: https://github.com/oracle/accelerated-data-science/blob/main/README.md Installs the 'text' module for the ADS SDK, providing libraries for text-related tasks, including wordcloud and spacy. Installation is done via pip. ```bash python3 -m pip install 'oracle-ads[text]' ``` -------------------------------- ### Install PyTorch Tensorboard Profiler Plugin Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/model_training/distributed_training/pytorch/creating.rst Install the necessary PyTorch Tensorboard plugin to visualize profiling data collected by the PyTorch Profiler. This command-line instruction is required before viewing logs using Tensorboard. ```bash pip install torch-tb-profiler ``` -------------------------------- ### Install ADS SDK with TensorFlow Libraries Source: https://github.com/oracle/accelerated-data-science/blob/main/README.md Installs the 'tensorflow' module for the ADS SDK, which includes TensorFlow and libraries from the 'viz' module for visualization. Installation is performed using pip. ```bash python3 -m pip install 'oracle-ads[tensorflow]' ``` -------------------------------- ### Download TensorFlow TensorBoard Notebook Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/model_training/tensorboard/tensorboard.rst This command uses `wget` to download a Jupyter notebook that demonstrates running a TensorFlow experiment and setting up TensorBoard logs. The notebook is hosted on GitHub. ```shell !wget https://raw.githubusercontent.com/mayoor/stats-ml-exps/master/tensorboard_tf.ipynb ``` -------------------------------- ### Install PySpark Conda Environment using ADS CLI Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/apachespark/setup-installation.rst Installs a PySpark conda environment from a published OCI URI. This command requires the OCI URI of the conda pack to be installed. Ensure the conda pack is published to your object storage. ```shell ads conda install oci://mybucket@mynamespace/path/to/pyspark/env ``` -------------------------------- ### PII Operator Configuration Example (YAML) Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/operators/pii_operator/pii.rst An example pii.yaml file demonstrating all available parameters for configuring the PII operator. This includes settings for input data, output directory, reporting, and detectors. ```yaml kind: operator type: pii version: v1 spec: output_directory: url: oci://my-bucket@my-tenancy/results name: mydata-out.csv report: report_filename: report.html show_rows: 10 show_sensitive_content: true input_data: url: oci://my-bucket@my-tenancy/mydata.csv target_column: target detectors: - name: default.phone action: anonymize ``` -------------------------------- ### Install ADS with BDS Module Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/cli/quickstart.rst Installs the oracle-ads package with the 'bds' extra libraries for Oracle Big Data Service (BDS) use cases. This includes libraries for interacting with Impala, HDFS, and SQL. ```bash $ python3 -m pip install "oracle-ads[bds]" ``` -------------------------------- ### Install PySpark Conda Environment (Shell) Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/apachespark/setup-installation.rst Installs a specified PySpark conda environment using the `odsc` command-line tool. This is a prerequisite for setting up the PySpark environment in an OCI Notebook Session. ```shell odsc conda install -s pyspark30_p37_cpu_v5 ``` -------------------------------- ### Prepare and Deploy HuggingFace Pipeline Model Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/model_registration/quick_start.rst This snippet demonstrates preparing a HuggingFace pipeline model by specifying inference environment and Python version, converting data to bytes, verifying the model, and deploying it. It also shows how to invoke the deployed model using both raw data and byte streams. Dependencies include 'oci', 'ads', and 'cloudpickle'. ```python import requests from oci.auth import default_signer conda_pack_path = "oci://bucket@namespace/path/to/conda/pack" python_version = "3.8" # Remember to update 3.x with your actual python version, e.g. 3.8 zero_shot_image_classification_model.prepare(inference_conda_env=conda_pack_path, inference_python_version = python_version, force_overwrite=True) ## Convert payload to bytes data = {"images": image, "candidate_labels": ["animals", "humans", "landscape"]} body = cloudpickle.dumps(data) # convert image to bytes # Verify generated artifacts zero_shot_image_classification_model.verify(data=data) zero_shot_image_classification_model.verify(data=body) # Register HuggingFace Pipeline model zero_shot_image_classification_model.save() ## Deploy log_group_id = "" log_id = "" zero_shot_image_classification_model.deploy(deployment_bandwidth_mbps=100, wait_for_completion=False, deployment_log_group_id = log_group_id, deployment_access_log_id = log_id, deployment_predict_log_id = log_id) zero_shot_shot_image_classification_model.predict(data) zero_shot_image_classification_model.predict(body) ### Invoke the model by sending bytes auth = default_signer()['signer'] endpoint = zero_shot_image_classification_model.model_deployment.url + "/predict" headers = {"Content-Type": "application/octet-stream"} requests.post(endpoint, data=body, auth=auth, headers=headers).json() ``` -------------------------------- ### Launch TensorBoard with Object Storage Logs (PyTorch Example) Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/model_training/tensorboard/tensorboard.rst This command launches a TensorBoard session on a local workstation to view logs written to OCI object storage from a PyTorch experiment. It uses `OCIFS_IAM_TYPE=api_key` and specifies the log directory using an OCI URL. ```shell OCIFS_IAM_TYPE=api_key tensorboard --logdir "oci://my-bucket@my-namespace/path/to/logs" ``` -------------------------------- ### Initialize Recommender Operator Configuration Source: https://github.com/oracle/accelerated-data-science/blob/main/ads/opctl/operator/lowcode/recommender/README.md Generates starter configuration files for the Recommender Operator. This command creates a set of YAML files in the specified output directory, which can then be customized for different deployment environments. ```bash ads operator init -t recommender --overwrite --output ~/recommender/ ``` -------------------------------- ### Run Anomaly Detection Job Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/operators/anomaly_detection_operator/quickstart.rst Executes the anomaly detection job locally using the specified 'anomaly.yaml' configuration file. This command starts the analysis process. ```bash ads operator run -f anomaly.yaml ``` -------------------------------- ### Create DocBin and Save to Disk (Python) Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/data_labeling/example.rst This Python snippet demonstrates how to initialize a spaCy DocBin object, process text data with annotations, and save the resulting DocBin object to a file. It requires the spaCy library and iterates through training data to create spaCy Doc objects, adding them to the DocBin before disk serialization. ```python3 nlp = spacy.blank("en") # load a new spacy model db = DocBin() # create a DocBin object i=0 for text, annot in tqdm(train_data): # data in previous format doc = nlp.make_doc(text) # create doc object from text ents = [] for start, end, label in annot["entities"]: span = doc.char_span(start, end, label=label, alignment_mode="contract") if span is not None: ents.append(span) doc.ents = ents # label the text with the ents db.add(doc) db.to_disk(os.path.join(os.path.expanduser("~"), "train.spacy")) # save the docbin object ``` -------------------------------- ### Install Conda Pack from Object Storage URI using ads opctl Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/cli/opctl/localdev/condapack.rst Installs a conda pack using its Object Storage URI. The installed pack can then be used within a Docker image, for example, in Visual Studio Code for local testing before deployment to OCI services. ```shell ads opctl conda install -u "oci://mybucket@namespace/conda_environment/path/to/my/conda" ``` -------------------------------- ### Full Local Container Configuration Example Source: https://github.com/oracle/accelerated-data-science/blob/main/ads/opctl/operator/lowcode/recommender/README.md A complete YAML configuration file for running the recommender operator in a local container. It specifies the image to use and the necessary volume mounts for OCI configuration, input data, and output directories. ```yaml kind: operator.local spec: image: recommender:v1 volume: - /Users//.oci:/root/.oci - /Users//recommender/data:/etc/operator/data - /Users//recommender/result:/etc/operator/result type: container version: v1 ``` -------------------------------- ### Launch TensorBoard Session with Object Storage Logs Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/model_training/tensorboard/tensorboard.rst This command launches a TensorBoard session on a local workstation, pointing to logs stored in an Oracle Cloud Infrastructure (OCI) object storage bucket. It requires setting the OCIFS_IAM_KEY environment variable and specifies the log directory using an OCI URL format. ```shell export OCIFS_IAM_KEY=api_key # If you are using resource principal, set resource_principal tensorboard --logdir oci://my-bucket@my-namespace/path/to/logs ``` -------------------------------- ### Publish Conda Pack using ADS CLI Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/apachespark/setup-installation.rst Publishes a created conda pack, making it available for installation via OCI URIs. This command is essential for sharing or reusing custom environments. ```shell ads publish -s pysparkenv ``` -------------------------------- ### LightGBMModel: Create, Deploy, Predict, and Delete Model Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/model_serialization/quick_start.rst This snippet shows the complete workflow for a LightGBM model using ADS. It covers training a LightGBM model, preparing it for deployment with ADS, verifying its predictions, saving it, deploying it, making predictions on test data, and finally removing the deployment and model artifacts. Dependencies include lightgbm, pandas, and ADS. ```python3 import lightgbm as lgb import tempfile from ads.catalog.model import ModelCatalog from ads.model.framework.lightgbm_model import LightGBMModel from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split iris = load_iris() X, y = iris.data, iris.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25) train = lgb.Dataset(X_train, label=y_train) param = { 'objective': 'multiclass', 'num_class': 3, } lightgbm_estimator = lgb.train(param, train) lightgbm_model = LightGBMModel(estimator=lightgbm_estimator, artifact_dir=tempfile.mkdtemp()) lightgbm_model.prepare(inference_conda_env="generalml_p37_cpu_v1") lightgbm_model.verify(X_test) model_id = lightgbm_model.save() model_deployment = lightgbm_model.deploy() lightgbm_model.predict(X_test) lightgbm_model.delete_deployment(wait_for_completion=True) lightgbm_model.delete() ``` -------------------------------- ### Install Model Introspection Test Dependencies Source: https://github.com/oracle/accelerated-data-science/blob/main/ads/model/model_artifact_boilerplate/README.md Installs necessary Python libraries (`pyyaml`, `requests`) for running model artifact introspection tests. This is a one-time setup operation required before executing the validation scripts. ```bash python3 -m pip install --user -r artifact_introspection_test/requirements.txt ``` -------------------------------- ### Prepare and Register GenericModel within Model Version Set Context Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/model_registration/quick_start.rst This snippet demonstrates preparing and registering a custom `GenericModel` within a model version set context. It includes defining a custom model, instantiating `GenericModel`, preparing artifacts, verifying the model, and saving it to a specified or newly created model version set. Dependencies include 'ads' and 'tempfile'. ```python import tempfile from ads.model.generic_model import GenericModel # Create custom framework model class Toy: def predict(self, x): return x ** 2 model = Toy() # Instantite ads.model.generic_model.GenericModel using the trained Custom Model generic_model = GenericModel(estimator=model, artifact_dir=tempfile.mkdtemp()) generic_model.summary_status() # Within the context manager, you can save the :ref:`Model Serialization` model without specifying the ``model_version_set`` parameter because it's taken from the model context manager. If the model version set doesn't exist in the model catalog, the example creates a model version set named ``my_model_version_set``. If the model version set exists in the model catalog, the models are saved to that model version set. with ads.model.experiment(name="my_model_version_set", create_if_not_exists=True): # Autogenerate score.py, pickled model, runtime.yaml, input_schema.json and output_schema.json generic_model.prepare( inference_conda_env="dbexp_p38_cpu_v1", model_file_name="toy_model.pkl", force_overwrite=True ) # Check if the artifacts are generated correctly. # The verify method invokes the ``predict`` function defined inside ``score.py`` in the artifact_dir generic_model.verify([2]) # Register the model model_id = generic_model.save(display_name="Custom Framework Model") ``` -------------------------------- ### Initialize Oracle ADS SDK Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/index.rst This snippet demonstrates the basic initialization of the Oracle Accelerated Data Science (ADS) SDK by importing the library and calling the 'hello()' function. This is typically the first step to ensure the SDK is correctly installed and accessible. ```python import ads ads.hello() ``` -------------------------------- ### Install and activate forecast Conda environment in notebook Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/operators/forecast_operator/quickstart.rst These commands are used within an Oracle Data Science (ODSC) notebook session to install a specific forecasting Conda environment and then activate it for use. This ensures the correct packages and dependencies are available. ```bash odsc conda install -s forecast_v3 conda activate /home/datascience/forecast_v3 ``` -------------------------------- ### Complete PyTorch MNIST Elastic Example (train.py) Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/model_training/distributed_training/horovod/coding.rst This is a comprehensive PyTorch script for distributed MNIST training using Horovod with elastic capabilities. It includes argument parsing, Horovod initialization, device configuration, optimizer setup, and data loading, inspired by Horovod's examples. ```python # Script adapted from https://github.com/horovod/horovod/blob/master/examples/elastic/pytorch/pytorch_mnist_elastic.py # ============================================================================== import argparse import os from filelock import FileLock import torch.nn as nn import torch.nn.functional as F import torch.optim as optim from torchvision import datasets, transforms import torch.utils.data.distributed import horovod.torch as hvd from torch.utils.tensorboard import SummaryWriter # Training settings parser = argparse.ArgumentParser(description='PyTorch MNIST Example') parser.add_argument('--batch-size', type=int, default=64, metavar='N', help='input batch size for training (default: 64)') parser.add_argument('--test-batch-size', type=int, default=1000, metavar='N', help='input batch size for testing (default: 1000)') ``` -------------------------------- ### Get Warnings for a Specific Custom Feature Type Object Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/feature_type/ads_feature_type_warnings.rst This Python code example shows how to obtain a list of warnings associated with a specific custom feature type. It first gets a handle to the feature type object using `feature_type_manager.feature_type_object()` and then retrieves the associated warnings as a Pandas DataFrame. ```python from ads.feature_engineering import feature_type_manager, Tag # Assuming a custom feature type named 'credit_card' is registered credit_card_feature_type = feature_type_manager.feature_type_object('credit_card') credit_card_feature_type.warning_registered() ``` -------------------------------- ### Manage Hugging Face Pipelines with ADS Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/model_registration/quick_start.rst This example demonstrates how to use a pre-trained Hugging Face pipeline for zero-shot image classification. It shows downloading an image, initializing the pipeline, performing classification, and then wrapping the pipeline using ADS's HuggingFacePipelineModel. ```python3 from transformers import pipeline from ads.model import HuggingFacePipelineModel import tempfile import PIL.Image from ads.common.auth import default_signer import requests import cloudpickle ## download the image image_url = "https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png" image = PIL.Image.open(requests.get(image_url, stream=True).raw) ## download the pretrained model classifier = pipeline(model="openai/clip-vit-large-patch14") classifier( images=image, candidate_labels=["animals", "humans", "landscape"], ) ## Initiate a HuggingFacePipelineModel instance zero_shot_image_classification_model = HuggingFacePipelineModel(classifier, artifact_dir=tempfile.mkdtemp()) ``` -------------------------------- ### Upgrade oracle-ads SDK Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/cli/quickstart.rst Upgrades the existing oracle-ads Python package to the latest version. This command is typically run within a data science conda environment. ```bash $ python3 -m pip install oracle-ads --upgrade ``` -------------------------------- ### Dockerfile for AI Quick Actions API Server Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/aqua/apiserver.rst This Dockerfile defines the environment for building a Docker image of the AI Quick Actions API server. It installs the necessary oracle-ads package with aqua support and sets the command to run the server. ```docker FROM ghcr.io/oracle/oraclelinux8-python:3.11-oracledb RUN pip3 install "oracle-ads[aqua]" CMD ["python3.11","-m","ads.aqua.server"] ``` -------------------------------- ### Install fspyspark32 Plugin via ODSC Source: https://github.com/oracle/accelerated-data-science/blob/main/ads/feature_store/docs/source/quickstart.rst This command installs the `fspyspark32_p38_cpu_v2` plugin using the Oracle Data Science Cloud (ODSC) command-line interface within a notebook session's terminal. This plugin is essential for using Spark 3.2 with Python 3.8 CPU environments for data science tasks. ```shell odsc conda install -s fspyspark32_p38_cpu_v3 ``` -------------------------------- ### Initialize a new forecast project using ADS CLI Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/operators/forecast_operator/quickstart.rst Initializes a new forecast project using the Accelerated Data Science (ADS) Command Line Interface (CLI). This command creates a directory containing necessary configuration files for running forecasts on OCI Data Science Jobs. ```bash ads operator init -t forecast --output my-forecast ``` -------------------------------- ### Initialize Oracle AutoMLx Engine Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/model_training/automl/quick_start.rst Initializes the Oracle AutoMLx engine. It supports 'dask' (default) or 'local' parallel engines. The example shows how to set the engine to 'local'. ```python3 import automl from automl import init init(engine='local') [2023-01-12 05:48:31,814] [automl.xengine] Local ProcessPool execution (n_jobs=36) ``` -------------------------------- ### Prepare and Deploy AutoML Model using GenericModel in Python Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/model_registration/frameworks/automlmodel.rst This Python code demonstrates how to train an AutoML model using the `automl` library and then deploy it using the `GenericModel` class from ADS. It includes data loading, preprocessing, model training, and model preparation for deployment, specifying conda environments, use case type, and sample data for schema generation. The `prepare` method handles setting up the model artifact directory. ```python3 import pandas as pd import numpy as np from sklearn.datasets import fetch_openml from sklearn.model_selection import train_test_split import ads import automl from automl import init from ads.model import GenericModel from ads.common.model_metadata import UseCaseType dataset = fetch_openml(name='adult', as_frame=True) df, y = dataset.data, dataset.target # Several of the columns are incorrectly labeled as category type in the original dataset numeric_columns = ['age', 'capitalgain', 'capitalloss', 'hoursperweek'] for col in df.columns: if col in numeric_columns: df[col] = df[col].astype(int) X_train, X_test, y_train, y_test = train_test_split(df, y.map({'>50K': 1, '<=50K': 0}).astype(int), train_size=0.7, random_state=0) init(engine='local') est = automl.Pipeline(task='classification') est.fit(X_train, y_train) ads.set_auth("resource_principal") automl_model = GenericModel(estimator=est, artifact_dir="automl_model_artifact") automl_model.prepare(inference_conda_env="automlx_p38_cpu_v1", training_conda_env="automlx_p38_cpu_v1", use_case_type=UseCaseType.BINARY_CLASSIFICATION, X_sample=X_test, force_overwrite=True) ``` -------------------------------- ### View Recommender Results Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/operators/recommender_operator/quickstart.rst Commands to view the generated recommendations and performance report. Recommendations are typically saved in a CSV file, and performance is summarized in an HTML report. ```bash vi results/recommendations.csv open results/report.html ``` -------------------------------- ### Initialize Anomaly Detection Job Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/operators/anomaly_detection_operator/quickstart.rst Initializes a new anomaly detection job using the ads cli command. This creates a directory structure and configuration files for your job. ```bash ads operator init -t anomaly ``` -------------------------------- ### Get ADS SDK Version in Python Source: https://github.com/oracle/accelerated-data-science/blob/main/tests/integration/fixtures/job_archive/test_notebook.ipynb Retrieves and displays the installed version of the Oracle Accelerated Data Science (ADS) SDK. This is useful for ensuring compatibility and tracking the SDK's evolution. ```python ads.__version__ ``` -------------------------------- ### Create Conda Pack using ADS CLI Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/apachespark/setup-installation.rst Creates a conda pack based on a provided YAML configuration file. This command utilizes the specified YAML file to build the environment. ```shell ads create -f pyspark.yaml ``` -------------------------------- ### Activate Conda Environment and Upgrade Oracle-ADS (Shell) Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/apachespark/setup-installation.rst Activates a specific conda environment and upgrades the `oracle-ads` library with optional components. This ensures you have the latest features and compatibility for your data science tasks. ```shell conda activate /home/datascience/conda/pyspark30_p37_cpu_v5 pip install "oracle-ads[data_science, data, opctl]" --upgrade ``` -------------------------------- ### Model Preparation, Saving, and Deployment Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/model_registration/_template/prepare_save_deploy.rst The `.prepare_save_deploy()` method is a convenient shortcut for `.prepare()`, `.save()`, and `.deploy()`. It returns a `ModelDeployment` object and supports all frameworks. This method accepts various parameters to customize the deployment process, including environment configuration, model details, and deployment instance specifications. ```APIDOC ## POST /oracle/accelerated-data-science/prepare_save_deploy ### Description Prepares, saves, and deploys a model. This method is a shortcut for `.prepare()`, `.save()`, and `.deploy()`. It returns a `ModelDeployment` object and is available for all frameworks. ### Method POST ### Endpoint `/oracle/accelerated-data-science/prepare_save_deploy` ### Parameters #### Query Parameters - **inference_conda_env** (str, optional) - The conda pack slug or object storage path for inference. Service pack slugs can only be used if the conda pack is a service pack. - **inference_python_version** (str, optional) - The Python version for the deployment. - **training_conda_env** (str, optional) - The conda pack slug or object storage path for training. Service pack slugs can only be used if the conda pack is a service pack. - **training_python_version** (str, optional) - The Python version for training. - **model_file_name** (str) - The name of the serialized model. - **as_onnx** (bool, optional) - Whether to serialize the model as ONNX. Defaults to False. - **initial_types** (list[Tuple], optional) - For SklearnModel, LightGBMModel, and XGBoostModel. A list of tuples, where each tuple contains a variable name and its type. See http://onnx.ai/sklearn-onnx/api_summary.html#id2 for details. - **force_overwrite** (bool, optional) - Whether to overwrite existing files. Defaults to False. - **namespace** (str, optional) - The namespace of the region. Used when providing slugs for conda environments. - **use_case_type** (str) - The use case type of the model. Can be provided as a string or using the `UseCaseType` class. - **X_sample** (Union[list, tuple, pd.Series, np.ndarray, pd.DataFrame], optional) - A sample of input data to generate the input schema. - **y_sample** (Union[list, tuple, pd.Series, np.ndarray, pd.DataFrame], optional) - A sample of output data to generate the output schema. - **training_script_path** (str, optional) - Path to the training script. Defaults to None. - **training_id** (str, optional) - The OCID of the training job or notebook session. Defaults to environment variable value. - **ignore_pending_changes** (bool, optional) - Whether to ignore pending Git changes. Defaults to False. - **max_col_num** (int, optional) - Maximum number of columns for input schema generation. Defaults to `utils.DATA_SCHEMA_MAX_COL_NUM`. - **model_display_name** (str, optional) - The display name for the model. - **model_description** (str, optional) - The description for the model. - **model_freeform_tags** (Dict(str, str), optional) - Freeform tags for the model. - **model_defined_tags** (Dict(str, dict(str, object)), optional) - Defined tags for the model. - **ignore_introspection** (bool, optional) - Whether to ignore model introspection results. If True, save ignores all introspection errors. - **wait_for_completion** (bool, optional) - Whether to wait for deployment to complete. Defaults to True. - **display_name** (str, optional) - The display name for the model (alias for `model_display_name`). - **description** (str, optional) - The description for the model (alias for `model_description`). - **deployment_instance_shape** (str, optional) - The shape of the instance for deployment. - **deployment_instance_count** (int, optional) - The number of instances for deployment. Defaults to 1. - **deployment_bandwidth_mbps** (int, optional) - Bandwidth limit on the load balancer in Mbps. Defaults to 10. - **deployment_log_group_id** (str, optional) - The OCID of the logging group for access and predict logs. - **deployment_access_log_id** (str, optional) - The OCID of the access log. See https://docs.oracle.com/iaas/data-science/using/model_dep_using_logging.htm. - **deployment_predict_log_id** (str, optional) - The OCID of the predict log. See https://docs.oracle.com/iaas/data-science/using/model_dep_using_logging.htm. - **deployment_memory_in_gbs** (float, optional) - Memory size in GBs for flexible shape instances. Defaults to None. - **deployment_ocpus** (float, optional) - Number of OCPUs for deployment instances. Defaults to None. ### Request Example ```json { "inference_conda_env": "oci://bucket/path/to/env", "inference_python_version": "3.8", "model_file_name": "my_model.pkl", "use_case_type": "classification", "deployment_instance_shape": "VM.Standard.E4.Flex", "deployment_ocpus": 1, "deployment_memory_in_gbs": 16 } ``` ### Response #### Success Response (200) - **ModelDeployment** (object) - An object representing the deployed model. #### Response Example ```json { "id": "ocid1.modeldeployment.oc1.iad.exampleuniqueID", "displayName": "My Model Deployment", "lifecycleState": "ACTIVE", "timeCreated": "2023-10-27T10:00:00Z" } ``` ``` -------------------------------- ### View Anomaly Detection Results Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/operators/anomaly_detection_operator/quickstart.rst Opens the performance summary report ('report.html') for the anomaly detection job. Results are typically placed in a 'results' folder if not otherwise specified in the YAML configuration. ```bash open results/report.html ``` -------------------------------- ### Manage Spark Pipeline Models with ADS Source: https://github.com/oracle/accelerated-data-science/blob/main/docs/source/user_guide/model_registration/quick_start.rst Demonstrates the end-to-end process of training a Spark ML Pipeline, wrapping it with ADS's SparkPipelineModel, preparing it for inference, verifying predictions, and registering the model. It includes sample data creation and pipeline definition. ```python3 import os import tempfile import ads from ads.model.framework.spark_model import SparkPipelineModel from pyspark.ml import Pipeline from pyspark.ml.classification import LogisticRegression from pyspark.ml.feature import HashingTF, Tokenizer from pyspark.sql import SparkSession spark = SparkSession \ .builder \ .appName("Python Spark SQL basic example") \ .getOrCreate() # create data training = spark.createDataFrame( [ (0, "a b c d e spark", 1.0), (1, "b d", 0.0), (2, "spark f g h", 1.0), (3, "hadoop mapreduce", 0.0), ], ["id", "text", "label"], ) test = spark.createDataFrame( [ (4, "spark i j k"), (5, "l m n"), (6, "spark hadoop spark"), (7, "apache hadoop"), ], ["id", "text"], ) # Train a Spark Pipeline model tokenizer = Tokenizer(inputCol="text", outputCol="words") hashingTF = HashingTF(inputCol=tokenizer.getOutputCol(), outputCol="features") lr = LogisticRegression(maxIter=10, regParam=0.001) pipeline = Pipeline(stages=[tokenizer, hashingTF, lr]) model = pipeline.fit(training) # Instantite ads.model.framework.spark_model.SparkPipelineModel using the pre-trained Spark Pipeline Model spark_model = SparkPipelineModel(estimator=model, artifact_dir=tempfile.mkdtemp()) spark_model.prepare(inference_conda_env="pyspark32_p38_cpu_v2", X_sample = training, force_overwrite=True) # Verify generated artifacts prediction = spark_model.verify(test) #Register Spark model spark_model.save(display_name="Spark Pipeline Model") ```