# ML.NET

ML.NET is Microsoft's cross-platform open-source machine learning framework for .NET developers. It enables building, training, deploying, and consuming custom machine learning models in .NET applications without requiring prior expertise in data science or experience with Python/R. The framework supports data loading from files and databases, data transformations, and includes numerous ML algorithms for scenarios like classification, forecasting, anomaly detection, and recommendations.

ML.NET's architecture centers around the `MLContext` class, which serves as the entry point for all operations including data loading, pipeline construction, model training, and prediction. The framework uses a pipeline-based approach where data transformations and trainers are chained together as estimators, fitted to produce transformers, and finally used for predictions. It also supports ONNX and TensorFlow model integration for extended deep learning capabilities.

## Installation

Add the ML.NET NuGet package to your .NET project.

```bash
# Via .NET CLI
dotnet add package Microsoft.ML

# Via NuGet Package Manager
Install-Package Microsoft.ML
```

## MLContext - Entry Point for All Operations

The MLContext is the starting point for all ML.NET operations, providing mechanisms for logging, exception tracking, randomness, and serving as a catalog of available operations.

```csharp
using Microsoft.ML;
using Microsoft.ML.Data;

// Create MLContext with optional seed for reproducibility
var mlContext = new MLContext(seed: 0);

// MLContext provides access to all ML.NET functionality:
// - mlContext.Data: Data loading and manipulation
// - mlContext.Model: Model saving, loading, and prediction engines
// - mlContext.Transforms: Feature engineering and data transformations
// - mlContext.BinaryClassification: Binary classification trainers and metrics
// - mlContext.MulticlassClassification: Multi-class classification trainers
// - mlContext.Regression: Regression trainers and metrics
// - mlContext.Clustering: Clustering trainers (K-Means)
// - mlContext.Ranking: Ranking trainers
// - mlContext.Recommendation: Recommendation trainers (Matrix Factorization)
// - mlContext.AnomalyDetection: Anomaly detection trainers
// - mlContext.Forecasting: Time series forecasting
```

## Loading Data from Text Files

TextLoader enables loading data from CSV, TSV, and other delimited text files into IDataView objects for ML.NET processing.

```csharp
using Microsoft.ML;
using Microsoft.ML.Data;

var mlContext = new MLContext();

// Method 1: Define columns explicitly with TextLoader
var loader = mlContext.Data.CreateTextLoader(new[] {
    new TextLoader.Column("Label", DataKind.Boolean, 0),
    new TextLoader.Column("Workclass", DataKind.String, 1),
    new TextLoader.Column("Education", DataKind.String, 2),
    new TextLoader.Column("MaritalStatus", DataKind.String, 3)
}, hasHeader: true, separatorChar: '\t');

var data = loader.Load("data.tsv");

// Method 2: Load using a data class with attributes
public class SentimentData
{
    [LoadColumn(0)]
    public bool Label { get; set; }

    [LoadColumn(1)]
    public string SentimentText { get; set; }
}

var dataFromClass = mlContext.Data.LoadFromTextFile<SentimentData>(
    "sentiment.csv",
    hasHeader: true,
    separatorChar: ',');

// Load vector columns (multiple features as single vector)
var vectorLoader = mlContext.Data.CreateTextLoader(new[] {
    new TextLoader.Column("Features", DataKind.Single, new[] {
        new TextLoader.Range(0, 10) }),  // Load columns 0-10 as vector
    new TextLoader.Column("Target", DataKind.Single, 11)
}, separatorChar: ';');
```

## Loading Data from In-Memory Collections

LoadFromEnumerable converts C# collections into IDataView objects, useful for real-time scenarios and programmatic data generation.

```csharp
using Microsoft.ML;
using Microsoft.ML.Data;

var mlContext = new MLContext();

// Define data class with vector type annotation
public class DataPoint
{
    public bool Label { get; set; }

    [VectorType(50)]  // Fixed-size feature vector
    public float[] Features { get; set; }
}

// Generate sample data
var dataPoints = new List<DataPoint>();
var random = new Random(0);
for (int i = 0; i < 1000; i++)
{
    var label = random.NextDouble() > 0.5;
    dataPoints.Add(new DataPoint
    {
        Label = label,
        Features = Enumerable.Repeat(label, 50)
            .Select(x => x ? (float)random.NextDouble()
                         : (float)random.NextDouble() + 0.03f)
            .ToArray()
    });
}

// Convert to IDataView
IDataView trainingData = mlContext.Data.LoadFromEnumerable(dataPoints);

// Cache data in memory for iterative algorithms
trainingData = mlContext.Data.Cache(trainingData);

// For dynamic vector sizes (unknown at compile time)
public class DynamicData
{
    public float[] Features { get; set; }  // No VectorType attribute
}

int featureDimension = 10;  // Known at runtime
var schema = SchemaDefinition.Create(typeof(DynamicData));
var vectorType = ((VectorDataViewType)schema[0].ColumnType).ItemType;
schema[0].ColumnType = new VectorDataViewType(vectorType, featureDimension);

var dynamicData = mlContext.Data.LoadFromEnumerable(
    new[] { new DynamicData { Features = new float[10] } },
    schema);
```

## Binary Classification with SDCA Logistic Regression

Train models to predict between two outcomes using Stochastic Dual Coordinate Ascent algorithm.

```csharp
using Microsoft.ML;
using Microsoft.ML.Data;
using System;
using System.Collections.Generic;
using System.Linq;

var mlContext = new MLContext(seed: 0);

// Data classes
public class DataPoint
{
    public bool Label { get; set; }
    [VectorType(50)]
    public float[] Features { get; set; }
}

public class Prediction
{
    public bool Label { get; set; }
    public bool PredictedLabel { get; set; }
    public float Probability { get; set; }
    public float Score { get; set; }
}

// Generate training data
var dataPoints = Enumerable.Range(0, 1000).Select(i => {
    var random = new Random(i);
    var label = random.NextDouble() > 0.5;
    return new DataPoint {
        Label = label,
        Features = Enumerable.Range(0, 50)
            .Select(_ => label ? (float)random.NextDouble()
                              : (float)random.NextDouble() + 0.03f)
            .ToArray()
    };
}).ToList();

var trainingData = mlContext.Data.LoadFromEnumerable(dataPoints);
trainingData = mlContext.Data.Cache(trainingData);

// Define and train the model
var pipeline = mlContext.BinaryClassification.Trainers
    .SdcaLogisticRegression(labelColumnName: "Label",
                            featureColumnName: "Features");

var model = pipeline.Fit(trainingData);

// Make predictions
var testData = mlContext.Data.LoadFromEnumerable(dataPoints.Take(10));
var transformedData = model.Transform(testData);
var predictions = mlContext.Data.CreateEnumerable<Prediction>(
    transformedData, reuseRowObject: false).ToList();

foreach (var p in predictions.Take(5))
    Console.WriteLine($"Label: {p.Label}, Predicted: {p.PredictedLabel}, " +
                      $"Probability: {p.Probability:F3}");

// Evaluate model
var metrics = mlContext.BinaryClassification.Evaluate(transformedData);
Console.WriteLine($"Accuracy: {metrics.Accuracy:F2}");
Console.WriteLine($"AUC: {metrics.AreaUnderRocCurve:F2}");
Console.WriteLine($"F1 Score: {metrics.F1Score:F2}");
Console.WriteLine(metrics.ConfusionMatrix.GetFormattedConfusionTable());
```

## Multi-Class Classification with SDCA Maximum Entropy

Classify data into three or more categories using the Maximum Entropy classifier.

```csharp
using Microsoft.ML;
using Microsoft.ML.Data;
using System;
using System.Collections.Generic;
using System.Linq;

var mlContext = new MLContext(seed: 0);

public class DataPoint
{
    public uint Label { get; set; }  // 1, 2, or 3
    [VectorType(20)]
    public float[] Features { get; set; }
}

public class Prediction
{
    public uint Label { get; set; }
    public uint PredictedLabel { get; set; }
    public float[] Score { get; set; }
}

// Generate data with 3 classes
var random = new Random(0);
var dataPoints = Enumerable.Range(0, 1000).Select(_ => {
    var label = (uint)random.Next(1, 4);
    return new DataPoint {
        Label = label,
        Features = Enumerable.Range(0, 20)
            .Select(__ => (float)(random.NextDouble() - 0.5) + label * 0.2f)
            .ToArray()
    };
}).ToList();

var trainingData = mlContext.Data.LoadFromEnumerable(dataPoints);
trainingData = mlContext.Data.Cache(trainingData);

// Pipeline: Convert string labels to keys, then train
var pipeline = mlContext.Transforms.Conversion
    .MapValueToKey(nameof(DataPoint.Label))
    .Append(mlContext.MulticlassClassification.Trainers.SdcaMaximumEntropy());

var model = pipeline.Fit(trainingData);

// Evaluate
var predictions = model.Transform(trainingData);
var metrics = mlContext.MulticlassClassification.Evaluate(predictions);

Console.WriteLine($"Micro Accuracy: {metrics.MicroAccuracy:F2}");
Console.WriteLine($"Macro Accuracy: {metrics.MacroAccuracy:F2}");
Console.WriteLine($"Log Loss: {metrics.LogLoss:F2}");
Console.WriteLine(metrics.ConfusionMatrix.GetFormattedConfusionTable());
```

## Regression with FastTree

Predict continuous numeric values using gradient boosted decision trees.

```csharp
using Microsoft.ML;
using Microsoft.ML.Data;
using System;
using System.Collections.Generic;
using System.Linq;

// Requires: dotnet add package Microsoft.ML.FastTree

var mlContext = new MLContext(seed: 0);

public class DataPoint
{
    public float Label { get; set; }
    [VectorType(50)]
    public float[] Features { get; set; }
}

public class Prediction
{
    public float Label { get; set; }
    public float Score { get; set; }
}

// Generate correlated features and labels
var random = new Random(0);
var dataPoints = Enumerable.Range(0, 1000).Select(_ => {
    float label = (float)random.NextDouble();
    return new DataPoint {
        Label = label,
        Features = Enumerable.Range(0, 50)
            .Select(__ => label + (float)random.NextDouble())
            .ToArray()
    };
}).ToList();

var trainingData = mlContext.Data.LoadFromEnumerable(dataPoints);

// Train FastTree regression model
var pipeline = mlContext.Regression.Trainers.FastTree(
    labelColumnName: nameof(DataPoint.Label),
    featureColumnName: nameof(DataPoint.Features),
    numberOfLeaves: 20,
    numberOfTrees: 100,
    minimumExampleCountPerLeaf: 10,
    learningRate: 0.2);

var model = pipeline.Fit(trainingData);

// Evaluate
var predictions = model.Transform(trainingData);
var metrics = mlContext.Regression.Evaluate(predictions,
    labelColumnName: nameof(DataPoint.Label));

Console.WriteLine($"Mean Absolute Error: {metrics.MeanAbsoluteError:F3}");
Console.WriteLine($"Mean Squared Error: {metrics.MeanSquaredError:F3}");
Console.WriteLine($"Root Mean Squared Error: {metrics.RootMeanSquaredError:F3}");
Console.WriteLine($"R-Squared: {metrics.RSquared:F3}");
```

## Clustering with K-Means

Group data into clusters based on feature similarity.

```csharp
using Microsoft.ML;
using Microsoft.ML.Data;
using System;
using System.Collections.Generic;
using System.Linq;

// Requires: dotnet add package Microsoft.ML.Mkl.Components

var mlContext = new MLContext(seed: 0);

public class DataPoint
{
    [KeyType(2)]
    public uint Label { get; set; }  // For evaluation only
    [VectorType(50)]
    public float[] Features { get; set; }
}

public class Prediction
{
    public uint Label { get; set; }
    public uint PredictedLabel { get; set; }
    public float[] Score { get; set; }  // Distances to cluster centroids
}

// Generate two clusters
var random = new Random(0);
var dataPoints = Enumerable.Range(0, 1000).Select(i => {
    int cluster = i < 500 ? 0 : 1;
    return new DataPoint {
        Label = (uint)cluster,
        Features = Enumerable.Range(0, 50)
            .Select(_ => cluster == 0
                ? (float)random.NextDouble() + 0.1f
                : (float)random.NextDouble() - 0.1f)
            .ToArray()
    };
}).ToList();

var trainingData = mlContext.Data.LoadFromEnumerable(dataPoints);

// Train K-Means with 2 clusters
var pipeline = mlContext.Clustering.Trainers.KMeans(
    featureColumnName: "Features",
    numberOfClusters: 2);

var model = pipeline.Fit(trainingData);

// Get cluster centroids
var modelParams = model.Model;
VBuffer<float>[] centroids = default;
modelParams.GetClusterCentroids(ref centroids, out int k);
Console.WriteLine($"Number of clusters: {k}");

// Evaluate
var predictions = model.Transform(trainingData);
var metrics = mlContext.Clustering.Evaluate(predictions, "Label", "Score", "Features");

Console.WriteLine($"Normalized Mutual Information: {metrics.NormalizedMutualInformation:F2}");
Console.WriteLine($"Average Distance: {metrics.AverageDistance:F2}");
Console.WriteLine($"Davies Bouldin Index: {metrics.DaviesBouldinIndex:F2}");
```

## Recommendation with Matrix Factorization

Build recommendation systems using collaborative filtering.

```csharp
using Microsoft.ML;
using Microsoft.ML.Data;
using System;
using System.Collections.Generic;
using System.Linq;

// Requires: dotnet add package Microsoft.ML.Recommender

var mlContext = new MLContext(seed: 0);

public class RatingData
{
    [KeyType(100)]
    public uint UserId { get; set; }

    [KeyType(100)]
    public uint MovieId { get; set; }

    public float Rating { get; set; }
}

public class RatingPrediction
{
    public uint UserId { get; set; }
    public uint MovieId { get; set; }
    public float Rating { get; set; }
    public float Score { get; set; }  // Predicted rating
}

// Sample user-movie ratings
var ratings = new List<RatingData>
{
    new RatingData { UserId = 1, MovieId = 1, Rating = 5 },
    new RatingData { UserId = 1, MovieId = 2, Rating = 3 },
    new RatingData { UserId = 1, MovieId = 3, Rating = 4 },
    new RatingData { UserId = 2, MovieId = 1, Rating = 4 },
    new RatingData { UserId = 2, MovieId = 2, Rating = 2 },
    new RatingData { UserId = 2, MovieId = 4, Rating = 5 },
    new RatingData { UserId = 3, MovieId = 1, Rating = 3 },
    new RatingData { UserId = 3, MovieId = 3, Rating = 5 },
    new RatingData { UserId = 3, MovieId = 4, Rating = 4 },
};

var trainingData = mlContext.Data.LoadFromEnumerable(ratings);

// Train matrix factorization model
var pipeline = mlContext.Recommendation().Trainers.MatrixFactorization(
    labelColumnName: nameof(RatingData.Rating),
    matrixColumnIndexColumnName: nameof(RatingData.UserId),
    matrixRowIndexColumnName: nameof(RatingData.MovieId),
    numberOfIterations: 20,
    approximationRank: 10,
    learningRate: 0.1);

var model = pipeline.Fit(trainingData);

// Predict rating for user 1, movie 4
var predictionEngine = mlContext.Model
    .CreatePredictionEngine<RatingData, RatingPrediction>(model);

var prediction = predictionEngine.Predict(
    new RatingData { UserId = 1, MovieId = 4 });

Console.WriteLine($"Predicted rating for User 1, Movie 4: {prediction.Score:F2}");
```

## Text Featurization

Transform text data into numeric feature vectors for machine learning.

```csharp
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Transforms.Text;
using System;
using System.Collections.Generic;

var mlContext = new MLContext();

public class SentimentData
{
    public bool Sentiment { get; set; }
    public string SentimentText { get; set; }
}

var data = new List<SentimentData>
{
    new SentimentData { Sentiment = true, SentimentText = "Best game I've ever played." },
    new SentimentData { Sentiment = false, SentimentText = "Terrible experience, would not recommend." },
    new SentimentData { Sentiment = true, SentimentText = "Amazing quality and fast delivery!" }
};

var dataView = mlContext.Data.LoadFromEnumerable(data);

// Simple text featurization (one-stop shop)
var simplePipeline = mlContext.Transforms.Text
    .FeaturizeText("Features", "SentimentText");

// Advanced text featurization with options
var advancedPipeline = mlContext.Transforms.Text.FeaturizeText("Features",
    new TextFeaturizingEstimator.Options
    {
        KeepPunctuations = false,
        KeepNumbers = false,
        OutputTokensColumnName = "Tokens",
        StopWordsRemoverOptions = new StopWordsRemovingEstimator.Options
        {
            Language = TextFeaturizingEstimator.Language.English
        }
    }, "SentimentText");

// Individual NLP operations
var customPipeline = mlContext.Transforms.Text
    .NormalizeText("NormalizedText", "SentimentText",
        caseMode: TextNormalizingEstimator.CaseMode.Lower,
        keepPunctuations: false)
    .Append(mlContext.Transforms.Text.TokenizeIntoWords("Tokens", "NormalizedText"))
    .Append(mlContext.Transforms.Text.RemoveDefaultStopWords("CleanTokens", "Tokens"))
    .Append(mlContext.Transforms.Text.ProduceNgrams("Ngrams", "CleanTokens",
        ngramLength: 2, useAllLengths: true));

// Train sentiment classifier
var classificationPipeline = simplePipeline
    .Append(mlContext.BinaryClassification.Trainers.SdcaLogisticRegression(
        labelColumnName: "Sentiment", featureColumnName: "Features"));

var model = classificationPipeline.Fit(dataView);
```

## Time Series Forecasting

Predict future values based on historical time series data using Singular Spectrum Analysis (SSA).

```csharp
using Microsoft.ML;
using Microsoft.ML.Transforms.TimeSeries;
using System;
using System.Collections.Generic;
using System.IO;

// Requires: dotnet add package Microsoft.ML.TimeSeries

var mlContext = new MLContext();

public class TimeSeriesData
{
    public float Value { get; set; }

    public TimeSeriesData(float value) => Value = value;
}

public class ForecastResult
{
    public float[] Forecast { get; set; }
    public float[] LowerBound { get; set; }
    public float[] UpperBound { get; set; }
}

// Generate periodic time series data
var data = new List<TimeSeriesData>();
for (int i = 0; i < 30; i++)
    data.Add(new TimeSeriesData(i % 5));  // Repeating pattern: 0,1,2,3,4

var dataView = mlContext.Data.LoadFromEnumerable(data);

// Configure SSA forecasting model
var forecastingPipeline = mlContext.Forecasting.ForecastBySsa(
    outputColumnName: nameof(ForecastResult.Forecast),
    inputColumnName: nameof(TimeSeriesData.Value),
    windowSize: 5,           // Size of window for SSA
    seriesLength: 11,        // Length of series to analyze
    trainSize: data.Count,   // Number of points to train on
    horizon: 5,              // Number of values to forecast
    confidenceLevel: 0.95f,
    confidenceLowerBoundColumn: nameof(ForecastResult.LowerBound),
    confidenceUpperBoundColumn: nameof(ForecastResult.UpperBound));

var model = forecastingPipeline.Fit(dataView);

// Create forecasting engine
var forecastEngine = model.CreateTimeSeriesEngine<TimeSeriesData, ForecastResult>(mlContext);

// Forecast next 5 values
var forecast = forecastEngine.Predict();
Console.WriteLine("Forecasted values: [{0}]",
    string.Join(", ", forecast.Forecast.Select(f => f.ToString("F2"))));

// Update model with new observations
forecastEngine.Predict(new TimeSeriesData(0));
forecastEngine.Predict(new TimeSeriesData(1));

// Save model checkpoint
forecastEngine.CheckPoint(mlContext, "forecast_model.zip");

// Load and continue forecasting
using (var stream = File.OpenRead("forecast_model.zip"))
{
    var loadedModel = mlContext.Model.Load(stream, out _);
    var loadedEngine = loadedModel.CreateTimeSeriesEngine<TimeSeriesData, ForecastResult>(mlContext);
    var newForecast = loadedEngine.Predict();
}
```

## Saving and Loading Models

Persist trained models to disk and load them for later use or deployment.

```csharp
using Microsoft.ML;
using Microsoft.ML.Data;
using System;
using System.IO;

var mlContext = new MLContext();

public class InputData
{
    public float Feature1 { get; set; }
    public float Feature2 { get; set; }
    public bool Label { get; set; }
}

public class OutputPrediction
{
    public bool PredictedLabel { get; set; }
    public float Probability { get; set; }
    public float Score { get; set; }
}

// Sample training data
var data = new[]
{
    new InputData { Feature1 = 1.0f, Feature2 = 2.0f, Label = true },
    new InputData { Feature1 = 3.0f, Feature2 = 1.0f, Label = false },
    new InputData { Feature1 = 2.0f, Feature2 = 3.0f, Label = true },
};

var dataView = mlContext.Data.LoadFromEnumerable(data);

// Build and train pipeline
var pipeline = mlContext.Transforms.Concatenate("Features", "Feature1", "Feature2")
    .Append(mlContext.BinaryClassification.Trainers.SdcaLogisticRegression());

var model = pipeline.Fit(dataView);

// Save model to file
string modelPath = "model.zip";
mlContext.Model.Save(model, dataView.Schema, modelPath);
Console.WriteLine($"Model saved to: {modelPath}");

// Load model from file
ITransformer loadedModel;
DataViewSchema modelSchema;
using (var stream = File.OpenRead(modelPath))
{
    loadedModel = mlContext.Model.Load(stream, out modelSchema);
}

// Create prediction engine from loaded model
var predictionEngine = mlContext.Model
    .CreatePredictionEngine<InputData, OutputPrediction>(loadedModel);

// Make predictions
var prediction = predictionEngine.Predict(
    new InputData { Feature1 = 2.5f, Feature2 = 2.5f });

Console.WriteLine($"Predicted: {prediction.PredictedLabel}, " +
                  $"Probability: {prediction.Probability:F3}");
```

## PredictionEngine for Single Predictions

Create efficient prediction engines for real-time single-row predictions.

```csharp
using Microsoft.ML;
using Microsoft.ML.Data;
using System;

var mlContext = new MLContext();

public class IrisInput
{
    [ColumnName("Label")]
    public string Species { get; set; }
    public float SepalLength { get; set; }
    public float SepalWidth { get; set; }
    public float PetalLength { get; set; }
    public float PetalWidth { get; set; }
}

public class IrisPrediction
{
    public uint PredictedLabel { get; set; }
    public float[] Score { get; set; }
}

// Assume model is already trained and loaded
// var model = mlContext.Model.Load("iris_model.zip", out _);

// For demonstration, create a simple pipeline
var sampleData = new[]
{
    new IrisInput { Species = "setosa", SepalLength = 5.1f, SepalWidth = 3.5f,
                    PetalLength = 1.4f, PetalWidth = 0.2f },
    new IrisInput { Species = "versicolor", SepalLength = 7.0f, SepalWidth = 3.2f,
                    PetalLength = 4.7f, PetalWidth = 1.4f },
};

var dataView = mlContext.Data.LoadFromEnumerable(sampleData);

var pipeline = mlContext.Transforms.Concatenate("Features",
        "SepalLength", "SepalWidth", "PetalLength", "PetalWidth")
    .Append(mlContext.Transforms.Conversion.MapValueToKey("Label"))
    .Append(mlContext.MulticlassClassification.Trainers.SdcaMaximumEntropy());

var model = pipeline.Fit(dataView);

// Create prediction engine (expensive - create once, reuse many times)
var predictionEngine = mlContext.Model
    .CreatePredictionEngine<IrisInput, IrisPrediction>(model);

// Make single predictions (fast after engine creation)
var newSample = new IrisInput
{
    SepalLength = 5.9f,
    SepalWidth = 3.0f,
    PetalLength = 4.2f,
    PetalWidth = 1.5f
};

var prediction = predictionEngine.Predict(newSample);
Console.WriteLine($"Predicted class: {prediction.PredictedLabel}");
Console.WriteLine($"Scores: [{string.Join(", ", prediction.Score.Select(s => s.ToString("F3")))}]");

// Note: PredictionEngine is NOT thread-safe
// For multi-threaded scenarios, create one engine per thread
// or use object pooling
```

## Cross-Validation

Evaluate model performance using k-fold cross-validation to get more reliable metrics.

```csharp
using Microsoft.ML;
using Microsoft.ML.Data;
using System;
using System.Linq;

var mlContext = new MLContext(seed: 0);

public class DataPoint
{
    public uint Label { get; set; }
    [VectorType(4)]
    public float[] Features { get; set; }
}

// Generate sample data
var random = new Random(0);
var data = Enumerable.Range(0, 150).Select(_ => {
    var label = (uint)random.Next(0, 3);
    return new DataPoint {
        Label = label,
        Features = new[] {
            (float)random.NextDouble() + label * 0.3f,
            (float)random.NextDouble() + label * 0.2f,
            (float)random.NextDouble() + label * 0.3f,
            (float)random.NextDouble() + label * 0.2f
        }
    };
}).ToList();

var dataView = mlContext.Data.LoadFromEnumerable(data);

// Define pipeline
var pipeline = mlContext.Transforms.Conversion.MapValueToKey("Label")
    .AppendCacheCheckpoint(mlContext)
    .Append(mlContext.MulticlassClassification.Trainers.SdcaMaximumEntropy());

// Perform 5-fold cross-validation
var cvResults = mlContext.MulticlassClassification.CrossValidate(
    dataView, pipeline, numberOfFolds: 5);

// Analyze results from all folds
var microAccuracies = cvResults.Select(r => r.Metrics.MicroAccuracy);
var macroAccuracies = cvResults.Select(r => r.Metrics.MacroAccuracy);

Console.WriteLine($"Average Micro Accuracy: {microAccuracies.Average():F3}");
Console.WriteLine($"Std Dev Micro Accuracy: {StandardDeviation(microAccuracies):F3}");
Console.WriteLine($"Average Macro Accuracy: {macroAccuracies.Average():F3}");

// Select best model
var bestRun = cvResults.OrderByDescending(r => r.Metrics.MicroAccuracy).First();
var bestModel = bestRun.Model;

static double StandardDeviation(IEnumerable<double> values)
{
    var avg = values.Average();
    var sum = values.Sum(v => Math.Pow(v - avg, 2));
    return Math.Sqrt(sum / values.Count());
}
```

## Train-Test Split

Split data into training and testing sets for model evaluation.

```csharp
using Microsoft.ML;
using Microsoft.ML.Data;
using System;
using System.Linq;

var mlContext = new MLContext(seed: 0);

public class DataPoint
{
    public bool Label { get; set; }
    [VectorType(10)]
    public float[] Features { get; set; }
}

// Generate sample data
var random = new Random(0);
var data = Enumerable.Range(0, 1000).Select(_ => {
    var label = random.NextDouble() > 0.5;
    return new DataPoint {
        Label = label,
        Features = Enumerable.Range(0, 10)
            .Select(__ => (float)random.NextDouble() + (label ? 0.1f : 0f))
            .ToArray()
    };
}).ToList();

var dataView = mlContext.Data.LoadFromEnumerable(data);

// Split: 80% training, 20% testing
var split = mlContext.Data.TrainTestSplit(dataView, testFraction: 0.2);

Console.WriteLine($"Training set size: ~{(int)(data.Count * 0.8)}");
Console.WriteLine($"Test set size: ~{(int)(data.Count * 0.2)}");

// Train on training set
var pipeline = mlContext.BinaryClassification.Trainers.SdcaLogisticRegression();
var model = pipeline.Fit(split.TrainSet);

// Evaluate on test set (never seen during training)
var predictions = model.Transform(split.TestSet);
var metrics = mlContext.BinaryClassification.Evaluate(predictions);

Console.WriteLine($"Test Accuracy: {metrics.Accuracy:F3}");
Console.WriteLine($"Test AUC: {metrics.AreaUnderRocCurve:F3}");
Console.WriteLine($"Test F1: {metrics.F1Score:F3}");
```

## Custom Transformations with CustomMapping

Define custom data transformation logic using C# lambda expressions.

```csharp
using Microsoft.ML;
using Microsoft.ML.Data;
using System;
using System.Collections.Generic;

var mlContext = new MLContext();

public class InputData
{
    public float Income { get; set; }
    public int Age { get; set; }
    public string Category { get; set; }
}

public class TransformedData
{
    public bool IsHighIncome { get; set; }
    public string AgeGroup { get; set; }
    public float NormalizedIncome { get; set; }
}

var data = new List<InputData>
{
    new InputData { Income = 75000, Age = 35, Category = "A" },
    new InputData { Income = 45000, Age = 28, Category = "B" },
    new InputData { Income = 120000, Age = 52, Category = "A" },
};

var dataView = mlContext.Data.LoadFromEnumerable(data);

// Define custom mapping function
Action<InputData, TransformedData> mapping = (input, output) =>
{
    output.IsHighIncome = input.Income > 50000;
    output.AgeGroup = input.Age < 30 ? "Young" : input.Age < 50 ? "Middle" : "Senior";
    output.NormalizedIncome = input.Income / 100000f;
};

// Create custom mapping transformer
var customPipeline = mlContext.Transforms.CustomMapping(mapping, contractName: null);

var transformedData = customPipeline.Fit(dataView).Transform(dataView);
var results = mlContext.Data.CreateEnumerable<TransformedData>(
    transformedData, reuseRowObject: false);

foreach (var item in results)
{
    Console.WriteLine($"High Income: {item.IsHighIncome}, " +
                      $"Age Group: {item.AgeGroup}, " +
                      $"Normalized: {item.NormalizedIncome:F2}");
}

// For saveable custom mappings, use CustomMappingFactory
[CustomMappingFactoryAttribute("IncomeMapper")]
public class IncomeMapper : CustomMappingFactory<InputData, TransformedData>
{
    public override Action<InputData, TransformedData> GetMapping()
    {
        return (input, output) =>
        {
            output.IsHighIncome = input.Income > 50000;
            output.NormalizedIncome = input.Income / 100000f;
        };
    }
}
```

## Data Transformation Pipelines

Chain multiple transformations for feature engineering.

```csharp
using Microsoft.ML;
using Microsoft.ML.Data;
using System;
using System.Collections.Generic;

var mlContext = new MLContext();

public class RawData
{
    public float Age { get; set; }
    public float Income { get; set; }
    public string Education { get; set; }
    public string Occupation { get; set; }
    public bool Label { get; set; }
}

var data = new List<RawData>
{
    new RawData { Age = 35, Income = 75000, Education = "Bachelor",
                  Occupation = "Engineer", Label = true },
    new RawData { Age = 28, Income = 45000, Education = "Master",
                  Occupation = "Teacher", Label = false },
    new RawData { Age = 52, Income = 120000, Education = "PhD",
                  Occupation = "Manager", Label = true },
};

var dataView = mlContext.Data.LoadFromEnumerable(data);

// Build comprehensive transformation pipeline
var pipeline = mlContext.Transforms
    // Normalize numeric features
    .NormalizeMinMax("NormalizedAge", "Age")
    .Append(mlContext.Transforms.NormalizeMinMax("NormalizedIncome", "Income"))

    // One-hot encode categorical features
    .Append(mlContext.Transforms.Categorical.OneHotEncoding("EducationEncoded", "Education"))
    .Append(mlContext.Transforms.Categorical.OneHotEncoding("OccupationEncoded", "Occupation"))

    // Concatenate all features into single vector
    .Append(mlContext.Transforms.Concatenate("Features",
        "NormalizedAge", "NormalizedIncome", "EducationEncoded", "OccupationEncoded"))

    // Cache for iterative training
    .AppendCacheCheckpoint(mlContext)

    // Add trainer
    .Append(mlContext.BinaryClassification.Trainers.FastTree());

// Train end-to-end model
var model = pipeline.Fit(dataView);

// The model includes all transformations - just pass raw data for predictions
var predictionEngine = mlContext.Model.CreatePredictionEngine<RawData, Prediction>(model);

public class Prediction
{
    public bool PredictedLabel { get; set; }
    public float Probability { get; set; }
}

var newPerson = new RawData
{
    Age = 40,
    Income = 85000,
    Education = "Master",
    Occupation = "Engineer"
};

var result = predictionEngine.Predict(newPerson);
Console.WriteLine($"Prediction: {result.PredictedLabel}, Probability: {result.Probability:F3}");
```

## Summary

ML.NET provides a comprehensive framework for building machine learning solutions entirely within the .NET ecosystem. The framework excels in scenarios requiring integration with existing .NET applications, real-time predictions in web services using ASP.NET Core, batch processing in data pipelines, and edge deployment scenarios where Python dependencies are impractical. Key strengths include its strongly-typed pipeline approach, seamless model serialization, and the ability to consume pre-trained TensorFlow and ONNX models.

Common integration patterns involve creating shared model projects for training pipelines, deploying trained models as NuGet packages or embedded resources, using PredictionEngine pooling for high-throughput web APIs, and leveraging ML.NET's AutoML capabilities for automated model selection. The framework supports both traditional machine learning algorithms (classification, regression, clustering, recommendation) and specialized scenarios like time series forecasting and anomaly detection. For production deployments, models can be serialized to .zip files and loaded in any .NET application, enabling separation between training environments and inference services.