# ML.NET ML.NET is Microsoft's cross-platform open-source machine learning framework for .NET developers. It enables building, training, deploying, and consuming custom machine learning models in .NET applications without requiring prior expertise in data science or experience with Python/R. The framework supports data loading from files and databases, data transformations, and includes numerous ML algorithms for scenarios like classification, forecasting, anomaly detection, and recommendations. ML.NET's architecture centers around the `MLContext` class, which serves as the entry point for all operations including data loading, pipeline construction, model training, and prediction. The framework uses a pipeline-based approach where data transformations and trainers are chained together as estimators, fitted to produce transformers, and finally used for predictions. It also supports ONNX and TensorFlow model integration for extended deep learning capabilities. ## Installation Add the ML.NET NuGet package to your .NET project. ```bash # Via .NET CLI dotnet add package Microsoft.ML # Via NuGet Package Manager Install-Package Microsoft.ML ``` ## MLContext - Entry Point for All Operations The MLContext is the starting point for all ML.NET operations, providing mechanisms for logging, exception tracking, randomness, and serving as a catalog of available operations. ```csharp using Microsoft.ML; using Microsoft.ML.Data; // Create MLContext with optional seed for reproducibility var mlContext = new MLContext(seed: 0); // MLContext provides access to all ML.NET functionality: // - mlContext.Data: Data loading and manipulation // - mlContext.Model: Model saving, loading, and prediction engines // - mlContext.Transforms: Feature engineering and data transformations // - mlContext.BinaryClassification: Binary classification trainers and metrics // - mlContext.MulticlassClassification: Multi-class classification trainers // - mlContext.Regression: Regression trainers and metrics // - mlContext.Clustering: Clustering trainers (K-Means) // - mlContext.Ranking: Ranking trainers // - mlContext.Recommendation: Recommendation trainers (Matrix Factorization) // - mlContext.AnomalyDetection: Anomaly detection trainers // - mlContext.Forecasting: Time series forecasting ``` ## Loading Data from Text Files TextLoader enables loading data from CSV, TSV, and other delimited text files into IDataView objects for ML.NET processing. ```csharp using Microsoft.ML; using Microsoft.ML.Data; var mlContext = new MLContext(); // Method 1: Define columns explicitly with TextLoader var loader = mlContext.Data.CreateTextLoader(new[] { new TextLoader.Column("Label", DataKind.Boolean, 0), new TextLoader.Column("Workclass", DataKind.String, 1), new TextLoader.Column("Education", DataKind.String, 2), new TextLoader.Column("MaritalStatus", DataKind.String, 3) }, hasHeader: true, separatorChar: '\t'); var data = loader.Load("data.tsv"); // Method 2: Load using a data class with attributes public class SentimentData { [LoadColumn(0)] public bool Label { get; set; } [LoadColumn(1)] public string SentimentText { get; set; } } var dataFromClass = mlContext.Data.LoadFromTextFile( "sentiment.csv", hasHeader: true, separatorChar: ','); // Load vector columns (multiple features as single vector) var vectorLoader = mlContext.Data.CreateTextLoader(new[] { new TextLoader.Column("Features", DataKind.Single, new[] { new TextLoader.Range(0, 10) }), // Load columns 0-10 as vector new TextLoader.Column("Target", DataKind.Single, 11) }, separatorChar: ';'); ``` ## Loading Data from In-Memory Collections LoadFromEnumerable converts C# collections into IDataView objects, useful for real-time scenarios and programmatic data generation. ```csharp using Microsoft.ML; using Microsoft.ML.Data; var mlContext = new MLContext(); // Define data class with vector type annotation public class DataPoint { public bool Label { get; set; } [VectorType(50)] // Fixed-size feature vector public float[] Features { get; set; } } // Generate sample data var dataPoints = new List(); var random = new Random(0); for (int i = 0; i < 1000; i++) { var label = random.NextDouble() > 0.5; dataPoints.Add(new DataPoint { Label = label, Features = Enumerable.Repeat(label, 50) .Select(x => x ? (float)random.NextDouble() : (float)random.NextDouble() + 0.03f) .ToArray() }); } // Convert to IDataView IDataView trainingData = mlContext.Data.LoadFromEnumerable(dataPoints); // Cache data in memory for iterative algorithms trainingData = mlContext.Data.Cache(trainingData); // For dynamic vector sizes (unknown at compile time) public class DynamicData { public float[] Features { get; set; } // No VectorType attribute } int featureDimension = 10; // Known at runtime var schema = SchemaDefinition.Create(typeof(DynamicData)); var vectorType = ((VectorDataViewType)schema[0].ColumnType).ItemType; schema[0].ColumnType = new VectorDataViewType(vectorType, featureDimension); var dynamicData = mlContext.Data.LoadFromEnumerable( new[] { new DynamicData { Features = new float[10] } }, schema); ``` ## Binary Classification with SDCA Logistic Regression Train models to predict between two outcomes using Stochastic Dual Coordinate Ascent algorithm. ```csharp using Microsoft.ML; using Microsoft.ML.Data; using System; using System.Collections.Generic; using System.Linq; var mlContext = new MLContext(seed: 0); // Data classes public class DataPoint { public bool Label { get; set; } [VectorType(50)] public float[] Features { get; set; } } public class Prediction { public bool Label { get; set; } public bool PredictedLabel { get; set; } public float Probability { get; set; } public float Score { get; set; } } // Generate training data var dataPoints = Enumerable.Range(0, 1000).Select(i => { var random = new Random(i); var label = random.NextDouble() > 0.5; return new DataPoint { Label = label, Features = Enumerable.Range(0, 50) .Select(_ => label ? (float)random.NextDouble() : (float)random.NextDouble() + 0.03f) .ToArray() }; }).ToList(); var trainingData = mlContext.Data.LoadFromEnumerable(dataPoints); trainingData = mlContext.Data.Cache(trainingData); // Define and train the model var pipeline = mlContext.BinaryClassification.Trainers .SdcaLogisticRegression(labelColumnName: "Label", featureColumnName: "Features"); var model = pipeline.Fit(trainingData); // Make predictions var testData = mlContext.Data.LoadFromEnumerable(dataPoints.Take(10)); var transformedData = model.Transform(testData); var predictions = mlContext.Data.CreateEnumerable( transformedData, reuseRowObject: false).ToList(); foreach (var p in predictions.Take(5)) Console.WriteLine($"Label: {p.Label}, Predicted: {p.PredictedLabel}, " + $"Probability: {p.Probability:F3}"); // Evaluate model var metrics = mlContext.BinaryClassification.Evaluate(transformedData); Console.WriteLine($"Accuracy: {metrics.Accuracy:F2}"); Console.WriteLine($"AUC: {metrics.AreaUnderRocCurve:F2}"); Console.WriteLine($"F1 Score: {metrics.F1Score:F2}"); Console.WriteLine(metrics.ConfusionMatrix.GetFormattedConfusionTable()); ``` ## Multi-Class Classification with SDCA Maximum Entropy Classify data into three or more categories using the Maximum Entropy classifier. ```csharp using Microsoft.ML; using Microsoft.ML.Data; using System; using System.Collections.Generic; using System.Linq; var mlContext = new MLContext(seed: 0); public class DataPoint { public uint Label { get; set; } // 1, 2, or 3 [VectorType(20)] public float[] Features { get; set; } } public class Prediction { public uint Label { get; set; } public uint PredictedLabel { get; set; } public float[] Score { get; set; } } // Generate data with 3 classes var random = new Random(0); var dataPoints = Enumerable.Range(0, 1000).Select(_ => { var label = (uint)random.Next(1, 4); return new DataPoint { Label = label, Features = Enumerable.Range(0, 20) .Select(__ => (float)(random.NextDouble() - 0.5) + label * 0.2f) .ToArray() }; }).ToList(); var trainingData = mlContext.Data.LoadFromEnumerable(dataPoints); trainingData = mlContext.Data.Cache(trainingData); // Pipeline: Convert string labels to keys, then train var pipeline = mlContext.Transforms.Conversion .MapValueToKey(nameof(DataPoint.Label)) .Append(mlContext.MulticlassClassification.Trainers.SdcaMaximumEntropy()); var model = pipeline.Fit(trainingData); // Evaluate var predictions = model.Transform(trainingData); var metrics = mlContext.MulticlassClassification.Evaluate(predictions); Console.WriteLine($"Micro Accuracy: {metrics.MicroAccuracy:F2}"); Console.WriteLine($"Macro Accuracy: {metrics.MacroAccuracy:F2}"); Console.WriteLine($"Log Loss: {metrics.LogLoss:F2}"); Console.WriteLine(metrics.ConfusionMatrix.GetFormattedConfusionTable()); ``` ## Regression with FastTree Predict continuous numeric values using gradient boosted decision trees. ```csharp using Microsoft.ML; using Microsoft.ML.Data; using System; using System.Collections.Generic; using System.Linq; // Requires: dotnet add package Microsoft.ML.FastTree var mlContext = new MLContext(seed: 0); public class DataPoint { public float Label { get; set; } [VectorType(50)] public float[] Features { get; set; } } public class Prediction { public float Label { get; set; } public float Score { get; set; } } // Generate correlated features and labels var random = new Random(0); var dataPoints = Enumerable.Range(0, 1000).Select(_ => { float label = (float)random.NextDouble(); return new DataPoint { Label = label, Features = Enumerable.Range(0, 50) .Select(__ => label + (float)random.NextDouble()) .ToArray() }; }).ToList(); var trainingData = mlContext.Data.LoadFromEnumerable(dataPoints); // Train FastTree regression model var pipeline = mlContext.Regression.Trainers.FastTree( labelColumnName: nameof(DataPoint.Label), featureColumnName: nameof(DataPoint.Features), numberOfLeaves: 20, numberOfTrees: 100, minimumExampleCountPerLeaf: 10, learningRate: 0.2); var model = pipeline.Fit(trainingData); // Evaluate var predictions = model.Transform(trainingData); var metrics = mlContext.Regression.Evaluate(predictions, labelColumnName: nameof(DataPoint.Label)); Console.WriteLine($"Mean Absolute Error: {metrics.MeanAbsoluteError:F3}"); Console.WriteLine($"Mean Squared Error: {metrics.MeanSquaredError:F3}"); Console.WriteLine($"Root Mean Squared Error: {metrics.RootMeanSquaredError:F3}"); Console.WriteLine($"R-Squared: {metrics.RSquared:F3}"); ``` ## Clustering with K-Means Group data into clusters based on feature similarity. ```csharp using Microsoft.ML; using Microsoft.ML.Data; using System; using System.Collections.Generic; using System.Linq; // Requires: dotnet add package Microsoft.ML.Mkl.Components var mlContext = new MLContext(seed: 0); public class DataPoint { [KeyType(2)] public uint Label { get; set; } // For evaluation only [VectorType(50)] public float[] Features { get; set; } } public class Prediction { public uint Label { get; set; } public uint PredictedLabel { get; set; } public float[] Score { get; set; } // Distances to cluster centroids } // Generate two clusters var random = new Random(0); var dataPoints = Enumerable.Range(0, 1000).Select(i => { int cluster = i < 500 ? 0 : 1; return new DataPoint { Label = (uint)cluster, Features = Enumerable.Range(0, 50) .Select(_ => cluster == 0 ? (float)random.NextDouble() + 0.1f : (float)random.NextDouble() - 0.1f) .ToArray() }; }).ToList(); var trainingData = mlContext.Data.LoadFromEnumerable(dataPoints); // Train K-Means with 2 clusters var pipeline = mlContext.Clustering.Trainers.KMeans( featureColumnName: "Features", numberOfClusters: 2); var model = pipeline.Fit(trainingData); // Get cluster centroids var modelParams = model.Model; VBuffer[] centroids = default; modelParams.GetClusterCentroids(ref centroids, out int k); Console.WriteLine($"Number of clusters: {k}"); // Evaluate var predictions = model.Transform(trainingData); var metrics = mlContext.Clustering.Evaluate(predictions, "Label", "Score", "Features"); Console.WriteLine($"Normalized Mutual Information: {metrics.NormalizedMutualInformation:F2}"); Console.WriteLine($"Average Distance: {metrics.AverageDistance:F2}"); Console.WriteLine($"Davies Bouldin Index: {metrics.DaviesBouldinIndex:F2}"); ``` ## Recommendation with Matrix Factorization Build recommendation systems using collaborative filtering. ```csharp using Microsoft.ML; using Microsoft.ML.Data; using System; using System.Collections.Generic; using System.Linq; // Requires: dotnet add package Microsoft.ML.Recommender var mlContext = new MLContext(seed: 0); public class RatingData { [KeyType(100)] public uint UserId { get; set; } [KeyType(100)] public uint MovieId { get; set; } public float Rating { get; set; } } public class RatingPrediction { public uint UserId { get; set; } public uint MovieId { get; set; } public float Rating { get; set; } public float Score { get; set; } // Predicted rating } // Sample user-movie ratings var ratings = new List { new RatingData { UserId = 1, MovieId = 1, Rating = 5 }, new RatingData { UserId = 1, MovieId = 2, Rating = 3 }, new RatingData { UserId = 1, MovieId = 3, Rating = 4 }, new RatingData { UserId = 2, MovieId = 1, Rating = 4 }, new RatingData { UserId = 2, MovieId = 2, Rating = 2 }, new RatingData { UserId = 2, MovieId = 4, Rating = 5 }, new RatingData { UserId = 3, MovieId = 1, Rating = 3 }, new RatingData { UserId = 3, MovieId = 3, Rating = 5 }, new RatingData { UserId = 3, MovieId = 4, Rating = 4 }, }; var trainingData = mlContext.Data.LoadFromEnumerable(ratings); // Train matrix factorization model var pipeline = mlContext.Recommendation().Trainers.MatrixFactorization( labelColumnName: nameof(RatingData.Rating), matrixColumnIndexColumnName: nameof(RatingData.UserId), matrixRowIndexColumnName: nameof(RatingData.MovieId), numberOfIterations: 20, approximationRank: 10, learningRate: 0.1); var model = pipeline.Fit(trainingData); // Predict rating for user 1, movie 4 var predictionEngine = mlContext.Model .CreatePredictionEngine(model); var prediction = predictionEngine.Predict( new RatingData { UserId = 1, MovieId = 4 }); Console.WriteLine($"Predicted rating for User 1, Movie 4: {prediction.Score:F2}"); ``` ## Text Featurization Transform text data into numeric feature vectors for machine learning. ```csharp using Microsoft.ML; using Microsoft.ML.Data; using Microsoft.ML.Transforms.Text; using System; using System.Collections.Generic; var mlContext = new MLContext(); public class SentimentData { public bool Sentiment { get; set; } public string SentimentText { get; set; } } var data = new List { new SentimentData { Sentiment = true, SentimentText = "Best game I've ever played." }, new SentimentData { Sentiment = false, SentimentText = "Terrible experience, would not recommend." }, new SentimentData { Sentiment = true, SentimentText = "Amazing quality and fast delivery!" } }; var dataView = mlContext.Data.LoadFromEnumerable(data); // Simple text featurization (one-stop shop) var simplePipeline = mlContext.Transforms.Text .FeaturizeText("Features", "SentimentText"); // Advanced text featurization with options var advancedPipeline = mlContext.Transforms.Text.FeaturizeText("Features", new TextFeaturizingEstimator.Options { KeepPunctuations = false, KeepNumbers = false, OutputTokensColumnName = "Tokens", StopWordsRemoverOptions = new StopWordsRemovingEstimator.Options { Language = TextFeaturizingEstimator.Language.English } }, "SentimentText"); // Individual NLP operations var customPipeline = mlContext.Transforms.Text .NormalizeText("NormalizedText", "SentimentText", caseMode: TextNormalizingEstimator.CaseMode.Lower, keepPunctuations: false) .Append(mlContext.Transforms.Text.TokenizeIntoWords("Tokens", "NormalizedText")) .Append(mlContext.Transforms.Text.RemoveDefaultStopWords("CleanTokens", "Tokens")) .Append(mlContext.Transforms.Text.ProduceNgrams("Ngrams", "CleanTokens", ngramLength: 2, useAllLengths: true)); // Train sentiment classifier var classificationPipeline = simplePipeline .Append(mlContext.BinaryClassification.Trainers.SdcaLogisticRegression( labelColumnName: "Sentiment", featureColumnName: "Features")); var model = classificationPipeline.Fit(dataView); ``` ## Time Series Forecasting Predict future values based on historical time series data using Singular Spectrum Analysis (SSA). ```csharp using Microsoft.ML; using Microsoft.ML.Transforms.TimeSeries; using System; using System.Collections.Generic; using System.IO; // Requires: dotnet add package Microsoft.ML.TimeSeries var mlContext = new MLContext(); public class TimeSeriesData { public float Value { get; set; } public TimeSeriesData(float value) => Value = value; } public class ForecastResult { public float[] Forecast { get; set; } public float[] LowerBound { get; set; } public float[] UpperBound { get; set; } } // Generate periodic time series data var data = new List(); for (int i = 0; i < 30; i++) data.Add(new TimeSeriesData(i % 5)); // Repeating pattern: 0,1,2,3,4 var dataView = mlContext.Data.LoadFromEnumerable(data); // Configure SSA forecasting model var forecastingPipeline = mlContext.Forecasting.ForecastBySsa( outputColumnName: nameof(ForecastResult.Forecast), inputColumnName: nameof(TimeSeriesData.Value), windowSize: 5, // Size of window for SSA seriesLength: 11, // Length of series to analyze trainSize: data.Count, // Number of points to train on horizon: 5, // Number of values to forecast confidenceLevel: 0.95f, confidenceLowerBoundColumn: nameof(ForecastResult.LowerBound), confidenceUpperBoundColumn: nameof(ForecastResult.UpperBound)); var model = forecastingPipeline.Fit(dataView); // Create forecasting engine var forecastEngine = model.CreateTimeSeriesEngine(mlContext); // Forecast next 5 values var forecast = forecastEngine.Predict(); Console.WriteLine("Forecasted values: [{0}]", string.Join(", ", forecast.Forecast.Select(f => f.ToString("F2")))); // Update model with new observations forecastEngine.Predict(new TimeSeriesData(0)); forecastEngine.Predict(new TimeSeriesData(1)); // Save model checkpoint forecastEngine.CheckPoint(mlContext, "forecast_model.zip"); // Load and continue forecasting using (var stream = File.OpenRead("forecast_model.zip")) { var loadedModel = mlContext.Model.Load(stream, out _); var loadedEngine = loadedModel.CreateTimeSeriesEngine(mlContext); var newForecast = loadedEngine.Predict(); } ``` ## Saving and Loading Models Persist trained models to disk and load them for later use or deployment. ```csharp using Microsoft.ML; using Microsoft.ML.Data; using System; using System.IO; var mlContext = new MLContext(); public class InputData { public float Feature1 { get; set; } public float Feature2 { get; set; } public bool Label { get; set; } } public class OutputPrediction { public bool PredictedLabel { get; set; } public float Probability { get; set; } public float Score { get; set; } } // Sample training data var data = new[] { new InputData { Feature1 = 1.0f, Feature2 = 2.0f, Label = true }, new InputData { Feature1 = 3.0f, Feature2 = 1.0f, Label = false }, new InputData { Feature1 = 2.0f, Feature2 = 3.0f, Label = true }, }; var dataView = mlContext.Data.LoadFromEnumerable(data); // Build and train pipeline var pipeline = mlContext.Transforms.Concatenate("Features", "Feature1", "Feature2") .Append(mlContext.BinaryClassification.Trainers.SdcaLogisticRegression()); var model = pipeline.Fit(dataView); // Save model to file string modelPath = "model.zip"; mlContext.Model.Save(model, dataView.Schema, modelPath); Console.WriteLine($"Model saved to: {modelPath}"); // Load model from file ITransformer loadedModel; DataViewSchema modelSchema; using (var stream = File.OpenRead(modelPath)) { loadedModel = mlContext.Model.Load(stream, out modelSchema); } // Create prediction engine from loaded model var predictionEngine = mlContext.Model .CreatePredictionEngine(loadedModel); // Make predictions var prediction = predictionEngine.Predict( new InputData { Feature1 = 2.5f, Feature2 = 2.5f }); Console.WriteLine($"Predicted: {prediction.PredictedLabel}, " + $"Probability: {prediction.Probability:F3}"); ``` ## PredictionEngine for Single Predictions Create efficient prediction engines for real-time single-row predictions. ```csharp using Microsoft.ML; using Microsoft.ML.Data; using System; var mlContext = new MLContext(); public class IrisInput { [ColumnName("Label")] public string Species { get; set; } public float SepalLength { get; set; } public float SepalWidth { get; set; } public float PetalLength { get; set; } public float PetalWidth { get; set; } } public class IrisPrediction { public uint PredictedLabel { get; set; } public float[] Score { get; set; } } // Assume model is already trained and loaded // var model = mlContext.Model.Load("iris_model.zip", out _); // For demonstration, create a simple pipeline var sampleData = new[] { new IrisInput { Species = "setosa", SepalLength = 5.1f, SepalWidth = 3.5f, PetalLength = 1.4f, PetalWidth = 0.2f }, new IrisInput { Species = "versicolor", SepalLength = 7.0f, SepalWidth = 3.2f, PetalLength = 4.7f, PetalWidth = 1.4f }, }; var dataView = mlContext.Data.LoadFromEnumerable(sampleData); var pipeline = mlContext.Transforms.Concatenate("Features", "SepalLength", "SepalWidth", "PetalLength", "PetalWidth") .Append(mlContext.Transforms.Conversion.MapValueToKey("Label")) .Append(mlContext.MulticlassClassification.Trainers.SdcaMaximumEntropy()); var model = pipeline.Fit(dataView); // Create prediction engine (expensive - create once, reuse many times) var predictionEngine = mlContext.Model .CreatePredictionEngine(model); // Make single predictions (fast after engine creation) var newSample = new IrisInput { SepalLength = 5.9f, SepalWidth = 3.0f, PetalLength = 4.2f, PetalWidth = 1.5f }; var prediction = predictionEngine.Predict(newSample); Console.WriteLine($"Predicted class: {prediction.PredictedLabel}"); Console.WriteLine($"Scores: [{string.Join(", ", prediction.Score.Select(s => s.ToString("F3")))}]"); // Note: PredictionEngine is NOT thread-safe // For multi-threaded scenarios, create one engine per thread // or use object pooling ``` ## Cross-Validation Evaluate model performance using k-fold cross-validation to get more reliable metrics. ```csharp using Microsoft.ML; using Microsoft.ML.Data; using System; using System.Linq; var mlContext = new MLContext(seed: 0); public class DataPoint { public uint Label { get; set; } [VectorType(4)] public float[] Features { get; set; } } // Generate sample data var random = new Random(0); var data = Enumerable.Range(0, 150).Select(_ => { var label = (uint)random.Next(0, 3); return new DataPoint { Label = label, Features = new[] { (float)random.NextDouble() + label * 0.3f, (float)random.NextDouble() + label * 0.2f, (float)random.NextDouble() + label * 0.3f, (float)random.NextDouble() + label * 0.2f } }; }).ToList(); var dataView = mlContext.Data.LoadFromEnumerable(data); // Define pipeline var pipeline = mlContext.Transforms.Conversion.MapValueToKey("Label") .AppendCacheCheckpoint(mlContext) .Append(mlContext.MulticlassClassification.Trainers.SdcaMaximumEntropy()); // Perform 5-fold cross-validation var cvResults = mlContext.MulticlassClassification.CrossValidate( dataView, pipeline, numberOfFolds: 5); // Analyze results from all folds var microAccuracies = cvResults.Select(r => r.Metrics.MicroAccuracy); var macroAccuracies = cvResults.Select(r => r.Metrics.MacroAccuracy); Console.WriteLine($"Average Micro Accuracy: {microAccuracies.Average():F3}"); Console.WriteLine($"Std Dev Micro Accuracy: {StandardDeviation(microAccuracies):F3}"); Console.WriteLine($"Average Macro Accuracy: {macroAccuracies.Average():F3}"); // Select best model var bestRun = cvResults.OrderByDescending(r => r.Metrics.MicroAccuracy).First(); var bestModel = bestRun.Model; static double StandardDeviation(IEnumerable values) { var avg = values.Average(); var sum = values.Sum(v => Math.Pow(v - avg, 2)); return Math.Sqrt(sum / values.Count()); } ``` ## Train-Test Split Split data into training and testing sets for model evaluation. ```csharp using Microsoft.ML; using Microsoft.ML.Data; using System; using System.Linq; var mlContext = new MLContext(seed: 0); public class DataPoint { public bool Label { get; set; } [VectorType(10)] public float[] Features { get; set; } } // Generate sample data var random = new Random(0); var data = Enumerable.Range(0, 1000).Select(_ => { var label = random.NextDouble() > 0.5; return new DataPoint { Label = label, Features = Enumerable.Range(0, 10) .Select(__ => (float)random.NextDouble() + (label ? 0.1f : 0f)) .ToArray() }; }).ToList(); var dataView = mlContext.Data.LoadFromEnumerable(data); // Split: 80% training, 20% testing var split = mlContext.Data.TrainTestSplit(dataView, testFraction: 0.2); Console.WriteLine($"Training set size: ~{(int)(data.Count * 0.8)}"); Console.WriteLine($"Test set size: ~{(int)(data.Count * 0.2)}"); // Train on training set var pipeline = mlContext.BinaryClassification.Trainers.SdcaLogisticRegression(); var model = pipeline.Fit(split.TrainSet); // Evaluate on test set (never seen during training) var predictions = model.Transform(split.TestSet); var metrics = mlContext.BinaryClassification.Evaluate(predictions); Console.WriteLine($"Test Accuracy: {metrics.Accuracy:F3}"); Console.WriteLine($"Test AUC: {metrics.AreaUnderRocCurve:F3}"); Console.WriteLine($"Test F1: {metrics.F1Score:F3}"); ``` ## Custom Transformations with CustomMapping Define custom data transformation logic using C# lambda expressions. ```csharp using Microsoft.ML; using Microsoft.ML.Data; using System; using System.Collections.Generic; var mlContext = new MLContext(); public class InputData { public float Income { get; set; } public int Age { get; set; } public string Category { get; set; } } public class TransformedData { public bool IsHighIncome { get; set; } public string AgeGroup { get; set; } public float NormalizedIncome { get; set; } } var data = new List { new InputData { Income = 75000, Age = 35, Category = "A" }, new InputData { Income = 45000, Age = 28, Category = "B" }, new InputData { Income = 120000, Age = 52, Category = "A" }, }; var dataView = mlContext.Data.LoadFromEnumerable(data); // Define custom mapping function Action mapping = (input, output) => { output.IsHighIncome = input.Income > 50000; output.AgeGroup = input.Age < 30 ? "Young" : input.Age < 50 ? "Middle" : "Senior"; output.NormalizedIncome = input.Income / 100000f; }; // Create custom mapping transformer var customPipeline = mlContext.Transforms.CustomMapping(mapping, contractName: null); var transformedData = customPipeline.Fit(dataView).Transform(dataView); var results = mlContext.Data.CreateEnumerable( transformedData, reuseRowObject: false); foreach (var item in results) { Console.WriteLine($"High Income: {item.IsHighIncome}, " + $"Age Group: {item.AgeGroup}, " + $"Normalized: {item.NormalizedIncome:F2}"); } // For saveable custom mappings, use CustomMappingFactory [CustomMappingFactoryAttribute("IncomeMapper")] public class IncomeMapper : CustomMappingFactory { public override Action GetMapping() { return (input, output) => { output.IsHighIncome = input.Income > 50000; output.NormalizedIncome = input.Income / 100000f; }; } } ``` ## Data Transformation Pipelines Chain multiple transformations for feature engineering. ```csharp using Microsoft.ML; using Microsoft.ML.Data; using System; using System.Collections.Generic; var mlContext = new MLContext(); public class RawData { public float Age { get; set; } public float Income { get; set; } public string Education { get; set; } public string Occupation { get; set; } public bool Label { get; set; } } var data = new List { new RawData { Age = 35, Income = 75000, Education = "Bachelor", Occupation = "Engineer", Label = true }, new RawData { Age = 28, Income = 45000, Education = "Master", Occupation = "Teacher", Label = false }, new RawData { Age = 52, Income = 120000, Education = "PhD", Occupation = "Manager", Label = true }, }; var dataView = mlContext.Data.LoadFromEnumerable(data); // Build comprehensive transformation pipeline var pipeline = mlContext.Transforms // Normalize numeric features .NormalizeMinMax("NormalizedAge", "Age") .Append(mlContext.Transforms.NormalizeMinMax("NormalizedIncome", "Income")) // One-hot encode categorical features .Append(mlContext.Transforms.Categorical.OneHotEncoding("EducationEncoded", "Education")) .Append(mlContext.Transforms.Categorical.OneHotEncoding("OccupationEncoded", "Occupation")) // Concatenate all features into single vector .Append(mlContext.Transforms.Concatenate("Features", "NormalizedAge", "NormalizedIncome", "EducationEncoded", "OccupationEncoded")) // Cache for iterative training .AppendCacheCheckpoint(mlContext) // Add trainer .Append(mlContext.BinaryClassification.Trainers.FastTree()); // Train end-to-end model var model = pipeline.Fit(dataView); // The model includes all transformations - just pass raw data for predictions var predictionEngine = mlContext.Model.CreatePredictionEngine(model); public class Prediction { public bool PredictedLabel { get; set; } public float Probability { get; set; } } var newPerson = new RawData { Age = 40, Income = 85000, Education = "Master", Occupation = "Engineer" }; var result = predictionEngine.Predict(newPerson); Console.WriteLine($"Prediction: {result.PredictedLabel}, Probability: {result.Probability:F3}"); ``` ## Summary ML.NET provides a comprehensive framework for building machine learning solutions entirely within the .NET ecosystem. The framework excels in scenarios requiring integration with existing .NET applications, real-time predictions in web services using ASP.NET Core, batch processing in data pipelines, and edge deployment scenarios where Python dependencies are impractical. Key strengths include its strongly-typed pipeline approach, seamless model serialization, and the ability to consume pre-trained TensorFlow and ONNX models. Common integration patterns involve creating shared model projects for training pipelines, deploying trained models as NuGet packages or embedded resources, using PredictionEngine pooling for high-throughput web APIs, and leveraging ML.NET's AutoML capabilities for automated model selection. The framework supports both traditional machine learning algorithms (classification, regression, clustering, recommendation) and specialized scenarios like time series forecasting and anomaly detection. For production deployments, models can be serialized to .zip files and loaded in any .NET application, enabling separation between training environments and inference services.