ML.NET Machine learning library from Microsoft

Today I would like to share my thoughts with you about a new machine learning library from Microsoft.

I came across this library a few weeks ago. After reading the first few pages from the documentation, I was amazed. The effort to create the library must have been tremendous. I am not a data scientist, but I am interested in maths and statistics, and I feel that the ML.NET library is likely to be a pretty good tool in my software toolbox. So let us dive together into the new-brand library from Microsoft.

The library’s goal is to provide an open-source and cross-platform machine learning framework where you are able to write your code in .NET environment.

Last year, I devoted some time to study machine learning in Python to made myself familiar with the basics. It was really more of getting the gist than a serious in-depth study. So I knew how machine learning looked like in the Python environment, and I was keen to find out how it could be done in C#.

As the documentation says:

ML.NET has been designed as an extensible platform. Therefore you can consume other popular ML frameworks (TensorFlow, ONNX, Infer.NET, and more).

It means you have access to even more machine learning scenarios, like image classification, object detection, and more than just a set of models that are provided by ML.NET itself.

To be honest, I love Microsoft because they are usually super developer-friendly. In the past, they made a few bad decisions, like creating Silverlight (and other technologies) only to kill it a few years later. And in the era of Steve Balmer, it was not the type of company you would look up to. Fortunately, it belongs to the past. Nowadays, they are thrilling, whether you look at their services, open-source software, or development tools. ML.NET is not an exception; it is an excellent piece of software with detailed documentation. On top of that, there are also tools for utter beginners, as I tell you later in this blog post.

The rest of this blog post describes what we can accomplish with the library. Because Machine learning has crucial steps you can not avoid, let’s look at what ML.NET offers to us.

Preparing data

First of all, you need to prepare your data. Machine learning heavily depends on the correct data, and it is a critical contribution to building an accurate model. I saw many data scientists “complaining” about laborious cleaning and adjusting their raw data. Having prepared some models in the past, I need to say they are dead right.

In ML.NET, it is called data transformation.

As MS said:

Data transformations are used to:

  • prepare data for model training
  • apply an imported model in TensorFlow or ONNX format
  • post-process data after it has been passed through a model

Therefore Microsoft prepares a set of functions for you to tackle this issue. Further, for data processing, you can use the full power of LINQ or other C# libraries. I believe for seasoned C# developers, that kind of work can be done pretty easily and elegantly.

You can learn about support for the data transformation in the short article. It will make you familiar with mapping functions from various data sources, classic normalization and scaling functions, and as well as a few classes for type conversions and text or image transformation, etc. Everything is thoroughly documented.

The core of machine learning lies in an algorithm that can create a mathematical model that can be used for predictions. ML.NET does not let you down.

There are many algorithms to take advantage of. There are 12 binary classification trainers that will be helpful whenever we want to find something falls into only two possible states like positive vs. negative comment sentinel on Facebook, diagnosis in a medical environment, etc..)

There are also seven algorithms for multiclass classification, regression, clustering, anomaly detection, ranking, and even the trainer for a recommendation. I believe it is entirely sufficient for prevalent machine learning tasks.

Moreover, if you are a beginner, you might wish to read the article on how to choose the right algorithms for your task. For instance, you will easily find out that “Averaged perceptron” is good for text classification. Besides, there is a link to the particular class which implements the algorithm with an example. As a result, it is easy to dive deep into the topic without leaving the documentation.

I would recommend that you take some time to make yourself familiar with different types of algorithms. Just read and try to understand the usability of each algorithm to get a high-level overview of what tasks ML.NET is able to solve.

It is worth adding that ML.NET documentation provides nice structured information. For example, the page regarding SdcaRegressionTrainer shows you a simple table to inform you if the algorithm needs normalization, an additional nuget package, and what type of algorithm is used plus a basic introduction to the algorithm. MS has done great work here.

Another important aspect is consuming your trained model.

Training and polishing your model may be really time-consuming for you as well as for your computer. Moreover, it is important to realize that a trained model lives in memory and is accessible throughout the application’s lifecycle. However, once the application stops running, if the model is not saved somewhere locally or remotely, it is no longer accessible.

The solution is using the ML.NET engines. Engines take your model and provide you with a prediction according to the inputs.

There are several approaches to model consuming available. ML.NET provides supportive infrastructure classes and functions you can use for storing and deploying models.

We could divide this infrastructure into two kinds:

Low-level API
ML.NET provides low-level API, which is good for saving or loading your serialized model to disk quickly. Thus you can train your model on your local computer and then just deploy your model to your favorite web or desktop application. It means it is possible to manually load the model into the engines and start using it in your app.

To learn about the low-level API, please refer to this tutorial.

Pooling API
However, the approach mentioned above is a little naive, and it is not suitable and recommended for production. ML.NET offers a more mature approach.

Creating the engine takes time and resources; for this reason, the right approach is to reuse these instances, such as we do it with Thread(ThreadPool) or connection to the database(ConnectionPool).

ML.NET offers robust but simple infrastructure. The class PredictionEnginePool service provides a mechanism to reload an updated model easily without taking your application down.

You will understand it better after watching the Microsoft official tutorial.

What I find super exciting is that you are allowed to use a pre-trained model like TensorFlow(URL) models to empower your application.

It means someone has trained a neural net in Python, and you can easily use it in C#. For the feature, MS even prepared helper classes to support you. Personally, I find TensorFlow a little Greek, but with ML.NET help with consuming these models, it seems easier than you might expect.

There are two possible options for how to use these models.

A) Run a pre-trained TensorFlow model
It is a more natural way. You can load a frozen TensorFlow model(serialized model on disk) and make predictions with it from your C# code. Look at the following example.

B) Transfer learning
Transfer learning is the process of using the knowledge gained while solving one problem and applying it to a different but related question.

You can just reuse part of an already pre-trained TensorFlow model, and on top of that, you may add other functionality or a new algorithm. This approach is very powerful but also much more complicated than just run a pre-trained TensorFlow model. Watch this nice example.

There is also a dark side to the project. ML.NET is different from other “normal” software development libraries, and please be prepared that some terms and techniques are not classic idiomatic .NET. Occasionally the dynamic nature of Python leaks into your code. Actually, it happens far more than I could imagine.

For instance(taken from this tutorial)

.Append(
mlContext.MulticlassClassification.Trainers.LbfgsMaximumEntropy (labelColumnName: “LabelKey”, featureColumnName: “softmax2_pre_activation”))

From my point of view, this seems much closer to “Python code” than C#. I had difficulties adapting to Python when I learned basic machine learning techniques and this code reminds me those dark times:) However, the friendly C# environment is so calming even if you have to cope with LbfgsMaximumEntropy function and its arguments :)

1) Visual Studio extension

If you are an absolute beginner, you can use the machine learning extension for Visual Studio. Feel free to download it here.

It is beneficial for people with absolutely no experience in machine learning. I take it as an education or introduction tool. I think after one or two projects, you will forget about its existence, or you might just use it as a bootstrapping tool.

2) AutoML

Another great thing is the AutoML. It is a mechanism of how to get advice on what algorithm you can or should use.

It uses iteration to find the best suitable combination algorithms and features. It is like machine learning of machine learning :)

3) Performance

ML.NET is a truly excellent library. Some authors have even published a white paper touching its performance.

You can check it here. It is not extremely technical, but it requires general knowledge of machine learning to understand fully.

4) Documentation

As I have mentioned, there are also general tips and tricks related to machine learning and a very gentle introduction to machine learning as well.

Check out the following essential tips here with links pointing out to more elaborated articles.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store