Basics

Bootstrapping

Download this notebook There is a story about a man who got stuck in quicksand and would have suffered a horrible death, but he managed to pull himself out of the sand by his boot straps. In German, Baron von Münchhausen had a similar story about how he pulled himself out of a swamp by his hair. Obviously, all of this is nonsense, but it coined the name of a statistical method - Bootstrapping - that seems to be similarly ludicrous, but actually works and is insanely useful for many otherwise difficult situations.

Data Science - Toxicological Predictions

[Download this notebook](05 - Data Science.ipynb) In this lesson you’ll learn: how to train a SVM. why it is necessary to scale variables. how to train a random forest model. about Y-scrambling and the necessity of train/test splits. how to split data into a training set and test set. Today you will learn some basics of data science and machine learning. These will also be relevant for training neural networks. As an example, we will build models to detect the toxicological concern of molecules.

Introduction to cheminformatics using rdkit

[Download this notebook](03 - Cheminformatics.ipynb) In this lesson you’ll learn: how to read smiles using rdkit. how to manipulate and visualize molecules. how to calculate molecule descriptors. how to calculate the similarity of molecules using fingerprints. Today’s notebook is about the use of Python in cheminformatics. As a case study, you will be looking for an alternative to Sorafenib. Sorafenib is a kinase inhibitor used mainly to treat advanced kidney cancer.

Introduction to Statistics

[Download this notebook](04 - Linear Regression.ipynb) Today we are going to look at some basics of statistics. Statistics can help us to describe and explain data in a simple way. In this lesson you’ll learn: how to calculate the mean, variance, and standard deviation in Python. the difference between a regression and a classification. how a linear regression functions and the meaning of its coefficients. about the Mean Squared Error and the loss function.

K-Fold Cross Validation

Download this notebook You may already be familiar with the idea of splitting data into training and test data: You only train your model on the training data and then evaluate it on the unknown test data to see how good it deals with completely new data. Often, you also see a validation data set that is known to the Machine Learning engineer, but not known by the model during training process.