Standard deviation

A standard deviation (or σ) is a statistical metric (measure) of how dispersed the data is in relation to the mean. It if frequently used in machine learning data preparation and in machine learning model training.

standardization

In machine learning, standardization is a feature engineering technique by which the dataset features are re-scaled to achieve zero-mean value (μ=0) and unit standard deviation value (σ=1). Each x value in the dataset gets a corresponding x' standardized value, which is calculated as follows. , where μ is the x variable mean and σ is ... Read more

Stationarity

Stationarity in machine learning is a property of a forecasting machine learning model by which the statistical attributes of a variable, such as mean, variance and covariance, are kept constant instead of varying over time.

stemming

Stemming in machine learning and natural language processing is the process of removing the affix of a word in order to retrieve the word stem. This is essential in order to train an ML model on a series of words belonging to a human natural language.

Stochastic

Stochastic in data science and machine learning refers to a property by which a randomly determined process cannot perfectly estimate individual events or data points but can demonstrate a general pattern common to the entire set of data. In data science and physics, the term stochastic refers to events which occur without a formally set ... Read more

stop word

A stop word in machine learning text processing refers to any word which provides no content, such as simple and common words (and, to, so, by, to, etc)

Stop word

A stop word is a word in a text document which is very common and it is therefore typically removed when the text is processed. Stop words are therefore not included in the training set of machine learning models in natural language processing scenarios.

stratified cross validation

Stratified cross validation is a data validation technique used when splitting the ML dataset into k subsets, of which k-1 subsets are used as training subsets (folds) and one (1) is used as the test subset (fold). This process is repeated k times. Stratified cross validation uses stratified sampling in the dataset, in order to ... Read more

stratified k-fold cross-validation

The stratified k-fold cross-validation is a k-fold cross-validation method in which each fold has a representative sample of data in datasets which exhibit class imbalance.

stride

Stride in Convolutional Neural Networks (CNN) is called the distance between filters in a convolution as they scan an image.

Strong Authentication

Strong Authentication (SA) Strong authentication assumes the usage of Multi-factor authentication (MFA) as a baseline, but goes beyond that with other authentication means. Strong authentication employs National Institute for Standards and Technology (NIST) assurance level-2 or assurance level-3. More details about strong authentication can be found at: https://www.yubico.com/resources/glossary/strong-authentication/.

structured data

Data found in data sources (virtual machines, virtual containers, storage accounts, databases, data wareshouses, data lakes, data marts and data hubs) can be classified into three (3) major categories with regard to the level of structure they present. Unstructured data, i.e data which is in a format that makes it difficult to search, filter, or ... Read more