Normalization | Machine Learning | Google for Developers (2024)

The goal of normalization is to transform features to be on a similarscale. This improves the performance and training stability of the model.

Normalization Techniques at a Glance

Four common normalization techniques may be useful:

scaling to a range
clipping
log scaling
z-score

The following charts show the effect of each normalization technique on thedistribution of the raw feature (price) on the left.The charts are based on the data set from 1985 Ward's Automotive Yearbook thatis part of the UCI Machine Learning Repository under Automobile DataSet.

Figure 1. Summary of normalization techniques.

Scaling to a range

Recall from MLCCthat scalingmeans converting floating-point feature values from their natural range (forexample, 100 to 900) into a standard range—usually 0 and 1 (or sometimes -1 to+1). Use the following simple formula to scale to a range:

\[ x' = (x - x_{min}) / (x_{max} - x_{min}) \]

Feature Clipping

If your data set contains extreme outliers, you might try featureclipping, which caps all feature values above (or below) a certainvalue to fixed value. For example, you could clip all temperature valuesabove 40 to be exactly 40.

You may apply feature clipping before or after other normalizations.

Formula: Set min/max values to avoid outliers.

Log Scaling

Log scaling computes the log of your values to compress a wide range to a narrowrange.

\[ x' = log(x) \]

Log scaling is helpful when a handful of your values have many points, whilemost other values have few points. This data distribution is known as the powerlaw distribution. Movie ratings are a good example. In the chart below, mostmovies have very few ratings (the data in the tail), while a few have lots ofratings (the data in the head). Log scaling changes the distribution, helping toimprove linear model performance.

Figure 3. Comparing a raw distribution to its log.

Z-Score

Z-score is a variation of scaling that represents the number of standarddeviations away from the mean. You would use z-score to ensure your featuredistributions have mean = 0 and std = 1. It’s useful when there are a fewoutliers, but not so extreme that you need clipping.

The formula for calculating the z-score of a point, x, is as follows:

\[ x' = (x - μ) / σ \]

Figure 4. Comparing a raw distribution to its z-score distribution.

Notice that z-score squeezes raw values that have a range of ~40000down into a range from roughly -1 to +4.

Suppose you're not sure whether the outliers truly are extreme.In this case, start with z-score unless you have feature values thatyou don't want the model to learn; for example, the values arethe result of measurement error or a quirk.

Summary

Normalization Technique	Formula	When to Use
Linear Scaling	$$ x' = (x - x_{min}) / (x_{max} - x_{min}) $$	When the feature is more-or-less uniformly distributed across a fixed range.
Clipping	if x > max, then x' = max. if x < min, then x' = min	When the feature contains some extreme outliers.
Log Scaling	x' = log(x)	When the feature conforms to the power law.
Z-score	x' = (x - μ) / σ	When the feature distribution does not contain extreme outliers.