Normalization  |  Machine Learning  |  Google for Developers (2024)

The goal of normalization is to transform features to be on a similarscale. This improves the performance and training stability of the model.

Normalization Techniques at a Glance

Four common normalization techniques may be useful:

  • scaling to a range
  • clipping
  • log scaling
  • z-score

The following charts show the effect of each normalization technique on thedistribution of the raw feature (price) on the left.The charts are based on the data set from 1985 Ward's Automotive Yearbook thatis part of the UCI Machine Learning Repository under Automobile DataSet.

Normalization | Machine Learning | Google for Developers (1)

Figure 1. Summary of normalization techniques.

Scaling to a range

Recall from MLCCthat scalingmeans converting floating-point feature values from their natural range (forexample, 100 to 900) into a standard range—usually 0 and 1 (or sometimes -1 to+1). Use the following simple formula to scale to a range:

\[ x' = (x - x_{min}) / (x_{max} - x_{min}) \]

Scaling to a range is a good choice when both of the following conditions aremet:

  • You know the approximate upper and lower bounds on your data withfew or no outliers.
  • Your data is approximately uniformly distributed across that range.

A good example is age. Most age values falls between 0 and 90, and every part ofthe range has a substantial number of people.

In contrast, you would not use scaling on income, because only a few peoplehave very high incomes. The upper bound of the linear scale for income would bevery high, and most people would be squeezed into a small part of the scale.

Feature Clipping

If your data set contains extreme outliers, you might try featureclipping, which caps all feature values above (or below) a certainvalue to fixed value. For example, you could clip all temperature valuesabove 40 to be exactly 40.

You may apply feature clipping before or after other normalizations.

Formula: Set min/max values to avoid outliers.

Normalization | Machine Learning | Google for Developers (2)

Figure 2. Comparing a raw distribution and its clipped version.

Another simple clipping strategy is to clip by z-score to +-Nσ (for example, limit to+-3σ). Note that σ is the standard deviation.

Log Scaling

Log scaling computes the log of your values to compress a wide range to a narrowrange.

\[ x' = log(x) \]

Log scaling is helpful when a handful of your values have many points, whilemost other values have few points. This data distribution is known as the powerlaw distribution. Movie ratings are a good example. In the chart below, mostmovies have very few ratings (the data in the tail), while a few have lots ofratings (the data in the head). Log scaling changes the distribution, helping toimprove linear model performance.

Normalization | Machine Learning | Google for Developers (3)

Figure 3. Comparing a raw distribution to its log.

Z-Score

Z-score is a variation of scaling that represents the number of standarddeviations away from the mean. You would use z-score to ensure your featuredistributions have mean = 0 and std = 1. It’s useful when there are a fewoutliers, but not so extreme that you need clipping.

The formula for calculating the z-score of a point, x, is as follows:

\[ x' = (x - μ) / σ \]

Normalization | Machine Learning | Google for Developers (4)

Figure 4. Comparing a raw distribution to its z-score distribution.

Notice that z-score squeezes raw values that have a range of ~40000down into a range from roughly -1 to +4.

Suppose you're not sure whether the outliers truly are extreme.In this case, start with z-score unless you have feature values thatyou don't want the model to learn; for example, the values arethe result of measurement error or a quirk.

Summary

Normalization TechniqueFormulaWhen to Use
Linear Scaling $$ x' = (x - x_{min}) / (x_{max} - x_{min}) $$ When the feature is more-or-less uniformly distributed across a fixed range.
Clipping if x > max, then x' = max. if x < min, then x' = min When the feature contains some extreme outliers.
Log Scaling x' = log(x) When the feature conforms to the power law.
Z-score x' = (x - μ) / σ When the feature distribution does not contain extreme outliers.
Normalization  |  Machine Learning  |  Google for Developers (2024)
Top Articles
Short Term Mutual Funds vs Long Term Mutual Funds
How to pay off credit card debt
NOAA: National Oceanic &amp; Atmospheric Administration hiring NOAA Commissioned Officer: Inter-Service Transfer in Spokane Valley, WA | LinkedIn
Chatiw.ib
Dollywood's Smoky Mountain Christmas - Pigeon Forge, TN
Jonathan Freeman : "Double homicide in Rowan County leads to arrest" - Bgrnd Search
Unlocking the Enigmatic Tonicamille: A Journey from Small Town to Social Media Stardom
Grand Park Baseball Tournaments
104 Presidential Ct Lafayette La 70503
Aces Fmc Charting
RBT Exam: What to Expect
7440 Dean Martin Dr Suite 204 Directions
Seattle Rpz
Bowie Tx Craigslist
Guilford County | NCpedia
Nj State Police Private Detective Unit
Burn Ban Map Oklahoma
Craigslist List Albuquerque: Your Ultimate Guide to Buying, Selling, and Finding Everything - First Republic Craigslist
Samantha Lyne Wikipedia
Troy Bilt Mower Carburetor Diagram
3S Bivy Cover 2D Gen
Craigslist Missoula Atv
Beryl forecast to become an 'extremely dangerous' Category 4 hurricane
Lista trofeów | Jedi Upadły Zakon / Fallen Order - Star Wars Jedi Fallen Order - poradnik do gry | GRYOnline.pl
Sussur Bloom locations and uses in Baldur's Gate 3
Fsga Golf
Rimworld Prison Break
Jcp Meevo Com
Random Bibleizer
Temu Seat Covers
By.association.only - Watsonville - Book Online - Prices, Reviews, Photos
What is Software Defined Networking (SDN)? - GeeksforGeeks
Sam's Club Gas Price Hilliard
Earthy Fuel Crossword
Moonrise Time Tonight Near Me
Weekly Math Review Q4 3
Andhra Jyothi Telugu News Paper
When His Eyes Opened Chapter 2048
Hebrew Bible: Torah, Prophets and Writings | My Jewish Learning
Weather Underground Bonita Springs
Froedtert Billing Phone Number
Gifford Christmas Craft Show 2022
Registrar Lls
Best Restaurants Minocqua
Nid Lcms
Weather In Allentown-Bethlehem-Easton Metropolitan Area 10 Days
ACTUALIZACIÓN #8.1.0 DE BATTLEFIELD 2042
Payrollservers.us Webclock
Gabrielle Abbate Obituary
Msatlantathickdream
Koniec veľkorysých plánov. Prestížna LEAF Academy mení adresu, masívny kampus nepostaví
Guidance | GreenStar™ 3 2630 Display
Latest Posts
Article information

Author: Neely Ledner

Last Updated:

Views: 5908

Rating: 4.1 / 5 (62 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: Neely Ledner

Birthday: 1998-06-09

Address: 443 Barrows Terrace, New Jodyberg, CO 57462-5329

Phone: +2433516856029

Job: Central Legal Facilitator

Hobby: Backpacking, Jogging, Magic, Driving, Macrame, Embroidery, Foraging

Introduction: My name is Neely Ledner, I am a bright, determined, beautiful, adventurous, adventurous, spotless, calm person who loves writing and wants to share my knowledge and understanding with you.