NumPy Softmax Function in Python

Softmax converts a list of scores into probability-like values that add up to one. It is common in classification models because the largest score becomes the largest probability, while every output remains positive.

NumPy does not provide a single top-level softmax function, but it gives the pieces needed to implement it clearly: exponentials, sums, maximum values, and axis-aware broadcasting. The official numpy.exp documentation covers exponentials, numpy.sum covers reduction, and scipy.special.softmax provides a ready-made reference implementation.

Related PythonPool guides cover NumPy log, NumPy divide, NumPy magnitude, and Python sum.

Contents

Basic Softmax Formula

The direct formula exponentiates each score and divides by the total of all exponentials.

import numpy as np

scores = np.array([1.0, 2.0, 3.0])
exp_scores = np.exp(scores)
probabilities = exp_scores / np.sum(exp_scores)

print(probabilities)
print(probabilities.sum())

The output values are positive and add up to one. Higher input scores receive larger output values.

This version is fine for small examples, but it can overflow when scores are large. A stable implementation is better for real numeric work.

Use A Numerically Stable Softmax

Subtract the maximum score before exponentiating. This does not change the final probabilities, but it keeps exponentials smaller.

import numpy as np

scores = np.array([1000.0, 1001.0, 1002.0])
shifted = scores - np.max(scores)
exp_scores = np.exp(shifted)
probabilities = exp_scores / np.sum(exp_scores)

print(probabilities)

The largest shifted score becomes zero, so its exponential is 1 instead of an enormous number. This prevents overflow in common model-output cases.

Use this stable pattern as the default whenever you write softmax yourself.

Wrap Softmax In A Function

A helper function keeps the formula reusable and makes axis handling easier to add later.

import numpy as np

def softmax(values):
    values = np.asarray(values, dtype=float)
    shifted = values - np.max(values)
    exp_values = np.exp(shifted)
    return exp_values / np.sum(exp_values)

print(softmax([2.0, 1.0, 0.1]))

np.asarray() accepts lists, tuples, and arrays. The float conversion avoids integer division surprises and makes the output numeric type clear.

For one-dimensional scores, this helper is usually enough.

Apply Softmax Row By Row

Model outputs often arrive as a two-dimensional array, with one row per example and one column per class. Use axis=1 to normalize each row.

import numpy as np

logits = np.array([
    [2.0, 1.0, 0.1],
    [0.5, 1.5, 3.0],
])

shifted = logits - np.max(logits, axis=1, keepdims=True)
exp_values = np.exp(shifted)
probabilities = exp_values / np.sum(exp_values, axis=1, keepdims=True)

print(probabilities)
print(probabilities.sum(axis=1))

keepdims=True preserves the two-dimensional shape during subtraction and division, so broadcasting lines up with each row.

Always choose the axis intentionally. Normalizing across the wrong axis can produce values that add to one in the wrong direction.

Compare With SciPy Softmax

If SciPy is already available, scipy.special.softmax() can calculate softmax directly.

import numpy as np
from scipy.special import softmax

logits = np.array([[2.0, 1.0, 0.1], [0.5, 1.5, 3.0]])
probabilities = softmax(logits, axis=1)

print(probabilities)
print(probabilities.sum(axis=1))

This is concise and handles the stable calculation internally. It is a good reference when checking a custom NumPy implementation.

If your project only uses NumPy, the manual stable formula avoids adding a new dependency.

Choose The Predicted Class

After softmax, argmax() returns the index of the largest probability. This is a common final step in classification demos.

import numpy as np

labels = np.array(["cat", "dog", "horse"])
probabilities = np.array([0.14, 0.79, 0.07])

index = np.argmax(probabilities)

print(labels[index])
print(probabilities[index])

Softmax output is useful for ranking classes, but it should not be treated as perfect confidence. Calibration depends on the model and training process.

For reporting, show both the predicted label and the probability so readers can see how decisive the result is.

Softmax Versus Normalization

Softmax is not the same as dividing raw values by their sum. Softmax first applies an exponential transformation, which increases the gap between larger and smaller scores. A score that is only a little larger can become much more prominent after exponentiation.

This behavior is useful for classification logits, where the model outputs relative scores rather than already normalized values. If your data already contains counts, percentages, or positive weights, ordinary normalization may be more appropriate than softmax.

Softmax also preserves order: the largest input gets the largest output. It changes the scale, not the ranking. That makes it useful when you need a probability-shaped distribution while keeping the model’s preference order intact.

Temperature And Sharpness

Some machine-learning workflows divide logits by a temperature before softmax. A lower temperature makes the largest class more dominant. A higher temperature spreads probability mass more evenly across classes.

Temperature is useful for experimentation and model calibration, but it should be chosen deliberately. Changing it can make predictions look more or less confident without changing the underlying model scores.

Common Softmax Mistakes

Do not exponentiate large scores without subtracting the maximum first. Overflow can create infinite values and break the normalization step.

Do not forget the axis for batch arrays. A one-dimensional example may work, then fail logically when moved to rows and columns.

Do not round probabilities before checking that they sum to one. Rounding can make a correct result look slightly off.

The practical default is a stable NumPy implementation with keepdims=True for arrays, or SciPy’s softmax() when SciPy is already part of the project.