Skip to content

Please Remember...

This is a fictional example and not real documentation. The purpose is to demonstrate my technical writing.

Important to Know

  • Quant models don’t prove guilt. They highlight trades that look unusual based on known patterns.
  • Further investigation is always needed. The model is a tool to guide your attention, not make final judgments.

understanding how logistic regression calculates probabilities | quant models (logistic regression formula)


Tutorial: Using the Logistic Regression Formula with Sample Data

How it looks in real life

Scenario

You (as a customer) have sent Company-X a file with post-trade information for analysis. After applying the Logistic Regression Formula, one trade stood out due to unusual characteristics that triggered red flags.

Let’s walk through how this unfolded:

Example trade

You sent Company-X :

  • Trade ID: 123456
  • Trade size: £250,000
  • Time of day: 02:00 AM
  • Trader: Jo Bloggs

Company-X has enriched the trade with:

  • Trader risk score: 8.5
  • Deviation from typical behavior: 1.2 standard deviations
  • Frequency: 15 trades in the past hour

Where do the enrichment values come from?

  • Based on historical trading data, Company-X has calculated that trader Jo Bloggs consistently shows risky trading behaviour. As a result, Jo Bloggs has a risk score of 8.5 out of 10, where 10 is the riskiest level.
  • Company-X has also analysed Jo Bloggs’ typical trading patterns. This particular trade sits 1.2 standard deviations away from their usual behaviour. That means it’s somewhat unusual for this trader, but not extremely rare.
  • The frequency value is also calculated internally. Company-X's systems looked at all trades made by Jo Bloggs in the past hour and found 15, which may indicate a trading spike.

Model Setup

Now that Company-X has both the trade information (from what the client sent them) and their internal enrichments, we can look at how their model processes it.

In Company-X's model:

  • y = 1 means the trade is fraudulent
  • y = 0 means the trade is not fraudulent

The model uses several features (input variables) and combines them using a set of learned weights, called β coefficients. These were derived from training the model on a large dataset of past trading behaviour.

Here’s how each input contributes:

Feature x-value β (coefficient)
Intercept (baseline risk) -3.0
Trade size (£250k) 250 0.01
Frequency (15 trades/hour) 15 0.08
Time of day (2 AM = 2) 2 0.5
Trader risk score (8.5) 8.5 0.4
Deviation (1.2 std devs) 1.2 1.0

How β coefficients are learned

The β coefficients represent how important each feature is for predicting fraud. These values come from training the logistic regression model on historical labeled data, where the outcome (fraud or not) is known. The model “learns” these weights by finding the best fit to the past data.

Scaling

Trade size is scaled in units of £1,000 to keep the β coefficient manageable.

Intercept (baseline risk)

The intercept reflects the model's starting point; the risk when no features are present.
A value of -3.0 strongly indicates non-fraudulent unless other risk factors push the probability higher.

This number comes from training the model on historical labelled data, and it can change as the model is retrained with new information over time. As weCompany-X add each feature, the score adjusts, moving the probability closer to 1 (fraudulent) or staying near 0 (not fraudulent).

Plugging everything into the formula

Not seeing the formula?

Refresh the page if you see something like "[ P(y = 1 \mid X) = ..." rather than a pretty formula.

Let’s plug in the numbers and see what the model sees.

  • First, all the values plugged in:
\[ P(y = 1 \mid X) = \frac{1}{1 + e^{-(-3.0 + 0.01 \times 250 + 0.08 \times 15 + 0.5 \times 2 + 0.4 \times 8.5 + 1.0 \times 1.2)}} \]
  • Now we calculate the expression inside the exponent:
\[ P(y = 1 \mid X) = \frac{1}{1 + e^{-(-3.0 + 2.5 + 1.2 + 1.0 + 3.4 + 1.2)}} \]
  • Which simplifies to:
\[ P(y = 1 \mid X) = \frac{1}{1 + e^{-6.3}} \]

And finally, the numerical result:

\[ P(y = 1 \mid X) \approx \frac{1}{1 + e^{-6.3}} \]
\[ P(y = 1 \mid X) \approx \frac{1}{1 + 0.0018} \]
\[ P(y = 1 \mid X) \approx 0.9982 \]

And the part we're interested in:

~0.9982 * 100 = ~99.82%

Probability of fraud is ~99.82%


Next steps for you

Company-X sends you a file with all the trades they've flagged as possibly fraudulent.

This means you need to investigate further by following your internal policies.

Look at things like:

  • Was this trade properly signed off?
  • Was it part of a larger risk avoidance strategy?
  • Are there any unusual patterns or connections to other trades?
  • Any additional contextual information that could explain the behaviour?

Important to Know

  • Quant models don’t prove guilt. They highlight trades that look unusual based on known patterns.
  • Further investigation is always needed. The model is a tool to guide your attention, not make final judgments.

What comes next?