Skip to content

AI × Quant Trader Series — Day 7

The Swiss Army Knife of Linear Models: Lasso Regression

Reading time: ~15 minutes
Prerequisites: basic linear algebra, Python, NumPy
Focus: engineering intuition, quant usage (not ML hype)


Part 1: Introduction to Regularized Linear Models

We now move from data processing to one of the most important modeling tools in quantitative trading and applied machine learning: regularized linear models.

In real-world financial modeling, the main difficulty is rarely computation. Instead, it is almost always structure:

  • Too many features
  • Strong multicollinearity
  • Limited samples
  • High noise-to-signal ratio

A plain linear regression model can fit the data extremely well in-sample, yet fail catastrophically out-of-sample.

This is where Lasso regression becomes indispensable.


Part 2: From Linear Regression to Lasso

2.1 Ordinary Least Squares (OLS)

The objective function of ordinary least squares is:

\[ \min_{\beta} \sum_{i=1}^{n}(y_i - X_i \beta)^2 \]

OLS attempts to minimize prediction error only.
It places no constraint on model complexity.

As a result:

  • Coefficients become unstable when features are correlated
  • Noise features receive non-zero weights
  • Overfitting is almost guaranteed in high-dimensional settings

2.2 Why Regularization Is Necessary

In quantitative finance, feature sets often include:

  • Dozens of technical indicators
  • Overlapping factors
  • Lagged signals

Many of these features carry redundant or spurious information.

Regularization explicitly penalizes complexity, forcing the model to prefer simpler and more stable solutions.


Part 3: Lasso Regression — Core Idea

3.1 Objective Function

Lasso (Least Absolute Shrinkage and Selection Operator) modifies OLS by adding an L1 penalty:

\[ \min_{\beta} \sum_{i=1}^{n}(y_i - X_i \beta)^2 + \lambda \sum_{j} |\beta_j| \]

Where:

  • The first term measures fit quality
  • The second term penalizes coefficient magnitude
  • \(\lambda\) controls the strength of regularization

3.2 What Makes Lasso Different

Unlike Ridge regression (L2 regularization), Lasso drives some coefficients exactly to zero.

This leads to:

  • Automatic feature selection
  • Sparse models
  • Improved interpretability

From an engineering perspective:

Lasso is not just a regression model — it is a structured filter.


Part 4: Intuition — Why Lasso Produces Sparsity

The L1 penalty creates a sharp constraint geometry.
When optimization occurs under this constraint, solutions naturally land on coordinate axes.

The practical consequence is simple:

Unimportant features are dropped entirely.

This behavior is extremely valuable in quant trading, where fewer signals often outperform noisy combinations.


Part 5: Implementing Lasso in Python

We now implement Lasso using scikit-learn.

Imports

import numpy as np
import pandas as pd
from sklearn.linear_model import Lasso
from sklearn.preprocessing import StandardScaler

5.1 Generate Example Data

import numpy as np

np.random.seed(42)

X = np.random.randn(100, 10)
true_beta = np.array([3, 0, 0, 1.5, 0, 0, 0, 2, 0, 0])
y = X @ true_beta + np.random.randn(100) * 0.5

5.2 Standardize Features

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

5.3 Fit the Lasso Model

from sklearn.linear_model import Lasso
import pandas as pd

lasso = Lasso(alpha=0.1)
lasso.fit(X_scaled, y)
pd.Series(lasso.coef_)
the output:
0    2.85
1    0.00
2    0.00
3    1.42
4    0.00
5    0.00
6    0.00
7    1.95
8    0.00
9    0.00
dtype: float64
Noise features are eliminated automatically, while true signals are retained.

Part 6: The Role of Alpha (λ)

6.1 Effect of Regularization Strength

Small α → weak regularization → overfitting

Large α → aggressive shrinkage → underfitting

for a in [0.01, 0.1, 1.0]:
    model = Lasso(alpha=a)
    model.fit(X_scaled, y)
    print(a, (model.coef_ != 0).sum())
the output:
0.01 7
0.1  3
1.0  0

from sklearn.linear_model import LassoCV

lasso_cv = LassoCV(cv=5)
lasso_cv.fit(X_scaled, y)

lasso_cv.alpha_
lasso_cv.coef_
Cross-validation improves robustness across different market regimes.

Part 7: Limitations of Lasso

Lasso is not universally optimal:

Performs poorly when features are highly correlated

Cannot model non-linear interactions

Sensitive to outliers

Common remedies include:

Elastic Net (L1 + L2)

PCA + Lasso

Lasso for feature selection followed by non-linear models