How Investment Analysts Reduce Complexity in 2025, but Not Optimally
- Samuel Fernández Lorenzo

- Sep 16
- 5 min read
Updated: Oct 19
Financial analysts and investment managers constantly face complex, high-dimensional data sets (and if you're reading this, you've probably experienced a similar problem). To handle this complexity, they often resort to dimensional reduction techniques, but frequently confuse methods that have fundamentally different purposes. Let's examine this.
The Analyst's Dilemma: Simplify Without Losing Key Information
In this article, I'll analyze two common approaches: principal component analysis (PCA) and variable selection. Do you know which one is appropriate for your objective? Confusing these methods can lead to models that don't explain anything to your clients or fail to capture the correct market signals. Below, we break down both approaches, their critical differences, and how to choose the right one.
Variable Selection: Maintain Clarity
Through variable selection, an analyst simply selects a relevant subset of indicators from a larger set of variables, eliminating those with little predictive or explanatory value. For a financial analyst, these might be things like the price/earnings ratio, momentum, or volatility, analogous to how a doctor looks at metrics like glucose, cholesterol, or blood pressure.
This method is ideal because it gives us:
Simpler and more interpretable models: Uses variables that analysts and regulators recognize, facilitating communication.
Reduces overfitting: Prevents models from confusing market "noise" with real signals.
Increases efficiency: Crucial for real-time analysis and algorithmic execution.
Eliminates redundancy and multicollinearity: We discard variables that provide duplicate information.
Improves visualization: Makes it easier to identify patterns in charts or dashboards.
Discovers key drivers: Reveals the factors that truly drive performance.
PCA: Mathematical Power, Yes, but at a Cost
Principal Component Analysis (PCA) reduces dimensionality by mathematically transforming your original variables into new "principal components" (PCs) that are linear combinations of the original variables. These components are numerically elegant because:
They are orthogonal to each other, eliminating multicollinearity.
They are ordered by the amount of variance they capture.
They represent the market with fewer dimensions.

But here's the problem: PCA creates synthetic variables that have no direct interpretation. And this has a critical disadvantage compared to variable selection: it sacrifices interpretability.
Imagine you're a manager explaining your strategy to an institutional client. If you use variable selection, you could indicate that your model is based on things like the price/earnings ratio, volatility, or price trends. An investor will perfectly understand these factors and can interpret the model. In contrast, if you apply PCA, you'll end up with variables like "PC1 = 0.4P/E + 0.7momentum - 0.3*volatility + ..." How do you explain that to a client or an investment committee? These components, although mathematically optimal, lack intuitive financial meaning and make it difficult to justify investment decisions.
Similarly, imagine now that you're working with medical biomarkers to predict diseases. By applying variable selection, you could identify that glucose levels, cholesterol, and blood pressure are the most relevant indicators. In contrast, using PCA you'll end up with components like "PC1 = 0.4glucose + 0.7blood_pressure - 0.3*cholesterol + ..." How do you explain that to a patient or even to other medical professionals?
The Sandwich Analogy for Understanding PCA
Imagine you have 20 ingredients for a sandwich (ham, cheese, lettuce, tomato, avocado, etc.), but of course, you can't use them all at once... What do you do?
Variable selection: You choose the best ingredients (perhaps ham and cheese) and eliminate the unnecessary ones (like pickles). You maintain the original essence.
PCA: You mix all the ingredients into a few "super-ingredients," such as a "ham-cheese paste" (PC1), a "vegetable mix" (PC2), and a "special sauce" (PC3). You simplify, but lose the ability to identify the original ingredients.
Note that to prepare these super-ingredients, you still need ALL the original ingredients, so if someone asks you to prepare that sandwich for them, you'll still need to buy all 20 original ingredients.
PCA in Index Replication: Solution or Problem?
To better understand the practical limitations, let's consider a common financial application of PCA: stock index replication (index tracking).
An index like the S&P 500 contains 500 stocks, making it expensive to replicate completely. Using PCA, quantitative analysts identify principal components that capture most of the index's variability, and with this, they try to build a smaller portfolio that attempts to replicate its behavior.
However, this is not so straightforward, because we still have the problem of deciding which instruments enter the portfolio—synthetic variables don't work! A method that could use PCA to support the selection of financial instruments would be as follows:
Step 1: Apply PCA to the historical returns of all stocks in the index
Step 2: For each principal component, select the instrument that has the highest correlation with that component, which will serve as its "representative."
Step 3: Retain only the principal components that explain a significant percentage of the total variance (for example, those that cover 90%).
Step 4: Use only the selected instruments to build an optimized portfolio that attempts to replicate the behavior of the complete index.
This method presents at least three important limitations that deserve critical consideration:
Does not incorporate the tracking error minimization objective: PCA focuses on capturing variance, not on specifically minimizing the difference between the index performance and the replicating portfolio, which is the true objective of index tracking.
Linearity assumption: PCA assumes linear relationships between variables, when financial markets often exhibit non-linear behaviors, especially during stress periods.
Ignorance of transaction costs: When selecting assets based purely on statistical criteria, PCA has no way to consider transaction costs, liquidity, and other practical constraints of the real financial world.
New Frontiers: Advanced Optimization for Asset Selection
The fundamental challenge for portfolio managers is that the number of possible asset combinations grows exponentially with the size of the investment universe. For a universe with just 20 stocks taken from the NASDAQ 100, there are already TRILLIONS of possible portfolios!
Cutting-edge quantitative investment firms are nowadays working with startups like Inspiration-Q applying combinatorial optimization techniques inspired by statistical physics. These techniques, which mimic the cooling process of materials, can efficiently explore the investment solution space, avoiding local optima through probabilistic transitions controlled by a gradually decreasing "temperature."
These methods excel in building baskets that balance profitability and risk, where metrics like the Sharpe ratio or tracking error can serve as an optimization objective, significantly improving the asset selection process compared to conventional numerical shortcuts like PCA.
Conclusion: Choose the Right Tool
Use PCA when you need to compress data, eliminate correlations, or visualize market structures, but don't mind sacrificing interpretability. And let's clarify something important: PCA is not really a variable selection method.
Opt for variable selection when you seek clarity, explainability, and models that connect with clients or regulators.
The key is understanding the strengths and limitations of each method to apply the correct one according to your objective.
Have you used PCA or variable selection in your financial analyses? What results did you get? Share your experience in the comments!



Comments