From Swirling Fluids to Hidden Patterns: Unveiling Insights with Singular Value Decomposition

Shubham Goswami
Level Up Coding
Published in
11 min readMay 5, 2024

--

Fluid simulations generate a mountain of data, but how do you extract the gems of information hidden within? Enter Singular Value Decomposition (SVD)! This powerful tool acts like a decoder ring for complex datasets. In this blog, we’ll delve into the world of SVD, exploring how it tackles challenges in fluid dynamics, data analysis, and machine learning. We’ll uncover how SVD helps researchers sift through the noise and discover the key patterns driving fluid behavior.

SVD from math3ma

Data often holds hidden secrets, waiting to be revealed. This rings true across various domains, including Computational Fluid Dynamics (CFD). Consider a highly turbulent flow — comprehending its intricate patterns necessitates the application of statistical methods to unveil inherent flow dynamics like quasi-periodic vortex shedding. This principle extends to big data analytics. For instance, imagine seeking patterns in gene sequences to deepen our understanding of molecular life or analyzing senate roll call votes to decipher political shifts. Even in Natural Language Processing (NLP) data mining, unraveling complex patterns demands sophisticated approaches, often overwhelming for newcomers to machine learning. Amidst this complexity, a simple yet powerful tool emerges: Singular Value Decomposition (SVD).

Singular Value Decomposition, a matrix decomposition method in linear algebra, finds wide-ranging applications in Data Science, Data Analysis, Fluid Dynamics, and Machine Learning. It forms the bedrock of various dimensionality reduction algorithms, illuminating the substructure of high-dimensional datasets with clarity. By reducing complex data to a lower-dimensional space, SVD unveils dominant sub-structures while organizing data based on variability.

Substructures of the universe

In fluid dynamics, SVD serves as the cornerstone of numerous Modal Decomposition algorithms, including Proper Orthogonal Decomposition, Dynamic Mode Decomposition, and Spectral Proper Orthogonal Decomposition. Therefore, instead of retracing fundamental concepts, this article marks the inception of a series dedicated to Modal Decomposition methods, commencing with a comprehensive exploration of Singular Value Decomposition.

Lets begin !!!

SVD : Overview

Singular Value Decomposition (SVD) stands as one of the most crucial matrix factorization methods, serving as the cornerstone for a myriad of data analysis techniques. Offering a numerically stable matrix decomposition, SVD boasts versatility across various applications and is guaranteed to exist — an assurance particularly absent in other methods. In essence, SVD extends the concept of Fast Fourier Transform (FFT) by providing a basis tailored to specific data, unlike FFT’s generic basis, which may lack relevance to the application at hand.

Moreover, grappling with high dimensionality presents a common challenge in processing large datasets sourced from intricate systems, such as neural recordings from the brain or fluid velocity measurements from experiments or simulations. In such scenarios, data often reveals dominant patterns, indicative of a low-dimensional manifold. Complex fluid systems like the Earth’s atmosphere or turbulent wakes behind vehicles exemplify this phenomenon, showcasing low-dimensional structures underlying high-dimensional state-spaces. Despite the necessity for millions or even billions of degrees of freedom in high-fidelity fluid simulations, discernible coherent structures — such as periodic vortex shedding or hurricanes — persist. Singular Value Decomposition offers a systematic approach to delineating a low-dimensional approximation to high-dimensional data, spotlighting dominant structures or patterns with precision.

Crucially, SVD is inherently data-driven, eschewing additional filters or intuitions, and relies solely on insights derived from the data itself. Furthermore, SVD furnishes a hierarchical representation of data, organizing correlations from the most dominant to the least dominant patterns. Unlike eigenvalue decomposition, which is confined to square matrices, SVD transcends such limitations, offering a universal solution applicable to matrices of any size. This remarkable combination of attributes solidifies SVD’s status as a cornerstone technique in data analysis, offering unparalleled insights into complex datasets across diverse fields.

SVD : The Math

What is this Data?

Data embodies information pertinent to its domain. In the realm of fluid dynamics, data manifests as velocity or pressure; in political roll-calls, it comprises polling information like yes or no votes; in educational settings, it encompasses attendance records or the distribution of students by gender or age. Across various domains, data often organically organizes into large matrices or arrays.

Consider a time-series dataset originating from an experiment or simulation — such as the flow around a cylinder in one dimension — arranged into a matrix where each column encapsulates measurements at a distinct time. In multidimensional scenarios, like three-dimensional fluid flow data, the dataset may be reshaped or flattened into a high-dimensional column vector, forming the columns of a vast matrix with rows delineating the temporal evolution of the data.

Let’s denote this matrix as X, where columns x1, x2, …, xn represent measurements sourced from simulations or experiments. These columns could signify the evolving state of a physical system over time, like fluid flow around a cylinder, neural measurements, or senate polling data over a year. Within each column, the state dimension, denoted as m, is often immensely large — potentially spanning millions or billions of degrees of freedom in numerical simulations.

For the purposes of this article, let’s envision X as a time-series dataset. The columns of X are termed “snapshots,” with n denoting the total number of snapshots captured. Meanwhile, the rows encapsulate temporal instances of data, each with a dimensionality of m. In most complex systems, m >> n, resulting in a tall-skinny matrix. This unique structure underscores the intricacies and challenges inherent in handling large-scale, multidimensional datasets across various fields of study.

Eigen decomposition

The foundation of Singular Value Decomposition (SVD) lies in the Eigen decomposition of a matrix. Assuming familiarity with basic mathematical concepts like points, lines, spaces, vectors, matrices, and tensors, let’s delve into understanding Eigen decomposition.

Eigen decomposition is an operation that disassembles a matrix into its constituent eigenvalues and eigenvectors. Let’s consider M as a square matrix with dimensions n x n. This matrix satisfies a linear equation represented as:

Here, λ represents a matrix of eigenvalues, and v denotes a matrix of eigenvectors. Solving for eigenvalues and eigenvectors involves treating the matrix as a system of linear equations and determining the values of the variables comprising the components of the eigenvector.

The linear system of equations can be solved using:

Here, I represents the identity matrix, a square matrix with entries on the diagonal equal to 1 and all other entries equal to zero.

Next, applying the necessary condition for solving this system:

We can solve this to obtain the matrices of eigenvalues and eigenvectors, such that:

Let’s illustrate this process with an example. Consider a matrix M defined as:

Linearizing this matrix through eigen decomposition yields:

We can solve for λ, v1, and v2 by forming the system of equations:

Rearranging the equations, we get:

The necessary condition for solving this system requires the determinant of the coefficient matrix, shown below, to be zero:

By solving this determinant, we can find the eigenvalues:

Eigenvectors corresponding to these eigenvalues can be found by substituting λ back into the equations and solving for v1 and v2. For example, finding the eigenvector corresponding to λ = 3 involves:

There are infinite values for v1 satisfying this equation, with the only restriction being that not all components in an eigenvector can be zero. Thus, for v1 = 1, v2 = 1, an eigenvector corresponding to λ = 3 is [1, 1]. Similarly, for λ = 1, with v1 = 1, the eigenvector is [1, -1].

This elucidates the process of Eigen decomposition, offering a foundational understanding crucial for comprehending Singular Value Decomposition.

Singular Value Decomposition

Similar to eigen decomposition, Singular Value Decomposition (SVD) is rooted in a fundamental theorem from linear algebra, asserting that a rectangular matrix M can be deconstructed into the product of three matrices: an orthogonal matrix U, a diagonal matrix S, and the transpose of another orthogonal matrix V. The theorem is typically presented as follows:

In this formulation, the orthogonal matrices U and V adhere to the identity:

The columns of U represent orthonormal eigenvectors of the matrix [AA^T], while the columns of V denote orthonormal eigenvectors of [A^T A]. The matrix S is diagonal, housing the square roots of eigenvalues extracted from either U or V, arranged in descending order. These eigenvectors are transformed into column vectors within a matrix, ordered based on the magnitudes of the corresponding eigenvalues. Specifically, the eigenvector linked to the largest eigenvalue occupies the first column, followed by the eigenvector associated with the next largest eigenvalue in the second column, and so forth until the smallest eigenvalue’s eigenvector fills the final column of the matrix.

Let’s elucidate this concept further using an example. Consider a matrix M defined as:

Now, of course, we’ll proceed to determine the eigenvalues and eigenvectors of this matrix. However, this time, let’s leverage Python for the task!

Fire up a Jupyter notebook or a Python script file, as per your preference, and follow along with the subsequent steps:

### Import modules
import numpy as np
from scipy.linalg import svd, eig

### Matrix M
M = np.array([[0.9501, 0.8913, 0.8214, 0.9218],
[0.2311, 0.7621, 0.4447, 0.7382],
[0.6068, 0.4565, 0.6154, 0.1763],
[0.4860, 0.0185, 0.7919, 0.4057]])
print(M)

### Output

#[[0.9501 0.8913 0.8214 0.9218]
# [0.2311 0.7621 0.4447 0.7382]
# [0.6068 0.4565 0.6154 0.1763]
# [0.486 0.0185 0.7919 0.4057]]

### Perform Singular value decomposition

U, S, VT = svd(M)

### Print the matrices

print('U = ', U)
print('S = ', S)
print('VT = ', VT)
```

After executing the previous commands, the output should appear as follows:
```python
U = [[-0.73014043 -0.12412396 0.1898962 0.64453675]
[-0.44134984 -0.63344244 -0.37882904 -0.51034259]
[-0.38091453 0.3253653 0.65765711 -0.56260881]
[-0.35638377 0.69100024 -0.62282832 -0.08714441]]

S = [2.44784832 0.67159729 0.36455585 0.19273877]

VT = [[-0.49024435 -0.47699339 -0.53624267 -0.49455189]
[ 0.40044691 -0.64334001 0.54167226 -0.36379536]
[ 0.51911046 0.46425538 -0.27699488 -0.66201595]
[ 0.57430574 -0.37822993 -0.58507161 0.42989097]]

Now, at this point, you can conduct further checks. For instance, you can verify if the orthonormal matrix U adheres to the identities provided earlier by executing print(U @ U.T). The expected output should be:

[[ 1.00000000e+00  1.06034385e-16  2.55328011e-16  1.07556976e-16]
[ 1.06034385e-16 1.00000000e+00 -4.40759880e-17 1.46502920e-16]
[ 2.55328011e-16 -4.40759880e-17 1.00000000e+00 -2.51669240e-16]
[ 1.07556976e-16 1.46502920e-16 -2.51669240e-16 1.00000000e+00]]

Next, let’s conduct another test by evaluating if the multiplication of U, S, and VT matrices results in M or not:

U @ np.diag(S) @ VT

The output of this operation should be:

array([[0.9501, 0.8913, 0.8214, 0.9218],
[0.2311, 0.7621, 0.4447, 0.7382],
[0.6068, 0.4565, 0.6154, 0.1763],
[0.486 , 0.0185, 0.7919, 0.4057]])

Reduced order representation

The most defining and useful property of Singular Value Decomposition is that it provides an optimal low-rank approximation of the complex data matrix X. In fact, SVD can offer a hierarchy of low-rank approximations, as the rank of approximation, r, is obtained by retaining the leading r singular values and vectors while discarding the rest. This principle, often referred to as the Gram-Schmidt theorem or the Eckart-Young theorem, can be summarized as follows:

The optimal rank-r approximation to X, in a least-squares sense, is given by the truncated SVD approximation X_Hat such that:

This equation underscores the significance of the truncated SVD approximation X_Hat, indicating that for a given rank r, there exists no superior approximation of X. Thus, high-dimensional data can be effectively described by capturing only a few dominant patterns, facilitating efficient data analysis and interpretation.

SVD : Take-aways

Singular Value Decomposition (SVD) offers a versatile framework that can be interpreted from three interrelated perspectives, each shedding light on its utility in data analysis. Firstly, it serves as a method for transforming correlated variables into a set of uncorrelated ones, thereby revealing the intricate relationships among the original data items. Secondly, SVD aids in identifying and prioritizing the dimensions along which data points exhibit the most variation, facilitating a nuanced understanding of the underlying data structure. Thirdly, leveraging this understanding of variation, SVD enables the generation of optimal approximations of the original data using fewer dimensions, effectively serving as a tool for data reduction.

(Left) Best Fit and (right) Un-correlated data

To illustrate these concepts, consider a scenario with 2-dimensional data points (Figure on the left). The regression lines depicted in the figures highlight the best approximations of the original data using 1-dimensional objects (lines). These regression lines minimize the distance between each original point and the line, capturing as much variation as possible along the respective dimensions. By generating perpendicular lines from each point to the regression line (figure on the right), a reduced representation of the original data is obtained, preserving the key variations inherent in the dataset. Additionally, these regression lines can be utilized to derive a set of uncorrelated data points, revealing subgroupings in the original data that may not be immediately apparent.

In essence, SVD encapsulates meaningful information encoded in the singular vectors or eigenvectors (U), while the singular values or eigenvalues (S) quantify the importance of this information. For instance, in the context of a recommender system, a truncated SVD of a matrix representing people and their movie choices can offer valuable insights. By retaining only the top largest singular values, the diagonal matrix S operates within a low-dimensional feature space, enabling efficient compression and extraction of essential information from the data. This approach has been exemplified in recommendation systems like Netflix’s, which utilize SVD to enhance content recommendations for users.

Take-Away: SVD serves as a powerful analytical tool with diverse applications in data analysis, facilitating dimensionality reduction, pattern identification, and information extraction from complex datasets.

In the next post, we’ll delve deeper into Singular Value Decomposition (SVD) by exploring real-world examples that showcase its practical applications across various domains. From image compression and recommendation systems to fluid dynamics simulations and data analysis, we’ll demonstrate how SVD empowers analysts and researchers to extract valuable insights from complex datasets.

Through a series of hands-on examples and case studies, readers will gain a comprehensive understanding of how SVD can be implemented to address diverse challenges in data science, engineering, and beyond. Whether it’s reducing the dimensionality of high-dimensional data, uncovering hidden patterns, or optimizing computational processes, SVD offers a powerful toolkit for tackling a wide range of analytical tasks.

--

--

MechEng Ph.D. | CFD wiz! Unraveling bluff body wakes & flow control with Python, OpenFOAM & HPC. Innovation driven. #Research