Examples of Linear Algebra in Machine Learning

Linear algebra is a sub-field of mathematics concerned with vectors, matrices and linear transforms. It is a key foundation to the field of machine learning from notations used to describe the operation of algorithms, to the implementation of algorithms in code. Although linear algebra is integral to the field of machine learning, the tight relationship is often left unexplained or explained using abstract concepts such as vector spaces or specific matrix operations. In this chapter, you will discover 10 common examples of machine learning that you may be familiar with that use, require and are really best understood using linear algebra. After reading this chapter, you will know:

  • The use if linear algebra structures when working with data such as tabular datasets and images.
  • Linear algebra concepts when working with data preparation such as one hot encoding and dimensionality reduction.
  • The in-grained use of linear algebra notation and methods in sub-fields such as deep learning, natural language processing and recommender systems.

1.1 Overview

 In this chapter, we will review 10 obvious and concrete examples of linear algebra in machine learning. I tried to pick examples that you may be familiar with or have even worked with before. They are:

  • Dataset and Data Files
  • Images and Photographs
  • One Hot Encoding
  • Linear Regression
  • Regularization
  • Principal Component Analysis
  • Singular-Value Decomposition
  • Latent Semantic Analysis
  • Recommender Systems
  • Deep Learning

1.2 Dataset and Data Files

In machine learning, you fit a model on a dataset. This is the table like set of numbers where each row represents an observation and each column represents a feature of the observation. For example, below is a snippet of the Iris flowers dataset:

5.1,3.5,1.4,0.2,Iris-setosa

4.9,3.0,1.4,0.2,Iris-setosa

4.7,3.2,1.3,0.2,Iris-setosa

4.6,3.1,1.5,0.2,Iris-setosa

5.0,3.6,1.4,0.2,Iris-setosa

This data is in fact a matrix, a key data structure in linear algebra. Further, when you split the data into inputs and outputs to fit a supervised machine learning model, such as the measurements and the flower species, you have a matrix (X) and a vector (y). The vector is another key data structure in linear algebra. Each row has the same length, i.e. the same number of columns, therefore we can say that the data is vectorized where rows can be provided to a model one at a time or in batch and the model can be pre-configured to expect rows of a fixed width.

1.3 Images and Photographs

Perhaps you are more used to working with images or photographs in computer vision applications. Each image that you work with is itself a table structure with a width and height and one pixel value in each cell for black and white images or 3 pixel values in each cell for a color image. A photo is yet another example of a matrix from linear algebra. Operations on the image, such as cropping, scaling, shearing and so on are all described using the notation and operations of linear algebra.

1.4 One Hot Encoding

Sometimes you work with categorical data in machine learning. Perhaps the class labels for classification problems, or perhaps categorical input variables. It is common to encode categorical variables to make their easier to work with and learn by some techniques. A popular encoding for categorical variables is the one hot encoding. A one hot encoding is where a table is created to represent the variable with one column for each category and a row for each example in the dataset. A check or one-value is added in the column for the categorical value for a given row, and a zero-value is added to all other columns. For example, the variable color variable with the 3 rows:

red

green

blue

...

Might be encoded as:

red, green, blue

1,         0,         0

0,         1,         0

0,         0,         1

...

Each row is encoded as a binary vector, a vector with zero or one values and this is an example of a sparse representation, a whole sub-field of linear algebra.

1.5 Linear Regression

Linear regression is an old method from statistics for describing the relationships between variables. It is often used in machine learning for predicting numerical values in simpler regression problems. There are many ways to describe and solve the linear regression problem, i.e. finding a set of coefficients that when multiplied by each of the input variables and added together results in the best prediction of the output variable. If you have used a machine learning tool or library, the most common way of solving linear regression is via a least squares optimization that is solved using matrix factorization methods from linear regression, such as an LU decomposition or an singular-value decomposition or SVD. Even the common way of summarizing the linear regression equation uses linear algebra notation:

                                                y = A · b                                                                         (1.1)

Where y is the output variable A is the dataset and b are the model coefficients.

1.6 Regularization

In applied machine learning, we often seek the simplest possible models that achieve the best skill on our problem. Simpler models are often better at generalizing from specific examples to unseen data. In many methods that involve coefficients, such as regression methods and artificial neural networks, simpler models are often characterized by models that have smaller coefficient values. A technique that is often used to encourage a model to minimize the size of coefficients while it is being fit on data is called regularization. Common implementations include the L 2 and L 1 forms of regularization. Both of these forms of regularization are in fact a measure of the magnitude or length of the coefficients as a vector and are methods lifted directly from linear algebra called the vector norm.

1.7 Principal Component Analysis

Often a dataset has many columns, perhaps tens, hundreds, thousands or more. Modeling data with many features is challenging, and models built from data that include irrelevant features are often less skillful than models trained from the most relevant data. It is hard to know which features of the data are relevant and which are not. Methods for automatically reducing the number of columns of a dataset are called dimensionality reduction, and perhaps the most popular is method is called the principal component analysis or PCA for short. This method is used in machine learning to create projections of high-dimensional data for both visualization and for training models. The core of the PCA method is a matrix factorization method from linear algebra. The eigendecomposition can be used and more robust implementations may use the singular-value decomposition or SVD.

1.8 Singular-Value Decomposition

Another popular dimensionality reduction method is the singular-value decomposition method or SVD for short. As mentioned and as the name of the method suggests, it is a matrix factorization method from the field of linear algebra. It has wide use in linear algebra and can be used directly in applications such as feature selection, visualization, noise reduction and more. We will see two more cases below of using the SVD in machine learning.

 

1.9 Latent Semantic Analysis

In the sub-field of machine learning for working with text data called natural language processing, it is common to represent documents as large matrices of word occurrences. For example, the columns of the matrix may be the known words in the vocabulary and rows may be sentences, paragraphs, pages or documents of text with cells in the matrix marked as the count or frequency of the number of times the word occurred. This is a sparse matrix representation of the text. Matrix factorization methods such as the singular-value decomposition can be applied to this sparse matrix which has the effect of distilling the representation down to its most relevant essence. Documents processed in thus way are much easier to compare, query and use as the basis for a supervised machine learning model. This form of data preparation is called Latent Semantic Analysis or LSA for short, and is also known by the name Latent Semantic Indexing or LSI.

1.10 Recommender Systems

Predictive modeling problems that involve the recommendation of products are called recommender systems, a sub-field of machine learning. Examples include the recommendation of books based on previous purchases and purchases by customers like you on Amazon, and the recommendation of movies and TV shows to watch based on your viewing history and viewing history of subscribers like you on Netflix. The development of recommender systems is primarily concerned with linear algebra methods. A simple example is in the calculation of the similarity between sparse customer behavior vectors using distance measures such as Euclidean distance or dot products. Matrix factorization methods like the singular-value decomposition are used widely in recommender systems to distill item and user data to their essence for querying and searching and comparison.

 3.11 Deep Learning

Artificial neural networks are nonlinear machine learning algorithms that are inspired by elements of the information processing in the brain and have proven effective at a range of problems not least predictive modeling. Deep learning is the recent resurged use of artificial neural networks with newer methods and faster hardware that allow for the development and training of larger and deeper (more layers) networks on very large datasets. Deep learning methods are routinely achieve state-of-the-art results on a range of challenging problems such as machine translation, photo captioning, speech recognition and much more. At their core, the execution of neural networks involves linear algebra data structures multiplied and added together. Scaled up to multiple dimensions, deep learning methods work with vectors, matrices and even tensors of inputs and coefficients, where a tensor is a matrix with more than two dimensions. Linear algebra is central to the description of deep learning methods via matrix notation to the implementation of deep learning methods such as Google’s TensorFlow Python library that has the word ”tensor” in its name.

1.12 Summary

In this chapter, you discovered 10 common examples of machine learning that you may be familiar with that use and require linear algebra. Specifically, you learned:

  • The use of linear algebra structures when working with data such as tabular datasets and images.
  • Linear algebra concepts when working with data preparation such as one hot encoding and dimensionality reduction.
  • The in-grained use of linear algebra notation and methods in sub-fields such as deep learning, natural language processing and recommender systems.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Linear algebra is a pillar of machine learning. You cannot develop a deep understanding and application of machine learning without it. In this new laser-focused Ebook written in the friendly Machine Learning Mastery style that you’re used to, you will finally cut through the equations, Greek letters, and confusion, and discover the topics in linear algebra that you need to know. Using clear explanations, standard Python libraries, and step-by-step tutorial lessons, you will discover what linear algebra is, the importance of linear algebra to machine learning, vector, and matrix operations, matrix factorization, principal component analysis, and much more. This book was designed to be a crash course in linear algebra for machine learning practitioners. Ideally, those with a background as a developer. This book was designed around major data structures, operations, and techniques in linear algebra that are directly relevant to machine learning algorithms. There are a lot of things you could learn about linear algebra, from theory to abstract concepts to APIs. My goal is to take you straight to developing an intuition for the elements you must understand with laser-focused tutorials. I designed the tutorials to focus on how to get things done with linear algebra. They give you the tools to both rapidly understand and apply each technique or operation. Each tutorial is designed to take you about one hour to read through and complete, excluding the extensions and further reading. You can choose to work through the lessons one per day, one per week, or at your own pace. I think momentum is critically important, and this book is intended to be read and used, not to sit idle. I would recommend picking a schedule and sticking to it.

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值