Introducing: Machine Learning in R(转)

Machine learning is a branch in computer science that studies the design of algorithms that can learn. Typical machine learning tasks are concept learning, function learning or “predictive modeling”, clustering and finding predictive patterns. These tasks are learned through available data that were observed through experiences or instructions, for example. Machine learning hopes that including the experience into its tasks will eventually improve the learning. The ultimate goal is to improve the learning in such a way that it becomes automatic, so that humans like ourselves don't need to interfere any more.

Machine learning has close ties with Knowledge Discovery, Data Mining, Artificial Intelligence and Statistics. Typical applications of machine learning can be classified into scientific knowledge discovery and more commercial applications, ranging from the “Robot Scientist” to anti-spam filtering and recommender systems.

This small tutorial is meant to introduce you to the basics of machine learning in R: it will show you how to use R to work with the well-known machine learning algorithm called “KNN” or k-nearest neighbors.

Using R For k-Nearest Neighbors (KNN)

The KNN or k-nearest neighbors algorithm is one of the simplest machine learning algorithms and is an example of instance-based learning, where new data are classified based on stored, labeled instances. More specifically, the distance between the stored data and the new instance is calculated by means of some kind of a similarity measure. This similarity measure is typically expressed by a distance measure such as the Euclidean distance, cosine similarity or the Manhattan distance. In other words, the similarity to the data that was already in the system is calculated for any new data point that you input into the system. Then, you use this similarity value to perform predictive modeling. Predictive modeling is either classification, assigning a label or a class to the new instance, or regression, assigning a value to the new instance. Whether you classify or assign a value to the new instance depends of course on your how you compose your model with KNN.

The k-nearest neighbor algorithm adds to this basic algorithm that after the distance of the new point to all stored data points has been calculated, the distance values are sorted and the k-nearest neighbors are determined. The labels of these neighbors are gathered and a majority vote or weighted vote is used for classification or regression purposes. In other words, the higher the score for a certain data point that was already stored, the more likely that the new instance will receive the same classification as that of the neighbor. In the case of regression, the value that will be assigned to the new data point is the mean of its k nearest neighbors.


Video Player
 
 
 

Step One. Get Your Data

Machine learning typically starts from observed data. You can take your own data set or browse through other sources to find one.

Built-in Datasets of R

This tutorial makes use of the Iris data set, which is well-known in the area of machine learning. This dataset is built into R, so you can take a look at this dataset by typing the following into your console:

iris

UC Irvine Machine Learning Repository

If you want to download the data set instead of using the one that is built into R, you can go to theUC Irvine Machine Learning Repository and look up the Iris data set.

Tip not only check out the data folder of the Iris data set, but also take a look at the data description page!

Then, load in the data set with the following command:

iris <- read.csv(url("http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"), header = FALSE) 

The command reads the .csv or “Comma Separated Value” file from the website. The headerargument has been put to FALSE, which means that the Iris data set from this source does not give you the attribute names of the data.

Instead of the attribute names, you might see strange column names such as “V1” or “V2”. Those are set at random. To simplify the working with the data set, it is a good idea to make one yourself: you can do this through the function names(), which gets or sets the names of an object. Concatenate the names of the attributes as you would like them to appear. For the Iris data set, you can use the following R command:

names(iris) <- c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species") 

Step Two. Know Your Data

Now that you have loaded the Iris data set into RStudio, you should try to get a thorough understanding of what your data is about. Just looking or reading about your data is certainly not enough to get started!

Initial Overview Of The Data Set

First, you can already try to get an idea of your data by making some graphs, such as histograms or boxplots. In this case, however, scatter plots can give you a great idea of what you're dealing with: it can be interesting to see how much one variable is affected by another. In other words, you want to see if there is any correlation between two variables.

You can make scatterplots with the ggvis package, for example. You first need to load the ggvispackage:

library(ggvis)
iris %>% ggvis(~Sepal.Length, ~Sepal.Width, fill = ~Species) %>% layer_points()

plot_768312428

You see that there is a high correlation between the sepal length and the sepal width of the Setosa iris flowers, while the correlation is somewhat less high for the Virginica and Versicolor flowers.

The scatter plot that maps the petal length and the petal width tells a similar story:

iris %>% ggvis(~Petal.Length, ~Petal.Width, fill = ~Species) %>% layer_points()

plot_675020181

You see that this graph indicates a positive correlation between the petal length and the petal width for all different species that are included into the Iris data set.

iris <- read.csv(url("http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"), header = FALSE) 

Tip are you curious about ggvis, graphs or histograms in particular? Check out our tutorial on histograms and/or our course on ggvis.

After a general visualized overview of the data, you can also view the data set by entering

iris

However, as you will see from the result of this command, this really isn't the best way to inspect your data set thoroughly: the data set takes up a lot of space in the console, which will impede you from forming a clear idea about your data. It is therefore a better idea to inspect the data set by executing

head(iris)

or

str(iris)

Tip try both of these commands out to see the difference!.

Note that the last command will help you to clearly distinguish the data type num and the three levels of the Species attribute, which is a factor. This is very convenient, since many R machine learning classifiers require that the target feature is coded as a factor.

Remember: factor variables represent categorical variables in R. They can thus take on a limited number of different values.

A quick look at the Species attribute through tells you that the division of the species of flowers is 50-50-50:

table(iris$Species) 

If you want to check the percentual division of the Species attribute, you can ask for a table of proportions:

round(prop.table(table(iris$Species)) * 100, digits = 1)

Note that the round argument rounds the values of the first argument,prop.table(table(iris$Species))*100 to the specified number of digitis, which is one digit after the decimal point. You can easily adjust this by changing the value of the digitsargument.

Profound Understanding Of Your Data

Let's not remain on this high-level overview of the data! R gives you the opportunity to go more in-depth with the summary() function. This will give you the minimum value, first quantile, median, mean, third quantile and maximum value of the data set Iris for numeric data types. For the class variable, the count of factors will be returned:

summary(iris) 
##   Sepal.Length   Sepal.Width    Petal.Length   Petal.Width 
##  Min.   :4.30   Min.   :2.00   Min.   :1.00   Min.   :0.1  
##  1st Qu.:5.10   1st Qu.:2.80   1st Qu.:1.60   1st Qu.:0.3  
##  Median :5.80   Median :3.00   Median :4.35   Median :1.3  
## Mean :5.84 Mean :3.05 Mean :3.76 Mean :1.2 ## 3rd Qu.:6.40 3rd Qu.:3.30 3rd Qu.:5.10 3rd Qu.:1.8 ## Max. :7.90 Max. :4.40 Max. :6.90 Max. :2.5 ## Species ## Iris-setosa :50 ## Iris-versicolor:50 ## Iris-virginica :50 ## ## ## 

You can also refine your summary overview by adding specific attributes to the command that was presented above:

summary(iris[c("Petal.Width", "Sepal.Width")])

As you can see, the c() function is added to the original command: the columns petal widthand sepal width are concatenated and a summary is then asked of just these two columns of the Iris data set.

Step Three. Where to Go Now.

After you have acquired a good understanding of your data, you have to decide on the use cases that would be relevant for your data set. In other words, you think about what your data set might teach you or what you think you can learn from your data. From there on, you can think about what kind of algorithms you would be able to apply to your data set in order to get the results that you think you can obtain.

Tip keep in mind that the more familiar you are with your data, the easier it will be to assess the use cases for your specific data set. The same also holds for finding the appropriate machine algorithm.

For this tutorial, the Iris data set will be used for classification, which is an example of predictive modeling. The last attribute of the data set, Species, will be the target variable or the variable that we want to predict in this example.

Note that the round that you can also take one of the numerical classes as the target variable if you want to use KNN to do regression.

Step Four. Prepare Your Workspace

Many of the algorithms used in machine learning are not incorporated by default into R. You will most probably need to download the packages that you want to use when you want to get started with machine learning.

Tip got an idea of which learning algorithm you may use, but not of which package you want or need? You can find a pretty complete overview of all the packages that are used in R right here.

To illustrate the KNN algorithm, this tutorial works with the package class. You can type in

library(class)

If you don't have this package yet, you can quickly and easily do so by typing

install.packages("<package name>")

if you're not sure if you have this package, you can run the following command to find out!

any(grepl("<name of your package>", installed.packages()))

Step Five. Prepare Your Data

Normalization

As a part of your data preparation, you might need to normalize your data so that its consistent. For this introductory tutorial, just remember that normalization makes it easier for the KNN algorithm to learn. There are two types of normalization:

  • example normalization is the adjustment of each example individually,
  • feature normalization indicates that you adjust each feature in the same way across all examples.

So when do you need to normalize your dataset? In short: when you suspect that the data is not consistent. You can easily see this when you go through the results of the summary() function. Look at the minimum and maximum values of all the (numerical) attributes. If you see that one attribute has a wide range of values, you will need to normalize your dataset, because this means that the distance will be dominated by this feature. For example, if your dataset has just two attributes, X and Y, and X has values that range from 1 to 1000, while Y has values that only go from 1 to 100, then Y's influence on the distance function will usually be overpowered by X's influence. When you normalize, you actually adjust the range of all features, so that distances between variables with larger ranges will not be over-emphasised.

Tip go back to the result of summary(iris) and try to figure out if normalization is necessary.

The Iris data set doesn't need to be normalized: the Sepal.Length attribute has values that go from 4.3 to 7.9 and Sepal.Width contains values from 2 to 4.4, while Petal.Length's values range from 1 to 6.9 and Petal.Width goes from 0.1 to 2.5. All values of all attributes are contained within the range of 0.1 and 7.9, which you can consider acceptable.

Nevertheless, it's still a good idea to study normalization and its effect, especially if you're new to machine learning. You can perform feature normalization, for example, by first making your ownnormalize function:

normalize <- function(x) {
num <- x - min(x)
denom <- max(x) - min(x)
return (num/denom)
}

You can then use this argument in another command, where you put the results of the normalization in a data frame through as.data.frame() after the function lapply() returns a list of the same length as the data set that you give in. Each element of that list is the result of the application of the normalize argument to the data set that served as input:

YourNormalizedDataSet <- as.data.frame(lapply(YourDataSet, normalize))

For the Iris dataset, you would have applied the normalize argument on the four numerical attributes of the Iris data set (Sepal.LengthSepal.WidthPetal.Length,Petal.Width) and put the results in a data frame:

iris_norm <- as.data.frame(lapply(iris[1:4], normalize))

To more thoroughly illustrate the effect of normalization on the data set, compare the following result to the summary of the Iris data set that was given in step two:

summary(iris_norm)
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width    
##  Min.   :0.000   Min.   :0.000   Min.   :0.000   Min.   :0.0000  
##  1st Qu.:0.222   1st Qu.:0.333   1st Qu.:0.102   1st Qu.:0.0833  
##  Median :0.417   Median :0.417   Median :0.568   Median :0.5000  
## Mean :0.429 Mean :0.439 Mean :0.468 Mean :0.4578 ## 3rd Qu.:0.583 3rd Qu.:0.542 3rd Qu.:0.695 3rd Qu.:0.7083 ## Max. :1.000 Max. :1.000 Max. :1.000 Max. :1.0000 

Training And Test Sets

In order to assess your model's performance later, you will need to divide the data set into two parts: a training set and a test set. The first is used to train the system, while the second is used to evaluate the learned or trained system. In practice, the division of your data set into a test and a training sets is disjoint: the most common splitting choice is to take 2/3 of your original data set as the training set, while the 1/3 that remains will compose the test set.

One last look on the data set teaches you that if you performed the division of both sets on the data set as is, you would get a training class with all species of “Setosa” and “Versicolor”, but none of “Virginica”. The model would therefore classify all unknown instances as either “Setosa” or “Versicolor”, as it would not be aware of the presence of a third species of flowers in the data. In short, you would get incorrect predictions for the test set.

You thus need to make sure that all three classes of species are present in the training model. What's more, the amount of instances of all three species needs to be present at more or less the same ratio as in your original data set.

To make your training and test sets, you first set a seed. This is a number of R's random number generator. The major advantage of setting a seed is that you can get the same sequence of random numbers whenever you supply the same seed in the random number generator.

set.seed(1234)

Then, you want to make sure that your Iris data set is shuffled and that you have the same ratio between species in your training and test sets. You use the sample() function to take a sample with a size that is set as the number of rows of the Iris data set, or 150. You sample with replacement: you choose from a vector of 2 elements and assign either 1 or 2 to the 150 rows of the Iris data set. The assignment of the elements is subject to probability weights of 0.67 and 0.33.

ind <- sample(2, nrow(iris), replace=TRUE, prob=c(0.67, 0.33))

Note that the replace argument is set to TRUE: this means that you assign a 1 or a 2 to a certain row and then reset the vector of 2 to its original state. This means that, for the next rows in your data set, you can either assign a 1 or a 2, each time again. The probability of choosing a 1 or a 2 should not be proportional to the weights amongst the remaining items, so you specify probability weights.

Remember that you want your training set to be 2/3 of your original data set: that is why you assign “1” with a probability of 0.67 and the “2"s with a probability of 0.33 to the 150 sample rows.

You can then use the sample that is stored in the variable ind to define your training and test sets:

iris.training <- iris[ind==1, 1:4]
iris.test <- iris[ind==2, 1:4] 

Note that, in addition to the 2/3 and 1/3 proportions specified above, you don't take into account all attributes to form the training and test sets. Specifically, you only take Sepal.Length,Sepal.WidthPetal.Length and Petal.Width. This is because you actually want to predict the fifth attribute, Species: it is your target variable. However, you do want to include it into the KNN algorithm, otherwise there will never be any prediction for it. You therefore need to store the class labels in factor vectors and divide them over the training and test sets.

iris.trainLabels <- iris[ind==1, 5]
iris.testLabels <- iris[ind==2, 5]

Step Six. The Actual KNN Model

Building Your Classifier

After all these preparation steps, you have made sure that all your known (training) data is stored. No actual model or learning was performed up until this moment. Now, you want to find the knearest neighbors of your training set.

An easy way to do these two steps is by using the knn() function, which uses the Euclidian distance measure in order to find the k-nearest neighbours to your new, unknown instance. Here, the k parameter is one that you set yourself. As mentioned before, new instances are classified by looking at the majority vote or weighted vote. In case of classification, the data point with the highest score wins the battle and the unknown instance receives the label of that winning data point. If there is an equal amount of winners, the classification happens randomly.

Note the k parameter is often an odd number to avoid ties in the voting scores.

To build your classifier, you need to take the knn() function and simply add some arguments to it, just like in this example:

iris_pred <- knn(train = iris.training, test = iris.test, cl = iris.trainLabels, k=3)

You store into iris_pred the knn() function that takes as arguments the training set, the test set, the train labels and the amount of neighbours you want to find with this algorithm. The result of this function is a factor vector with the predicted classes for each row of the test data.

Note that you don't want to insert the test labels: these will be used to see if your model is good at predicting the actual classes of your instances!

You can retrieve the result of the knn() function by typing in the following command:

iris_pred
##  [1] Iris-setosa     Iris-setosa     Iris-setosa     Iris-setosa    
##  [5] Iris-setosa     Iris-setosa     Iris-setosa     Iris-setosa    
##  [9] Iris-setosa     Iris-setosa     Iris-setosa     Iris-setosa    
## [13] Iris-versicolor Iris-versicolor Iris-versicolor Iris-versicolor
## [17] Iris-versicolor Iris-versicolor Iris-versicolor Iris-versicolor ## [21] Iris-versicolor Iris-versicolor Iris-versicolor Iris-versicolor ## [25] Iris-virginica Iris-virginica Iris-virginica Iris-virginica ## [29] Iris-versicolor Iris-virginica Iris-virginica Iris-virginica ## [33] Iris-virginica Iris-virginica Iris-virginica Iris-virginica ## [37] Iris-virginica Iris-virginica Iris-virginica Iris-virginica ## Levels: Iris-setosa Iris-versicolor Iris-virginica 

The result of this command is the factor vector with the predicted classes for each row of the test data.

Step Seven. Evaluation of Your Model

An essential next step in machine learning is the evaluation of your model's performance. In other words, you want to analyze the degree of correctness of the model's predictions. For a more abstract view, you can just compare the results of iris_pred to the test labels that you had defined earlier:

##    Predicted Species Observed Species
## 1        Iris-setosa      Iris-setosa
## 2        Iris-setosa      Iris-setosa
## 3        Iris-setosa      Iris-setosa
## 4 Iris-setosa Iris-setosa ## 5 Iris-setosa Iris-setosa ## 6 Iris-setosa Iris-setosa ## 7 Iris-setosa Iris-setosa ## 8 Iris-setosa Iris-setosa ## 9 Iris-setosa Iris-setosa ## 10 Iris-setosa Iris-setosa ## 11 Iris-setosa Iris-setosa ## 12 Iris-setosa Iris-setosa ## 13 Iris-versicolor Iris-versicolor ## 14 Iris-versicolor Iris-versicolor ## 15 Iris-versicolor Iris-versicolor ## 16 Iris-versicolor Iris-versicolor ## 17 Iris-versicolor Iris-versicolor ## 18 Iris-versicolor Iris-versicolor ## 19 Iris-versicolor Iris-versicolor ## 20 Iris-versicolor Iris-versicolor ## 21 Iris-versicolor Iris-versicolor ## 22 Iris-versicolor Iris-versicolor ## 23 Iris-versicolor Iris-versicolor ## 24 Iris-versicolor Iris-versicolor ## 25 Iris-virginica Iris-virginica ## 26 Iris-virginica Iris-virginica ## 27 Iris-virginica Iris-virginica ## 28 Iris-virginica Iris-virginica ## 29 Iris-versicolor Iris-virginica ## 30 Iris-virginica Iris-virginica ## 31 Iris-virginica Iris-virginica ## 32 Iris-virginica Iris-virginica ## 33 Iris-virginica Iris-virginica ## 34 Iris-virginica Iris-virginica ## 35 Iris-virginica Iris-virginica ## 36 Iris-virginica Iris-virginica ## 37 Iris-virginica Iris-virginica ## 38 Iris-virginica Iris-virginica ## 39 Iris-virginica Iris-virginica ## 40 Iris-virginica Iris-virginica 

You see that the model makes reasonably accurate predictions, with the exception of one wrong classification in row 29, where "Versicolor” was predicted while the test label is “Virginica”.

This is already some indication of your model's performance, but you might want to go even deeper into your analysis. For this purpose, you can import the package gmodels:

install.packages("package name")

If you have already installed this package, you can simply enter

library(gmodels)

Then you can make a cross tabulation or a contingency table. This type of table is often used to understand the relationship between two variables. In this case, you want to understand how the classes of your test data, stored in iris.testLabels relate to your model that is stored iniris_pred:

CrossTable(x = iris.testLabels, y = iris_pred, prop.chisq=FALSE)

Screenshot 2015-03-24 20.05.32

Note that the last argument prop.chisq indicates whether or not the chi-square contribution of each cell is included. The chi-square statistic is the sum of the contributions from each of the individual cells and is used to decide whether the difference between the observed and the expected values is significant.

From this table, you can derive the number of correct and incorrect predictions: one instance from the testing set was labeled Versicolor by the model, while it was actually a flower of speciesVirginica. You can see this in the first row of the “Virginica” species in the iris.testLabelscolumn. In all other cases, correct predictions were made. You can conclude that the model's performance is good enough and that you don't need to improve the model!

Move On To Big Data

This tutorial was mainly concerned with performing basic machine learning algorithm KNN with the help of R. The Iris data set that was used was small and overviewable; But you can do so much more! If you have experimented enough with the basics presented in this tutorial and other machine learning algorithms, you might want to find it interesting to go further into R and data analysis.DataCamp can help you to take this step .

 

转自:http://blog.datacamp.com/machine-learning-in-r/

转载于:https://www.cnblogs.com/payton/p/4370917.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Master machine learning techniques with R to deliver insights for complex projects About This Book Get to grips with the application of Machine Learning methods using an extensive set of R packages Understand the benefits and potential pitfalls of using machine learning methods Implement the numerous powerful features offered by R with this comprehensive guide to building an independent R-based ML system Who This Book Is For If you want to learn how to use R's machine learning capabilities to solve complex business problems, then this book is for you. Some experience with R and a working knowledge of basic statistical or machine learning will prove helpful. What You Will Learn Gain deep insights to learn the applications of machine learning tools to the industry Manipulate data in R efficiently to prepare it for analysis Master the skill of recognizing techniques for effective visualization of data Understand why and how to create test and training data sets for analysis Familiarize yourself with fundamental learning methods such as linear and logistic regression Comprehend advanced learning methods such as support vector machines Realize why and how to apply unsupervised learning methods In Detail Machine learning is a field of Artificial Intelligence to build systems that learn from data. Given the growing prominence of R―a cross-platform, zero-cost statistical programming environment―there has never been a better time to start applying machine learning to your data. The book starts with introduction to Cross-Industry Standard Process for Data Mining. It takes you through Multivariate Regression in detail. Moving on, you will also address Classification and Regression trees. You will learn a couple of “Unsupervised techniques”. Finally, the book will walk you through text analysis and time series. The book will deliver practical and real-world solutions to problems and variety of tasks such as complex recommendation systems. By the end of this book, you will g
Discovering knowledge from big multivariate data, recorded every days, requires specialized machine learning techniques. This book presents an easy to use practical guide in R to compute the most popular machine learning methods for exploring data sets, as well as, for building predictive models. The main parts of the book include: Unsupervised learning methods, to explore and discover knowledge from a large multivariate data set using clustering and principal component methods. You will learn hierarchical clustering, k-means, principal component analysis and correspondence analysis methods. Regression analysis, to predict a quantitative outcome value using linear regression and non-linear regression strategies. Classification techniques, to predict a qualitative outcome value using logistic regression, discriminant analysis, naive bayes classifier and support vector machines. Advanced machine learning methods, to build robust regression and classification models using k-nearest neighbors methods, decision tree models, ensemble methods (bagging, random forest and boosting). Model selection methods, to select automatically the best combination of predictor variables for building an optimal predictive model. These include, best subsets selection methods, stepwise regression and penalized regression (ridge, lasso and elastic net regression models). We also present principal component-based regression methods, which are useful when the data contain multiple correlated predictor variables. Model validation and evaluation techniques for measuring the performance of a predictive model. Model diagnostics for detecting and fixing a potential problems in a predictive model. The book presents the basic principles of these tasks and provide many examples in R. This book offers solid guidance in data mining for students and researchers. Key features: Covers machine learning algorithm and implementation Key mathematical concepts are presented Short, self-contained chapters with practical examples. At the end of each chapter, we present R lab sections in which we systematically work through applications of the various methods discussed in that chapter.
About This Book Fully-coded working examples using a wide range of machine learning libraries and tools, including Python, R, Julia, and Spark Comprehensive practical solutions taking you into the future of machine learning Go a step further and integrate your machine learning projects with Hadoop Who This Book Is For This book has been created for data scientists who want to see Machine learning in action and explore its real-world applications. Knowledge of programming (Python and R) and mathematics is advisable if you want to get started immediately. What You Will Learn Implement a wide range of algorithms and techniques for tackling complex data Get to grips with some of the most powerful languages in data science, including R, Python, and Julia Harness the capabilities of Spark and Mahout used in conjunction with Hadoop to manage and process data successfully Apply the appropriate Machine learning technique to address a real-world problem Get acquainted with deep learning and find out how neural networks are being used at the cutting edge of Machine learning Explore the future of Machine learning and dive deeper into polyglot persistence, semantic data, and more Table of Contents Chapter 1. Introduction to Machine learning Chapter 2. Machine learning and Large-scale datasets Chapter 3. An Introduction to Hadoop's Architecture and Ecosystem Chapter 4. Machine Learning Tools, Libraries, and Frameworks Chapter 5. Decision Tree based learning Chapter 6. Instance and Kernel Methods Based Learning Chapter 7. Association Rules based learning Chapter 8. Clustering based learning Chapter 9. Bayesian learning Chapter 10. Regression based learning Chapter 11. Deep learning Chapter 12. Reinforcement learning Chapter 13. Ensemble learning Chapter 14. New generation data architectures for Machine learning
Title: R Machine Learning Essentials Author: Michele Usuelli Length: 218 pages Edition: 1 Language: English Publisher: Packt Publishing Publication Date: 2014-11-25 ISBN-10: 178398774X ISBN-13: 9781783987740 Gain quick access to the machine learning concepts and practical applications using the R development environment About This Book Build machine learning algorithms using the most powerful tools in R Identify business problems and solve them by developing effective solutions Hands-on tutorial explaining the concepts through lots of practical examples, tips and tricks Who This Book Is For If you want to learn how to develop effective machine learning solutions to your business problems in R, this book is for you. It would be helpful to have a bit of familiarity with basic object-oriented programming concepts, but no prior experience is required. In Detail R Machine Learning Essentials provides you with an introduction to machine learning with R. Machine learning finds its applications in speech recognition, search-based operations, and artificial intelligence, among other things. You will start off by getting an introduction to what machine learning is, along with some examples to demonstrate the importance in understanding the basic ideas of machine learning. This book will then introduce you to R and you will see that it is an influential programming language that aids effective machine learning. You will learn the three steps to build an effective machine learning solution, which are exploring the data, building the solution, and validating the results. The book will demonstrate each step, highlighting their purpose and explaining techniques related to them. By the end of this book, you will be able to use the machine learning techniques effectively, identify business problems, and solve them by applying appropriate solutions. Table of Contents Chapter 1. Transforming Data into Actions Chapter 2. R – A Powerful Tool for Developing Machine Learning Algorithms Chapter 3. A Simple Machine Learning Analysis Chapter 4. Step 1 – Data Exploration and Feature Engineering Chapter 5. Step 2 – Applying Machine Learning Techniques Chapter 6. Step 3 – Validating the Results Chapter 7. Overview of Machine Learning Techniques Chapter 8. Machine Learning Examples Applicable to Businesses

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值