数据科学 python_Python还是R:对于数据科学来说,哪个是更好的选择?

数据科学 python

Data science is going to revolutionize this world completely in the coming years. The tough question among data scientists is that which programming language plays the most important role in data science? There are many programming languages used in data science including R, C++, Python.

未来几年,数据科学将彻底改变这个世界。 数据科学家之间的难题是,哪种编程语言在数据科学中扮演着最重要的角色? 数据科学中使用了许多编程语言,包括R,C ++,Python。

In this blog, we are going to discuss two important programming languages namely Python and R. This will help you choose the best-fit language for your next data science project.

在本博客中,我们将讨论两种重要的编程语言,即Python和R。这将帮助您为下一个数据科学项目选择最合适的语言。

Python is an open-source, flexible, object-oriented and easy-to-use programming language. It has a large community base and consists of a rich set of libraries & tools. It is, in fact, the first choice of every data scientist.

Python是一种开源,灵活, 面向对象且易于使用的编程语言。 它具有庞大的社区基础,并包含一组丰富的库和工具。 实际上,它是每个数据科学家的首选。

On the other hand, R is a very useful programming language for statistical computation & data science. It offers unique technique's viz. nonlinear/linear modeling, clustering, time-series analysis, classical statistical tests, and classification technique.

另一方面,R是用于统计计算和数据科学的非常有用的编程语言。 它提供了独特技术的作用。 非线性/线性建模,聚类,时间序列分析,经典统计检验和分类技术。

Also Read: Uses of Google App Engine

另请参阅: Google App Engine的用途

Python的功能 (Features of Python)

  • Dynamically typed language, so the variables are defined automatically.

    动态类型语言,因此变量是自动定义的。
  • More readable and uses less code to perform the same task as compared to other programming languages.

    与其他编程语言相比,更具可读性且使用更少的代码来执行相同的任务。
  • Strongly typed. So, developers have to cast types manually.

    强类型。 因此,开发人员必须手动转换类型。
  • An interpreted language. This means that the program need not be compiled.

    一种解释性语言。 这意味着程序无需编译。
  • Flexible, portable and can run on any platform easily. It is scalable and can be integrated with other third-party software easily.

    灵活,便携式,可以轻松在任何平台上运行。 它具有可伸缩性,可以轻松地与其他第三方软件集成。

数据科学应用程序的R功能 (R features for data science apps)

  • Multiple calculations can be done with vectors

    向量可以进行多次计算
  • Statistical language

    统计语言
  • You can run your code without any compiler

    您无需任何编译器即可运行代码
  • Data science support

    数据科学支持

Here, I have listed out some domains that are used to differentiate these two programming languages for data science.

在这里,我列出了一些用于区分这两种数据科学编程语言的领域。

1) 数据结构 (1) Data structures )

When it comes to data structures, binary trees can be easily implemented in Python but this process is done in R by using list class which is a slow move.

对于数据结构,二进制树可以在Python中轻松实现,但是此过程是在R中使用列表类完成的,这是一个缓慢的过程。

Implementation of binary trees in Python is shown below:

Python中二进制树的实现如下所示:

First, create a node class and assign any value to the node. This will create a tree with a root node.

首先,创建一个节点类并将任何值分配给该节点。 这将创建一个带有根节点的树。

class Node:

    def __init__(self, data):

        self.left = None
        self.right = None
        self.data = data


    def PrintTree(self):
        print(self.data)

root = Node(10)

root.PrintTree()

Output: 10

输出:10

Now, we need to insert into a tree so we add an insert class & same node class inserted above.

现在,我们需要插入到树中,因此我们在上面添加了一个插入类和相同的节点类。

class Node:

    def __init__(self, data):

        self.left = None
        self.right = None
        self.data = data

    def insert(self, data):
# Compare the new value with the parent node
        if self.data:
            if data < self.data:
                if self.left is None:
                    self.left = Node(data)
                else:
                    self.left.insert(data)
            elif data > self.data:
                if self.right is None:
                    self.right = Node(data)
                else:
                    self.right.insert(data)
        else:
            self.data = data

# Print the tree
    def PrintTree(self):
        if self.left:
            self.left.PrintTree()
        print( self.data),
        if self.right:
            self.right.PrintTree()

# Use the insert method to add nodes
root = Node(12)
root.insert(6)
root.insert(14)
root.insert(3)

root.PrintTree()

Output: 3 6 12 14

输出:3 6 12 14

获奖语言: (Winning language:)

Python

Python

2) 程序语言统一 (2) Programming language unity )

The version change of Python from 2.7 to 3.x will not cause any disruption in the market while changing the version of R into two different dialects is impacting a lot because of RStudio: R & Tidyverse.

由于RStudio:R& Tidyverse ,将R的版本更改为两种不同的方言时,将Python的版本从2.7更改为3.x不会对市场造成任何干扰。

获奖语言: (Winning language: )

Python

Python

3) 元编程和OOP事实 (3) Meta programming & OOP facts)

Python programming language has one OOP paradigm while in R, you can print a function to the terminal many times. The meta programming features of R i.e. code that produce code is magical. Hence, it has become the first choice of computer scientists. Though functions are objects in both programming languages R takes it more seriously as that of Python.

Python编程语言在R中有一个OOP范例,您可以多次在终端上打印函数。 R即生成代码的代码的元编程功能是神奇的。 因此,它已成为计算机科学家的首选。 尽管函数在两种编程语言中都是对象,但R还是像Python一样重视它。

As a functional programming language, R provides good tools to perform well-structured code generation. Here, a simple function is taking a vector as an argument & returning element which is higher than the threshold.

作为一种功能编程语言,R提供了很好的工具来执行结构良好的代码生成。 在这里,一个简单的函数就是将向量作为参数并返回高于阈值的元素。

myFun <- function(vec) {
  numElements <- length(which(vec > threshold))
  numElements
}

For a different threshold value, we will write a function that generates all these functions instead of rewriting the function by hand. Below, we have shown the function that produces many myFun type functions:

对于不同的阈值,我们将编写一个生成所有这些函数的函数,而不是手动重写该函数。 下面,我们展示了产生许多myFun类型函数的函数:

genMyFuns <- function(thresholds) {
  ll <- length(thresholds)
  print("Generating functions:")
  for(i in 1:ll) {
    fName <- paste("myFun.", i, sep="")
    print(fName)
    assign(fName, eval(
                       substitute(
                                  function(vec) {
                                    numElements <- length(which(vec > tt));
                                    numElements;
                                  }, 
                                  list(tt=thresholds[i])
                                 )
                      ),
             envir=parent.frame()
           )
  }
}

You can also consider the numeric example on the R CLI session as shown below:

您还可以考虑在R CLI会话上使用数字示例,如下所示:

>  genMyFuns(c(7, 9, 10))
[1] "Generating functions:"
[1] "myFun.1"
[1] "myFun.2"
[1] "myFun.3"
>  myFun.1(1:20)
[1] 13
>  myFun.2(1:20)
[1] 11
>  myFun.3(1:20)
[1] 10
>

获奖语言: (Winning language:)

R

[R

4) 与C / C ++的接口 (4) Interface to C/C++)

To interface with C/C++, R programming language has strong tools as compared to Python language. R's Rcpp is one of the powerful tools which interface to C/C++ and its new ALTREP idea can further enhance performance & usability. On the other hand, Python has tools viz. swig which is not that much power but working the same. Other variants of Python like Cython and PyPy can remove the need for explicit C/C++ interface completely anytime.

为了与C / C ++交互,与Python语言相比,R编程语言具有强大的工具。 R的Rcpp是与C / C ++交互的强大工具之一,其新的ALTREP思想可以进一步提高性能和可用性。 另一方面,Python有工具viz。 s,那不是那么多的力量,但工作相同。 诸如Cython和PyPy之类的Python其他变体可以完全消除对显式C / C ++接口的需求。

获奖语言: (Winning language:)

R programming

R编程

5) 并行计算 (5) Parallel computation)

Both programming languages do not provide good support for multicore computation. R comes with a parallel package which is not a good workaround and Python's multiprocessing package is not either. Python has better interfaces for GPUs. However, external libraries supporting cluster computation are good in both the programming languages.

两种编程语言都没有为多核计算提供良好的支持。 R附带了一个并行程序包,这不是一个好的解决方法,Python的多处理程序包也不是。 Python具有更好的GPU接口。 但是,支持群集计算的外部库在两种编程语言中都很好。

获奖语言: (Winning language:)

None of the two

两者都不

6)统计问题 (6) Statistical issues)

R language was written by statisticians for statisticians. Hence there were no statistical issues involved. On the other hand, Python professionals majorly work in machine learning and have a poor understanding of the statistical issues.

R语言是由统计学家为统计学家编写的。 因此,不涉及统计问题。 另一方面,Python专业人士主要从事机器学习,并且对统计问题了解甚少。

R is related to the S statistical language commercially available as S-PLUS. R provides numerous statistics functions namely sd(variable), median(variable), min(variable), mean(variable), quantile(variable, level), length(variable), var(variable). T-test is used to determine statistical differences. An example is hown below to perform a t-test:

R与作为S-PLUS市售的S统计语言有关。 R提供了许多统计函数,即sd(变量),中位数(变量),min(变量),均值(变量),分位数(变量,水平),长度(变量),var(变量)。 T检验用于确定统计差异。 下面是一个例子,如何执行t检验:

> t.test(var1, var2)

> t.test(var1,var2)

Welch Two Sample t-test

韦尔奇两样本t检验

data: x1 and x2

数据:x1和x2

t = 4.0369, df = 22.343, p-value = 0.0005376

t = 4.0369,df = 22.343,p值= 0.0005376

alternative hypothesis: true difference in means is not equal to 0

替代假设:均值的真实差不等于0

95 percent confidence interval:

95%置信区间:

2.238967 6.961033

2.238967 6.961033

sample estimates:

样本估算:

mean of x mean of y

x的平均值,y的平均值

8.733333 4.133333

8.733333 4.133333

>

>

However, the classic version of the t-test can be run as shown below:

但是,可以如下所示运行经典版本的t检验:

> t.test(var1, var2, var.equal=T)

> t.test(var1,var2,var.equal = T)

Two Sample t-test

两次样本t检验

data: x1 and x2

数据:x1和x2

t = 4.0369, df = 28, p-value = 0.0003806

t = 4.0369,df = 28,p值= 0.0003806

alternative hypothesis: true difference in means is not equal to 0

替代假设:均值的真实差不等于0

95 percent confidence interval:

95%置信区间:

2.265883 6.934117

2.265883 6.934117

sample estimates:

样本估算:

mean of x mean of y

x的平均值,y的平均值

8.733333 4.133333

8.733333 4.133333

>

>

To run a t-test on paired data, you need to code like below:

要对配对的数据进行t检验,您需要编写如下代码:

> t.test(var1, var2, paired=T)

> t.test(var1,var2,paired = T)

Paired t-test

配对t检验

data: x1 and x2

数据:x1和x2

t = 4.3246, df = 14, p-value = 0.0006995

t = 4.3246,df = 14,p值= 0.0006995

alternative hypothesis: true difference in means is not equal to 0

替代假设:均值的真实差不等于0

95 percent confidence interval:

95%置信区间:

2.318620 6.881380

2.318620 6.881380

sample estimates:

样本估算:

mean of the differences

差异的均值

4.6

4.6

>

>

获奖语言: (Winning language: )

R language

R语言

7) AL和ML (7) AL & ML)

Python got huge importance after the arrival of machine learning and artificial intelligence. Python offers a great number of finely-tuned libraries for image recognition like AlexNet. Therefore, R versions can be easily developed. Python powerful libraries come from making certain image-smoothing ops which further can be implemented in R's Keras wrapper. Due to which a pure-R version of TensorFlow can be easily developed. However, R's package availability for gradient boosting & random forests is outstanding.

随着机器学习和人工智能的到来,Python变得非常重要。 Python提供了大量用于图像识别的微调库,例如AlexNet。 因此,可以轻松开发R版本。 Python强大的库来自制作某些平滑图像的操作,这些操作可以进一步在R's Keras包装器中实现 。 因此,可以轻松开发TensorFlow的纯R版本。 但是,R的用于梯度增强和随机森林的软件包可用性非常出色。

获奖语言: (Winning language: )

Python

Python

8)图书馆的存在 (8) Presence of libraries )

The Comprehensive R Archive Network (CRAN) has over 12,000 packages while the Python Package Index (PyPI) has over 183,000. PyPI is thin on data science as compared to R.

综合R存档网络(CRAN)拥有超过12,000个软件包,而Python软件包索引( PyPI )具有超过183,000个软件包。 与R相比,PyPI在数据科学上很薄弱。

获奖语言: (Winning language:)

Tie between the two

两者之间的关系

9) 学习图 (9) Learning graph)

When it comes to becoming proficient in Python, one needs to learn a lot of material including Pandas, NumPy & matplotlib, matrix types while basic graphics are already built-in R. The novice can easily learn R programming language within minutes by doing simple data analysis. However, Python libraries can be tricky for him to configure out. But R packages are out of the box.

要精通Python,需要学习许多材料,包括Pandas,NumPy和matplotlib,矩阵类型,而基本图形已经内置R。新手可以通过简单的数据在几分钟内轻松学习R编程语言。分析。 但是,Python库对于他进行配置可能很棘手。 但是R包是开箱即用的。

获奖语言: (Winning language:)

R programming language

R编程语言

10) 优雅 (10) Elegance)

Being the last comparison factor, it is actually the most subjective one. Python is more elegant than R programming language as it greatly reduces the use of parentheses & braces while coding and making it more sleek to use by developers.

作为最后一个比较因素,它实际上是最主观的一个。 Python比R编程语言更优雅,因为它在编码时大大减少了括号和花括号的使用,并使开发人员更流畅地使用它。

获奖语言: (Winning language: )

Python

Python

最后说明: (Final Note: )

Both languages are giving a head fight to each other in the world of data science. At some point, Python is winning the race while at some other R language is up. So the end choice between the two above programming languages for data science depends on the following factors:

在数据科学领域,这两种语言正在互相打架。 在某些时候,Python赢得了比赛,而在其他R语言上则取得了胜利。 因此,以上两种用于数据科学的编程语言之间的最终选择取决于以下因素:

-> Amount of time you invest

->您投资的时间

-> Your project requirements

->您的项目要求

-> Objective of your business

->您的业务目标

Thank you for investing your precious time in reading and I welcome your positive feedback.

感谢您将宝贵的时间用于阅读,我欢迎您的积极反馈。

翻译自: https://habr.com/en/post/482500/

数据科学 python

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值