R in Business Intelligence

1 篇文章 0 订阅
January 18, 2015
By Jan Górecki - R

(This article was first published on  Jan Gorecki - R, and kindly contributed to R-bloggers)     

Business Intelligence (BI) can be simply described as extracting useful informations from the data. This is quite a broad process as the source data structure (and quality) can vary, as well the useful information structure can vary. More technically process of such transformation can be described as ETL (extract, transform, load), plus presentation of the useful information

The idea to review usefulness of R in those processes came after I saw Wes McKinney's slides about python pandas and BI: Practical Medium Data Analytics with Python (10 Things I Hate About pandas, PyData NYC 2013).
I recommend to preview the slides, just for better understanding where pandas is (been in 2013) in BI. Python pandas is presented more or less as BI ready processing engine.

R in BI

In my opinion R is fully capable (even more than pandas) to serve as engine for BI related processes.
R has naturally (developed for decades) broad range of statistical tools available (multiple repositories with thousands of packages). I will skip this enormous feature of R and just focus on simple BI case of extraction, transformation, loading and presentation.
Below are listed packages which directly address the steps in basic BI process.

Extraction

  • DBI - native database drivers for multiple vendors, top performance.
  • RODBC - ODBC database driver connection.
  • RJDBC - JDBC database driver connection.
  • data.table's fread - very fast csv files reader.
  • tons other packages to support different format of data (e.g.: xlsx, xml, json, sas, spss, stata).

Good overview of R database interfaces available on Burns Statistics.

Transformation

  • data.table - powerful data transformation tool, uses from[ where, select|update, group by ][...] syntax.
  • dplyr - also powerful, but less scalable, data transformation tool, uses from %>% where %>% group by %>% select %>% ... syntax. Pivot and unpivot (cast and melt) are located in tidyr package.

Loading

  • DBI - native database drivers for multiple vendors, top performance.
  • RODBC - ODBC database driver connection.
  • RJDBC - JDBC database driver connection.

Presentation

Presentation of useful information is totallly different task than ETL process, it can be easily outsourced to any BI dashboard tool by simply populating the data structure expected by particular tool. Yet when using R you don't even need to push prepared data to external presentation tool. You can produce a web application dashboard directly from R.

  • shiny - Web Application Framework for R.
  • opencpu - HTTP API to R.
  • httpuv - HTTP and WebSocket server library, also the core of shiny package.
  • Rook - web server interface for R.

Using mentioned packages you are capable to host interactive web applications. Those can generate interactive plots, interactivly query the data. All is working on engine of R session, can execute R functions so it can also utilize huge plotting and statistical analysis capabilities available in any R packages. As of now I haven't heard of any open source end-to-end dashboard product which could allow to simply populate data marts and query against them in shiny, but as you can see in Extras section below it is not so hard to build a fully customized one. Yet there exists some big dashboard products for R, e.g. Mango Navigator, but I'm not sure how well they can cover BI needs.

Extras

Packages mentioned above are core "engines" for each of the section, there are multiple packages which simply wraps functions in those packages, unify or simplify its usage. Those are listed below, surely not all which exists:

  • ETLUtils - read (only) data from multiple database. Unify DBI, RODBC, RJDBC calls.
  • db.r - database schema explorer.
  • dplyr - it can use DBI database connections behind the scene.
  • sqldf - helper to operate on R data.frame using SQL statements, not so efficient as data.table or dplyr, yet may be quite useful for SQL folks.
  • dwtools - data warehouse related functions. Database queries details logging (statement, nrow in, nrow out, timing in ns, etc.), processing details logging. Unify DBI, RODBC calls. (self-promotion)
  • shinyBI - shiny application as package, can perform interactive aggregation on user provided dataset (using data.table as backend for top performance), and interactive plotting of the aggregated data. (self-promotion)

The sentence from pandas slides "Stop believing in the 'one tool to rule them all'" is very true, but if we consider R as container for tools, which are the packages, then we might end up with one container to rule them all.

Crucial factors for business to adopt R

Performance and scalability

Extraction / Loading

I doubt if DBI performance of extraction or loading can be easily beaten. Still the most common interface for ETL tools is ODBC connection which is dramatically slower than native drivers available via DBI. Similarly data.table's fread (fast read for csv files) can be hard to beat.
Detailed benchmark of DBI-compliant package ROracle is available at Oracle blogs:

Transformation

This may suprise you but R beats python pandas in terms of performance. There are benchmarks in the web which may state the opposite, but in fact they are old, or they were made against the bug in the opposite tool.
Linking the recent benchmarks of R vs pandas:

Presentation

Measuring performance of web apps seems to be totally different story which I will not cover. Important factor is that within you web application you can make all the computation in any R package you prefer as the web application is already working in R. Good example is shinyBI shiny app mentioned above in Extras which uses data.table backend for best performance.

Support and licensing

Open source is often considered as problem in terms of support and licensing.
On the enterprise level it is usually necessary to implement the solutions for which support is already contracted.
It is very resonable, but people should be aware of very good alternatives.
This gap has been filled by companies like Revolution Analytics or RStudio.
Besides of big R companies there are other ways to get (even more) effective support like contracting support from:

  • packages authors, contributors
  • stackoverflow top answerers for a tag
  • tons of R consultancy companies

In terms of licensing, there are many products which are available under dual license. If there is no such for a product you are interested in, then you can try to integrate to the current license, it is often possible, or ask the author for a dual license version.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
智慧校园信息化系统解决方案旨在通过先进的信息技术,实现教育的全方位创新和优质资源的普及共享。该方案依据国家和地方政策背景,如教育部《教育信息化“十三五”规划》和《教育信息化十年发展规划》,以信息技术的革命性影响为指导,推进教育信息化建设,实现教育思想和方法的创新。 技术发展为智慧校园建设提供了强有力的支撑。方案涵盖了互连互通、优质资源共享、宽带网络、移动APP、电子书包、电子教学白板、3D打印、VR虚拟教学等技术应用,以及大数据和云计算技术,提升了教学数据记录和分析水平。此外,教育资源公共服务平台、教育管理公共服务平台等平台建设,进一步提高了教学、管控的效率。 智慧校园系统由智慧教学、智慧管控和智慧办公三大部分组成,各自具有丰富的应用场景。智慧教学包括微课、公开课、精品课等教学资源的整合和共享,支持在线编辑、录播资源、教学分析等功能。智慧管控则通过平安校园、可视对讲、紧急求助、视频监控等手段,保障校园安全。智慧办公则利用远程视讯、无纸化会议、数字会议等技术,提高行政效率和会议质量。 教育录播系统作为智慧校园的重要组成部分,提供了一套满足学校和教育局需求的解决方案。它包括标准课室、微格课室、精品课室等,通过自动五机位方案、高保真音频采集、一键式录课等功能,实现了优质教学资源的录制和共享。此外,录播系统还包括互动教学、录播班班通、教育中控、校园广播等应用,促进了教育资源的均衡化发展。 智慧办公的另一重点是无纸化会议和数字会议系统的建设,它们通过高效的文件管理、会议文件保密处理、本地会议的音频传输和摄像跟踪等功能,实现了会议的高效化和集中管控。这些系统不仅提高了会议的效率和质量,还通过一键管控、无线管控等设计,简化了操作流程,使得会议更加便捷和环保。 总之,智慧校园信息化系统解决方案通过整合先进的信息技术和教学资源,不仅提升了教育质量和管理效率,还为实现教育均衡化和资源共享提供了有力支持,推动了教育现代化的进程。
智慧校园信息化系统解决方案旨在通过先进的信息技术,实现教育的全方位创新和优质资源的普及共享。该方案依据国家和地方政策背景,如教育部《教育信息化“十三五”规划》和《教育信息化十年发展规划》,以信息技术的革命性影响为指导,推进教育信息化建设,实现教育思想和方法的创新。 技术发展为智慧校园建设提供了强有力的支撑。方案涵盖了互连互通、优质资源共享、宽带网络、移动APP、电子书包、电子教学白板、3D打印、VR虚拟教学等技术应用,以及大数据和云计算技术,提升了教学数据记录和分析水平。此外,教育资源公共服务平台、教育管理公共服务平台等平台建设,进一步提高了教学、管控的效率。 智慧校园系统由智慧教学、智慧管控和智慧办公三大部分组成,各自具有丰富的应用场景。智慧教学包括微课、公开课、精品课等教学资源的整合和共享,支持在线编辑、录播资源、教学分析等功能。智慧管控则通过平安校园、可视对讲、紧急求助、视频监控等手段,保障校园安全。智慧办公则利用远程视讯、无纸化会议、数字会议等技术,提高行政效率和会议质量。 教育录播系统作为智慧校园的重要组成部分,提供了一套满足学校和教育局需求的解决方案。它包括标准课室、微格课室、精品课室等,通过自动五机位方案、高保真音频采集、一键式录课等功能,实现了优质教学资源的录制和共享。此外,录播系统还包括互动教学、录播班班通、教育中控、校园广播等应用,促进了教育资源的均衡化发展。 智慧办公的另一重点是无纸化会议和数字会议系统的建设,它们通过高效的文件管理、会议文件保密处理、本地会议的音频传输和摄像跟踪等功能,实现了会议的高效化和集中管控。这些系统不仅提高了会议的效率和质量,还通过一键管控、无线管控等设计,简化了操作流程,使得会议更加便捷和环保。 总之,智慧校园信息化系统解决方案通过整合先进的信息技术和教学资源,不仅提升了教育质量和管理效率,还为实现教育均衡化和资源共享提供了有力支持,推动了教育现代化的进程。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值