[Getting and Cleaning data] Week 2

For more detail, down the html file here.

Week 2

Reading data from MySQL

What is MySQL? SQL is short for Structured Query Language and MySQL is the world’s most popular database.(Further information can be found in wiki page and mysql page)

  • Free and widely used open source database software
  • Widely used in internet bases application
  • Data are structured in
    • Databases
    • Tables within databases
    • Fields with tables
  • Each row is called a record

And why using it is the important point to keep in mind here is that as a data scientist what role that you will have is likely to collect data from a database, and maybe later you’re going to put some data back in it. But usually, the basic data collection has already been formed before you get there, so you usually be handed a database and trying having to get data out of it.

Now we will focus on how to access MySQL database using R. Firstly you need to install R package RMySQL. The instruction can be found in my blog How to install RMySQL package on Windows. FOn a Mac, general way install.packages("RMySQL") is OK. Then we will access the database and collect some information about it.

  • step 1: connect a database using dbConnect function.
library(RMySQL)
ucscDb <- dbConnect(MySQL(), user="genome", host="genome-mysql.cse.ucsc.edu")
  • step 2: apply a query to the database using dbGetQuery function. (Here the result contains all the databases in this sever.)
result <- dbGetQuery(ucscDb, "show databases;")
  • step 3: disconnect the connection using dbDisconnect function.(It is very important that whenever you’ve done analysis data or collecting data from MySQL server that you disconnect from the server.)
dbDisconnect(ucscDb)

Or you can use dbSendQuery function in step 2. The difference for dbGetQuery and dbSendQuery is that

1 dbSendQuery only submits and synchronously executes the SQL statement to the database engine. It does not extracts any records — for that you need to use the function dbFetch, and then you must call dbClearResult when you finish fetching the records you need.
2 dbGetQuery comes with a default implementation that calls dbSendQuery, then if dbHasCompleted is TRUE, it uses fetch to return the results. on.exit is used to ensure the result set is always freed by dbClearResult. Subclasses should override this method only if they provide some sort of performance optimisation.

The above codes gives all databases in the connection. If we want to focus on a specific database, we need to use the argument dbname for the name of database.

  • step 1: connect a specific database using dbConnect function with argument dbname .
hg19 <- dbConnect(MySQL(), user="genome", dbname="hg19&#
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值