7 Steps for Learning Data Mining and Data Science

How to learn data mining and data science? I outline seven steps and point you to resources for becoming a data scientist.

By Gregory Piatetsky, Oct 10, 2013. ccomments

I am frequently asked - how to learn Data Mining and Data Science?

Here is my summary. Let me know what I missed and add your comments below.

You can best learn data mining and data science by doing, so start analyzing data as soon as you can! However, don't forget to learn the theory, since you need a good statistical and machine learning foundation to understand what you are doing and to find real nuggets of value in the noise of Big Data.

Here are 7 steps for learning data mining and data science. Although they are numbered, you can do them in parallel or in a different order.

  1. Languages: Learn R, Python, and SQL
  2. Tools: Learn how to use data mining and visualization tools
  3. Textbooks: Read introductory textbooks to understand the fundamentals
  4. Education: watch webinars, take courses, and consider a certificate or a degree in data science
  5. Data: Check available data resources and find something there
  6. Competitions: Participate in data mining competitions
  7. Interact with other data scientists, via social networks, groups, and meetings

Also, don't forget to subscribe to KDnuggets News bi-weekly email and follow @kdnuggets - voted Top Big Data Twitter - for latest news on Analytics, Big Data, Data Mining, and Data Science.

Here I use Data Mining and Data Science interchangeably - see my presentation Analytics Industry Overview, where I look at evolution and popularity of different terms like Statistics, Knowledge Discovery, Data Mining, Predictive Analytics, Data Science, and Big Data.

1. Learning Languages

 
Recent KDnuggets Poll found that the most popular languages for data mining are R, Python, and SQL.

There are many resources for each, for example

2. Tools: Data Mining, Data Science, and Visualization Software

 
There are many data mining tools for different tasks, but it is best to learn using a data mining suite which supports the entire process of data analysis.

You can start with open source (free) tools such as KNIME, RapidMiner, and Weka.

However, for many analytics jobs you need to know SAS, which is the leading commercial tool and widely used.

Other popular Analytics and Data Mining Software include MATLAB, StatSoft STATISTICA, Microsoft SQL Server, Tableau, IBM SPSS Modeler, and Rattle.

Visualization is an essential part of any data analysis - learn how to use Microsoft Excel (good for many simpler tasks),R graphics, (especially ggplot2), and also Tableau - an excellent package for visualization. Other good visualization tools include TIBCO Spotfire and Miner3D.


3. Textbooks

 
There are many data mining and data science textbooks available, but you can check these


4. Education: Webinars, Courses, Certificates, and Degrees

 
You can start by watching some of the many free webinars and webcasts on latest topics in Analytics, Big Data, Data Mining, and Data Science.

There are also many online courses, short and long, many of them free - see KDnuggets online education directory.

Check in particular these courses:

Finally, consider getting Certificates in Data Mining, and Data Science or advanced degrees, such as MS in Data Science - see KDnuggets directory for Education in Analytics, Data Mining, and Data Science.

5. Data

 
You will need data to analyze - see KDnuggets directory of Datasets for Data Mining, including


6. Competitions

 
Again, you will best learn by doing, so participate in Kaggle competitions - start with beginner competitions, such as Predicting Titanic Survival using Machine Learning


7. Interact: Meetings, Groups, and Social Networks

 
You can join many peer groups - see Top 30 LinkedIn Groups for Analytics, Big Data, Data Mining, and Data Science.

AnalyticBridgeis an active community for Analytics and Data Science.

You can attend some of the many Meetings and Conferences on Analytics, Big Data, Data Mining, Data Science, & Knowledge Discovery.

Also, consider joining ACM SIGKDD, which organizes the annual KDD conference - the leading research conference in the field.

More ...

 
Check also other answers:

转载自:http://www.kdnuggets.com/2013/10/7-steps-learning-data-mining-data-science.html


"Data Mining and Learning Analytics: Applications in Educational Research" 2016 | ISBN-10: 1118998235 | 320 pages | PDF | 6 MB Addresses the impacts of data mining on education and reviews applications in educational research teaching, and learning This book discusses the insights, challenges, issues, expectations, and practical implementation of data mining (DM) within educational mandates. Initial series of chapters offer a general overview of DM, Learning Analytics (LA), and data collection models in the context of educational research, while also defining and discussing data mining’s four guiding principles— prediction, clustering, rule association, and outlier detection. The next series of chapters showcase the pedagogical applications of Educational Data Mining (EDM) and feature case studies drawn from Business, Humanities, Health Sciences, Linguistics, and Physical Sciences education that serve to highlight the successes and some of the limitations of data mining research applications in educational settings. The remaining chapters focus exclusively on EDM’s emerging role in helping to advance educational research—from identifying at-risk students and closing socioeconomic gaps in achievement to aiding in teacher evaluation and facilitating peer conferencing. This book features contributions from international experts in a variety of fields. Includes case studies where data mining techniques have been effectively applied to advance teaching and learning Addresses applications of data mining in educational research, including: social networking and education; policy and legislation in the classroom; and identification of at-risk students Explores Massive Open Online Courses (MOOCs) to study the effectiveness of online networks in promoting learning and understanding the communication patterns among users and students Features supplementary resources including a primer on foundational aspects of educational mining and learning analytics Data Mining and Learning Analytics: Applications in Educational Research is written for both scientists in EDM and educators interested in using and integrating DM and LA to improve education and advance educational research.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值