Journey from a Python noob to a Kaggler on Python (从Python数据分析菜鸟到专家)

Journey from a Python noob to a Kaggler on Python

导言:本文写的实在太全面了,先忍不住收藏下来,过两天发表中文版翻译。


原链接:

Journey from a Python noob to a Kaggler on Python

So, you want to become a data scientist or may be you are already one and want to expand your tool repository. You have landed at the right place. The aim of this page is to provide a comprehensive learning path to people new to python for data analysis. This path provides a comprehensive overview of steps you need to learn to use Python for data analysis. If you already have some background, or don’t need all the components, feel free to adapt your own paths and let us know how you made changes in the path.


Step 0: Warming up

Before starting your journey, the first question to answer is:

Why use Python?

or

How would Python be useful?

Watch the first 30 minutes of this talk from Jeremy, Founder of DataRobot at PyCon 2014, Ukraine to get an idea of how useful Python could be.

 

Step 1: Setting up your machine

Now that you have made up your mind, it is time to set up your machine. The easiest way to proceed is to justdownload Anaconda from Continuum.io . It comes packaged with most of the things you will need ever. The major downside of taking this route is that you will need to wait for Continuum to update their packages, even when there might be an update available to the underlying libraries. If you are a starter, that should hardly matter.

If you face any challenges in installing, you can find more detailed instructions for various OS here

 

Step 2: Learn the basics of Python language

You should start by understanding the basics of the language, libraries and data structure. The python track fromCodecademy is one of the best places to start your journey. By end of this course, you should be comfortable writing small scripts on Python, but also understand classes and objects.

Specifically learn: Lists, Tuples, Dictionaries, List comprehensions, Dictionary comprehensions 

Assignment: Solve the python tutorial questions on HackerRank. These should get your brain thinking on Python scripting

Alternate resources: If interactive coding is not your style of learning, you can also look at The Google Class for Python. It is a 2 day class series and also covers some of the parts discussed later.

 

Step 3: Learn Regular Expressions in Python

You will need to use them a lot for data cleansing, especially if you are working on text data. The best way to learn Regular expressions is to go through the Google class and keep this cheat sheet handy.

Assignment: Do the baby names exercise

If you still need more practice, follow this tutorial for text cleaning. It will challenge you on various steps involved in data wrangling.

Step 4: Learn Scientific libraries in Python – NumPy, SciPy, Matplotlib and Pandas

This is where fun begins! Here is a brief introduction to various libraries. Let’s start practicing some common operations.

  • Practice the NumPy tutorial thoroughly, especially NumPy arrays. This will form a good foundation for things to come.
  • Next, look at the SciPy tutorials. Go through the introduction and the basics and do the remaining ones basis your needs.
  • If you guessed Matplotlib tutorials next, you are wrong! They are too comprehensive for our need here. Instead look at this ipython notebook till Line 68 (i.e. till animations)
  • Finally, let us look at Pandas. Pandas provide DataFrame functionality (like R) for Python. This is also where you should spend good time practicing. Pandas would become the most effective tool for all mid-size data analysis. Start with a short introduction, 10 minutes to pandas. Then move on to a more detailed tutorial on pandas.

You can also look at Exploratory Data Analysis with Pandas and Data munging with Pandas

Additional Resources:

  • If you need a book on Pandas and NumPy, “Python for Data Analysis by Wes McKinney”
  • There are a lot of tutorials as part of Pandas documentation. You can have a look at them here

Assignment: Solve this assignment from CS109 course from Harvard.

 

Step 5: Effective Data Visualization

Go through this lecture form CS109. You can ignore the initial 2 minutes, but what follows after that is awesome! Follow this lecture up with this assignment

 

Step 6: Learn Scikit-learn and Machine Learning

Now, we come to the meat of this entire process. Scikit-learn is the most useful library on python for machine learning. Here is a brief overview of the library. Go through lecture 10 to lecture 18 from CS109 course from Harvard. You will go through an overview of machine learning, Supervised learning algorithms like regressions, decision trees, ensemble modeling and non-supervised learning algorithms like clustering. Follow individual lectures with theassignments from those lectures.

 

Additional Resources:

Assignment: Try out this challenge on Kaggle

 

Step 7: Practice, practice and Practice

Congratulations, you made it!

You now have all what you need in technical skills. It is a matter of practice and what better place to practice than compete with fellow Data Scientists on Kaggle. Go, dive into one of the live competitions currently running on Kaggleand give all what you have learnt a try!

 

Step 8: Deep Learning

Now that you have learnt most of machine learning techniques, it is time to give Deep Learning a shot. There is a good chance that you already know what is Deep Learning, but if you still need a brief intro, here it is.

I am myself new to deep learning, so please take these suggestions with a pinch of salt. The most comprehensive resource is deeplearning.net. You will find everything here – lectures, datasets, challenges, tutorials. You can also try the course from Geoff Hinton a try in a bid to understand the basics of Neural Networks.

 

P.S. In case you need to use Big Data libraries, give Pydoop and PyMongo a try. They are not included here as Big Data learning path is an entire topic in itself.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
### 回答1: 我可以使用Python来处理JSON文件。Python有专门的模块,用于处理JSON格式的文件,这叫做json模块。使用它,可以容易地将JSON数据转换为Python数据结构,也可以将Python数据结构转换为JSON数据,从而便于处理和分析。 ### 回答2: Python可以很方便地处理JSON文件。首先,需要导入json模块。然后,可以使用json.load()函数将JSON文件加载为Python的字典或列表对象。接下来,可以对该对象进行操作,例如添加、删除、修改字典键值对,或者遍历列表元素。修改完毕后,可以使用json.dump()函数将Python对象转换为JSON格式并保存到文件中。 下面是一个简单的示例代码,展示了如何使用Python处理JSON文件: ```python import json # 从JSON文件加载数据 with open('data.json') as file: data = json.load(file) # 添加新的键值对 data['name'] = 'John' # 删除键值对 del data['age'] # 修改键值对 data['gender'] = 'Male' # 遍历列表元素 for item in data['hobbies']: print(item) # 将修改后的数据保存为JSON文件 with open('updated_data.json', 'w') as file: json.dump(data, file) ``` 上述代码首先使用json.load()函数加载了一个名为"data.json"的JSON文件,将其转换为Python的字典或列表对象,并赋值给变量"data"。然后,在"data"对象上进行了一些操作,例如添加、删除、修改键值对,以及遍历列表元素。最后,使用json.dump()函数将修改后的数据保存为一个名为"updated_data.json"的JSON文件。 Python的json模块提供了更多高级的处理JSON的方法,例如json.dumps()函数可以将Python对象转换为JSON字符串,json.loads()函数可以将JSON字符串转换为Python对象等。没人对这一模块进行了详细的总结,请详细阅读官方文档,以便更深入地理解和使用Python处理JSON文件的能力。 ### 回答3: Python可以使用内置的json库来处理JSON文件。 要处理JSON文件,首先需要使用`json`模块中的函数将JSON文件加载为Python中的数据结构。可以使用`json.load()`函数来加载JSON文件。例如: ``` import json # 打开JSON文件 with open('data.json') as f: # 加载JSON数据 data = json.load(f) print(data) ``` 将JSON文件加载到`data`变量中后,就可以像使用Python对象一样访问和操作JSON数据。 要将Python对象写入到JSON文件中,可以使用`json.dump()`函数。例如: ``` import json # 创建Python对象 data = {'name': 'John', 'age': 30, 'city': 'New York'} # 写入JSON文件 with open('data.json', 'w') as f: # 将Python对象转换为JSON格式并写入文件 json.dump(data, f) ``` 可以将需要写入的Python对象作为第一个参数传递给`json.dump()`函数,并指定要写入的目标文件。 通过使用这些函数,Python可以方便地读取和写入JSON文件,使得对JSON数据的处理更加简单和灵活。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值