Table of contents
0. My Own GitHub
https://github.com/Chase-Yi/Clean-IT
1. Python in Google Colab
⭕ The 3 environments used by Google Colab :
● Linux virtual machine hosted by Google
● Google Drive
● Local PC
⭕ Nice features of Colab :
● Enable GPU
● Manage sessions
● Easy to visualize work and try code without setup
⭕ Pros :
● Easy to launch
● Free access to GPU
● Collaborative
● Free
⭕ Cons :
● Limited use of GPU
● Long sessions are killed
● Data is lost if not saved on Drive/local disk
⭕ Try out Colab yourself
import pandas as pd
data = pd.read_csv("C:/Users/.../Desktop/colab_exercise_data.csv")
data.head(10)
import matplotlib.pyplot as plt
import seaborn as sns
print(data.head())
print(data.tail())
print(data.shape)
print(data.info())
# draw scatter plot
plt.scatter(x=data['Profit'], y=data['Cost'])
# draw pie chart
data['Age_Group'].value_counts().plot(kind='pie', figsize=(6,6))
# draw bar plot
ax = data['Country'].value_counts().plot(kind='bar', figsize=(14,6))
# draw histogram
sns.displot(data['Revenue'], kind="kde")
# show correlation matrix
corr = data._get_numeric_data().corr()
corr
# show heat map
fig = plt.figure(figsize=(8,8), dpi=400)
sns.heatmap(corr,cmap='RdBu',annot=True)
# show density plot
ax = data['Unit_Cost'].plot(kind='density', figsize=(14,6))
ax.axvline(data['Unit_Cost'].mean(), color='red')
ax.axvline(data['Unit_Cost'].median(), color='green')
# show pair plots
sns.pairplot(data.iloc[:, 12:15],kind="reg",diag_kind="kde")
2. Python local setup
⭕ Download Anaconda here : https://www.anaconda.com/products/distribution
⭕ conda --version to check local conda version :
⭕ python --version to check local python version :
⭕ conda list to check installed packages :
⭕ Check available conda packages and installation here :https://anaconda.org/anaconda/repo
⭕ pip is already installed with Anaconda Windows version
⭕ pip list to check all installed dependencies with pip :
⭕ pip freeze > requirements.txt to save environment current packages with pip
⭕ conda list --export > requirements.txt to save environment current packages with conda
⭕ pip install package_name to install a package with pip (e.g. pip install numpy)
⭕ conda install --channel “anaconda” package_name to install a package with conda
(e.g. conda install --channel “anaconda” numpy)
3. Python virtual environment
⭕ A virtual environment is a Python environment such that the Python interpreter, libraries and scripts installed into it are all isolated from those installed in other virtual environments and from the “system” Python
⭕ Assign virtual environment to a project :
● Create the project directory
● conda create - n envname to create virtual environment
● conda activate envname to activate virtual environment
● conda deactivate to deactivate virtual environment
● conda env list to check all available virtual environments
⭕ Problem :
● create a virtual environment under python 3.7 :
conda create -name p37 python=3.7
● install a package named flask 2.2 :
conda activate p37
conda install -c anaconda flask=2.2
● generate the dependencies file :
pip freeze > requirements.txt
# or
conda list --export > requirements.txt
● go back to the base and delete the virtual environment :
conda deactivate
conda env remove --name p37
conda env list
4. Python IDEs
⭕ IDE = Integrated Development Environment
⭕ Popular Python IDEs: Pycharm, Spyder
5. R local setup
⭕ Install R here : https://cran.r-project.org/bin/windows/base/
⭕ Install RStudio here : https://www.rstudio.com/products/rstudio/download/#download
⭕ R is a programming language (like Python) !
⭕ Rstudio is an IDE to write R code (like Spyder) !
⭕ Similar to Python, install packages to be used
6. Github
⭕ A code hosting platform that lets you version your code and collaborate on projects with others
⭕ Create a Github account here :https://github.com/
⭕ Markdown cheat sheet here : Basic writing and formatting syntax - GitHub Docs
⭕ pip install -r requirements.txt command to install project dependencies locally from a requirements.txt file
⭕ pip freeze > requirements.txt command to generate all dependencies relevant to a project and write them in the requirements.txt file
7. Git locally
⭕ Git is a software for tracking changes in any set of files
⭕ Install Git for Windows here : Git for Windows
⭕ During installation, override the default branch name to “main” (identically to Github)
⭕ Configure global variables (name and email) :
● git config --global user.name “<your_name>”
● git config --global user.email “<your_email>”
● git config --list
⭕ Git process :
● Modify file -> Stage file -> Commit file
⭕ To reset/cancel your last commit :
● git reset --soft HEAD~1
⭕ Git branches :
● By default, your repository has one branch named “main”
● You can create a branch off main (make a copy of it) to work on without affecting the main branch in production
● If another person makes changes to main while you work, you can pull in those changes
⭕ To create a new branch :
● git branch <branch_name>
⭕ To switch to the new branch :
● git checkout <branch_name>
⭕ To delete a branch :
● git branch -d <branch_name>
⭕ To get a graphical view of all branches and commits :
● git log --graph --oneline --all
8. Additional Reading
⭕ Article on using Conda effectively :
https://towardsdatascience.com/conda-essential-concepts-and-tricks-e478ed53b5b