Python Week 1 + Week 2
Assets (Wk1): Slides, Data Set, Quizzes
Assets (Wk2): Slides, Data Set, Quizzes
Try this: It’s a Zen Monk among Programming Languages
import this
ML Definition
Basics:
List [Guessing Number] [link]
Tuples
Dictionaries [Chatbot example]
Give examples on the above
Introduction to Numpy [ Connect to Cheat Sheet]
Introduction to Pandas [ Connect to Cheat Sheet]
Try this — Automated Pandas App [link]
Case Study (Uber Rides)
Uber Drive Data Analysis -
Data Set [link]
The data of a driver’s uber trips are available for year 2016.
Your manager wants you to explore this data to give him some useful insights about the trip behaviour of a Uber driver.
Dataset -
The dataset contains Start Date, End Date, Start Location, End Location, Miles Driven and Purpose of drive (Business, Personal, Meals, Errands, Meetings, Customer Support etc.)
Qs
1. Import the libraries
2.Get the data and observe it
3.Check missing values, either remove it or fill it.
4.Get summary of data using python function.
5.Explore the data parameter wise
Here we have information of destination(start and stop), time(start and stop), category and purpose of trip, miles covered.
Steps I would Recommend
- Start Excel
- Start Power BI
- Take up basic Stats Class
- Learn R
- Apply whatever you have learnt in Step 1,2 and 3 using R
- Learn Python (including Data Science Libraries)
- Do some competitions in Kaggle
- Read Subject Related books
- Try to Learn ML concepts
- Try to teach your Machine to Classify/Cluster and Predict
- Update your Resume as Data Scientist and attend Interviews
- If failed in step 11, Repeat from step 7
Why Python is my personal choice
- It’s free
- XKCD
- Turtles 🐢
- PyGame
- Tkinter ( for desktop apps)
- Flask web application
- ML Libraries
- Black Hole
- No semi colon business
Week 2
Introduction to Matplotlib [Cheat Sheet]
- XKCD [Notebook in Kaggle]
- Visualization tools
Introduction to Seaborn [Cheat Sheet]
- Seaborn Datasets
- Visualization tools
Introduction to Plotly [Cheat Sheet]
- Interactive Charts
Case Study:
Honey production data set-
This dataset provides insight into honey production supply and demand in America by state from 1998 to 2012.
Dataset [Github link]
The dataset contains numcol, yieldprod, totalprod, stocks , priceperlb, prodvalue, and other useful information likeCertain states are excluded every year (ex. CT) to avoid disclosing data for individual operations.
For Reference: https://www.kaggle.com/arthurpaulino/honey-production/data
Qs
•Import pandas, numpy, seaborn, matplotlib.pyplot packages
•Get the data
•Explore the data for non-null and extreme values
•How many States are included in the dataset?
•Which are the States that are included in this dataset?
•Calculate the average production for each state across all years
•How many years data is provided in the dataset? And what is the starting and ending year?
Online Resources:
Trinket.io (for automation)
pythontutor.com (for basic input syntax test)
Colab Research (Google’s Python Notebook)
Azure Notebooks (Microsoft’s Python Notebook)
Jupyter.org (bit.do/pythontinker)
R Notebooks (Python 2 Notebook)
Rextester (Python IDE)
Repl.it (Python IDE with updated modules)
Hackerearth (for Practice)
CodeSkulptor (for CGI effects)
Hacker Rank(for Practice)
Blockly (for simpler understading)
W3Schools.org (Python Module)
Tutorials Point (Python 3)
Data Camp (bit.do/skynetcode)
Exercises:
Calculate BMI
Check if a name is Male/Female
If you throw two, which number is your best bet ?
Create a Password Generator
Books:
Think Python [link]
Python for Data Science by Jack Vander Plas [link]
Discussion:
Learn by Trail and Error -Or- Academic Approach
True Story: Most of your job is cleaning and debugging
Do I need to be Statistician or a PhD or an Applied Mathematician or a Pro Programmer
Await for the Eureka Moments, do not rush yourself
Topics to be meditated:
1. For Univariate distributions :
sns.distplot(auto[‘normalized_losses’])
2. For bivariate distributions :
sns.jointplot(auto[‘engine_size’], auto[‘horsepower’])
3. For multivariate distributions:
sns.pairplot(auto[[‘normalized_losses’, ‘engine_size’, ‘horsepower’]])
Action Spots:
- Keep .ipynb files in Azure Notebooks
- Keep the data set in Gitlab
Qs for Poll
- Stats/Programming/Business Domain
- Excel
Meme World
Confusion Matrix
Interview
TRIO
[Next Week Meme]