Python Week 1 + Week 2

3 min readJun 1, 2019

Assets (Wk1): Slides, Data Set, Quizzes

Assets (Wk2): Slides, Data Set, Quizzes

Try this: It’s a Zen Monk among Programming Languages

import this

ML Definition

Basics:

List [Guessing Number] [link]
Tuples
Dictionaries [Chatbot example]

Give examples on the above

Introduction to Numpy [ Connect to Cheat Sheet]

Introduction to Pandas [ Connect to Cheat Sheet]

Try this — Automated Pandas App [link]

Case Study (Uber Rides)

Uber Drive Data Analysis -

Data Set [link]

The data of a driver’s uber trips are available for year 2016.

Your manager wants you to explore this data to give him some useful insights about the trip behaviour of a Uber driver.

Dataset -

The dataset contains Start Date, End Date, Start Location, End Location, Miles Driven and Purpose of drive (Business, Personal, Meals, Errands, Meetings, Customer Support etc.)

1. Import the libraries
2.Get the data and observe it
3.Check missing values, either remove it or fill it.
4.Get summary of data using python function.
5.Explore the data parameter wise

Here we have information of destination(start and stop), time(start and stop), category and purpose of trip, miles covered.

Steps I would Recommend

Start Excel
Start Power BI
Take up basic Stats Class
Learn R
Apply whatever you have learnt in Step 1,2 and 3 using R
Learn Python (including Data Science Libraries)
Do some competitions in Kaggle
Read Subject Related books
Try to Learn ML concepts
Try to teach your Machine to Classify/Cluster and Predict
Update your Resume as Data Scientist and attend Interviews
If failed in step 11, Repeat from step 7

Why Python is my personal choice

It’s free
XKCD
Turtles 🐢
PyGame
Tkinter ( for desktop apps)
Flask web application
ML Libraries
Black Hole
No semi colon business

Week 2

Introduction to Matplotlib [Cheat Sheet]

XKCD [Notebook in Kaggle]
Visualization tools

Introduction to Seaborn [Cheat Sheet]

Seaborn Datasets
Visualization tools

Introduction to Plotly [Cheat Sheet]

Interactive Charts

Case Study:

Honey production data set-

This dataset provides insight into honey production supply and demand in America by state from 1998 to 2012.

Dataset [Github link]

The dataset contains numcol, yieldprod, totalprod, stocks , priceperlb, prodvalue, and other useful information likeCertain states are excluded every year (ex. CT) to avoid disclosing data for individual operations.

For Reference: https://www.kaggle.com/arthurpaulino/honey-production/data

•Import pandas, numpy, seaborn, matplotlib.pyplot packages

•Get the data

•Explore the data for non-null and extreme values

•How many States are included in the dataset?

•Which are the States that are included in this dataset?

•Calculate the average production for each state across all years

•How many years data is provided in the dataset? And what is the starting and ending year?

Online Resources:

Trinket.io (for automation)

pythontutor.com (for basic input syntax test)

Colab Research (Google’s Python Notebook)

Azure Notebooks (Microsoft’s Python Notebook)

Jupyter.org (bit.do/pythontinker)

R Notebooks (Python 2 Notebook)

Rextester (Python IDE)

Repl.it (Python IDE with updated modules)

Hackerearth (for Practice)

CodeSkulptor (for CGI effects)

Hacker Rank(for Practice)

Blockly (for simpler understading)

W3Schools.org (Python Module)

Tutorials Point (Python 3)

Data Camp (bit.do/skynetcode)

Exercises:

Calculate BMI

Check if a name is Male/Female

If you throw two, which number is your best bet ?

Create a Password Generator

Books:

Think Python [link]

Python for Data Science by Jack Vander Plas [link]

Discussion:

Learn by Trail and Error -Or- Academic Approach
True Story: Most of your job is cleaning and debugging
Do I need to be Statistician or a PhD or an Applied Mathematician or a Pro Programmer
Await for the Eureka Moments, do not rush yourself

Topics to be meditated:

1. For Univariate distributions :

sns.distplot(auto[‘normalized_losses’])

2. For bivariate distributions :

sns.jointplot(auto[‘engine_size’], auto[‘horsepower’])

3. For multivariate distributions:

sns.pairplot(auto[[‘normalized_losses’, ‘engine_size’, ‘horsepower’]])

Action Spots: