Python Week 1 + Week 2

Pradeep Ankem
3 min readJun 1, 2019

--

Assets (Wk1): Slides, Data Set, Quizzes

Assets (Wk2): Slides, Data Set, Quizzes

Try this: It’s a Zen Monk among Programming Languages

import this

ML Definition

Basics:

List [Guessing Number] [link]

Tuples

Dictionaries [Chatbot example]

Give examples on the above

Introduction to Numpy [ Connect to Cheat Sheet]

Introduction to Pandas [ Connect to Cheat Sheet]

Try this — Automated Pandas App [link]

Case Study (Uber Rides)

Uber Drive Data Analysis -

Data Set [link]

The data of a driver’s uber trips are available for year 2016.

Your manager wants you to explore this data to give him some useful insights about the trip behaviour of a Uber driver.

Dataset -

The dataset contains Start Date, End Date, Start Location, End Location, Miles Driven and Purpose of drive (Business, Personal, Meals, Errands, Meetings, Customer Support etc.)

Qs

1. Import the libraries
2.Get the data and observe it
3.Check missing values, either remove it or fill it.
4.Get summary of data using python function.
5.Explore the data parameter wise

Here we have information of destination(start and stop), time(start and stop), category and purpose of trip, miles covered.

Steps I would Recommend

  1. Start Excel
  2. Start Power BI
  3. Take up basic Stats Class
  4. Learn R
  5. Apply whatever you have learnt in Step 1,2 and 3 using R
  6. Learn Python (including Data Science Libraries)
  7. Do some competitions in Kaggle
  8. Read Subject Related books
  9. Try to Learn ML concepts
  10. Try to teach your Machine to Classify/Cluster and Predict
  11. Update your Resume as Data Scientist and attend Interviews
  12. If failed in step 11, Repeat from step 7

Why Python is my personal choice

  • It’s free
  • XKCD
  • Turtles 🐢
  • PyGame
  • Tkinter ( for desktop apps)
  • Flask web application
  • ML Libraries
  • Black Hole
  • No semi colon business

Week 2

Introduction to Matplotlib [Cheat Sheet]

  • XKCD [Notebook in Kaggle]
  • Visualization tools

Introduction to Seaborn [Cheat Sheet]

  • Seaborn Datasets
  • Visualization tools

Introduction to Plotly [Cheat Sheet]

  • Interactive Charts

Case Study:

Honey production data set-

This dataset provides insight into honey production supply and demand in America by state from 1998 to 2012.

Dataset [Github link]

The dataset contains numcol, yieldprod, totalprod, stocks , priceperlb, prodvalue, and other useful information likeCertain states are excluded every year (ex. CT) to avoid disclosing data for individual operations.

For Reference: https://www.kaggle.com/arthurpaulino/honey-production/data

Qs

•Import pandas, numpy, seaborn, matplotlib.pyplot packages

•Get the data

•Explore the data for non-null and extreme values

•How many States are included in the dataset?

•Which are the States that are included in this dataset?

•Calculate the average production for each state across all years

•How many years data is provided in the dataset? And what is the starting and ending year?

Online Resources:

Trinket.io (for automation)

pythontutor.com (for basic input syntax test)

Colab Research (Google’s Python Notebook)

Azure Notebooks (Microsoft’s Python Notebook)

Jupyter.org (bit.do/pythontinker)

R Notebooks (Python 2 Notebook)

Rextester (Python IDE)

Repl.it (Python IDE with updated modules)

Hackerearth (for Practice)

CodeSkulptor (for CGI effects)

Hacker Rank(for Practice)

Blockly (for simpler understading)

W3Schools.org (Python Module)

Tutorials Point (Python 3)

Data Camp (bit.do/skynetcode)

Exercises:

Calculate BMI

Check if a name is Male/Female

If you throw two, which number is your best bet ?

Create a Password Generator

Books:

Think Python [link]

Python for Data Science by Jack Vander Plas [link]

Discussion:

Learn by Trail and Error -Or- Academic Approach

True Story: Most of your job is cleaning and debugging

Do I need to be Statistician or a PhD or an Applied Mathematician or a Pro Programmer

Await for the Eureka Moments, do not rush yourself

Topics to be meditated:

1. For Univariate distributions :

sns.distplot(auto[‘normalized_losses’])

2. For bivariate distributions :

sns.jointplot(auto[‘engine_size’], auto[‘horsepower’])

3. For multivariate distributions:

sns.pairplot(auto[[‘normalized_losses’, ‘engine_size’, ‘horsepower’]])

Action Spots:

  • Keep .ipynb files in Azure Notebooks
  • Keep the data set in Gitlab

Qs for Poll

  1. Stats/Programming/Business Domain
  2. Excel

Meme World

Confusion Matrix

Interview

TRIO

[Next Week Meme]

--

--

Pradeep Ankem
Pradeep Ankem

Written by Pradeep Ankem

In Parallel Universe, I would have been a Zen Monk.

No responses yet