Pandas Essentials [2023 Ed]

Pradeep Ankem
3 min readFeb 9, 2023
Image credits to AI

Home

Linkedin Profile of the Author: link

Class Room ~ II

Source: Dashboards — Infragistics Reveal™ Help

Things to discuss ~ II

Datasets to be discussed:

  1. IPL (link)
  2. IMDB Movie dataset (link)
  3. IT Help desk (link)

use Pandas Profiling on these datasets

Feature Engineering

1. use this link from kaggle

2. use this other link from Medium article

Things to notice in Kaggle Notebook

  • Master Survival Rate
  • IsAlone Survival Rate
  • Family Size more than 4 Survial Rate
  • How to map Sex, Embarked and Title to Categories
  • Heatmap for correlation matrix (link)

Things to try — II

  1. Blockly
  2. Sketch HowTo
  3. Generative Apps quiz
  4. All Categories Trendquiz
  5. Chat GPT quiz on Pandas functions
  6. Pandas Profiling
  7. 10 simple hacks (Link)
  8. Python wikipedia Library
  9. What are the use cases to be tried in ChatGPT
  10. Show the upcoming competition in Titanic
  11. Try Einblicks

Generative Apps Universe

Source: https://t.co/ON5eIGvnEQ" / Twitter

Quotes:

1. If you know one, you know all ~ Programming Heuristic. So, master one.

2. Show me your work, and get the job

3. You learn more from debugging than documentation, so keep trying new things

4. Ensure every script to have a logic instead of a hard coded value

5. Learn Shortcut to shortcut

6. If you are writing more than 4 lines, write it as a function

Datasets:

https://bit.ly/mtcarspy # mtcars dataset

https://bit.ly/boatsdinner #Titanic dataset

Use seaborn datasets

Train.csv

Test.csv

Syntax

Age between 5 and 7

full[(full['Age'] > 5.0) & (full['Age'] < 7.0 ) ] 

String Contains

full[(full['Cabin'].str.contains('B2',na=False)) ] #filter data by columns
full.isnull().sum()  # Check with alues are empty

Fill with mean

x = df["Calories"].mean()

df["Calories"].fillna(x, inplace = True)

Removing Rows

for x in df.index:
if df.loc[x, "Duration"] > 120:
df.drop(x, inplace = True)

Links

xkcd: link

Titanic Data Science Solutions | Kaggle

BlocklyML (pradeepankem.repl.co) + repl link (link)

Word Cloud web app

Pandas sweetviz web app (link)

To Do

Trendquiz.com

Sketch to Trail (link)

Get the Titanic dataset from kaggle using library

Have accounts in following

Datalore

Colab

Github

ChatGPT

All social media platforms

Excel Online

repl.it

kaggle

目 Reading Data From Different Sources

import * from pandas
read_csv
read_json
read_xml
read_excel

目 Concatenate

Refer to  .ipynb file

目 Merging and Joining a Dataframe

Refer to Jake VanderPlas book

目Re-shaping the Dataframe

link

目 Pivot Table

Jake Book link

目Duplicate

Below link for duplicate removal

Tutorials Point link

link

目Map and Reshape

目Group-by in Pandas

Use functions from Jake VanderPlas & Kaggle Notebook (link)

Kaggle Notebook link

目 Transpose

--

--