Learn the basics of business decision making with python code

With a lot of hype going on with the data science field, most of us jump directly into machine learning models and algorithms to make business decisions. All the online courses available fail to teach the very basics of decision making. Hypotheses testing is one of the basic building blocks of decision making and oldest. The earliest use of hypotheses testing was in the 1700s by John Arbuthnot to test whether male and female births are equally likely to occur.

In this article, we will be discussing everything about hypotheses testing at the beginner level along with python code making…

Innovation, News, Technology

A data collaboration tool that eliminates data disorganization for small teams without any IT lift

Source: DataLogz

Accept it or not, every data science/analytics team has faced difficulty in managing, organizing, and collaborating datasets that they are working on. DataLogz is a free web tool that offers a zero-implementation cost solution for data science and analytics teams to organize data without complicated IT procedures. This tool can be used immediately without any hustle and bustle, which helps in understanding data faster to generate valuable insights and document data in a modern way instead of a traditional spreadsheet or word document, which of course, no one likes to read.

Before I jump into the working of the app…

Data Science, Statistics

Statistical concepts with examples, formula, and python code

Source: flexjobs


  1. Estimate of Location
  • Mean
  • Trimmed Mean
  • Weighted Mean
  • Median
  • Mode

2. Estimate of Variability

  • Deviation
  • Mean Absolute Deviation
  • Median Absolute Deviation
  • Variance
  • Standard Deviation
  • Interquartile Range

3. Correlation

Understanding the dataset

We will be using simple product details dataset which contains Product ID, Cost Price, and Selling Price to demonstrate various statistical methods.


The stable version of Python 3.9 is here

Photo by Markus Winkler on Unsplash

The stable version of Python 3.9.0 has been released on 5th October 2020. Let’s see the new major features.

Dictionary Merging

Consider two dictionaries having the same key-value pairs except one of the values in one dictionary. For example, a person's email id has been changed recently which is in a new dictionary(b) and you would like to update the email id in the original dictionary(a) containing other details.

There are two ways to do it(updating dictionary a) in Python 3.9.

Output: {'id' : 10, 'username' : 'python3.9', 'email' : ''}

Type Hints

Python now supports native type hinting. You can have…

All commonly used MySQL functions in one place with examples and a short explanation.


There are a ton of functions provided by MySQL and most of them are used extensively. I will be providing the most commonly used functions with a short description. The intension of the article is to provide one spot for all MySQL functions so that one can quickly go through it before your interview or an examination. I’m assuming you already have basic knowledge of SQL. Without wasting your time let me directly jump into the functions.

Before that, I would like you to know that I have used MySQL Workbench to execute the queries and employee database. …

Building web-based visualization in Python from scratch

Bokeh is a data visualization library in Python. It provides highly interactive graphs and plots. What makes it different from other Python plotting libraries is that the output from Bokeh will be on the web page, meaning if we run the code in python editor the resulting plot will be in the browser. This gives the advantage of embedding the Bokeh plot on any website using Django or Flask.

Most of us are familiar with the iris dataset, it has morphological data of three different flower species namely Setosa, Virginica, and Versicolor.

Data Science

Are human data scientist’s really required?

Source: Forbes

dabl stands for Data Analysis Baseline Library. The idea behind dabl is to make supervised learning automated for reducing boilerplate for common tasks. Meaning, while building any predictive model the data has to be cleaned, analyzed, and run through many models with different parameter tuning to get the best accuracy rate which needs several lines of code and man time, all these tasks will be handled by dabl with very few lines of code saving time and money of someone handling tons of data each day.

The main idea behind developing the library is to allow data scientists to spend…

Data Science

Getting the most out of Matplotlib and Seaborn

Photo by Anthony Fomin on Unsplash

CitiBike is New York City’s famous bike rental company and the largest in the USA. CitiBike launched in May 2013 and has become an essential part of the transportation network. They make commute fun, efficient, and affordable — not to mention healthy and good for the environment.

I have got the data of CityBike riders of June 2013 from Kaggle. I will walk you through the complete exploratory data analysis answering some of the questions like:

  1. Where do CitiBikers ride?
  2. When do they ride?
  3. How far do they go?
  4. Which stations are most popular?
  5. What days of the week are…

Data Science

Advanced methods and function to crunch some data

Source: Amazon

Pandas is used mainly for reading, cleaning, and extracting insights from data. We will see an advanced use of Pandas which are very important to a Data Scientist. These operations are used to analyze data and manipulate it if required. These are used in the steps performed before building any machine learning model.

  1. Summarising Data
  2. Concatenation
  3. Merge and Join
  4. Grouping
  5. Pivot Table
  6. Reshaping multi-index DataFrame

We will be using the very famous Titanic dataset to explore the functionalities of Pandas. Let’s just quickly import NumPy, Pandas, and load Titanic Dataset from Seaborn.

import numpy as np
import pandas as pd
import seaborn…

Using the classic Titanic dataset to unleash the power of Pandas.

Courtesy-Movie Still (Kung Fu Panda 2)

If you are already familiar with NumPy, Pandas is just a package build on top of it. Pandas provide more flexibility than NumPy to work with data. While in NumPy we can only store values of single data type(dtype) Pandas has the flexibility to store values of multiple data type. Hence, we say Pandas is heterogeneous. We will unpack several more advantages of Pandas today.

Since we will be referring to NumPy in every section, I’m assuming you have knowledge of NumPy if not I will be dropping links to resources at the end of the article.

I’m considering the…

Sujan Shirol

Python developer | Studying Master's in Data Science | I believe teaching is the best way to learn |

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store