With a lot of hype going on with the data science field, most of us jump directly into machine learning models and algorithms to make business decisions. All the online courses available fail to teach the very basics of decision making. Hypotheses testing is one of the basic building blocks of decision making and oldest. The earliest use of hypotheses testing was in the 1700s by John Arbuthnot to test whether male and female births are equally likely to occur.
In this article, we will be discussing everything about hypotheses testing at the beginner level along with python code making…
Accept it or not, every data science/analytics team has faced difficulty in managing, organizing, and collaborating datasets that they are working on. DataLogz is a free web tool that offers a zero-implementation cost solution for data science and analytics teams to organize data without complicated IT procedures. This tool can be used immediately without any hustle and bustle, which helps in understanding data faster to generate valuable insights and document data in a modern way instead of a traditional spreadsheet or word document, which of course, no one likes to read.
2. Estimate of Variability
3. Correlation
We will be using simple product details dataset which contains Product ID, Cost Price, and Selling Price to demonstrate various statistical methods.
The stable version of Python 3.9.0 has been released on 5th October 2020. Let’s see the new major features.
Consider two dictionaries having the same key-value pairs except one of the values in one dictionary. For example, a person's email id has been changed recently which is in a new dictionary(b) and you would like to update the email id in the original dictionary(a) containing other details.
There are two ways to do it(updating dictionary a) in Python 3.9.
Output: {'id' : 10, 'username' : 'python3.9', 'email' : 'newpython@gmail.com'}
Python now supports native type hinting. You can have…
There are a ton of functions provided by MySQL and most of them are used extensively. I will be providing the most commonly used functions with a short description. The intension of the article is to provide one spot for all MySQL functions so that one can quickly go through it before your interview or an examination. I’m assuming you already have basic knowledge of SQL. Without wasting your time let me directly jump into the functions.
Before that, I would like you to know that I have used MySQL Workbench to execute the queries and employee database. …
Bokeh is a data visualization library in Python. It provides highly interactive graphs and plots. What makes it different from other Python plotting libraries is that the output from Bokeh will be on the web page, meaning if we run the code in python editor the resulting plot will be in the browser. This gives the advantage of embedding the Bokeh plot on any website using Django or Flask.
Most of us are familiar with the iris dataset, it has morphological data of three different flower species namely Setosa, Virginica, and Versicolor. …
dabl stands for Data Analysis Baseline Library. The idea behind dabl is to make supervised learning automated for reducing boilerplate for common tasks. Meaning, while building any predictive model the data has to be cleaned, analyzed, and run through many models with different parameter tuning to get the best accuracy rate which needs several lines of code and man time, all these tasks will be handled by dabl with very few lines of code saving time and money of someone handling tons of data each day.
The main idea behind developing the library is to allow data scientists to spend…
CitiBike is New York City’s famous bike rental company and the largest in the USA. CitiBike launched in May 2013 and has become an essential part of the transportation network. They make commute fun, efficient, and affordable — not to mention healthy and good for the environment.
I have got the data of CityBike riders of June 2013 from Kaggle. I will walk you through the complete exploratory data analysis answering some of the questions like:
Pandas is used mainly for reading, cleaning, and extracting insights from data. We will see an advanced use of Pandas which are very important to a Data Scientist. These operations are used to analyze data and manipulate it if required. These are used in the steps performed before building any machine learning model.
We will be using the very famous Titanic dataset to explore the functionalities of Pandas. Let’s just quickly import NumPy, Pandas, and load Titanic Dataset from Seaborn.
import numpy as np import pandas as pd…
If you are already familiar with NumPy, Pandas is just a package build on top of it. Pandas provide more flexibility than NumPy to work with data. While in NumPy we can only store values of single data type(dtype) Pandas has the flexibility to store values of multiple data type. Hence, we say Pandas is heterogeneous. We will unpack several more advantages of Pandas today.
Since we will be referring to NumPy in every section, I’m assuming you have knowledge of NumPy if not I will be dropping links to resources at the end of the article.
I’m considering the…
Python developer | Studying Master's in Data Science | I believe teaching is the best way to learn | www.linkedin.com/in/sujan-shirol/