Viswa Teja
7 min readMay 20, 2021

--

Python and its libraries

Python Libraries

Python and its useful Libraries

python is a fascinating language. It caught my eye when I was looking through Quora’s Q&As. Then I got captivated by the simplicity of the language making me start going through a lot of articles. I would, therefore, like to share my views on some of the libraries of python. Python is an ocean of libraries. the beauty is the simplified semi-colons, length, and compilation time of a program. Can simply C programming VS python as noodles making VS instant noodles.

What you will learn today??

  1. numpy
  2. pandas
  3. sklearn
  4. matplotlib
  5. seaborn
  6. plotly
  7. Tensorflow

Numpy:

Numpy is a python library that adds support to multidimensional arrays and matrices which are used for the creation and computational purposes. We can create random numbers to any limit. Initialize arrays for further usage

Features:

  1. Integrated with python
  2. Multidimensional arrays and matrices replacing scalars
  3. More packages added to reduce the compile time, etc

Let’s jump start with numpy

#Importing libraries
numpy as np
arr = np.arange(0,20)
#finding square root of array
np.sqrt(arr)
#finding logorithm of an array
np.log(arr)
np.sum(arr)

As simple as it looks. If we try them it would give us some curiosity to work more on them.

#Arrays from Random numbers 
np.random.randn()

#That gives
array([ 0.203699 , -0.33052418, -0.40276094, -0.15157177,
-1.00109237,-0.5573472 , 0.62844561, -0.00810115,
-0.23314955, -0.80874789])
#the following can be performed, we getnp.arange(1,51).reshape(5,10) /50
[Ans]
array([[0.02, 0.04, 0.06, 0.08, 0.1 , 0.12, 0.14, 0.16, 0.18, 0.2 ],
[0.22, 0.24, 0.26, 0.28, 0.3 , 0.32, 0.34, 0.36, 0.38, 0.4 ],
[0.42, 0.44, 0.46, 0.48, 0.5 , 0.52, 0.54, 0.56, 0.58, 0.6 ],
[0.62, 0.64, 0.66, 0.68, 0.7 , 0.72, 0.74, 0.76, 0.78, 0.8 ],
[0.82, 0.84, 0.86, 0.88, 0.9 , 0.92, 0.94, 0.96, 0.98, 1. ]])
np.linspace(0,1,20)
#we get
[Ans]
array([ 0. , 0.05263158, 0.10526316, 0.15789474,
0.21052632, 0.26315789, 0.31578947, 0.36842105,
0.42105263, 0.47368421,
0.52631579, 0.57894737, 0.63157895, 0.68421053,
0.73684211, 0.78947368, 0.84210526, 0.89473684,
0.94736842, 1. ])
#Matrix is as easy as arrays
mat = np.arange(1,26).reshape(5,5)
mat
#we get
[Ans]
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20],
[21, 22, 23, 24, 25]])

Pandas:

Pandas is all about data frames and series. pandas create data frames and allow to make changes to the data frame grouping columns based on the categorical count, indexing, finding aggregate, etc. It can perform all the numerical analysis on data and give us hope to visualize them graphically.

There are certain ways to obtain data points and their numerical calculations using both numpy and pandas. All together these libraries allow us to get more insights into datasets.

consider an example

#Reading a dataset into the dataframe
df = pd.read_csv('scores.csv')
df.head(2)
we get ........

Hours Scores
0 2.5 21
1 5.1 47
###########################################
df.info()
we get........
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25 entries, 0 to 24
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Hours 25 non-null float64
1 Scores 25 non-null int64
dtypes: float64(1), int64(1)
memory usage: 528.0 bytes
##########################################
df.describe()
we get.........
Hours Scores
count 25.000000 25.000000
mean 5.012000 51.480000
std 2.525094 25.286887
min 1.100000 17.000000
25% 2.700000 30.000000
50% 4.800000 47.000000
75% 7.400000 75.000000
max 9.200000 95.000000

Rest all can be considered with our exploration speed and techniques used

Sklearn:

sklearn is mostly used for beginners and experts to explore more on machine learning models. It consists of packages for every model like linear regression, Decision Tree, etc. We just need to evaluate and deploy the model to understand more about it. This library made lengthy codes to just fit, transform and predict. So, let me take a model as an example

Firstly Importing Pandas and numpy as discussed above

Import Required Libraries:

#Importing libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

Don't worry about matplotlib. It's just a visualization library to visually represent a group of data points.

Libraries were easy to import right.

Let’s get to the next fruit.

Using these libraries to import the dataset:

Before that let's discuss the dataset

the dataset is nothing but a collection of data with related or similar kinds of data in one column. each column referred to as a feature and each row referred to as a sample.

Let's consider a simple dataset for example of students scores

                         scores.csv
Hours,Scores
2.5, 21
5.1, 47
3.2, 27
8.5, 75
3.5, 30
1.5, 20
9.2, 88
5.5, 60
8.3, 81
2.7, 25
7.7, 85
5.9, 62
4.5, 41
3.3, 42
1.1, 17
8.9, 95
2.5, 30
1.9, 24
6.1, 67
7.4, 69
2.7, 30
4.8, 54
3.8, 35
6.9, 76
7.8, 86

The above as we are watching consists of 25 rows and 2 columns

Let's get deeper

Importing a Dataset:

#importing a dataset is soo easy
df = pd.read_csv('scores.csv')
#read the dataframe
df.head()
#check for null values
df.isnull().sum()
#since the dataset has less number of data points. It was easy to #identify but large datasets make things clumsy

Assigning variables to make output and input columns:

x = df.iloc[:,:-1].values #means all columns except last column
y= df.iloc[:,1].values #final or last column

Dividing data into train data and test data:

this is mainly done to test the model cuz we never know how we are performing until tested

#Import the package to split the data
from sklearn.model_selection import train_test_split
#splitting the data
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.20, random_state=42)
#Reshape the data to avoid the any errors
X_train = X_train.reshape(-1,1)
X_test = X_test.reshape(-1,1)

Fitting the model and predicting the model:

#import sklearn and fit the model 
from sklearn.linear_model import LinearRegression
print('importing regression model')
lr = LinearRegression()
lr.fit(X_train,y_train)
#Predit the model
predictions = lr.predict(X_test)
predictions
#the following prediction is obtained after trainng array([83.18814104, 27.03208774, 27.03208774, 69.63323162, 59.95115347])

Matplotlib:

Matplotlib is a visualization library used for graphically visualizing data. Plots like a bar graph, histogram, Scatterplots, etc are plotted using matplotlib we can actually add labels to the x and y-axis, add title and also get the upper value of each bar or each histogram.

We can even plot stacked bars too. If we go deeper we can find many more features that can be explored.

Plotting a scatter is as beautiful as it looks

syntax of matplotlib
import matplotlib.pyplot as plt
#passing required arguments inside each plot#plotiing a histogram
plt.histogram()
plt.xlabel('')
plt.ylabel('')
plt.show()
#plotting a bar graph
plt.bar()
#plotting a scatterplot
plt.scatter()
Scatterplot

Plotting a bar graph:

A bar graph between actual and predicted values

Plotting a stacked bar graph with annotation

A Stacked bar graph

We can even have subplots as many as we want. they can be side by side or up and down all specified by the user.

Matplotlib subplots

Seaborn:

It's a high-level interface for drawing attractive and mesmerizing graphs of different types. seaborn has many graphs like heatmaps, pair plots, violin plots, boxplots, line plots, count plots, bar plots, scatter plots, density plots, etc.

this seaborn is based on the matplotlib library and can rely on each other

here are some of the Example plots.

Bar plot

A Factorplot can also do make you happy

Factor plot

A heatmap using seaborn

Heat map

Plotly:

plotly is a visualization library with 3d visualization, allowing to explore data with geographical graphs, 3d visualization of a data point and shows the position of the data point, required information regarding that data point and also we can create n number of graphs

Plotly made visualization easy

From the above picture, we can say that pointing to a data point gives the coordinates and values of that particular point which is very much useful for machine learning projects

Different plotly graphs

All together these libraries allow us to get more insights into datasets.

Tensorflow:

you can think of a tensor as a matrix on steroids — expanded to n more dimensions. It does not calculate small multiplications and convolutions. what it does is deploy models on complex neural networks. As it is the used well back end engine for Keras all it does is well-optimized tensor manipulation.

The beauty and advantage are it can work on both CPU and GPU.

A few of the vlogs I consider worthy are

youtube: sentdex (python)

youtube: Krish naik (for statistics and ml model deployments)

Articles on Analytics Vidhya and medium

Finally practicing in jupyter notebook, Hacker Rank, and leetcode.

I hope this article was useful

Happy learning

Thank you

--

--