In recent years, python programming language has gained lot of popularity and because of its qualities, has become most developers first choice of coding. Python has extremely large number of packages available for almost every task. One of the many libraries of python is *Pandas*, which is widely used for data analysis in python. In this blog, we will go through the fundamentals of pandas.

*Pandas *is
an open source python library which provides high performance, easy to
use, flexible and expressive data structures designed to make working
with structured (tabular, multi-dimensional, potentially heterogeneous)
and time series data easily and intuitively. It aims to be the
fundamental high-level building block for doing practical, real world
data analysis in python. *Pandas* is built on *Numpy*, which is also an open source python library.

Pandas is well suited for different kinds of data such as:

- tabular data such as SQL table or Excel spreadsheet.
- Ordered and Unordered time series data.
- homogeneous or heterogeneous arbitrary matrix data.
- any other form of statistical data.

For a data scientist, working with data consists of performing tasks like cleaning the data, modelling the data, interpreting the results and organizing it in suitable format for further visualizations. Pandas is the perfect tool for all these tasks. Before diving into the practicals, lets first install the package into our system.

## Installation

Before using pandas in python code, we need to install it first as it does not come preinstalled with python unless you are using anaconda in which case it does. You can install pandas using pip by executing following code in command prompt. All the code in this blog will be suitable for windows. For Mac or Linux, you need to google it. I will provide links wherever possible.

pip install pandas

This piece of code should run without any error. If any error comes up that means the installation was not successful.

Pandas is so popular for data analysis because of its powerful data structure. There are two primary data structures in pandas:

*Series (1-dimensional).**DataFrame (2-dimensional).*

These two data structure handle the vast majority of typical use cases in industry like finance, statistics, social science, and many areas of engineering.

**Series**

Series
is a one dimensional numpy array with axis labels and capable of
handling data of any type (integer, string, float and even python
objects). The axis labels are called as *index. *In simpler terms, series is nothing but a column in a excel spreadsheet.

A pandas series can be created using Series() constructor which has many parameters but most important ones are described below.

- _data_ : It takes various forms of data like ndarray, lists, dictionary, constants, etc,.
- _index_ : The values which will be used as an index. If not provided, by default np.arrange(n) will be assigned as index where n is the total number of rows in the data.
- _dtype_ : It is for data type. If None, data will be inferred.
- _copy_ : It is used for copying of data, by default its false.

**Creating series**

#creating lists data = np.array([15000, 250000, 500000, 1000000]) #creating lists of index values index_val = ['2011', '2012', '2013', '2014'] #creating series ser = pd.Series(data, index=index_val) ser

## DataFrame

Dataframe is a two dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Data is arranged in a tabular format in dataframe. Dataframes consists of three components: rows,columns and the data. It can also be considered as combination of multiple pandas series as every column in dataframe is a series.

A pandas dataframe can be created using DataFrame() constructor which has many parameters but most important ones are described below.

- _data_ : It takes various forms of data like ndarray, series, map, lists, dictionaries, constants and another dataframe, etc,.
- _index_ : The values which will be used as a row index. If not provided, by default np.arrange(n) will be assigned as index where n is the total number of rows in the data.
- _columns_ : For column labels, If not provided takes np.arrange(n) by default.
- _dtype_ : It is for data type of each column. If None, data will be inferred.
- _copy_ : It is used for copying of data, by default its false.

**Creating dataframe**

#creating lists of lists data = [['1','Thor','Mjolnir'],['2', 'Cap America', 'Shield'],['3', 'Iron Man', 'Armour'],['4', 'Black Widow', 'Combat'], ['5', 'Hawk Eye', 'Arrow']] #creating lists of column names columns = ['Avenger number', 'Name', 'Weapon'] #creating dataframe df = pd.DataFrame(data, columns) df

Pandas has a lot to offer and data structures like series and dataframe are too powerful to explain their implementation in a single blog. This is enough for this blog. I will cover the practical implementation of series and dataframe and maybe practical data analysis using pandas in future until then keep learning.

Thank you for reading. Enjoy python.