Quick introduction to Pandas

This tutorial will serve as a quick introduction to Pandas for handling CSV data.
CSV data is the most common format for storing and distributing structured data.
Many Machine Learning datasets are distributed as CSV datafiles.
After going through this tutorial, you will able to handle CSV files as input for your neural networks.

Panda

Install Pandas

1
pip install pandas

Reading a CSV with Pandas

1
2
3
4
import pandas as pd
df = pd.read_csv('heart.csv')

print(df)

The read_csv() function helps us to read the CSV data from a file and into a DataFrame in our program.

Printing first few rows of a Dataframe

1
print(df.head())

The head() command returns the first 5 rows of the DataFrame.

Selecting a single column

1
age_column = df['age']

If the name of our column is “age”, we can select that column in the above way.

Selecting certain rows and certain columns

1
age_and_sex = df.iloc[0:10, 0:2]

If we want to select only the first 10 rows and the first 2 columns, we can do it with the iloc command. The slicing values are separated by a comma (first the row slicing, and then the column slicing).

Getting only the unique values in a column

1
cp_values = df['cp'].unique()

After selecting a column, if we apply the unique() command, then only the unique values are returned (without repetition).

Replacing string values with integer values

1
2
mapping = {'male': 0, 'female': 1}
df = df.replace(mapping)

The replace() function replaces the values in the DataFrame according to the mapping.

Converting to numpy array

1
age_and_sex = age_and_sex.to_numpy()

The to_numpy() command converts the DataFrame to a Numpy Array. We can use this Numpy array as input to out Neural Network.

Shuffling the rows

1
df = df.sample(frac=1)

The sample() method returns some rows from the DataFrame in a random order. If frac = 0.5, then half the rows are returned in a random order. If frac = 1, all the rows are returned in a random order.