4.5 Pandas
Pandas is a fundamental library for scientific programming and data analysis, built on top of NumPy. Its primary structures are designed to handle labeled and relational data efficiently.
Series
A Series is a one-dimensional array-like object that can hold any data type (integers, strings, floats, etc.) and has an associated array of data labels, called the index. It’s essentially a column in a spreadsheet.
import pandas as pd
# Create a Series from a list
s = pd.Series([10, 20, 30], index=['A', 'B', 'C'])
print(s['B']) # Access value by label 'B': 20DataFrames
A DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous (can hold different data types) tabular data structure with labeled axes (rows and columns). It’s the most commonly used Pandas object and is conceptually similar to a spreadsheet or a SQL table.
# Create a DataFrame from a dictionary
data = {
'City': ['London', 'Paris'],
'Population': [8.9, 2.1]
}
df = pd.DataFrame(data)
# Access a column (Series)
print(df['City'])
# Output:
# 0 London
# 1 Paris