Loading Data
import pandas as pd
- tables are called data frames (df)
- read csv:
pd.read_csv()
- view column
- view unique values:
np.unique()
- drop rows with missing values:
df.dropna()
- see column names:
df.columns
- accessing rows and columns is similar to an array
- select specific columns:
df[['col1','col2']]
- see current index:
df.index
- re-name the rows based on contents in a column:
df.set_index('col')
- select rows based on new contents:
df.loc['value']
- select rows by numeric position:
df.iloc[[3,4]]
- select columns by numeric position:
df.iloc[:,[0,2,3]]
- select specific rows and columns:
df.iloc[[3,4],[0,2,3]]
- index entries don't need to be unique
lambda
use inside of function calls to return single expressions
- i.e.
Auto_re.loc[lambda df: df['year'] > 80, ['weight', 'origin']]
&
is 'and'
|
is 'or'
- For integer based queries, use theĀ
iloc[]
Ā method. For string and Boolean selections, use theĀ loc[]
Ā method. For functional queries that filter rows, use theĀ loc[]
Ā method with a function (typically aĀ lambda
) in the rows argument.
- access columns as an attribute:
df.col
- bin data by using
pd.cut()
- drop data by using
pd.drop()