Pandas

Loading Data

import pandas as pd
tables are called data frames (df)
read csv: pd.read_csv()
- arguments
  - na_values
view column
- df['col_name']
view unique values: np.unique()
drop rows with missing values: df.dropna()
see column names: df.columns
accessing rows and columns is similar to an array
select specific columns: df[['col1','col2']]
see current index: df.index
- re-name the rows based on contents in a column: df.set_index('col')
  - select rows based on new contents: df.loc['value']
select rows by numeric position: df.iloc[[3,4]]
select columns by numeric position: df.iloc[:,[0,2,3]]
select specific rows and columns: df.iloc[[3,4],[0,2,3]]
index entries don't need to be unique
lambda use inside of function calls to return single expressions
- i.e. Auto_re.loc[lambda df: df['year'] > 80, ['weight', 'origin']]
& is 'and'
| is 'or'
For integer based queries, use the iloc[] method. For string and Boolean selections, use the loc[] method. For functional queries that filter rows, use the loc[] method with a function (typically a lambda) in the rows argument.
access columns as an attribute: df.col
bin data by using pd.cut()
drop data by using pd.drop()