Numpy
import numpy as np
- array:
x = np.array([3, 4, 5])
- adding arrays together works as expected
- arranged as a sequence of rows
- matrix
np.matrix()
x = np.array([[1, 2], [3, 4]])
- Uses [0,0] indexing:
x[row,col]
x[2]
yields the third entry of x
- correlation matrix:
np.corrcoef(x, y)
- attributes:
x.attribute
- dimensions:
x.ndim
- data type:
x.dtype
- nrows and ncols:
x.shape
- transpose:
x.T
- methods:
x.method()
- function associated with an object
- sum:
x.sum()
or np.sum(x)
- reshape:
x.reshape(tuple)
- returns a new array with the elements of
x
in a different shape
- modifying values in a reshaped object also changes the values in the original object
- square root:
np.sqrt(x)
- squaring:
x**2
- mean:
np.mean()
axis=0
refers to matrix rows
- variance:
np.var()
- default: divides by instead of
axis=0
refers to matrix rows
- standard deviation:
np.std()
axis=0
refers to matrix rows
- generate random data:
np.random.normal()
- set a seed:
x=np.random.default_rng()
then x.normal()
- pi:
np.pi
np.isnan()
finds NA values
Sequences and Slice Notation
- vector of numbers starting at and ending at :
np.linspace(a,b,n)
- vector of numbers spaced out by step starting at and ending at :
np.arange(a,b)
- slice:
[a:b]
i.e. "hello world"[3:6]
outputs 'lo '
Indexing Data
- Select multiple rows:
A[[1,3]]
- Select multiple columns:
A[:,[0,2]]
- Select submatrix
- function:
A[np.ix_([1,3],[0,2,3])]
- slices:
A[1:4:2,0:3:2]
- Booleans
np.zeros()
make an array with all zeros (set dtype=bool)
- use "= =" w/o space to verify if two objects are equal
- check all entries are true:
np.all()
- check if any entries are true:
np.any()
- boolean arrays can be used to select submatrices while equivalent binary arrays can't
np.argmax()
 identifies the index of the largest element of an array, optionally computed over an axis of the array
df.corr()
to see correlation matrix
- use
np.set_printoptions(threshold=sys.maxsize)
to stop truncating outputs and np.set_printoptions(threshold=False)
to return to truncating results
df.describe()
to get summary data for each numeric column