Statistics

Basic Definitions

If you're new to probability as a concept, start with some basics that can be explained in simple visuals. You'll want to keep in mind that there's two main types of data: quantitative and qualitative.

Data Visualization

There's quite a few options out there for seeing data with your eyes since humans are bad at understanding numbers intuitively. We all have our favorite plot types:

Data Exploration

There are a lot of numerical ways to get a feel for data in addition to visual methods. Consider 1, 2, or 3 different ways of finding an "average". Find out the span your data covers, but remember to keep extremes in check! Also keep an eye on how far away data points usually are from the center.

Permutations and Combinations

These concepts are the hardest for my mind to wrap itself around. I can't figure out how counting works, apparently.

Distributions

There are some commonalities found among distributions, but here's a few of the many distributions out there that I have notes on:

Experimentation (or Applying Theory to Life)

In the real world, we run experiments to answer questions about the world around us using data. Other times, we're asked to answer a question about existing data.

Most of the time, we need to select our data from the population or we're trying to estimate the population from the data we have.

Then we have to test our question, and our final answer is always an estimate (we can't ever know anything for certain, but we can be really really sure).

Predicting the Future (under construction 🏗️)

When we want to determine the patterns in data, there are a ton of methods available. To start, prime yourself on what correlation and regression mean, and afterwards get your machete - we're heading into the weeds.

Here's a flowchart to help select which models to use based on your data.

Parametric Models

These models make assumptions about f.

Non-Parametric Models

These models don't make assumptions about f.

Classification Models

Comparing Models

Linear Regression vs K-Nearest Neighbors
A Comparison of Classification Methods
LDA vs QDA

Assessing Models

Remember to assess your models with your test data!

Resampling Methods

Use to repeatedly draw samples from a training set and refitting a model multiple times to get more information about the fitted model.

model assessment: the process of evaluating a model's performance
model selection: the process of selecting the proper level of flexibility for a model

References

Hi! ignore the following callout. In my full vault, this helps me find pages that should (or shouldn't) live on this page.

Unrequited Notes

Connect With Me!