
Becoming a Data Head

Whenever N is large, it's easy to make the leap and think N = ALL; every possible data point is at your disposal. But believing N=ALL does not absolve you from thinking about data quality and bias. (Remember the lessons in Chapter 4, “Argue with the Data.”) Are you truly capturing people from the population you care about?
Jordan Goldmeier • Becoming a Data Head
Experimental data is collected under experimental conditions with treatment groups and time-tested precautions to maintain integrity and avoid confounding. Experimental data is the gold standard. Because of the care the experiment provides to ensuring the results are reliable, this data presents an opportunity upon which to derive some causal
... See moreJordan Goldmeier • Becoming a Data Head
When the probability of an event depends on some other event, it's called a conditional probability and uses notation called a vertical bar, |, read as “given.” A few examples will make this clearer: The probability Alex is late to work is 5%. P(A) = 5%. The probability Alex is late to work given he has a flat tire is 100%. P(A | F) = 100%. The
... See moreJordan Goldmeier • Becoming a Data Head
there's variation in all things; variation creates uncertainty; and probability and statistics are tools to help us manage uncertainty.
Jordan Goldmeier • Becoming a Data Head
Doing data science: Straight talk from the frontline.
Jordan Goldmeier • Becoming a Data Head
The mean is the sum of all the numbers you have divided by the count of all the numbers. The effect of this operation is to give you a sense of what each observation in your series contributes to the entire sum if every observation generated the same amount. The mean is also called the average.
Jordan Goldmeier • Becoming a Data Head
For a data point to be removed, have good business justification for removing it. Arbitrarily picking and choosing which data points are outliers can introduce sampling bias. If outliers are dropped, the original data point and reason for dropping it should be documented and communicated, especially if the results changed substantially.
Jordan Goldmeier • Becoming a Data Head
testing whether a p-value was less than a significance level to reject a null hypothesis is a key part of statistical inference.
Jordan Goldmeier • Becoming a Data Head
The mode is the most common number in the dataset.