
Becoming a Data Head

models are simplified versions of reality.
Jordan Goldmeier • Becoming a Data Head
DIMENSIONALITY REDUCTION Dimensionality reduction is a process you're already familiar with. Photography is an example; it reduces the three-dimensional world down to a flat, two-dimensional photo you can carry in your pocket. With datasets, we're working with rows and columns: observations and features. The number of columns (features) in a datase
... See moreJordan Goldmeier • Becoming a Data Head
unsupervised learning: a collection of tools designed to discover hidden patterns and groups in datasets when no predefined groups are available. It's a powerful technique used in a variety of fields, from segmenting customers into different marketing groups, to organizing music on Spotify or Pandora, and organizing the photos on your phone.
Jordan Goldmeier • Becoming a Data Head
unsupervised learning. You didn't come in with preconceived notions about the data, but instead let the data organize itself.2
Jordan Goldmeier • Becoming a Data Head
Doing data science: Straight talk from the frontline.
Jordan Goldmeier • Becoming a Data Head
Whenever N is large, it's easy to make the leap and think N = ALL; every possible data point is at your disposal. But believing N=ALL does not absolve you from thinking about data quality and bias. (Remember the lessons in Chapter 4, “Argue with the Data.”) Are you truly capturing people from the population you care about?
Jordan Goldmeier • Becoming a Data Head
In short, statistical inference follows these steps: Ask a meaningful question. Formulate a hypothesis test, setting the status quo as the null hypothesis, and what you hope to be true as the alternative hypothesis. Establish a significance level. (5% or 0.05 is an arbitrary but often-used number.) Calculate a p-value based on a statistical test. C
... See moreJordan Goldmeier • Becoming a Data Head
What is a false positive error? It's when evidence appears to confirm the reality of the alternate hypothesis when instead it should have been rejected (e.g., a man has a positive pregnancy test).
Jordan Goldmeier • Becoming a Data Head
On the other hand, a false negative error happens when you accept a false null (e.g., a pregnant woman has a negative pregnancy test).