
Data Smart: Using Data Science to Transform Information into Insight

You can use it to classify company e-mails, customer support transcripts, AP wire articles, the police blotter, medical documents, movie reviews, whatever!
John W. Foreman • Data Smart: Using Data Science to Transform Information into Insight
This issue, handling categorical data, that is, data that's grouped by a finite number of labels without inherent numeric equivalents, is one that constantly nips at data miners' heels.
John W. Foreman • Data Smart: Using Data Science to Transform Information into Insight
You have to be smart about pulling relevant predictors out of the dataset.
John W. Foreman • Data Smart: Using Data Science to Transform Information into Insight
mean big business, more complex measures of prestige, infl uence, and centrality have evolved to account for such bad behavior.
John W. Foreman • Data Smart: Using Data Science to Transform Information into Insight
These predictors are often called model features or independent variables, while the thing we're trying to predict “Pregnant (yes/no)?” would be the dependent variable in the sense that its value is dependent on the independent variable data we're pushing into the model.
John W. Foreman • Data Smart: Using Data Science to Transform Information into Insight
business receives feedback on how their audience is engaging at the individual level through click tracking, online purchases, social sharing, and so on.
John W. Foreman • Data Smart: Using Data Science to Transform Information into Insight
But where forecasting and supervised machine learning differ greatly is in their canonical problem spaces.
John W. Foreman • Data Smart: Using Data Science to Transform Information into Insight
At a simplistic level, you feed a supervised AI algorithm some historical data, purchases at Target for example, and you tell the algorithm, “Hey, these purchases were from pregnant people, and these other purchases were from not-so-pregnant people.”
John W. Foreman • Data Smart: Using Data Science to Transform Information into Insight
It is easy, especially with sparse datasets (only a few observations), to get a model that fits quite well but whose fit is statistically insignificant, meaning that the relationship between the features and the independent variable may not actually be real.