
Roman's Data Science: How to monetize your data

In his 1925 monograph Statistical Methods for Research Workers, Ronald Fisher (the founder of hypothesis testing) outlined concepts such as the statistical significance criterion, the rules for testing statistical hypotheses, analysis of variance, and experiment planning. This work defined our current approach to experiment planning.
Roman Zykov • Roman's Data Science: How to monetize your data
In data analysis, survival bias is taking the known into account while neglecting the unknown (which nevertheless exists).
Roman Zykov • Roman's Data Science: How to monetize your data
the further the peaks (averages) of these distributions are, the higher the power and the lower probability of a Type 2 error (that the null hypothesis will be accepted incorrectly). This is most logical, as the further the averages of the distributions are from each other, the more obvious the difference between the hypotheses becomes, thus making
... See moreRoman Zykov • Roman's Data Science: How to monetize your data
Classical machine learning can be divided into three types: supervised learning unsupervised learning reinforcement learning.
Roman Zykov • Roman's Data Science: How to monetize your data
The advantages of bootstrapping are that it is independent from the sample distribution, the only parameters you are working with are the number of samples, and you can easily calculate any metric.
Roman Zykov • Roman's Data Science: How to monetize your data
Fisher statistics, the p-value is a universal number that it understandable to statisticians and allows them to reject the null hypothesis. The p-value was not a thing before Fisher
Roman Zykov • Roman's Data Science: How to monetize your data
Evolutionary hypotheses, where one parameter is slightly optimized, have a less profound effect than revolutionary hypotheses, where the approach is fundamentally different. That said, evolutionary hypotheses are more likely to bear fruit.
Roman Zykov • Roman's Data Science: How to monetize your data
All measurements contain errors. This is a fact, get over it. Errors themselves should be noted and not considered errors as such (I’ll explain how we can monitor this in a later chapter).
Roman Zykov • Roman's Data Science: How to monetize your data
Kozyrkov implores us to “always evaluate decision quality based only on what was known at the time the decision was made.”