
Roman's Data Science: How to monetize your data

Evolutionary hypotheses, where one parameter is slightly optimized, have a less profound effect than revolutionary hypotheses, where the approach is fundamentally different. That said, evolutionary hypotheses are more likely to bear fruit.
Roman Zykov • Roman's Data Science: How to monetize your data
Kozyrkov implores us to “always evaluate decision quality based only on what was known at the time the decision was made.”
Roman Zykov • Roman's Data Science: How to monetize your data
In A/B tests, we work with two groups – a test group and a control group. Both need their own bootstrap.
Roman Zykov • Roman's Data Science: How to monetize your data
In his 1925 monograph Statistical Methods for Research Workers, Ronald Fisher (the founder of hypothesis testing) outlined concepts such as the statistical significance criterion, the rules for testing statistical hypotheses, analysis of variance, and experiment planning. This work defined our current approach to experiment planning.
Roman Zykov • Roman's Data Science: How to monetize your data
Statistical hypothesis testing involves two important concepts: general population and sample.
Roman Zykov • Roman's Data Science: How to monetize your data
Classical machine learning can be divided into three types: supervised learning unsupervised learning reinforcement learning.
Roman Zykov • Roman's Data Science: How to monetize your data
One problem with all of these tests is that they are distribution-specific. For example, the Student’s t-test and the z-test require normally distributed data.
Roman Zykov • Roman's Data Science: How to monetize your data
nine out of ten hypotheses don’t pan out. But you have no idea that a hypothesis will not produce the desired result until you are well into the testing process. I believe that it is best to kill a hypothesis as early as possible – as soon as the first sign that the idea won’t take off presents itself.
Roman Zykov • Roman's Data Science: How to monetize your data
The advantages of bootstrapping are that it is independent from the sample distribution, the only parameters you are working with are the number of samples, and you can easily calculate any metric.