HarvardX – Data Science : Probability

Why did I take the course? First off, I am an engineer by trade and I enjoy technology. I am a life long learner and with the current COVID19 pandemic I find I have some time on my hands so I decided to dive into more formal online learning. I regularly use Pluralsight and really enjoy that type of learning so I figured I would try out HarvardX. Data Science is something that I have been developing and researching for some time and I am very interested in it along with AI/ML. I wanted to round out the fundamentals of Data Science as I continue to build on that knowledge.

What I learned

This was a refresh on the basics of probability, a course I took my Junior year in college. I really enjoyed the course so it was a good relearn of that content. As a bonus, in this HarvardX course I learned R Studio and the basics of R. If I wanted a deeper dive into R basics I would have taken the prereq of the first course in the Data Science offered by HarvardX. There was a learning curve there, but it is similar enough to any of the other language that I have learned.

Central Limit Theory may be the “most important concept in mathematical history”. When I read that it made me take note and try to understand why. First you need a ‘large’ sample size, then you need independent and identically distributed variables. That sounded like it would be useful in epidemiology so I started searching for CLT and Coronavirus and found that CLT is a main tenant in determining confidence intervals for the spread of the virus. It also explains why we needed a larger sample size (more testing) to conform to the CLT and become more accurate.

Monte Carlo Simulations are quick and dirty ways to get an estimate of the risk based on variable data.

The Big Short was the hook that got me interested in this course. It was interesting to see how statistics and probability were a major factor in the crisis. Assuming independence (the data isn’t connected in anyway) was the central flaw where they assumed that giving more mortgages would not affect the percentage of default on the loans. This assumption of independence was a contributing factor in the greater meltdown.

Conclusion

The course was good and well formatted. I wish the videos expanded on the book rather than just reading it. I didn’t engage in the discussions much and it wasn’t more that a forum on the lesson. There wasn’t much of a ‘class’ feeling, it was a Pluralsight with some well organized exercises which is great for fitting in the course around a busy schedule. You can get in an hour here and an hour there. I spent about 3h a week and finished in 6 weeks or so.

Related Post