Time spent: 0.5h
Total: 13.5h/10000h
Learning statistics also gives you a brief look into data analysis & data visualization. Perhaps I’ll turn this into bigger blog post about a code-based statistics approach with matplotlib and pandas.
Totally slacked off today though. Feeling quite optimistic regardless.
Imagine rolling a die. Assuming perfectly fair circumstances, the chances of you throwing any particular face will be . Seems trivial enough, no?
Unfortunately, it’s not always that trivial, and this is where machine learning can help. For example, imagine that the die has a slightly irregular weight distribution, so that the probability of you throwing any face is unknown. A good way of figuring the probabilities out would be to generate a sample dataset of the throws you get and building a model based on that dataset to predict the next throws. That’s called a probabilistic model, and is at the core of how machine learning works.
Multidimensional data is the simplest form of data - for example, a CSV file, an Excel sheet - it is simply an matrix, which is essentially just feature vectors (also referred to as data points) of size . is also considered the dimension of the data set. If , the dataset is referred to as univariate data, but if , it’s referred to as multivariate data.