Machine Learning

Where To Start?

Machine learning, an integral part of modern data analysis, is deeply rooted in statistics. In this tutorial, we reacquaint ourselves with statistics, focusing on how to compute vital figures from data sets. Additionally, we’ll dive into Python’s versatile modules, learning to extract necessary information and develop predictive functions based on our findings.

Python stands out in machine learning due to its simplicity and the powerful libraries at its disposal, like NumPy for numerical operations, Pandas for data manipulation, and Scikit-learn for machine learning algorithms.

Data Set

In the realm of computers and machine learning, a data set is any collection of data. This could range from a simple array to an extensive database.

Consider this array example:


And a sample database:

Carname  Color  Age  Speed  AutoPass
BMW      red    5    99    Y
Volvo    black  7    86    Y
Toyota   blue   6    86    Y

From the array, one might estimate the average value, identify the maximum and minimum values, and perform other basic statistical analyses. The database, however, offers richer insights – for instance, the most common car color or the age of the oldest car. Machine Learning leverages this data, analyzing and predicting outcomes from such datasets.

In machine learning, we often deal with large datasets. This tutorial aims to simplify these concepts, using small and understandable datasets for easier comprehension.

Data Types

Understanding the type of data is crucial in analysis. We categorize data into three main types:

  1. Numerical Data: These are numbers, further classified into:
    • Discrete Data: Integer-based, countable data. Example: The number of cars passing by.
    • Continuous Data: Measurable, variable data. Example: The price or size of an item.
  2. Categorical Data: These are non-measurable values, like colors or yes/no values.
  3. Ordinal Data: Categorical data that can be ranked. Example: School grades (A, B, C, etc.).

Identifying the data type is the first step in determining the appropriate analysis technique. As we progress through this tutorial, we will delve deeper into statistics and data analysis, unraveling the complexities of machine learning with Python.