Exploring the Machine Learning Landscape: Chapter 1 - Part 1

Reading Time: 7 minutes

Exploring the Machine Learning Landscape: Chapter 1-Part 1

The Machine Learning

Beyond the Basics

3d rendering biorobots concept 1 — Image by freepik

Introduction:

Welcome back fellow data enthusiasts !

In my previous blog post, We started journey into the world of machine learning, exploring it’s core concepts, significance and contexts of machine learning book. You can read my previous blog post Here.
Today, we will dive into 1st chapter of Hands-On Machine Learning with Scikit-learn, Keras & TensorFlow
a complete beginner friendly guide that sets the stage for our learning adventure. This chapter correctly titled “The Machine Learning Landscape” offers a wide view of the field, which helping us to understand the various aspects & application of machine learning.

What is Machine Learning ?

In one breath you can say:

Machine Learning is the science & art of programming computers so they can learn from data.

But here is a slightly more general definition :

Machine Learning is the field of study that gives computers the ability to learn without being explicitly programmed.

– Arthur Samuel, 1959

And more Engineering – Oriented definition :

A computer program is said to learn from experience E with respect to some task T and some performance P, if its performance on task T, as measured by P, improves with experience E.

– Tom Mitchell, 1977

Let me explain Tom Mitchell’s definition to you in simple terms:
Imagine you have a robot that needs to short different colored balls into baskets. The task T is sorting the balls correctly. The performance measure p could be how many ball the robot sorts correctly in a given time. Experience E is the number of times the robot practices the balls.

Now, if the robot’s ability to sort the ball correctly (performance on T) gets better as it practices more and more (gain experience E), we say the robot is learning. So, in simple terms, learning happens when practice makes the robot better at its task.

A computer is considered to be learning if:

Experience E: It gets more data or practice.
Task T: It has a specific job to do.

Performance P: It’s a way to measure how well it’s doing the job

If the program gets better at the job (T) it gets more data or practice(E), according to the measurement(P), then it’s learning.

Why Use Machine Learning?

Machine Learning is beneficial because it simplifies complex problem-solving compared to traditional programming. Traditional methods involves creating and maintaining long lists of rules, which are difficult to update and adapt,

Machine Learning, on the other hand, automatically learns patterns from data, making it more accurate, easier to maintain, and adaptable to changes.
It excels in complex tasks like speech recognition and provides insights into data by revealing hidden patterns, helping in better understanding and decision-making.

Examples of Application of Machine Learning.

Analyzing images of products on a production line to automatically classify them.
- This is image classification, typically performed using convolutional neural net‐works (CNNs)
Detecting tumors in brains scans.
- This is semantic segmentation, where each pixel in the image is classified (as we want to determine the exact location and shape of tumors), typically using CNNs as well.

Automatically classifying news articles
- This is natural language processing (NLP), and more specifically text classification, which can be tackled using recurrent neural networks (RNNs), CNNs, or Transformers
Automatically flagging offensive comments on the discussion forums.
- This is also text classification, using the same NLP tools.
Summarizing long documents automatically.
- This is a branch of NLP called text summarization, again using the same tools.

Creating chatbot or personal assistant.
- This involves many NLP components, including natural language understanding(NLU) and question-answering modules.
Forecasting company’s revenue next year, base on many performance matrices.
- This is a regression task (i.e., predicting values) that may be tackled using any regression model, such as a Linear Regression or Polynomial Regression model, a regression SVM, a regression Random Forest, or an artificial neural network. If you want to take into account sequences of past performance metrics, you may want to use RNNs, CNNs, or Transformers.
Making app react to voice commands.
- This is speech recognition, which requires the processing of audio samples: since they are long and complex sequences, they are usually processed using RNN, CNN, or transformers

Detecting credit card fraud. (This is anomaly detection )
Segmenting clients based on their purchases so that we can design a different marketing strategy for each segment. (This is clustering)
Representing a complex, high-dimensional dataset in a clear and insightful diagram.
- This is data visualization, often involving dimensionality reduction techniques
Recommending a product that a client may be interested in, based of past purchases.
- This is a recommender system. One approach is to feed past purchases (and other information about the client) to an artificial neural network, and get it to output the most likely next purchase. This neural net would typically be trained on past sequences of purchases across all clients.

The list could go on, but hopefully it gives you a sense of the incredible. The breadth and complexity and type of tasks that machine learning can handle The techniques you use for each task.

Types of Machine Learning Systems

There are so many different types of Machine Learning systems that it is useful to classify them in broad categories, based on the following criteria:

Whether or not they are trained with human supervision:
- Supervised Learning: The model is trained using labeled data. Think of it like a student learning with a teacher’s guidance. Examples include recognizing spam emails where the system is trained with emails labeled as ‘spam’ or ‘not spam’.
- Unsupervised Learning: The model learns without labeled data. It’s like a student exploring and finding patterns on their own. An example is grouping customers by their purchasing behavior without predefined categories.
- Semi-supervised Learning: This is a mix of both; the model is trained with some labeled and mostly unlabeled data. It’s like a student who gets some help but mostly figures things out independently.
- Reinforcement Learning: The model learns by trial and error, receiving rewards or penalties. Think of it like learning to play a game where we learn from winning and losing.
Whether or not they can learn incrementally on the fly (Based on How They Learn):
- Online Learning: The model learns incrementally, as new data comes in. Imagine a student who keeps learning new things every day.
- Batch Learning: The model is trained in one go with a lot of data at once, like studying all semester and then taking an exam.
Based on how they make predictions (instance-based versus model-based learning):
- Instance-based Learning: The model compares new data to stored data points. It’s like solving problems by comparing them to previous examples.
- Model-based Learning: The model detects patterns and builds a predictive model. This is like scientists forming theories based on experiments and observations.

These categories can be combined. For example, a modern spam filter might:

Learn continuously (online learning),

Use a deep neural network to build a model (model-based learning),
And be trained with labeled examples of spam and non-spam emails (supervised learning).

By combining these criteria, you can create powerful machine learning systems tailored to specific tasks.

Let’s explore some of the most popular and common learning methods used in machine learning.

Supervised/Unsupervised Learning

Machine Learning systems can be classified according to the amount and type of supervision they get during training. There are four major categories: supervised learning, unsupervised learning, semi supervised learning, and Reinforcement Learning.

Supervised Learning :
- In supervised learning, the algorithm is trained using a dataset that includes both the input data and the desired output, known as labels. Think of it like teaching a child by showing them a picture and telling them what it is.
- For example, consider a spam filter. You provide it with a bunch of emails that are labeled as either “spam” or “not spam” (sometimes called “ham”). The filter learns from these examples so it can classify new emails correctly. This type of task, where the goal is to categorize items, is known as classification.
- In short, supervised learning is like training a student with clear examples and correct answers, so they can make accurate decisions on their own in the future.

fig. A labeled training set for spam classification (an example of supervised learning)

Here are some of the most important supervised learning algorithms (covered in this book):

k-Nearest Neighbors

Linear Regression

Logistic Regression

Support Vector Machines (SVMs)

Decision Trees and Random Forests

Neural networks

Unsupervised Learning:
- In unsupervised learning, the system learns without any labels or guidance. It’s like letting a child explore and learn on their own without any hints.
- Imagine you have a lot of data about your blog’s visitors. You can use a clustering algorithm to find patterns among them without any labels. For example, the algorithm might discover that 40% of your visitors are males who love comic books and usually read your blog in the evening, while 20% are young sci-fi lovers who visit on weekends. If you use a hierarchical clustering algorithm, it might even break these groups down further, helping you target your content more effectively.

Unlabeled training set — fig. An unlabeled training set for unsupervised learning

Here are some key types of unsupervised learning algorithms:
- Clustering: Grouping similar items together.
  - K-Means
  - DBSCAN
  - Hierarchical Cluster Analysis (HCA)
- Anomaly Detection and Novelty Detection: Finding unusual or new items.
  - One-class SVM
  - Isolation Forest
- Visualization and Dimensionality Reduction: Simplifying data to make it easier to visualize.
  - Principal Component Analysis (PCA)
  - Kernel PCA
  - Locally Linear Embedding (LLE)
  - t-Distributed Stochastic Neighbor Embedding (t-SNE)
- Association Rule Learning: Discovering interesting relationships between items.
  - Apriori
  - Eclat

Conclusion

Machine learning is a vast and fascinating field with numerous applications that can revolutionize the way we solve problems and understand data. From supervised learning, where algorithms learn from labeled examples, to unsupervised learning, where systems uncover hidden patterns without guidance, each method offers unique advantages and uses. Whether it's classifying emails, detecting anomalies, or clustering data, machine learning techniques empower us to tackle complex tasks with efficiency and precision

Thanks for reading!

If you enjoyed this article and would like to receive notifications for my future posts, consider subscribing . By subscribing, you’ll stay updated on the latest insights, tutorials, and tips in the world of data science.

Additionally, I would love to hear your thoughts and suggestions. Please leave a comment with your feedback or any topics you’d like me to cover in upcoming blogs. Your engagement means a lot to me, and I look forward to sharing more valuable content with you.

Subscribe and Follow for More