Description
This book gives an overview of the statistical and machine learning methods used in data science
projects, with an emphasis on the applicability to business problem solving. No software is shown,
and the mathematical details are kept to a minimum. The book describes the tasks associated
with all stages of the analytical life cycle, including data preparation and data exploration, feature
engineering and selection, analytical modeling considering supervised and unsupervised techniques,
and model assessment and deployment. It describes the techniques and provides real-world case
studies to exemplify the techniques. Readers will learn the most important techniques and methods
related to data science and when to apply them for different business problems. The book provides a
comprehensive overview about the statistical and machine learning techniques associated with data
science initiatives and guides readers through the necessary steps to successfully deploy data science
projects.
This book covers the most important data science skills, the types of different data science applications,
the phases in the data science lifecycle, the techniques assigned to the data preparation steps for data
science, some of the most common techniques associated to supervised machine learning models
(linear and logistic regression, decision tree, forest, gradient boosting, neural networks, support vector
machines, and factorization machines), advanced supervised modeling methods like ensemble models
and two-stage models, the most important techniques associated to unsupervised machine learning
models (clustering, association rules, sequence analysis, link analysis, path analysis, network analysis,
and network optimization), the method and fits statistics to assess model results, different approaches
to deploy analytical models in production, and the main topics related to the model operationalization
process.
This book does not cover the techniques for data engineering in depth. It also does not provide any
programming code for the supervised and unsupervised models, nor does it show in practice how to
deploy models in production