Python Data Science Essentials: Become an efficient data science practitioner by thoroughly understanding the key concepts of Python
| Alberto Boschetti (Author) Find all the books, read about the author, and more. See search results for this author |
| Luca Massaron (Author) Find all the books, read about the author, and more. See search results for this author |
Use the Amazon App to scan ISBNs and compare prices.
There is a newer edition of this item:
Key Features
- Quickly get familiar with data science using Python
- Save time - and effort - with all the essential tools explained
- Create effective data science projects and avoid common pitfalls with the help of examples and hints dictated by experience
Book Description
The book starts by introducing you to setting up your essential data science toolbox. Then it will guide you across all the data munging and preprocessing phases. This will be done in a manner that explains all the core data science activities related to loading data, transforming and fixing it for analysis, as well as exploring and processing it. Finally, it will complete the overview by presenting you with the main machine learning algorithms, the graph analysis technicalities, and all the visualization instruments that can make your life easier in presenting your results.
In this walkthrough, structured as a data science project, you will always be accompanied by clear code and simplified examples to help you understand the underlying mechanics and real-world datasets.
What you will learn
- Set up your data science toolbox using a Python scientific environment on Windows, Mac, and Linux
- Get data ready for your data science project
- Manipulate, fix, and explore data in order to solve data science problems
- Set up an experimental pipeline to test your data science hypothesis
- Choose the most effective and scalable learning algorithm for your data science tasks
- Optimize your machine learning models to get the best performance
- Explore and cluster graphs, taking advantage of interconnections and links in your data
Table of Contents
- First Steps
- Data Munging
- The Data Science Pipeline
- Machine Learning
- Social Network Analysis
- Visualization
Customers who viewed this item also viewed
Customers who bought this item also bought
Editorial Reviews
About the Author
Alberto Boschetti is a data scientist with expertise in signal processing and statistics. He holds a PhD in telecommunication engineering and currently lives and works in London. In his work projects, he faces challenges involving natural language processing (NLP), machine learning, and probabilistic graph models everyday. He is very passionate about his job and he always tries to stay updated on the latest developments in data science technologies by attending meetups, conferences, and other events.
Luca Massaron
Luca Massaron is a data scientist and marketing research director who specializes in multivariate statistical analysis, machine learning, and customer insight, with over a decade of experience in solving real-world problems and generating value for stakeholders by applying reasoning, statistics, data mining, and algorithms. From being a pioneer of web audience analysis in Italy to achieving the rank of a top 10 Kaggler, he has always been passionate about everything regarding data and analysis and about demonstrating the potentiality of data-driven knowledge discovery to both experts and nonexperts. Favoring simplicity over unnecessary sophistication, he believes that a lot can be achieved in data science by understanding its essentials.
Don't have a Kindle? Get your Kindle here, or download a FREE Kindle Reading App.
Product details
- Publisher : Packt Publishing (April 30, 2015)
- Language : English
- Paperback : 258 pages
- ISBN-10 : 1785280422
- ISBN-13 : 978-1785280429
- Item Weight : 15.8 ounces
- Dimensions : 7.5 x 0.59 x 9.25 inches
- Best Sellers Rank: #3,131,141 in Books (See Top 100 in Books)
- #764 in Mainframes & Minicomputers (Books)
- #1,022 in Business Intelligence Tools
- #1,665 in Data Modeling & Design (Books)
- Customer Reviews:
About the authors

Luca Massaron is a data scientist and a research director specialized in multivariate statistical analysis, machine learning and customer insight with over a decade of experience in solving real world problems and in generating value for stakeholders by applying reasoning, statistics, data mining and algorithms. From being a pioneer of Web audience analysis in Italy to achieving the rank of top ten data scientist at competitions held by kaggle.com, he has always been passionate about everything regarding data and analysis and about demonstrating the potentiality of data-driven knowledge discovery to both experts and non-experts. Favouring simplicity over unnecessary sophistication, he believes that a lot can be achieved in data science just by doing the essential.

Alberto Boschetti is a data scientist, with strong expertise in signal processing and statistics. He holds a Ph.D. in Telecommunication Engineering and currently lives and works in London. In his work projects he daily faces challenges spanning among natural language processing (NLP), machine learning and probabilistic graph models. He is very passionate about his job and he always tries to be updated on the latest development of data science technologies, attending meetups, conferences and other events.
Customer reviews
Customer Reviews, including Product Star Ratings help customers to learn more about the product and decide whether it is the right product for them.
To calculate the overall star rating and percentage breakdown by star, we don’t use a simple average. Instead, our system considers things like how recent a review is and if the reviewer bought the item on Amazon. It also analyzed reviews to verify trustworthiness.
Learn more how customers reviews work on AmazonTop reviews from the United States
There was a problem filtering reviews right now. Please try again later.
In little more than 200 pages it delivers the essential you need to know if you want to do data science and use Python for that (and you should, the authors suggest!). The part of the book I have particularly appreciated is the list of problems you have to face in practice and the proposed solution: loading your data in a fast and easy way by different sources, for instance, or the way to build and tune complex machine learning models for regression and classification problems. Your data can't fit into memory? It happens, as you know. Well there's a paragraph just for that and there are clear and efficient coding examples to solve the problem. I feel safer with this book with so many examples and solutions. Some books I was looking for help in my projects. This one appears to be very useful.
Carlo.
The books begins with a description of how to install Python and various packages needed to run the code. The purpose of these packages is also explained. Different Python distributions are briefly discussed together with their characteristics, so that a reader can select a distribution particularly suitable to his/her needs. As all code examples in the book are run in IPython Notebook, special attention is paid to a short but comprehensive introduction into IPython itself. Data sets used in the book are described too.
After advising on installation of Python and its packages, the book guides readers towards fast and easy data loading from a file, including the case when the entire data set cannot be loaded during one read in the memory and the solution offered is to load it in chunks by using pandas.
Furthermore, answers to the following problems are provided: how to deal with erroneous records, how to treat categorical and text data, what are useful data cleansing and transformation operations implemented in pandas, how to use the optimized data structures - numpy arrays - and what operations on them can be done.
Once data is loaded and converted to a suitable representation, the book then spends a chapter on the general Data Science pipeline that can be implemented with scikit-learn. The pipeline includes dimensionality reduction via either feature extraction or feature selection, outlier detection, predictive modeling (classification and regression), optimization of model's hyper-parameters, and model's performance evaluation. This material creates the holistic view what typical data analysis is comprised of.
The next chapter introduces several popular machine learning algorithms in detail. Among them are linear and logistic regression, Naive Bayes, support vector machines, bagging and boosting ensembles. Special attention is paid to scikit-learn solutions of the 3Vs of big data: namely, volume, velocity and variety. Scalability with volume is solved with incremental learning when at any given moment of time, only a portion (batch) of the entire data fit to the available memory is used to update a model, hence, a model learns incrementally as new batches arrive. To keep up with velocity, scikit-learn offers a number of classification and regression algorithms optimized for speed. Data variety is deal with the help of hashing and sparse matrices. The chapter ends with short examples of doing basic operations of Natural Language Processing with the NLTK package and data clustering.
Final two chapters are devoted to social network analysis with the NetworkX package and data visualization with the matplotlib and pandas packages, respectively.
Although I have both paper and electronic versions of this book, I would advise first to buy the paper version as numerous code is much easier to understand in this format because one can see the entire snapshot at once.




