Introducing Data Science: Big Data, Machine Learning, and more, using Python tools 1st Edition
ISBN-13: 978-1633430037
ISBN-10: 9781633430037
Why is ISBN important? ISBN
Scan an ISBN with your phone
Use the Amazon App to scan ISBNs and compare prices.
This bar-code number lets you verify that you're getting exactly the right version or edition of a book. The 13-digit and 10-digit formats both work.
Use the Amazon App to scan ISBNs and compare prices.
Add to book club
Loading your book clubs
There was a problem loading your book clubs. Please try again.
Not in a club? Learn more
Join or create book clubs
Choose books together
Track your books
Bring your club to Amazon Book Clubs, start a new book club and invite your friends to join, or find a club that’s right for you for free.
Only 19 left in stock - order soon.
Ships from and sold by Amazon.com.
More Buying Choices
Summary
Introducing Data Science teaches you how to accomplish the fundamental tasks that occupy data scientists. Using the Python language and common Python libraries, you'll experience firsthand the challenges of dealing with data at scale and gain a solid foundation in data science.
Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.
About the Technology
Many companies need developers with data science skills to work on projects ranging from social media marketing to machine learning. Discovering what you need to learn to begin a career as a data scientist can seem bewildering. This book is designed to help you get started.
About the Book
Introducing Data ScienceIntroducing Data Science explains vital data science concepts and teaches you how to accomplish the fundamental tasks that occupy data scientists. You’ll explore data visualization, graph databases, the use of NoSQL, and the data science process. You’ll use the Python language and common Python libraries as you experience firsthand the challenges of dealing with data at scale. Discover how Python allows you to gain insights from data sets so big that they need to be stored on multiple machines, or from data moving so quickly that no single machine can handle it. This book gives you hands-on experience with the most popular Python data science libraries, Scikit-learn and StatsModels. After reading this book, you’ll have the solid foundation you need to start a career in data science.
What’s Inside
About the Reader
This book assumes you're comfortable reading code in Python or a similar language, such as C, Ruby, or JavaScript. No prior experience with data science is required.
About the Authors
Davy Cielen, Arno D. B. Meysman, and Mohamed Ali are the founders and managing partners of Optimately and Maiton, where they focus on developing data science projects and solutions in various sectors.
Table of Contents
Introducing Data Science teaches you how to accomplish the fundamental tasks that occupy data scientists. Using the Python language and common Python libraries, you'll experience firsthand the challenges of dealing with data at scale and gain a solid foundation in data science.
Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.
About the Technology
Many companies need developers with data science skills to work on projects ranging from social media marketing to machine learning. Discovering what you need to learn to begin a career as a data scientist can seem bewildering. This book is designed to help you get started.
About the Book
Introducing Data ScienceIntroducing Data Science explains vital data science concepts and teaches you how to accomplish the fundamental tasks that occupy data scientists. You’ll explore data visualization, graph databases, the use of NoSQL, and the data science process. You’ll use the Python language and common Python libraries as you experience firsthand the challenges of dealing with data at scale. Discover how Python allows you to gain insights from data sets so big that they need to be stored on multiple machines, or from data moving so quickly that no single machine can handle it. This book gives you hands-on experience with the most popular Python data science libraries, Scikit-learn and StatsModels. After reading this book, you’ll have the solid foundation you need to start a career in data science.
What’s Inside
- Handling large data
- Introduction to machine learning
- Using Python to work with data
- Writing data science algorithms
About the Reader
This book assumes you're comfortable reading code in Python or a similar language, such as C, Ruby, or JavaScript. No prior experience with data science is required.
About the Authors
Davy Cielen, Arno D. B. Meysman, and Mohamed Ali are the founders and managing partners of Optimately and Maiton, where they focus on developing data science projects and solutions in various sectors.
Table of Contents
- Data science in a big data world
- The data science process
- Machine learning
- Handling large data on a single computer
- First steps in big data
- Join the NoSQL movement
- The rise of graph databases
- Text mining and text analytics
- Data visualization to the end user
Frequently bought together

- +
- +
Total price:
To see our price, add these items to your cart.
Some of these items ship sooner than the others.
Choose items to buy together.
Customers who viewed this item also viewed
Page 1 of 1 Start overPage 1 of 1
Editorial Reviews
About the Author
Davy Cielen is one of the founders and managing partners of Optimately where he focuses on leading and developing data science projects and solutions in various sectors and closely follows new developments in data science. Before Optimately he worked on data science and big data projects at a major retailer.
Arno Meysman is one of the founders and managing partners of Optimately where he focuses on leading and developing data science projects and solutions in various sectors and closely follows new developments in data science. Before Optimately he worked on data science and big data projects at a major retailer. Apart from data science he is also into data visualisation and generally "Creating data-driven things that are smart, interactive and pretty".
Mohamed Ali is one of the founders and managing partners of Optimately and Maiton, where they focus on developing data science projects and solutions in various sectors.
Arno Meysman is one of the founders and managing partners of Optimately where he focuses on leading and developing data science projects and solutions in various sectors and closely follows new developments in data science. Before Optimately he worked on data science and big data projects at a major retailer. Apart from data science he is also into data visualisation and generally "Creating data-driven things that are smart, interactive and pretty".
Mohamed Ali is one of the founders and managing partners of Optimately and Maiton, where they focus on developing data science projects and solutions in various sectors.
Start reading Introducing Data Science on your Kindle in under a minute.
Don't have a Kindle? Get your Kindle here, or download a FREE Kindle Reading App.
Don't have a Kindle? Get your Kindle here, or download a FREE Kindle Reading App.
Product details
- ASIN : 1633430030
- Publisher : Manning; 1st edition (May 23, 2016)
- Language : English
- Paperback : 320 pages
- ISBN-10 : 9781633430037
- ISBN-13 : 978-1633430037
- Item Weight : 1.06 pounds
- Dimensions : 7.38 x 0.8 x 9.25 inches
- Best Sellers Rank: #717,533 in Books (See Top 100 in Books)
- #47 in Electronic Data Interchange (EDI)
- #58 in Ruby Programming
- #151 in Computer Programming Structured Design
- Customer Reviews:
Customer reviews
4.5 out of 5 stars
4.5 out of 5
71 global ratings
How customer reviews and ratings work
Customer Reviews, including Product Star Ratings help customers to learn more about the product and decide whether it is the right product for them.
To calculate the overall star rating and percentage breakdown by star, we don’t use a simple average. Instead, our system considers things like how recent a review is and if the reviewer bought the item on Amazon. It also analyzed reviews to verify trustworthiness.
Learn more how customers reviews work on Amazon
Top reviews
Top reviews from the United States
There was a problem filtering reviews right now. Please try again later.
Reviewed in the United States on June 23, 2016
I've only read through page 50, but I have already found several errors. A particularly egregious one occurs in the first code listing, in which a regression model is fit through the origin even though an intercept should clearly be present. The data for the regression model is created as follows: two predictors, let's say x1 and x2, are created from the uniform(0, 1) distribution. The response = 0.4 x1 + 0.6 x2 + error, where the error term is also distributed as uniform(0, 1). That being the case, the intercept should be close to 0.5, the mean of the uniform(0, 1) distribution. You may wonder what difference that makes. Well... if the correct model is used, the model coefficients end up being close to the actual coefficients used to create the data, namely 0.4 and 0.6, instead of 0.7658 and 1.1252 as shown in the text. In addition, R-squared for the correct model is about 0.3, a far cry from the "high" value of 0.893 reported by the authors.
24 people found this helpful
Report abuse
Reviewed in the United States on June 5, 2016
Loved this book! If I could have given 6 stars, I would have.
This book would provide you with a very well rounded approach to Data Science and by that I mean truly would give you a ride though all the aspects of this field versus showing you some regression algorithm using python and call it Data Science.
Book has it all - not only it leverages probably the most favorite language (python) for its examples, it also goes in details in supporting tools and eco systems. For examples, Spark - Why create something when Spark is already here and we can just use it in our work.
It covered NoSQL technologies to give readers enough information to get started and weighted pros and cons of each. I especially enjoyed reading ACID, BASE and CAP theorem sections. I am familiar with them and gave presentation on exact same topic few years ago and I enjoyed the read since it covered the important key points leaving me with nice warm feeling in my stomach that unaware readers will be in a good hands!
During discussion of NoSQL, ElasticSearch was introduced and entire chapter was devoted on how to leverage search capabilities to provide us with valuable results... Search is something that ElasticSearch does best! Section about Damerau-Levenshtein was great. It made you think of dirty data that is present in the real world and how you deal with it (vs giving you example with perfectly clean and ready to use data)
Speaking of real world experience - this book took a step back and instead of trying to be data science book and throwing cool python libraries at you, it talked about general approach in the real word when you deal with data science projects by trying to make you think of project's research goals - Why are we doing this? This was done to help you think and to help you pick the right solutions.
Another example of real world problems was their chapter on dealing with big and i mean truly big data. In some sample program, you can surely play with tens of hundreds of sample records, but what do you do with gigs or more of data? while running production servers, you are not dealing with 2-3 lines of log entries, you deal sometimes with gigs! So I was very happy to see section that talked on how you can tackle problems like that.
Authors did a great job in my opinion by cloning and making it available pywebhdfs package that would work with their example of the code (they did use now outdated hortonworks sandbox that made it hard to follow in few chapters, but it was not hard to figure out where menus/buttons were moved)
A nice final touch that I felt was great was section on results visualization. How would you communicate what you found to others? will you point them at some hard to read print out, OR shows them a picture/graph that makes your findings easy to read?
So... many many gems in this book that would really give you a great overview of the field of data science and would get you started not only in strictly academic / demo only way, but also in real life production environment.
I definitely would be re-reading this book and recommending it to my colleagues!
This book would provide you with a very well rounded approach to Data Science and by that I mean truly would give you a ride though all the aspects of this field versus showing you some regression algorithm using python and call it Data Science.
Book has it all - not only it leverages probably the most favorite language (python) for its examples, it also goes in details in supporting tools and eco systems. For examples, Spark - Why create something when Spark is already here and we can just use it in our work.
It covered NoSQL technologies to give readers enough information to get started and weighted pros and cons of each. I especially enjoyed reading ACID, BASE and CAP theorem sections. I am familiar with them and gave presentation on exact same topic few years ago and I enjoyed the read since it covered the important key points leaving me with nice warm feeling in my stomach that unaware readers will be in a good hands!
During discussion of NoSQL, ElasticSearch was introduced and entire chapter was devoted on how to leverage search capabilities to provide us with valuable results... Search is something that ElasticSearch does best! Section about Damerau-Levenshtein was great. It made you think of dirty data that is present in the real world and how you deal with it (vs giving you example with perfectly clean and ready to use data)
Speaking of real world experience - this book took a step back and instead of trying to be data science book and throwing cool python libraries at you, it talked about general approach in the real word when you deal with data science projects by trying to make you think of project's research goals - Why are we doing this? This was done to help you think and to help you pick the right solutions.
Another example of real world problems was their chapter on dealing with big and i mean truly big data. In some sample program, you can surely play with tens of hundreds of sample records, but what do you do with gigs or more of data? while running production servers, you are not dealing with 2-3 lines of log entries, you deal sometimes with gigs! So I was very happy to see section that talked on how you can tackle problems like that.
Authors did a great job in my opinion by cloning and making it available pywebhdfs package that would work with their example of the code (they did use now outdated hortonworks sandbox that made it hard to follow in few chapters, but it was not hard to figure out where menus/buttons were moved)
A nice final touch that I felt was great was section on results visualization. How would you communicate what you found to others? will you point them at some hard to read print out, OR shows them a picture/graph that makes your findings easy to read?
So... many many gems in this book that would really give you a great overview of the field of data science and would get you started not only in strictly academic / demo only way, but also in real life production environment.
I definitely would be re-reading this book and recommending it to my colleagues!
13 people found this helpful
Report abuse
Reviewed in the United States on October 16, 2016
This book is a great introduction into data science with step by step examples. The chapters take business problems and show how to find and display solutions. The authors put together a data science process which gives a road map on how to begin and cycle through a project. Each step in the process is outlined with examples on how to implement. While most data science books focus only on machine learning, this book places machine learning in context of a business problem with important steps such as data preparation.
To be clear, machine learning is included with algorithm explanations in the book complete with Python code examples. This includes typical data science topics such as sparse data, text mining, supervised and unsupervised learning. Data scientists tend to split into "R" and "Python" bins and this book is a shout out to "Python" . A nod is given to "R" with the availability of the RPy library and "R" popularity.
The authors address scaling Python code with both optimization and using big data tools. They give a crisp overview of the Hadoop framework and the memory advantages of using Spark. Another important part of data science is working with data and this book provides an excellent overview of SQL and NoSQL databases complete with ACID and BASE concepts and contrasts. Special attention is given to graph databases arguing that this is a contender for efficiently modelling complex data.
This book is great for the aspiring data science to become familiar with the data science process. This book does require a technical background in order to understand how to set up the examples and follow the theory. It would also be useful for a manager, data architect or data engineer to understand how to best support a data scientist to find business solutions by data mining gems from a business data pool.
To be clear, machine learning is included with algorithm explanations in the book complete with Python code examples. This includes typical data science topics such as sparse data, text mining, supervised and unsupervised learning. Data scientists tend to split into "R" and "Python" bins and this book is a shout out to "Python" . A nod is given to "R" with the availability of the RPy library and "R" popularity.
The authors address scaling Python code with both optimization and using big data tools. They give a crisp overview of the Hadoop framework and the memory advantages of using Spark. Another important part of data science is working with data and this book provides an excellent overview of SQL and NoSQL databases complete with ACID and BASE concepts and contrasts. Special attention is given to graph databases arguing that this is a contender for efficiently modelling complex data.
This book is great for the aspiring data science to become familiar with the data science process. This book does require a technical background in order to understand how to set up the examples and follow the theory. It would also be useful for a manager, data architect or data engineer to understand how to best support a data scientist to find business solutions by data mining gems from a business data pool.
4 people found this helpful
Report abuse
Reviewed in the United States on July 7, 2020
This book is absolutely useless if your goal is to figure out what data science is and how you can apply it in a business context.
You will spend hours typing in code (which is full of errors mind, I have some understanding of Python - but if you have not then you are stuffed), only to find that the results is a matrix that apparently tells you if a URL is trustworthy (for example). Which URL? No idea and book doesn't tell you. How am I supposed to apply this to anything. Doesn't tell you that either. What do you with this information. Joke's on you!
Maybe the examples are correct in principle, but given their lack of application, this is an absolute waste of time for anyone but the armchair business man - give this book a wide berth.
You will spend hours typing in code (which is full of errors mind, I have some understanding of Python - but if you have not then you are stuffed), only to find that the results is a matrix that apparently tells you if a URL is trustworthy (for example). Which URL? No idea and book doesn't tell you. How am I supposed to apply this to anything. Doesn't tell you that either. What do you with this information. Joke's on you!
Maybe the examples are correct in principle, but given their lack of application, this is an absolute waste of time for anyone but the armchair business man - give this book a wide berth.
Top reviews from other countries
Amazon Customer
4.0 out of 5 stars
Demystifying the complex structure surrounding the data science as field of study and projects
Reviewed in India on July 27, 2019
They did a very good job in giving a very sound and robust structure of Data Science as a subject and mainstream project. This basic overview and simplified content will surely help you in making sense of hyped technologies and platforms prevailing in technology sector. It also help you in making subtle changes and decisions in your career.
In one line: This will help you in chunking down some crucial data science proccess and concepts.
In one line: This will help you in chunking down some crucial data science proccess and concepts.
Geevarghese samuel
5.0 out of 5 stars
Very much impressed!
Reviewed in India on January 16, 2017
Very much impressed by the book. It does exactly what the title says; introduce you to data science.
This book is just an appetizer. Once you are done with it, it is up to you to choose a path; big data analysis, machine learning etc.
So, if you are interested in knowing more about data science, go forward and pick this book. You won't be disappointed.
Cheers!
This book is just an appetizer. Once you are done with it, it is up to you to choose a path; big data analysis, machine learning etc.
So, if you are interested in knowing more about data science, go forward and pick this book. You won't be disappointed.
Cheers!
6 people found this helpful
Report abuse
Jav
5.0 out of 5 stars
Every thing in this book is bang on!
Reviewed in India on April 18, 2019
This book is a complete refresher for interviews per se. Although it is NOT an introductory book. You should have some reading of ML books before jumping in on this beauty.
One person found this helpful
Report abuse
Yogendra Kumar
4.0 out of 5 stars
Very good Illustration
Reviewed in India on May 24, 2021
Nice presentation
Swaroop Raj
5.0 out of 5 stars
Very good book for beginners
Reviewed in India on January 10, 2022
this book is so cool and easy to understand the basics of data science








