Learning Data Mining with Python
Use the Amazon App to scan ISBNs and compare prices.
There is a newer edition of this item:
Harness the power of Python to analyze data and create insightful predictive models
About This Book
- Learn data mining in practical terms, using a wide variety of libraries and techniques
- Learn how to find, manipulate, and analyze data using Python
- Step-by-step instructions on creating real-world applications of data mining techniques
Who This Book Is For
If you are a programmer who wants to get started with data mining, then this book is for you.
What You Will Learn
- Apply data mining concepts to real-world problems
- Predict the outcome of sports matches based on past results
- Determine the author of a document based on their writing style
- Use APIs to download datasets from social media and other online services
- Find and extract good features from difficult datasets
- Create models that solve real-world problems
- Design and develop data mining applications using a variety of datasets
- Set up reproducible experiments and generate robust results
- Recommend movies, online celebrities, and news articles based on personal preferences
- Compute on big data, including real-time data from the Internet
In Detail
The next step in the information age is to gain insights from the deluge of data coming our way. Data mining provides a way of finding this insight, and Python is one of the most popular languages for data mining, providing both power and flexibility in analysis.
This book teaches you to design and develop data mining applications using a variety of datasets, starting with basic classification and affinity analysis. Next, we move on to more complex data types including text, images, and graphs. In every chapter, we create models that solve real-world problems.
There is a rich and varied set of libraries available in Python for data mining. This book covers a large number, including the IPython Notebook, pandas, scikit-learn and NLTK.
Each chapter of this book introduces you to new algorithms and techniques. By the end of the book, you will gain a large insight into using Python for data mining, with a good knowledge and understanding of the algorithms and implementations.
Customers who viewed this item also viewed
Don't have a Kindle? Get your Kindle here, or download a FREE Kindle Reading App.
Product details
- Publisher : Packt Publishing (July 29, 2015)
- Language : English
- Paperback : 344 pages
- ISBN-10 : 1784396052
- ISBN-13 : 978-1784396053
- Item Weight : 1.3 pounds
- Dimensions : 7.5 x 0.78 x 9.25 inches
- Best Sellers Rank: #4,269,168 in Books (See Top 100 in Books)
- #1,313 in Business Intelligence Tools
- #2,146 in Data Mining (Books)
- #3,231 in Database Storage & Design
- Customer Reviews:
Customer reviews
Customer Reviews, including Product Star Ratings help customers to learn more about the product and decide whether it is the right product for them.
To calculate the overall star rating and percentage breakdown by star, we don’t use a simple average. Instead, our system considers things like how recent a review is and if the reviewer bought the item on Amazon. It also analyzed reviews to verify trustworthiness.
Learn more how customers reviews work on AmazonTop reviews from the United States
There was a problem filtering reviews right now. Please try again later.
My main issue is with the publisher. There were tons of typos in this book including formatting errors. Some of the sample code did not work as well.
I then tried to go to Packt and download the sample code. I registered for their site and nothing came to my email (I checked my junk mail as well). Next, I emailed the publisher asking for access and asking how I can prove that I own the book... they never responded. At this point it was not worth it for me to pursue further. With most other books offering the code for free on GitHub I do not understand at all why this material is behind a registration wall. Additionally, the price for this book is quite high as with all the Packt books, so there should not be errors like this.
To make matters worse the text is essentially paraphrasing the iPython Notebooks that make up bulk of this book. The book only contains code snippets which are discussed in the context of any given lesson, such that without the notebooks the code examples are incomplete. The complete code examples, provided in the iPython Notebooks, lack any description, such that reading the text requires moving between the book and the browser based notebooks to combine the disjoint material. Furthermore to download the notebook material, without which the book is worthless, you are forced to register with Packt Publishing. I was very unhappy to learn that this book costs only five dollars on Packt when downloaded as an ebook. (it may have been on sale)
All in all, I'm still going to grind through this book since it has been helpful for introducing some high level machine learning concepts and functions. If the typos were cleaned up it would be a solid 3.5 stars. Oddly enough for five dollars I'd even recommend this to folks, but with a disclaimer that it will be a slog trying to untangle the mistakes and omissions.
"Machine Learning in Python" by Bowles, published in 2015 by Wiley, 360 pages, $25 for the cheapest hardcopy now available from Amazon (including shipping)
"Designing Machine Learning Systems with Python" by Julian, 2016, Packt, 232 pages, $42
"Mastering Python for Data Science" by Madhavan, 2015, Packt, 294 pages, $39
"Learning Data Mining with Python" by Layton, 2015, 369 pages, $43
"Python Data Science Cookbook" by Subramanian, 2015, 347 pages, $48
"Data Science From Scratch" by Grus, 2015, 330 pages, $24
"Learning scikit-learn" by Moncecchi and Garreta, 2013, 118 pages, $28
"Building Machine Learning Systems with Python" by Coelho and Richert, 2015, 305 pages, $49
"Python Machine Learning" by Raschka, 2015, 454 pages, $34
The whittling-down turned out to be harder than expected: Python titles are better than R counterparts, and Madhavan's book alone was easy to dismiss. Subramanian, Moncecchi-Garreta and Julian did not make the cut based on comparison with alternatives, but were not of themselves bad. Grus is the beginner's best bet - beginners can stop reading here - while Bowles is a book which I like a lot, but which may be a bit too specialist. As a reviewer, thinking about what other "intermediate" readers might find useful, I end up pointing to the trio of Raschka, Layton and Coelho-Richert as the books worth choosing from.
I distinguish Raschka, in appreciation of his more pedagogical style - or maybe I am just giving the top spot to the thickest book! - but the other two titles are definitely worth checking out. Compared to Coelho-Richert (CR), Layton's book surveys a wider range of algorithms - a good third of CR's page count is devoted to text analysis, which means less space for everything else - but strangely neglects regression, my own primary interest. (This is why I dock one star). The writing is more "cohesive" and methodical - but while Coelho and Richert know to "liven up" the early chapters with visualizations, Layton does not use "matplotlib" till page 98. (And after that, you see charts in the chapter on graph mining - notably, a topic you don't find in the other two books). Get both, and see which one you prefer.


