Doing Data Science: Straight Talk from the Frontline Author: Cathy O'Neil | Language: English | ISBN:
B00FRSNHDC | Format: EPUB
Doing Data Science: Straight Talk from the Frontline Description
Now that people are aware that data can make the difference in an election or a business model, data science as an occupation is gaining ground. But how can you get started working in a wide-ranging, interdisciplinary field that’s so clouded in hype? This insightful book, based on Columbia University’s Introduction to Data Science class, tells you what you need to know.
In many of these chapter-long lectures, data scientists from companies such as Google, Microsoft, and eBay share new algorithms, methods, and models by presenting case studies and the code they use. If you’re familiar with linear algebra, probability, and statistics, and have programming experience, this book is an ideal introduction to data science.
Topics include:
- Statistical inference, exploratory data analysis, and the data science process
- Algorithms
- Spam filters, Naive Bayes, and data wrangling
- Logistic regression
- Financial modeling
- Recommendation engines and causality
- Data visualization
- Social networks and data journalism
- Data engineering, MapReduce, Pregel, and Hadoop
Doing Data Science is collaboration between course instructor Rachel Schutt, Senior VP of Data Science at News Corp, and data science consultant Cathy O’Neil, a senior data scientist at Johnson Research Labs, who attended and blogged about the course.
- File Size: 11353 KB
- Print Length: 406 pages
- Simultaneous Device Usage: Unlimited
- Publisher: O'Reilly Media; 1 edition (October 10, 2013)
- Sold by: Amazon Digital Services, Inc.
- Language: English
- ASIN: B00FRSNHDC
- Text-to-Speech: Enabled
X-Ray:
- Lending: Not Enabled
- Amazon Best Sellers Rank: #24,859 Paid in Kindle Store (See Top 100 Paid in Kindle Store)
- #1
in Kindle Store > Kindle eBooks > Computers & Technology > Software - #2
in Books > Science & Math > Mathematics > Applied > Stochastic Modeling - #6
in Kindle Store > Kindle eBooks > Nonfiction > Professional & Technical > Professional Science > Mathematics > Applied > Statistics
- #1
in Kindle Store > Kindle eBooks > Computers & Technology > Software - #2
in Books > Science & Math > Mathematics > Applied > Stochastic Modeling - #6
in Kindle Store > Kindle eBooks > Nonfiction > Professional & Technical > Professional Science > Mathematics > Applied > Statistics
Book review - Doing Data Science by O'Neil and Schutt, O'Reilly Media.
More breadth than depth
What is data science? The book Doing Data Science not only explains what data science is but also provides a broad overview of methods and techniques that one must master in order to call one self a data scientist. The book is based on a course about data science given at Columbia University. However it is not to be considered as a text book about data science but more as a broad introduction to a number of topics in data science.
In the spring of 2013 I followed two Coursera courses. One about the statistical programming language R and one on Data Analysis. I had for some time been looking for a book that could be used as a follow-up reading on topics in data science. This was the reason I picked up "Doing Data Science".
The book begins with a chapter about what data science is all about is followed by four chapters on topics like statistical inference, explanatory data analysis, various machine learning algorithms, linear and logistic regression, and Naive Bayes. I have a background in both mathematics and statistics and I was able to understand these chapters but the material is covered in such broad terms that I find it hard to believe that a newcomer to this topics will understand or gain much knowledge from reading these chapters. Basic math is presented about the models but without some kind of detailed explanation one cannot develop any deeper intuition for the approach explained.
The best parts of the book is definitely chapter 6 to 8 and 10. In here we find interesting discussion about coverage of data science applied to financial modeling, extracting information from data, and social networks.
I found this book to be a very odd bird indeed. It is one book you can read from back cover to front cover and not be at a disadvantage. This is because the book is really just a collection of presentations made by various people to a class taught by the primary author Rachel Schutt at Columbia University in the Fall of 2012 – Introduction to Data Science. It wasn’t entirely clear what content Schutt was directly responsible for since only some of the chapters indicate who the contributors were (one of the chapters was contributed by a group of her students!). The co-author, Cathy O’Neil, I’ve encountered before as an outspoken blogger going by the name “mathbabe” but it wasn’t specifically stated how she became part of the book project, other than to say she was one of the students in Schutt’s class. Chapter 6 was partly written by O’Neil.
Both Schutt and O’Neil are Ph.D.s data science appropriate fields, but the book was not “written” by the two, rather they seemed to have performed some kind of editing function with the materials submitted by each contributor and added commentaries of their own. As a result, the book is a hodgepodge of anecdotes, factoids, R code snippets, plots, and mathematics, all from the in-class presentations. I enjoy seeing math in data science books, but the equations in this book were sort of just floating there requiring the reader to explore further at another time.
Although I have issues with the book as it is not any sort of text for the field, I did enjoy reading it with a number of “Ah, I didn’t know that!” moments. Schutt’s credentials in data science are considerable, having worked at Google for a few years around the same time that “data science” was growing up in Silicon Valley.
Doing Data Science: Straight Talk from the Frontline Preview
Link
Please Wait...