Diverse Data Science Reading Recommendations for Students, Professionals and Beginners
April 12, 2022
The field of data science covers all kinds of ground, ranging from hyper-technical model building to philosophical and ethical questions regarding privacy and bias. The industry’s literature covers much of that same ground.
Featuring recommendations from datascience@berkeley faculty, this collection covers topics such as how technology can reinforce discrimination, how data can exclude groups and amplify bias, and the tradeoffs between usefulness and transparency.
The investigations and strategies found in these books can be a useful supplement to anyone working with data, including beginners just starting out in data science, professionals with decades of experience, and everyone in between.
Use the links below to navigate to the different sections:
Books About Big Data and Statistics
The Data Detective: Ten Easy Rules to Make Sense of Statistics by Tim Harford
Hartford untangles the complicated world of statistics with 10 strategies for interpreting data in a way that addresses biases and knowledge gaps.
“A great text on how statistics and data science can help decision making. In particular, focus on Rule Six: Ask Who Is Missing. Oftentimes, folks are missing from data in a way that systematically excludes their lived experiences.” – Michael Rivera, assistant professor of practice, datascience@berkeley
For fans of The Signal and the Noise by Nate Silver and Freakonomics by Stephen J. Dubner and Steven Levitt, Everybody Lies offers surprising data-informed insights into the economy, sports, gender and more.
“The empirical findings in Everybody Lies are so intriguing that the book would be a page-turner even if it were structured as a mere laundry list. But Mr. Stephens-Davidowitz also puts forward a deft argument: The web will [revolutionize] social science just as the microscope and telescope transformed the natural sciences.” – “How to Find Out What People Really Think,” The Economist
Naked Statistics: Stripping the Dread from the Data by Charles Wheelan
Wheelan describes key statistical concepts such as inference, correlation, and regression analysis using pop culture examples and non-technical language.
“While a great measure of the book’s appeal comes from Mr. Wheelan’s fluent style — a natural comedian, he is truly the Dave Barry of the coin toss set — the rest comes from his multiple real world examples illustrating exactly why even the most reluctant mathophobe is well advised to achieve a personal understanding of the statistical underpinnings of life.” – “A Crash Course in Playing the Numbers,” The New York Times
Books About Data, Racism, and Inequality
Algorithms of Oppression: How Search Engines Reinforce Racism by Safiya Umoja Noble
Noble examines data discrimination among search engines, such as Google, where biased algorithms privilege whiteness and punish women of color.
“This is a great text on how online spaces reinforce, and magnify, racism and misogyny.” – Michael Rivera
Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor by Virginia Eubanks
Eubanks investigates how data mining, policy algorithms, and predictive risk modeling disproportionately hurt the poor and working class, as resources are both taken and given based on statistical profiles.
“Automating Inequality is riveting (an accomplishment for a book on technology and policy). Its argument should be widely circulated, to poor people, social service workers, and policymakers, but also throughout the professional classes. Everyone needs to understand that technology is no substitute for justice.” – “How Big Data Is ‘Automating Inequality,’” The New York Times
Race After Technology: Abolitionist Tools for the New Jim Code by Ruha Benjamin
Benjamin explores automation as a powerful propeller for discrimination and White supremacy. Her concept, “The New Jim Code,” and accompanying guidance explore how discriminatory design is deepening social inequities and how to decode tech’s promises.
“This book is worthy of the widest readership, leaving us not only with a deeper understanding of the mutual and shifting roles of race and technology, but also, importantly, with the manageable and doable tools with which to create alternative, equitable, inclusive, and prosperous futures.” – “Domesticating the Techno-Racial Project,” Nature Machine Technology
Books About Women and Data
Invisible Women: Exposing Data Bias in a World Designed for Men by Caroline Criado Perez
Perez examines how data that fails to take gender into account is amplifying bias and discrimination against women in our policy, healthcare, and education decisions.
“Essential reading for people of ALL genders and from all walks of life, and will likely affect how you think about the world, and about how women fit into it.” – “Invisible Women: Exposing Data Bias in a World Designed for Men,” Forbes
Data Feminism by Catherine D’Ignazio and Lauren F. Klein
An explanation of how data science can be used to eliminate pervasive biases and improve outcomes for those often hurt by discriminatory data, Data Feminism provides an intersectional guide to using feminism and data as tools toward justice.
“Anyone who works with data — and all scientists do, of course — will benefit from reading this book. But the readers who may gain the most from it are those who are trying to use data in the public interest. Data Feminism does such a good job of integrating theories and projects across several fields that it will likely become a touchstone for teaching data science that goes beyond data ethics.” – “Using Data to End Oppression,” American Scientist
Brotopia: Breaking Up the Boys’ Club of Silicon Valley by Emily Chang
Chang’s exposé of the “bro” culture among venture capital firms and tech companies is less specific to data science but reveals recurring experiences for women in male-dominated workplaces.
“…Brotopia is more than a business book. Silicon Valley holds extraordinary power over our present lives as well as whatever utopia (or nightmare) might come next. ‘If robots are going to run the world, or at the very least play a hugely critical role in our future, men shouldn’t be programming them alone,’ Chang writes. ‘The scarcity of women in an industry that is so forcefully reshaping our culture simply cannot be allowed to stand.’” – “In ‘Brotopia,’ Silicon Valley Disrupts Everything but the Boys’ Club,” The New York Times
Books About Privacy and Data Ethics
This set of solutions is based on the emerging science of socially aware algorithm design for the increasingly common privacy concerns and violations of basic rights caused by overreaching technology.
“This is a great introduction for folks just starting out in data science because it’s suitable to a general audience without [dumbing] down the technical aspects, which is a difficult balance to strike. The authors discuss the inevitable trade-offs between fairness, privacy, transparency, and usefulness when it comes to algorithmic decision making and present technical solutions on how to optimize these seemingly opposing forces at the same time.” – Kyle Hamilton, lecturer, datascience@berkeley
The Algorithmic Foundations of Differential Privacy by Cynthia Dwork and Aaron Roth
Dwork and Roth examine privacy-preserving data analysis and include an introduction to the problems and techniques of differential privacy.*
“If you are getting into the field of statistical privacy engineering, this would be a great start. The book builds its way from the basic terms and definitions into designing differentially private mechanisms and algorithms in an approachable manner.” – Daniel Aranki, assistant professor of practice, datascience@berkeley, and executive director, Berkeley Telemonitoring Project
*The Algorithmic Foundations of Differential Privacy is available for free (PDF, 1.3 MB) at the International Association of Privacy Professionals web site.
Zuboff explains the concept of surveillance capitalism as “a new economic order that claims human experience as free raw material for hidden commercial practices of extraction, prediction, and sales.”
“Zuboff’s expansive, erudite, deeply researched exploration of digital futures elucidates the norms and hidden terminal goals of information-intensive industries. Zuboff’s book is the information industry’s Silent Spring.” – Chris Hoofnagle, professor of practice, cybersecurity@berkeley
Books About Data and Business
Product-Led Growth: How to Build a Product That Sells Itself by Wes Bush
Real-life examples, email scripts, and answers to some of the most persistent business decisions in product marketing are all available in the first part of Bush’s Product-Led series.
“There is a lot of talk about product-led growth in the industry right now… As you [may] suspect, [it] requires analyzing and understanding data.” – Joyce Shen, lecturer, datascience@berkeley
Storytelling with Data: Let’s Practice! by Cole Nussbaumer Knaflic
A thorough guide to the fundamentals of data visualization and communicating information with data, Nussbaumer Knaflic’s book and accompanying exercises explain how to direct the audience’s attention, eliminate clutter, and more.
“Intended for anyone committed to improving their ability to communicate data and complemented by a web site that enables users to further hone their skills, this book is written in a fun, friendly and accessible manner and will be highly appreciated by visual learners and creative data-minded individuals.” – “Book Review: Storytelling with Data: Let’s Practice! by Cole Nussbaumer Knaflic,” LSE Review of Books
Data Smart: Using Data Science to Transform Information into Insight by John W. Foreman
This book includes nine tutorials on data science techniques such as linear programming, Naïve Bayes classification, and outlier detection using Excel spreadsheets.
“This book is set apart from many of the data mining books because of its hands-on exercises and the way the author uses those exercises to describe certain techniques and practices used in data science. The first chapter provides a primer on using Microsoft Excel because the exercises in the book use the spreadsheet.” – “Book Review: ‘Data Smart’ by John W. Foreman,” Seattle PI/BlogCritics
Citation for this content: datascience@berkeley, the online Master of Information and Data Science from UC Berkeley