Summer Reading B2G1 Free
 
 

Special Offers see all

Enter to WIN a $100 Credit

Subscribe to PowellsBooks.news
for a chance to win.
Privacy Policy

Visit our stores


    Recently Viewed clear list


    Lists | July 16, 2015

    Annie Liontas: IMG "You Want Me to Smell My Fingers?": Five Unforgettable Greek Idioms



    The word "idiom" originates in the Greek word ídios ("one's own") and means "special feature" or "special phrasing." Idioms are peculiar because,... Continue »
    1. $18.20 Sale Hardcover add to wish list

      Let Me Explain You

      Annie Liontas 9781476789088

    spacer
Qualifying orders ship free.
$26.79
List price: $39.99
New Trade Paper
Ships in 1 to 3 days
Add to Wishlist
Qty Store Section
1 Local Warehouse Mathematics- General

More copies of this ISBN

Bad Data Handbook: Cleaning Up the Data So You Can Get Back to Work

by

Bad Data Handbook: Cleaning Up the Data So You Can Get Back to Work Cover

 

Synopses & Reviews

Publisher Comments:

What is bad data? Some people consider it a technical phenomenon, like missing values or malformed records, but bad data includes a lot more. In this handbook, data expert Q. Ethan McCallum has gathered 19 colleagues from every corner of the data arena to reveal how theyve recovered from nasty data problems.

From cranky storage to poor representation to misguided policy, there are many paths to bad data. Bottom line? Bad data is data that gets in the way. This book explains effective ways to get around it.

Among the many topics covered, youll discover how to:

  • Test drive your data to see if its ready for analysis
  • Work spreadsheet data into a usable form
  • Handle encoding problems that lurk in text data
  • Develop a successful web-scraping effort
  • Use NLP tools to reveal the real sentiment of online reviews
  • Address cloud computing issues that can impact your analysis effort
  • Avoid policies that create data analysis roadblocks
  • Take a systematic approach to data quality analysis

Synopsis:

Even if you're relatively new to the data science field, you've likely encountered your share of bad data: missing values and arcane file formats are rather pedestrian matters. But those are just the beginning. The idea of bad data is an ecosystem unto itself, that also includes mismatches in character set, data that changes behind your back, and data you don't know how to handle on your own.

In short, bad data is data that gets in the way.

In the Bad Data Handbook, Q. Ethan McCallum gathers cast of authors to explore the wide variety of data headaches, including:

  • Different forms of bad data, and how to spot it
  • Techniques for wrangling bad data
  • Infrastructure and policy matters that will impact your data analysis efforts
  • Procedures to keep bad data from getting worse (and, perhaps, to help it get better)

Synopsis:

Welcome to data sciences dirty secret: real-world data is messy. Data scientists must spend a good deal of time playing software developer, writing code to clean up data before they can actually do anything constructive with it.

Its a necessary evil, but you can still make the most of it. This practical book walks you through several real-world examples to demonstrate the theory and practice behind working with and cleaning up dirty data.

No one tool solves all of the problems well. Wise data scientists learn many tools and learn where each one shines. To that end, this book takes a polyglot approach: most examples will involve R and Python, but expect the occasional smattering of Groovy and sed/awk fun.

About the Author

Q Ethan McCallum is a consultant, writer, and technology enthusiast, though perhaps not in that order. His work has appeared online on The OReilly Network and Java.net, and also in print publications such as C/C++ Users Journal, Doctor Dobbs Journal, and Linux Magazine. In his professional roles, he helps companies to make smart decisions about data and technology.

Table of Contents

  • About the Authors
  • Preface
  • Chapter 1: Setting the Pace: What Is Bad Data?
  • Chapter 2: Is It Just Me, or Does This Data Smell Funny?
  • Chapter 3: Data Intended for Human Consumption, Not Machine Consumption
  • Chapter 4: Bad Data Lurking in Plain Text
  • Chapter 5: (Re)Organizing the Webs Data
  • Chapter 6: Detecting Liars and the Confused in Contradictory Online Reviews
  • Chapter 7: Will the Bad Data Please Stand Up?
  • Chapter 8: Blood, Sweat, and Urine
  • Chapter 9: When Data and Reality Dont Match
  • Chapter 10: Subtle Sources of Bias and Error
  • Chapter 11: Dont Let the Perfect Be the Enemy of the Good: Is Bad Data Really Bad?
  • Chapter 12: When Databases Attack: A Guide for When to Stick to Files
  • Chapter 13: Crouching Table, Hidden Network
  • Chapter 14: Myths of Cloud Computing
  • Chapter 15: The Dark Side of Data Science
  • Chapter 16: How to Feed and Care for Your Machine-Learning Experts
  • Chapter 17: Data Traceability
  • Chapter 18: Social Media: Erasable Ink?
  • Chapter 19: Data Quality Analysis Demystified: Knowing When Your Data Is Good Enough
  • Colophon

Product Details

ISBN:
9781449321888
Author:
Mccallum, Q. Ethan.
Publisher:
O'Reilly Media
Author:
McCallum, Q. Ethan
Subject:
Database Management - General
Subject:
Database design
Subject:
R;analysis;data;data mining;databases;programming;python
Subject:
CourseSmart Subject Description
Copyright:
Edition Description:
Print PDF
Publication Date:
20121131
Binding:
TRADE PAPER
Language:
English
Pages:
264
Dimensions:
9.19 x 7 in

Other books you might like

  1. Ethics of Big Data: Balancing Risk... New Open eBook $12.99
  2. Rowing to Latitude: Journeys Along...
    Used Trade Paper $6.95
  3. Yellow fever, black goddess :the... Used Hardcover $6.95
  4. Seven Databases in Seven Weeks: A... Used Trade Paper $24.00
  5. Feet of Clay: Saints, Sinners, and... Used Trade Paper $5.95
  6. When One Has Lived a Long Time Alone Used Trade Paper $6.95

Related Subjects

Computers and Internet » Computers Reference » Beginning and Reference
Computers and Internet » Computers Reference » General
Computers and Internet » Database » Design
Computers and Internet » Database » General
Computers and Internet » Software Engineering » Software Management
Science and Mathematics » Mathematics » General

Bad Data Handbook: Cleaning Up the Data So You Can Get Back to Work New Trade Paper
0 stars - 0 reviews
$26.79 In Stock
Product details 264 pages O'Reilly Media - English 9781449321888 Reviews:
"Synopsis" by ,

Even if you're relatively new to the data science field, you've likely encountered your share of bad data: missing values and arcane file formats are rather pedestrian matters. But those are just the beginning. The idea of bad data is an ecosystem unto itself, that also includes mismatches in character set, data that changes behind your back, and data you don't know how to handle on your own.

In short, bad data is data that gets in the way.

In the Bad Data Handbook, Q. Ethan McCallum gathers cast of authors to explore the wide variety of data headaches, including:

  • Different forms of bad data, and how to spot it
  • Techniques for wrangling bad data
  • Infrastructure and policy matters that will impact your data analysis efforts
  • Procedures to keep bad data from getting worse (and, perhaps, to help it get better)

"Synopsis" by ,

Welcome to data sciences dirty secret: real-world data is messy. Data scientists must spend a good deal of time playing software developer, writing code to clean up data before they can actually do anything constructive with it.

Its a necessary evil, but you can still make the most of it. This practical book walks you through several real-world examples to demonstrate the theory and practice behind working with and cleaning up dirty data.

No one tool solves all of the problems well. Wise data scientists learn many tools and learn where each one shines. To that end, this book takes a polyglot approach: most examples will involve R and Python, but expect the occasional smattering of Groovy and sed/awk fun.

spacer
spacer
  • back to top

FOLLOW US ON...

     
Powell's City of Books is an independent bookstore in Portland, Oregon, that fills a whole city block with more than a million new, used, and out of print books. Shop those shelves — plus literally millions more books, DVDs, and gifts — here at Powells.com.