Poetry Madness
 
 

Recently Viewed clear list


Original Essays | April 11, 2014

Paul Laudiero: IMG Shit Rough Draft



I was sitting in a British and Irish romantic drama class my last semester in college when the idea for Shit Rough Drafts hit me. I was working... Continue »
  1. $9.07 Sale Trade Paper add to wish list

spacer
Qualifying orders ship free.
$21.25
New Trade Paper
Ships in 1 to 3 days
Add to Wishlist
available for shipping or prepaid pickup only
Available for In-store Pickup
in 7 to 12 days
Qty Store Section
25 Remote Warehouse Database- Design

Big Data Glossary

by

Big Data Glossary Cover

 

Synopses & Reviews

Publisher Comments:

There's been a massive amount of innovation in data tools over the last few years, thanks to a few key trends:

Learning from the web. Techniques originally developed by website developers coping with scaling issues are increasingly being applied to other domains.

CS+?=$$$. Google have proven that research techniques from computer science can be effective at solving problems and creating value in many real-world situations. That's led to increased interest in cross-pollination and investment in academic research from commercial organizations.

Cheap hardware*. Now that machines with a decent amount of processing power can be hired for just a few cents an hour, many more people can afford to do large-scale data processing. They can't afford the traditional high prices of professional data software though, so they've turned to open-source alternatives.

These trends have led to a Cambrian Explosion of new tools, which means when you're planning a new data project you have a lot to choose from. This guide aims to help you make those choices by describing each tool from the perspective of a developer looking to use them in an application. Wherever possible, this will be from my first-hand experiences, or from colleagues who have used the systems in production environments. I've made a deliberate choice to include my own opinions and impressions, so you should see this guide as a starting point for exploring the tools, not the final word. I'll do my best to explain what I like about each service but your tastes and requirements may well be quite different.

Since the goal is to help experienced engineers navigate the new data landscape, the guide only covers tools that have been created or risen to prominence in the last few years. For example, PostGres is not covered because it's been widely used for over a decade, but its Greenplum derivative is newer and less well-known, so it is included.

Synopsis:

To help you navigate the large number of new data tools available, this guide describes 60 of the most recent innovations, from NoSQL databases and MapReduce approaches to machine learning and visualization tools. Descriptions are based on first-hand experience with these tools in a production environment.

This handy glossary also includes a chapter of key terms that help define many of these tool categories:

  • NoSQL Databases—Document-oriented databases using a key/value interface rather than SQL
  • MapReduce—Tools that support distributed computing on large datasets
  • Storage—Technologies for storing data in a distributed way
  • Servers—Ways to rent computing power on remote machines
  • Processing—Tools for extracting valuable information from large datasets
  • Natural Language Processing—Methods for extracting information from human-created text
  • Machine Learning—Tools that automatically perform data analyses, based on results of a one-off analysis
  • Visualization—Applications that present meaningful data graphically
  • Acquisition—Techniques for cleaning up messy public data sources
  • Serialization—Methods to convert data structure or object state into a storable format

About the Author

A former Apple engineer, Pete Warden is the founder of OpenHeatMap, and writes on large-scale data processing and visualization.

Table of Contents

Preface; Conventions Used in This Book; Using Code Examples; Safari® Books Online; How to Contact Us; Chapter 1: Terms; 1.1 Document-Oriented; 1.2 Key/Value Stores; 1.3 Horizontal or Vertical Scaling; 1.4 MapReduce; 1.5 Sharding; Chapter 2: NoSQL Databases; 2.1 MongoDB; 2.2 CouchDB; 2.3 Cassandra; 2.4 Redis; 2.5 BigTable; 2.6 HBase; 2.7 Hypertable; 2.8 Voldemort; 2.9 Riak; 2.10 ZooKeeper; Chapter 3: MapReduce; 3.1 Hadoop; 3.2 Hive; 3.3 Pig; 3.4 Cascading; 3.5 Cascalog; 3.6 mrjob; 3.7 Caffeine; 3.8 S4; 3.9 MapR; 3.10 Acunu; 3.11 Flume; 3.12 Kafka; 3.13 Azkaban; 3.14 Oozie; 3.15 Greenplum; Chapter 4: Storage; 4.1 S3; 4.2 Hadoop Distributed File System; Chapter 5: Servers; 5.1 EC2; 5.2 Google App Engine; 5.3 Elastic Beanstalk; 5.4 Heroku; Chapter 6: Processing; 6.1 R; 6.2 Yahoo! Pipes; 6.3 Mechanical Turk; 6.4 Solr/Lucene; 6.5 ElasticSearch; 6.6 Datameer; 6.7 BigSheets; 6.8 Tinkerpop; Chapter 7: NLP; 7.1 Natural Language Toolkit; 7.2 OpenNLP; 7.3 Boilerpipe; 7.4 OpenCalais; Chapter 8: Machine Learning; 8.1 WEKA; 8.2 Mahout; 8.3 scikits.learn; Chapter 9: Visualization; 9.1 Gephi; 9.2 GraphViz; 9.3 Processing; 9.4 Protovis; 9.5 Fusion Tables; 9.6 Tableau; Chapter 10: Acquisition; 10.1 Google Refine; 10.2 Needlebase; 10.3 ScraperWiki; Chapter 11: Serialization; 11.1 JSON; 11.2 BSON; 11.3 Thrift; 11.4 Avro; 11.5 Protocol Buffers;

Product Details

ISBN:
9781449314590
Author:
Warden, Pete
Publisher:
O'Reilly Media
Subject:
Data Modeling & Design
Subject:
Database design
Edition Description:
Trade Paper
Publication Date:
20110922
Binding:
Paperback
Language:
English
Pages:
62
Dimensions:
9.19 x 7 in

Other books you might like

  1. How Will You Measure Your Life Used Trade Paper $9.50
  2. The Innovator's Dilemma: When New... Used Hardcover $9.95
  3. Hadoop: The Definitive Guide Used Trade Paper $14.95
  4. Cloud Architecture Patterns: Using... New Trade Paper $26.25
  5. Network Warrior Used Trade Paper $22.00

Related Subjects

Children's » General
Computers and Internet » Database » Design
Science and Mathematics » Electricity » General Electronics

Big Data Glossary New Trade Paper
0 stars - 0 reviews
$21.25 In Stock
Product details 62 pages O'Reilly Media - English 9781449314590 Reviews:
"Synopsis" by ,

To help you navigate the large number of new data tools available, this guide describes 60 of the most recent innovations, from NoSQL databases and MapReduce approaches to machine learning and visualization tools. Descriptions are based on first-hand experience with these tools in a production environment.

This handy glossary also includes a chapter of key terms that help define many of these tool categories:

  • NoSQL Databases—Document-oriented databases using a key/value interface rather than SQL
  • MapReduce—Tools that support distributed computing on large datasets
  • Storage—Technologies for storing data in a distributed way
  • Servers—Ways to rent computing power on remote machines
  • Processing—Tools for extracting valuable information from large datasets
  • Natural Language Processing—Methods for extracting information from human-created text
  • Machine Learning—Tools that automatically perform data analyses, based on results of a one-off analysis
  • Visualization—Applications that present meaningful data graphically
  • Acquisition—Techniques for cleaning up messy public data sources
  • Serialization—Methods to convert data structure or object state into a storable format

spacer
spacer
  • back to top
Follow us on...




Powell's City of Books is an independent bookstore in Portland, Oregon, that fills a whole city block with more than a million new, used, and out of print books. Shop those shelves — plus literally millions more books, DVDs, and gifts — here at Powells.com.