The Fictioning Horror Sale
 
 

Recently Viewed clear list


Interviews | September 2, 2014

Jill Owens: IMG David Mitchell: The Powells.com Interview



David MitchellDavid Mitchell's newest mind-bending, time-skipping novel may be his most accomplished work yet. Written in six sections, one per decade, The Bone... Continue »
  1. $21.00 Sale Hardcover add to wish list

    The Bone Clocks

    David Mitchell 9781400065677

spacer
Qualifying orders ship free.
$34.99
List price: $39.99
New Trade Paper
Ships in 1 to 3 days
Add to Wishlist
Qty Store Section
1 Beaverton Internet- Servers

More copies of this ISBN

This title in other editions

Enterprise Data Workflows with Cascading

by

Enterprise Data Workflows with Cascading Cover

 

Synopses & Reviews

Publisher Comments:

Despite its growing use in the enterprise, building applications for Hadoop is notoriously difficult. But there is a solution. This hands-on book introduces you to Cascading, the framework that enables you to build powerful data processing applications on Hadoop without having to spend months learning the intricacies of MapReduce.

Whether youre a developer, data scientist, or system/IT administrator, youll quickly learn Cascadings streamlined approach to data processing, data filtering, and workflow optimization, using sample apps based on Java, Scala, and Clojure. Companies such as Etsy, Razorfish, TeleNav, and Twitter already use Cascading for mission-critical applications. This book shows you how this framework can help your organization extract meaningful information from large amounts of distributed data.

  • Examine best practices for using data science in enterprise-scale apps
  • Learn how to use workflows that reach beyond MapReduce to integrate other popular Big Data frameworks
  • Quickly build and test applications with familiar constructs and reusable components, and instantly deploy them onto large clusters
  • Easily discover, model, and analyze both unstructured and semi-structured data in any format and from any source
  • Seamlessly move and scale application deployments from development to production, regardless of cluster location or data size

Synopsis:

There is an easier way to build Hadoop applications. With this hands-on book, youll learn how to use Cascading, the open source abstraction framework for Hadoop that lets you easily create and manage powerful enterprise-grade data processing applications—without having to learn the intricacies of MapReduce.

Working with sample apps based on Java and other JVM languages, youll quickly learn Cascadings streamlined approach to data processing, data filtering, and workflow optimization. This book demonstrates how this framework can help your business extract meaningful information from large amounts of distributed data.

  • Start working on Cascading example projects right away
  • Model and analyze unstructured data in any format, from any source
  • Build and test applications with familiar constructs and reusable components
  • Work with the Scalding and Cascalog Domain-Specific Languages
  • Easily deploy applications to Hadoop, regardless of cluster location or data size
  • Build workflows that integrate several big data frameworks and processes
  • Explore common use cases for Cascading, including features and tools that support them
  • Examine a case study that uses a dataset from the Open Data Initiative

About the Author

Paco Nathan is a Data Scientist at Concurrent, Inc., and heads up the developer outreach program there. He has a dual background from Stanford in math/stats and distributed computing, with 25+ years experience in the tech industry. As an expert in Hadoop, R, predictive analytics, machine learning, natural language processing, Paco has built and led several expert Data Science teams, with data infrastructure based on large-scale cloud deployments. He has presented twice on the AWS Start-Up Tour, and gives talks often about Hadoop, Data Science, and Cloud Computing.

Table of Contents

PrefaceChapter 1: Getting StartedChapter 2: Extending Pipe AssembliesChapter 3: Test-Driven DevelopmentChapter 4: Scalding—A Scala DSL for CascadingChapter 5: Cascalog—A Clojure DSL for CascadingChapter 6: Beyond MapReduceChapter 7: The Workflow AbstractionChapter 8: Case Study: City of Palo Alto Open DataTroubleshooting WorkflowsIndexColophon

Product Details

ISBN:
9781449358723
Author:
Nathan, Paco
Publisher:
O'Reilly Media
Subject:
Data processing
Subject:
Computers-Reference - General
Subject:
JVM;Java;big data;cascading;cascalog;clojure;data analysis;data science;devops;enterprise;hadoop;mapreduce;scala;scalding
Copyright:
Edition Description:
Print PDF
Publication Date:
20130803
Binding:
Paperback
Language:
English
Pages:
170
Dimensions:
9.19 x 7 in

Related Subjects

Computers and Internet » Computers Reference » General
Computers and Internet » Internet » Apache
Computers and Internet » Internet » Servers
Engineering » Mechanical Engineering » General

Enterprise Data Workflows with Cascading New Trade Paper
0 stars - 0 reviews
$34.99 In Stock
Product details 170 pages O'Reilly Media - English 9781449358723 Reviews:
"Synopsis" by ,

There is an easier way to build Hadoop applications. With this hands-on book, youll learn how to use Cascading, the open source abstraction framework for Hadoop that lets you easily create and manage powerful enterprise-grade data processing applications—without having to learn the intricacies of MapReduce.

Working with sample apps based on Java and other JVM languages, youll quickly learn Cascadings streamlined approach to data processing, data filtering, and workflow optimization. This book demonstrates how this framework can help your business extract meaningful information from large amounts of distributed data.

  • Start working on Cascading example projects right away
  • Model and analyze unstructured data in any format, from any source
  • Build and test applications with familiar constructs and reusable components
  • Work with the Scalding and Cascalog Domain-Specific Languages
  • Easily deploy applications to Hadoop, regardless of cluster location or data size
  • Build workflows that integrate several big data frameworks and processes
  • Explore common use cases for Cascading, including features and tools that support them
  • Examine a case study that uses a dataset from the Open Data Initiative

spacer
spacer
  • back to top
Follow us on...




Powell's City of Books is an independent bookstore in Portland, Oregon, that fills a whole city block with more than a million new, used, and out of print books. Shop those shelves — plus literally millions more books, DVDs, and gifts — here at Powells.com.