Summer Reading B2G1 Free
 
 

Special Offers see all

Enter to WIN a $100 Credit

Subscribe to PowellsBooks.news
for a chance to win.
Privacy Policy

Visit our stores


    Recently Viewed clear list


    Original Essays | July 15, 2015

    Frank Wilczek: IMG You Are... Who?



    Writing a book is an unnatural act of communication. Speaking to a person, or even to an audience, is an interaction. Very different styles are... Continue »
    1. $20.97 Sale Hardcover add to wish list

    spacer
Qualifying orders ship free.
$39.99
New Trade Paper
Ships in 1 to 3 days
Add to Wishlist
Qty Store Section
1 Beaverton COMP- INET WEB PUB
1 Burnside Database- MIS

More copies of this ISBN

Learning Spark: Lightning-Fast Big Data Analysis

by and and and

Learning Spark: Lightning-Fast Big Data Analysis Cover

 

Synopses & Reviews

Publisher Comments:

The Web is getting faster, and the data it delivers is getting bigger. How can you handle everything efficiently? This book introduces Spark, an open source cluster computing system that makes data analytics fast to run and fast to write. Youll learn how to run programs faster, using primitives for in-memory cluster computing. With Spark, your job can load data into memory and query it repeatedly much quicker than with disk-based systems like Hadoop MapReduce.

Written by the developers of Spark, this book will have you up and running in no time. Youll learn how to express MapReduce jobs with just a few simple lines of Spark code, instead of spending extra time and effort working with Hadoops raw Java API.

  • Quickly dive into Spark capabilities such as collect, count, reduce, and save
  • Use one programming paradigm instead of mixing and matching tools such as Hive, Hadoop, Mahout, and S4/Storm
  • Learn how to run interactive, iterative, and incremental analyses
  • Integrate with Scala to manipulate distributed datasets like local collections
  • Tackle partitioning issues, data locality, default hash partitioning, user-defined partitioners, and custom serialization
  • Use other languages by means of pipe() to achieve the equivalent of Hadoop streaming

Synopsis:

Data in all domains is getting bigger. How can you work with it efficiently? Recently updated for Spark 1.3, this book introduces Apache Spark, the open source cluster computing system that makes data analytics fast to write and fast to run. With Spark, you can tackle big datasets quickly through simple APIs in Python, Java, and Scala. This edition includes new information on Spark SQL, Spark Streaming, setup, and Maven coordinates.

Written by the developers of Spark, this book will have data scientists and engineers up and running in no time. Youll learn how to express parallel jobs with just a few lines of code, and cover applications from simple batch jobs to stream processing and machine learning.

  • Quickly dive into Spark capabilities such as distributed datasets, in-memory caching, and the interactive shell
  • Leverage Sparks powerful built-in libraries, including Spark SQL, Spark Streaming, and MLlib
  • Use one programming paradigm instead of mixing and matching tools like Hive, Hadoop, Mahout, and Storm
  • Learn how to deploy interactive, batch, and streaming applications
  • Connect to data sources including HDFS, Hive, JSON, and S3
  • Master advanced topics like data partitioning and shared variables

About the Author

Holden Karau is a software development engineer at Databricks and is active in open source. She is the author of an earlier Spark book. Prior to Databricks she worked on a variety of search and classification problems at Google, Foursquare, and Amazon. She graduated from the University of Waterloo with a Bachelors of Mathematics in Computer Science. Outside of software she enjoys playing with fire, welding, and hula hooping.

Most recently, Andy Konwinski co-founded Databricks. Before that he was a PhD student and then postdoc in the AMPLab at UC Berkeley, focused on large scale distributed computing and cluster scheduling. He co-created and is a committer on the Apache Mesos project. He also worked with systems engineers and researchers at Google on the design of Omega, their next generation cluster scheduling system. More recently, he developed and led the AMP Camp Big Data Bootcamps and first Spark Summit, and has been contributing to the Spark project.

Patrick Wendell is an engineer at Databricks as well as a Spark Committer and PMC member. In the Spark project, Patrick has acted as release manager for several Spark releases, including Spark 1.0. Patrick also maintains several subsystems of Spark's core engine. Before helping start Databricks, Patrick obtained an M.S. in Computer Science at UC Berkeley. His research focused on low latency scheduling for large scale analytics workloads. He holds a B.S.E in Computer Science from Princeton University

Matei Zaharia is the creator of Apache Spark and CTO at Databricks. He holds a PhD from UC Berkeley, where he started Spark as a research project. He now serves as its Vice President at Apache. Apart from Spark, he has made research and open source contributions to other projects in the cluster computing area, including Apache Hadoop (where he is a committer) and Apache Mesos (which he also helped start at Berkeley).

Product Details

ISBN:
9781449358624
Author:
Holden Karau and Andy Konwinsk and Patrick Wendell and Matei Zaharia
Publisher:
O'Reilly Media
Author:
Karau, Holden
Author:
Konwinski, Andy
Author:
Zaharia, Matei
Author:
Hamstra, Mark
Author:
Wendell, Patrick
Subject:
Data processing
Subject:
Computers-Reference - General
Edition Description:
TRADE PAPER
Publication Date:
20150231
Binding:
TRADE PAPER
Language:
English
Pages:
276
Dimensions:
9.19 x 7 in

Related Subjects

Business » Computers
Computers and Internet » Computers Reference » General
Computers and Internet » Database » MIS
Computers and Internet » Internet » Web » Web Programming
Computers and Internet » Internet » Web Design
Computers and Internet » Internet » Web Publishing
Computers and Internet » Software Engineering » Programming and Languages
Health and Self-Help » Self-Help » General

Learning Spark: Lightning-Fast Big Data Analysis New Trade Paper
0 stars - 0 reviews
$39.99 In Stock
Product details 276 pages O'Reilly Media - English 9781449358624 Reviews:
"Synopsis" by ,

Data in all domains is getting bigger. How can you work with it efficiently? Recently updated for Spark 1.3, this book introduces Apache Spark, the open source cluster computing system that makes data analytics fast to write and fast to run. With Spark, you can tackle big datasets quickly through simple APIs in Python, Java, and Scala. This edition includes new information on Spark SQL, Spark Streaming, setup, and Maven coordinates.

Written by the developers of Spark, this book will have data scientists and engineers up and running in no time. Youll learn how to express parallel jobs with just a few lines of code, and cover applications from simple batch jobs to stream processing and machine learning.

  • Quickly dive into Spark capabilities such as distributed datasets, in-memory caching, and the interactive shell
  • Leverage Sparks powerful built-in libraries, including Spark SQL, Spark Streaming, and MLlib
  • Use one programming paradigm instead of mixing and matching tools like Hive, Hadoop, Mahout, and Storm
  • Learn how to deploy interactive, batch, and streaming applications
  • Connect to data sources including HDFS, Hive, JSON, and S3
  • Master advanced topics like data partitioning and shared variables

spacer
spacer
  • back to top

FOLLOW US ON...

     
Powell's City of Books is an independent bookstore in Portland, Oregon, that fills a whole city block with more than a million new, used, and out of print books. Shop those shelves — plus literally millions more books, DVDs, and gifts — here at Powells.com.