50
Used, New, and Out of Print Books - We Buy and Sell - Powell's Books
Cart |
|  my account  |  wish list  |  help   |  800-878-7323
Hello, | Login
MENU
  • Browse
    • New Arrivals
    • Bestsellers
    • Featured Preorders
    • Award Winners
    • Audio Books
    • See All Subjects
  • Used
  • Staff Picks
    • Staff Picks
    • Picks of the Month
    • Bookseller Displays
    • 50 Books for 50 Years
    • 25 Best 21st Century Sci-Fi & Fantasy
    • 25 PNW Books to Read Before You Die
    • 25 Books From the 21st Century
    • 25 Memoirs to Read Before You Die
    • 25 Global Books to Read Before You Die
    • 25 Women to Read Before You Die
    • 25 Books to Read Before You Die
  • Gifts
    • Gift Cards & eGift Cards
    • Powell's Souvenirs
    • Journals and Notebooks
    • socks
    • Games
  • Sell Books
  • Blog
  • Events
  • Find A Store

Don't Miss

  • Spring Sale
  • Scientifically Proven Sale
  • Powell's Author Events
  • Oregon Battle of the Books
  • Audio Books

Visit Our Stores


Jinwoo Chong: Clock In: Jinwoo Chong’s Playlist for 'Flux' (0 comment)
I had my first inklings of the novel that eventually became Flux about a year after I was laid off from my first job after college, the result of a corporate takeover of my company that eliminated my entire department. While a tough hurdle to overcome at twenty-one years old, I learned a lot about self-sufficiency....

Read More»

  • Esther Yi: The Writers That Haunt Me: Esther Yi’s Bookshelf for 'Y/N' (0 comment)
  • Kelsey Ford: 10 Books That Celebrate Women’s Rights and Women’s Wrongs (0 comment)

{1}
##LOC[OK]##
{1}
##LOC[OK]## ##LOC[Cancel]##
{1}
##LOC[OK]## ##LOC[Cancel]##

Virtualizing Hadoop How to Install Deploy & Optimize Hadoop in a Virtualized Architecture

by George Trujillo, Charles Kim
Virtualizing Hadoop How to Install Deploy & Optimize Hadoop in a Virtualized Architecture

  • Comment on this title
  • Synopses & Reviews

ISBN13: 9780133811025
ISBN10: 0133811026



All Product Details

View Larger ImageView Larger Images
Ships free on qualified orders.
Add to Cart
0.00
Trade Paperback
Ships in 1 to 3 days
Add to Wishlist

Synopses & Reviews

Publisher Comments

This is the only complete foundational guide to virtualizing Hadoop and deploying it in the cloud. The authors demystify all aspects of virtualizing Hadoop at scale, empowering DBAs, BI specialists, integrators, architects, and managers to deploy quickly and achieve outstanding performance.

¿

Hadoop as a Service combines exceptional clarity for Hadoop newcomers with realistic examples for building deep technical skill. Drawing on their immense experience, the authors identify specific obstacles and challenges in virtualizing Hadoop, helping you avoid pitfalls, mitigate risks, and achieve superior results.

¿

The authors focus on the baseline Apache Software Foundation Hadoop 2 distribution, while also addressing subtle differences in Cloudera and VMware/EMC's Pivotal HD. Coverage includes:

  • Core Big Data and NoSQL concepts you should know before you start
  • Understanding how data works and moves throughout Hadoop clusters
  • Integrating Hadoop into your overall enterprise data architecture
  • Mastering Linux-based best practices for virtualizing Hadoop
  • Virtualizing master and data servers
  • Simplifying and accelerating deployment, and more

Synopsis

Plan and Implement Hadoop Virtualization for Maximum Performance, Scalability, and Business Agility

Enterprises running Hadoop must absorb rapid changes in big data ecosystems, frameworks, products, and workloads. Virtualized approaches can offer important advantages in speed, flexibility, and elasticity. Now, a world-class team of enterprise virtualization and big data experts guide you through the choices, considerations, and tradeoffs surrounding Hadoop virtualization. The authors help you decide whether to virtualize Hadoop, deploy Hadoop in the cloud, or integrate conventional and virtualized approaches in a blended solution.

First, Virtualizing Hadoop reviews big data and Hadoop from the standpoint of the virtualization specialist. The authors demystify MapReduce, YARN, and HDFS and guide you through each stage of Hadoop data management. Next, they turn the tables, introducing big data experts to modern virtualization concepts and best practices.

Finally, they bring Hadoop and virtualization together, guiding you through the decisions you ll face in planning, deploying, provisioning, and managing virtualized Hadoop. From security to multitenancy to day-to-day management, you ll find reliable answers for choosing your best Hadoop strategy and executing it.

Coverage includes the following:

Reviewing the frameworks, products, distributions, use cases, and roles associated with Hadoop

Understanding YARN resource management, HDFS storage, and I/O

Designing data ingestion, movement, and organization for modern enterprise data platforms

Defining SQL engine strategies to meet strict SLAs

Considering security, data isolation, and scheduling for multitenant environments

Deploying Hadoop as a service in the cloud

Reviewing the essential concepts, capabilities, and terminology of virtualization

Applying current best practices, guidelines, and key metrics for Hadoop virtualization

Managing multiple Hadoop frameworks and products as one unified system

Virtualizing master and worker nodes to maximize availability and performance

Installing and configuring Linux for a Hadoop environment

Normal 0 false false false EN-US JA X-NONE

"


About the Author

George J. Trujillo, Jr. is an experienced corporate executive with exceptional communication skills. He is an expert in change management with strong leadership skills, critical thinking, and data-driven decisions. George is an internationally recognized data architect, leader, and speaker in big data and cloud solutions. His background includes Big Data Architecture, Hadoop (Hortonworks, Cloudera), data governance, schema design, metadata management, security, NoSQL, and BI. He has many industry recognitions, including Oracle Recognized Double ACE, Sun Ambassador for Sun Microsystem’s Application Middleware Platform, VMware Recognized vExpert, VMware Certified Instructor, MySQL’s Socrates Award, and MySQL Certified DBA. His leadership in the user community includes Independent Oracle Users Group (IOUG) board of directors, president of IOUG Cloud SIG, chair for RMOUG Big Data SIG, president of RMOUG Cloud SIG, Oracle Fusion Council and Oracle Beta Leadership Council, IOUG’s Elected to “Oracles of Oracle” circle, and master presenter for the IOUG’s Master Series. His many job positions have included vice president of big data architecture in the financial services industry, master principal big data specialist at Hortonworks, tier one data specialist for VMware Center of Excellence, and CEO for professional services and training organization.

 

Charles Kim is the president of Viscosity North America, a niche consulting organization specializing in big data, Oracle Exadata/RAC, and virtualization. Charles is an architect in Hadoop/big data, Linux infrastructure, cloud, virtualization, engineered systems, and Oracle clustering technologies. Charles is an author with Oracle Press, Pearson, and APress in Oracle, Hadoop, and Linux technology stacks. He holds certifications in Oracle, VMware, Red Hat Linux, and Microsoft and has more than 23 years of IT experience on mission- and business-critical systems.

Charles presents regularly at VMworld, Oracle OpenWorld, IOUG, and various local/regional user group conferences. He is an Oracle ACE director, VMware vExpert, Oracle Certified DBA, Certified Exadata Specialist, and a Certified RAC Expert. Charles’s books include the following:

·        Oracle Database 11g New Features for DBA and Developers

·         Linux Recipes for Oracle DBAs

·         Oracle Data Guard 11g Handbook

·         Virtualizing Business Critical Oracle Databases: Database as a Service

·         Oracle ASM 12c Pocket Reference Guide

·         Expert Exadata Handbook

Charles is the president of the Cloud Computing (and Virtualization) SIG for the Independent Oracle User Group. Charles blogs regularly at the DBAExpert.com/ blog site.

His LinkedIn profile is http://www.linkedin.com/in/chkim.

His Twitter tag is @racdba

 

Steven Jones is a 16-year veteran of technical training with experience in UNIX, networking, database technology, virtualization, and big data. Steven works at VMware as a VMware Certified Instructor; VCA; VCP 4, 5, 6; and vExpert 2014, 2015. He is a coauthor of Virtualize Oracle Business Critical Databases: Database Infrastructure as a Service, by Charles Kim, George Trujillo, Steven Jones, and Sudhir Balasubramanian 2014 iBooks. He was a speaker for VMworld 2013 Virtualizing Mission Critical Oracle RAC with vC Ops, San Francisco and Barcelona, and a co-speaker worldwide for VMware Education SDDC Intensive Workshop. Steven seeks to bring innovation, analogy, and narrative to understanding and mastering information technology as a service.

 

Rommel Garcia is a senior solutions engineer at Hortonworks, a leading open source company driving the adoption of Hadoop. Rommel has spent the past few years focusing on the design, installation, and deployment of large-scale Hadoop ecosystems. He has helped organizations implement security best practices and guidelines for Hadoop platforms. He has performance tuned Hadoop clusters ranging from fast-growing startups to Fortune 100 organizations. Rommel is a nationally recognized speaker at Hadoop and big data conferences. He is also well known for his expertise in performance tuning Java applications and middle-tier platforms. He has a BS in electronics engineering and an MS degree in computer science. Rommel resides in Atlanta with his wife, Elizabeth, and his children, Mila and Braden.

 

Justin Murray is a senior technical marketing architect at VMware. He holds a BA and a post-graduate diploma in computer science from University College Cork in Ireland. Justin has worked in software engineering, technical training, and consulting in various companies in the UK and the United States. Since 2007, he has been working with VMware’s partner companies to validate and optimize big data and other next-generation application workloads on VMware vSphere.


Table of Contents

Foreword xix

Preface xxi

Part I: Introduction to Hadoop

Chapter 1 Understanding the Big Data World 1

The Data Revolution 2

Traditional Data Systems 4

    Semi-Structured and Unstructured Data 5

    Causation and Correlation 7

    Data Challenges 8

The Modern Data Architecture 17

Organizational Transformations 20

Industry Transformation 21

Summary 22

Chapter 2 Hadoop Fundamental Concepts 23

Types of Data in Hadoop 23

Use Cases 25

What Is Hadoop? 26

Hadoop Distributions 32

Hadoop Frameworks 32

NoSQL Databases 37

    What Is NoSQL? 38

A Hadoop Cluster 42

Hadoop Software Processes 45

    Hadoop Hardware Profiles 48

Roles in the Hadoop Environment 56

Summary 59

Chapter 3 YARN and HDFS 61

A Hadoop Cluster Is Distributed 61

Hadoop Directory Layouts 65

    Hadoop Operating System Users 67

The Hadoop Distributed File System 67

    YARN Logging 70

    The NameNode 70

    The DataNode 71

    Block Placement 75

    NameNode Configurations and Managing Metadata 77

Rack Awareness 82

    Block Management 83

    The Balancer 84

    Maintaining Data Integrity in the Cluster 84

Quotas and Trash 92

YARN and the YARN Processing Model 93

    Running Applications on YARN 101

    Resource Schedulers 107

    Benchmarking 112

    TeraSort Benchmarking Suite 115

Summary 117

Chapter 4 The Modern Data Platform 119

Designing a Hadoop Cluster 119

    Enterprise Data Movement 124

Summary 140

Chapter 5 Data Ingestion 141

Extraction, Loading, and Transformation (ELT) 141

    Sqoop: Data Movement with SQL Sources 143

    Flume: Streaming Data 148

    Oozie: Scheduling and Workfl ow 167

    Falcon: Data Lifecycle Management 172

    Kafka: Real-time Data Streaming 176

Summary 186

Chapter 6 Hadoop SQL Engines 187

Where SQL Was Born 187

SQL in Hadoop 188

Hadoop SQL Engines 190

    Selecting the SQL Tool For Hadoop 190

Now Getting Groovy with Hive and Pig 198

    Hive 199

    HCatalog 213

    Pig 215

Summary 221

Chapter 7 Multitenancy in Hadoop 223

Securing the Access 224

    Authentication 225

    Auditing 230

    Authorization 230

    Data Protection 232

    Isolating the Data 241

    Isolating the Process 251

Summary 255

Part II: Introduction to Virtualization

Chapter 8 Virtualization Fundamentals 257

Why Virtualize Hadoop? 258

    Introduction to Virtualization 261

Summary 276

References 276

Chapter 9 Best Practices for Virtualizing Hadoop 277

Running Virtualized Hadoop with Purpose and Discipline 277

    The Discipline of Purpose Starts with a Clear Target 279

    Virtualizing Different Tiers of Hadoop 280

    Industry Best Practices 282

Summary 298

Part III: Virtualizing Hadoop

Chapter 10 Virtualizing Hadoop 299

How Are Hadoop Ecosystems Going to Be Managed? 300

    Building an Enterprise Hadoop Platform That Is Agile and Flexible 301

    Clarification of Terms 302

    The Journey from Bare-Metal to Virtualization 303

Why Consider Virtualizing Hadoop? 304

    Benefits of Virtualizing Hadoop 305

    Virtualized Hadoop Can Run as Fast or Faster Than Native 306

    Coordination and Cross-Purpose Specialization Is the Future 309

    Barriers Can Be Organizational 310

    Virtualization Is Not an All or Nothing Option 310

    Rapid Provisioning and Improving Quality of Development and Test Environments 311

    Improve High Availability with Virtualization 313

    Use Virtualization to Leverage Hadoop Workloads 313

    Hadoop in the Cloud 314

    Big Data Extensions 314

    The Path to Virtualization 315

    The Software-Defined Data Center 316

    Virtualizing the Network 318

    vRealize Suite 320

Summary 321

References 322

Chapter 11 Virtualizing Hadoop Master Servers 323

Virtualizing Servers in a Hadoop Cluster 324

    Virtualizing the Environment Around Hadoop 325

    Virtualizing the Master Hadoop Servers 325

    Virtualizing Without the SAN 330

Summary 331

Chapter 12 Virtualizing the Hadoop Worker Nodes 333

A Brief Introduction to the Worker Nodes in Hadoop 333

Deployment Models for Hadoop Clusters 335

    The Combined Model 336

    The Separated Model 339

    Network Effects of the Data-Compute Separation 341

    The Shared-Storage Approach to the Data-Compute Separated Model 343

    Local Disks for the Application’s Temporary Data 345

    The Shared Storage Architecture Model Using Network-Attached Storage (NAS) 345

    Deployment Model Summary 348

Best Practices for Virtualizing Hadoop Workers 349

    Disk I/O 349

The Hadoop Virtualization Extensions (HVE) 354

Summary 357

References 358

Resources 358

Chapter 13 Deploying Hadoop as a Service in the Private Cloud 361

The Cloud Context 361

    Stakeholders for Hadoop 362

    Overview of the Solution Architecture 368

Summary 370

References 371

Chapter 14 Understanding the Installation of Hadoop 373

Map the Right Solutions to the Right Use Case 373

    Thoughts About Installing Hadoop 374

Configuring Repositories 376

    Installing HDP 2.2 378

    Environment Preparation 378

Setting Up the Hadoop Configuration 389

Starting HDFS and YARN 393

    Start YARN 396

    Verifying MapReduce Functionality 398

Installing and Configuring Hive 400

Installing and Configuring MySQL Database 401

Installing and Configuring Hive and HCatalog 401

Summary 404

Chapter 15 Configuring Linux for Hadoop 405

Supported Linux Platforms 406

Different Deployment Models 406

Linux Golden Templates 407

    Building a Linux Enterprise Hadoop Platform 408

    Selecting the Linux Distribution 411

Optimal Linux Kernel Parameters and System Settings 411

    epoll 411

    Disable Swap Space 412

    Disable Security During Install 412

    IO Scheduler Tuning 414

    Check Transparent Huge Pages Configuration 414

    Limits.conf 414

    Partition Alignment for RDMs 415

    File System Considerations 416

    Lazy Count Parameter for XFS 418

    Mount Options 418

    I/O Scheduler 419

    Disk Read and Write Options 421

    Storage Benchmarking 421

    Java Version 422

    Set Up NTP 423

    Enable Jumbo Frames 424

    Additional Network Considerations 425

Summary 427

Appendix A Hadoop Cluster Creation: A Prerequisite Checklist 429

Appendix B Big Data/Hadoop on VMware vSphere Reference Materials 433

Deployment Guides 433

Reference Architectures 434

Customer Case Studies 434

Performance 434

vSphere Big Data Extensions (BDE) 435

Other vSphere Features and Big Data 436

 

 

9780133811025   TOC   7/7/2015

 


What Our Readers Are Saying

Be the first to share your thoughts on this title!




Product Details

ISBN:
9780133811025
Binding:
Trade Paperback
Publication date:
07/30/2015
Publisher:
VMWARE PRESS
Series info:
Vmware Press Technology
Pages:
480
Height:
1.00IN
Width:
6.90IN
Thickness:
1.00
Illustration:
Yes
Author:
Charles Kim
Author:
George Trujillo
Media Run Time:
B

Ships free on qualified orders.
Add to Cart
0.00
Trade Paperback
Ships in 1 to 3 days
Add to Wishlist
Used Book Alert for book Receive an email when this ISBN is available used.
{1}
##LOC[OK]##
{1}
##LOC[OK]## ##LOC[Cancel]##
{1}
##LOC[OK]## ##LOC[Cancel]##
{1}
##LOC[OK]##
{1}
##LOC[OK]## ##LOC[Cancel]##
{1}
##LOC[OK]## ##LOC[Cancel]##
{1}
##LOC[OK]##
{1}
##LOC[OK]## ##LOC[Cancel]##
{1}
##LOC[OK]## ##LOC[Cancel]##
{1}
##LOC[OK]##
{1}
##LOC[OK]## ##LOC[Cancel]##
{1}
##LOC[OK]## ##LOC[Cancel]##
  • Twitter
  • Facebook
  • Pinterest
  • Instagram

  • Help
  • Guarantee
  • My Account
  • Careers
  • About Us
  • Security
  • Wish List
  • Partners
  • Contact Us
  • Shipping
  • Transparency ACT MRF
  • Sitemap
  • © 2023 POWELLS.COM Terms

{1}
##LOC[OK]##
{1}
##LOC[OK]## ##LOC[Cancel]##
{1}
##LOC[OK]## ##LOC[Cancel]##
{1}
##LOC[OK]##
{1}
##LOC[OK]## ##LOC[Cancel]##
{1}
##LOC[OK]## ##LOC[Cancel]##
{1}
##LOC[OK]##
{1}
##LOC[OK]## ##LOC[Cancel]##
{1}
##LOC[OK]## ##LOC[Cancel]##
{1}
##LOC[OK]##
{1}
##LOC[OK]## ##LOC[Cancel]##
{1}
##LOC[OK]## ##LOC[Cancel]##