- Used Books
- Staff Picks
- Gifts & Gift Cards
- Sell Books
- Stores & Events
- Let's Talk Books
Special Offers see all
More at Powell's
Recently Viewed clear list
More copies of this ISBN
This title in other editions
Programming Hiveby Edward Capriolo
Synopses & Reviews
Need to move a relational database application to Hadoop? This comprehensive guide introduces you to Apache Hive, Hadoops data warehouse infrastructure. Youll quickly learn how to use Hives SQL dialect—HiveQL—to summarize, query, and analyze large datasets stored in Hadoops distributed filesystem.
This example-driven guide shows you how to set up and configure Hive in your environment, provides a detailed overview of Hadoop and MapReduce, and demonstrates how Hive works within the Hadoop ecosystem. Youll also find real-world case studies that describe how companies have used Hive to solve unique problems involving petabytes of data.
Hive makes life much easier for developers who work with stored and managed data in Hadoop clusters, such as data warehouses. With this example-driven guide, youll learn how to use the Hive infrastructure to provide data summarization, query, and analysis—particularly with HiveQL, the query language dialect of SQL.
Youll learn how to set up Hive in your environment and optimize its use, and how it interoperates with other tools, such as HBase. Youll also learn how to extend Hive with custom code written in Java or scripting languages. Ideal for developers with prior SQL experience, this book shows you how Hive simplifies many tasks that would be much harder to implement in the lower-level MapReduce API provided by Hadoop.
About the Author
Edward Capriolo is currently System Administrator at Media6degrees where he helps design and maintain distributed data storage systems for the internet advertising industry.
Edward is a member of the Apache Software Foundation and a committer for the Hadoop-Hive project. He has experience as a developer as well Linux and network administrator and enjoys the rich world of open source software.
Jason Rutherglen is a software architect at Think Big Analytics and specializes in Big Data, Hadoop, search, and security.
Table of Contents
PrefaceChapter 1: IntroductionChapter 2: Getting StartedChapter 3: Data Types and File FormatsChapter 4: HiveQL: Data DefinitionChapter 5: HiveQL: Data ManipulationChapter 6: HiveQL: QueriesChapter 7: HiveQL: ViewsChapter 8: HiveQL: IndexesChapter 9: Schema DesignChapter 10: TuningChapter 11: Other File Formats and CompressionChapter 12: DevelopingChapter 13: FunctionsChapter 14: StreamingChapter 15: Customizing Hive File and Record FormatsChapter 16: Hive Thrift ServiceChapter 17: Storage Handlers and NoSQLChapter 18: SecurityChapter 19: LockingChapter 20: Hive Integration with OozieChapter 21: Hive and Amazon Web Services (AWS)Chapter 22: HCatalogChapter 23: Case StudiesGlossaryReferencesColophon
What Our Readers Are Saying
Other books you might like
Computers and Internet » Computer Languages » SQL