Hive was developed by Facebook and later open sourced in Apache community. Hive provides SQL like interface to run queries on Big Data frameworks. Hive provides SQL like syntax also called as HiveQL that includes all SQL capabilities like analytical functions which are the need of the hour in today's Big Data world.

This book provides you easy installation steps with different types of metastores supported by Hive. This book has simple and easy to learn recipes for configuring Hive clients and services. You would also learn different Hive optimizations including Partitions and Bucketing. The book also covers the source code explanation of latest Hive version.

Hive Query Language is being used by other frameworks including spark. Towards the end you will cover integration of Hive with these frameworks.

Hanish Bansal

Hanish Bansal is a software engineer with over 4 years of experience in developing big data applications. He loves to study emerging solutions and applications mainly related to big data processing, NoSQL, natural language processing, and neural networks. He has worked on various technologies such as Spring Framework, Hibernate, Hadoop, Hive, Flume, Kafka, Storm, and NoSQL databases, which include HBase, Cassandra, MongoDB, and search engines such as Elasticsearch. In 2012, he completed his graduation in Information Technology stream from Jaipur Engineering College and Research Center, Jaipur, India. He was also the technical reviewer of the book Apache Zookeeper Essentials. In his spare time, he loves to travel and listen to music. You can read his blog at http://hanishblogger.blogspot.in/ and follow him on Twitter at https://twitter.com/hanishbansal786.

Saurabh Chauhan

Saurabh Chauhan is a module lead with close to 8 years of experience in data warehousing and big data applications. He has worked on multiple Extract, Transform and Load tools, such as Oracle Data Integrator and Informatica as well as on big data technologies such as Hadoop, Hive, Pig, Sqoop, and Flume. He completed his bachelor of technology in 2007 from Vishveshwarya Institute of Engineering and Technology. In his spare time, he loves to travel and discover new places. He also has a keen interest in sports.

Shrey Mehrotra

Shrey Mehrotra has 6 years of IT experience and, since the past 4 years, in designing and architecting cloud and big data solutions for the governmental and financial domains. Having worked with big data R&D Labs and Global Data and Analytical Capabilities, he has gained insights into Hadoop, focusing on HDFS, MapReduce, and YARN. His technical strengths also include Hive, Pig, Spark, Elasticsearch, Sqoop, Flume, Kafka, and Java. He likes spending time performing R&D on different big data technologies. He is the coauthor of the book Learning YARN, a certified Hadoop developer, and has also written various technical papers. In his free time, he listens to music, watches movies, and spending time with friends.

  • Preface
  • Chapter 1 : Developing Hive
    • Introduction
    • Deploying Hive on a Hadoop cluster
    • Deploying Hive Metastore
    • Installing Hive
    • Configuring HCatalog
    • Understanding different components of Hive
    • Compiling Hive from source
    • Hive packages
    • Debugging Hive
    • Running Hive
    • Changing configurations at runtime
  • Chapter 2 : Services in Hive
    • Introducing HiveServer2
    • Understanding HiveServer2 properties
    • Configuring HiveServer2 high availability
    • Using HiveServer2 Clients
    • Introducing the Hive metastore service
    • Configuring high availability of metastore service
    • Introducing Hue
  • Chapter 3 : Understanding the Hive Data Model
    • Introduction
    • Using numeric data types
    • Using string data types
    • Using Date/Time data types
    • Using miscellaneous data types
    • Using complex data types
    • Using operators
    • Partitioning
    • Partitioning a managed table
    • Partitioning an external table
    • Bucketing
  • Chapter 4 : Hive Data Definition Language
    • Introduction
    • Creating a database schema
    • Dropping a database schema
    • Altering a database schema
    • Using a database schema
    • Showing database schemas
    • Describing a database schema
    • Creating tables
    • Dropping tables
    • Truncating tables
    • Renaming tables
    • Altering table properties
    • Creating views
    • Dropping views
    • Altering the view properties
    • Altering the view as select
    • Showing tables
    • Showing partitions
    • Show the table properties
    • Showing create table
    • HCatalog
    • WebHCat
  • Chapter 5 : Hive Data Manipulation Language
    • Introduction
    • Loading files into tables
    • Inserting data into Hive tables from queries
    • Inserting data into dynamic partitions
    • Writing data into files from queries
    • Enabling transactions in Hive
    • Inserting values into tables from SQL
    • Updating data
    • Deleting data
  • Chapter 6 : Hive Extensibility Features
    • Introduction
    • Serialization and deserialization formats and data types
    • Exploring views
    • Exploring indexes
    • Hive partitioning
    • Creating buckets in Hive
    • Analytics functions in Hive
    • Windowing in Hive
    • File formats
  • Chapter 7 : Joins and Join Optimization
    • Understanding the joins concept
    • Using a left/right/full outer join
    • Using a left semi join
    • Using a cross join
    • Using a map-side join
    • Using a bucket map join
    • Using a bucket sort merge map join
    • Using a skew join
  • Chapter 8 : Statistics in Hive
    • Bringing statistics in to Hive
    • Table and partition statistics in Hive
    • Column statistics in Hive
    • Top K statistics in Hive
  • Chapter 9 : Functions in Hive
    • Using built-in functions
    • Using the built-in User-defined Aggregation Function (UDAF)
    • Using the built-in User Defined Table Function (UDTF)
    • Creating custom User-Defined Functions (UDF)
  • Chapter 10 : Hive Tuning
    • Enabling predicate pushdown optimizations in Hive
    • Optimizations to reduce the number of map
    • Sampling
  • Chapter 11 : Hive Security
    • Securing Hadoop
    • Authorizing Hive
    • Configuring the SQL standards-based authorization
    • Authenticating Hive
  • Chapter 12 : Hive Integration with Other Frameworks
    • Working with Apache Spark
    • Working with Accumulo
    • Working with HBase
    • Working with Google Drill
  • Index
紙本書 NT$ 1440
NT$ 1152

還沒安裝 HyRead 3 嗎?馬上免費安裝~
QR Code