EPUB PDF JPG
本書有DRM加密保護,需使用HyRead閱讀軟體開啟
  • Real-Time big data analytics:design, process, and analyze large sets of complex data in real time
  • 點閱:5
  • 作者: Sumit Gupta,Shilpi Saxena
  • 出版社:Packt Publishing Ltd.
  • 出版年:2016
  • ISBN:9781784391409; 9781784397401
  • 格式:EPUB 流式,PDF,JPG

Processing historical data for the past 10-20 years, performing analytics, and finally producing business insights is the most popular use case for today's modern enterprises.

Enterprises have been focusing on developing data warehouses (https://en.wikipedia.org/wiki/Data_warehouse) where they want to store the data fetched from every possible data source and leverage various BI tools to provide analytics over the data stored in these data warehouses. But developing data warehouses is a complex, time consuming, and costly process, which requires a considerable investment, both in terms of money and time.

No doubt that the emergence of Hadoop and its ecosystem have provided a new paradigm or architecture to solve large data problems where it provides a low cost and scalable solution which processes terabytes of data in a few hours which earlier could have taken days. But this is only one side of the coin. Hadoop was meant for batch processes while there are bunch of other business use cases that are required to perform analytics and produce business insights in real or near real-time (subseconds SLA). This was called real-time analytics (RTA) or near real-time analytics (NRTA) and sometimes it was also termed as 'fast data' where it implied the ability to make near real-time decisions and enable 'orders-of-magnitude' improvements in elapsed time to decisions for businesses.

A number of powerful, easy to use open source platforms have emerged to solve these enterprise real-time analytics data use cases. Two of the most notable ones are Apache Storm and Apache Spark, which offer real-time data processing and analytics capabilities to a much wider range of potential users. Both projects are a part of the Apache Software Foundation and while the two tools provide overlapping capabilities, they still have distinctive features and different roles to play.

Interesting isn't it?

Sumit Gupta is a seasoned professional, innovator, and technology evangelist with over 100 man months of experience in architecting, managing, and delivering enterprise solutions revolving around a variety of business domains, such as hospitality, healthcare, risk management, insurance, and so on. He is passionate about technology and overall he has 15 years of hands-on experience in the software industry and has been using Big Data and cloud technologies over the past 4 to 5 years to solve complex business problems.

Sumit has also authored Neo4j Essentials https://www.packtpub.com/big-data-and-business-intelligence/neo4j-essentials), Building Web Applications with Python and Neo4j (https://www.packtpub.com/application-development/

building-web-applications-python-and-neo4j), and Learning Real-time Processing with Spark Streaming (https://www.packtpub.com/big-data-and-business-intelligence/learning-real-time-processing-spark-streaming), all with Packt Publishing.

Shilpi Saxena is an IT professional and also a technology evangelist. She is an engineer who has had exposure to various domains (machine to machine space,healthcare, telecom, hiring, and manufacturing). She has experience in all the aspects of conception and execution of enterprise solutions. She has been architecting, managing, and delivering solutions in the Big Data space for the last 3 years; she also handles a high-performance and geographically-distributed team of elite engineers.

Shilpi has more than 12 years (3 years in the Big Data space) of experience in the development and execution of various facets of enterprise solutions both in the products and services dimensions of the software industry. An engineer by degree and profession, she has worn varied hats, such as developer, technical leader, product owner, tech manager, and so on, and she has seen all the flavors that the industry has to offer. She has architected and worked through some of the pioneers' production implementations in Big Data on Storm and Impala with autoscaling in AWS.

Shilpi has also authored Real-time Analytics with Storm and Cassandra (https://www.packtpub.com/big-data-and-business-intelligence/learning-real-time-analytics-storm-and-cassandra) with Packt Publishing.

  • Preface
  • Chapter 1 : Introducing the Big Data Technology Landscape and Analytics Platform
    • Big Data – a phenomenon
    • The Big Data dimensional paradigm
    • The Big Data ecosystem
    • The Big Data infrastructure
    • Components of the Big Data ecosystem
    • Distributed batch processing
    • Distributed databases (NoSQL)
    • Real - time processing
    • Summary
  • Chapter 2 : Getting Acquainted with Storm
    • An overview of Storm
    • Storm architecture and its components
    • How and when to use Storm
    • Storm internals
    • Summary
  • Chapter 3 : Processing Data with Storm
    • Storm input sources
    • Other sources for input to Storm
    • Reliability of data processing
    • Storm simple patterns
    • Storm persistence
    • Summary
  • Chapter 4 : Introduction to Trident and Optimizing Storm Performance
    • Working with Trident
    • Understanding LMAX
    • Storm internode communication
    • Understanding the Storm UI
    • Optimizing Storm performance
    • Summary
  • Chapter 5 : Getting Acquainted with Kinesis
    • Architectural overview of Kinesis
    • Creating a Kinesis streaming service
    • Summary
  • Chapter 6 : Getting Acquainted with Spark
    • An overview of Spark
    • The architecture of Spark
    • Resilient distributed datasets (RDD)
    • Writing and executing our first Spark program
    • Summary
  • Chapter 7 : Programming with RDDs
    • Understanding Spark transformations and actions
    • Programming Spark transformations and actions
    • Handling persistence in Spark
    • Summary
  • Chapter 8 : SQL Query Engine for Spark – Spark SQL
    • The architecture of Spark SQL
    • Coding our first Spark SQL job
    • Converting RDDs to DataFrames
    • Working with Parquet
    • Working with Hive tables
    • Performance tuning and best practices
    • Summary
  • Chapter 9 : Analysis of Streaming Data Using Spark Streaming
    • High level architecture
    • Coding our first Spark Streaming job
    • Querying streaming data in real time
    • Deployment and monitoring
    • Summary
  • Chapter 10 : Introducing Lambda Architecture
    • What is Lambda Architecture
    • The technology matrix for Lambda Architecture
    • Realization of Lambda Architecture
    • Summary
  • Index
紙本書 NT$ 1440
單本電子書
NT$ 1152

還沒安裝 HyRead 3 嗎?馬上免費安裝~
QR Code