Learning Spark: Lightning-Fast Data Analysis

Karau, Holden

dc.contributor.author	Karau, Holden	en_us
dc.date.accessioned	2025-04-21T01:28:05Z
dc.date.available	2025-04-21T01:28:05Z
dc.date.issued	2015	en_us
dc.identifier.isbn	9781449358624	en_us
dc.identifier.other	HPU2166450	en_us
dc.identifier.uri	https://lib.hpu.edu.vn/handle/123456789/35660
dc.description.abstract	Data in all domains is getting bigger. How can you work with it efficiently? This book introduces Apache Spark, the open source cluster computing system that makes data analytics fast to write and fast to run. With Spark, you can tackle big datasets quickly through simple APIs in Python, Java, and Scala. Written by the developers of Spark, this book will have data scientists and engineers up and running in no time. You’ll learn how to express parallel jobs with just a few lines of code, and cover applications from simple batch jobs to stream processing and machine learning. Quickly dive into Spark capabilities such as distributed datasets, in-memory caching, and the interactive shellLeverage Spark’s powerful built-in libraries, including Spark SQL, Spark Streaming, and MLlibUse one programming paradigm instead of mixing and matching tools like Hive, Hadoop, Mahout, and StormLearn how to deploy interactive, batch, and streaming applicationsConnect to data sources including HDFS, Hive, JSON, and S3Master advanced topics like data partitioning and shared variables	en_us
dc.format.extent	274 p.	en_us
dc.format.mimetype	application/pdf
dc.language.iso	en	en_us
dc.publisher	O’Reilly Media	en_us
dc.subject	Data Structures	en_us
dc.subject	Data Analytics	en_us
dc.subject	Data Processing	en_us
dc.title	Learning Spark: Lightning-Fast Data Analysis	en_us
dc.type	Book	en_us
dc.size	7.82 MB	en_us
dc.department	Technology	en_us

Files in this item

Name:: Learning-Spark-Lightning-Fast- ...
Size:: 7.823Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Technology [3206]

Show simple item record