Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): Weekend Learning - Good Session - Taming Big Data with Berkeley Data Analytics Stack

January 05, 2014

Weekend Learning - Good Session - Taming Big Data with Berkeley Data Analytics Stack

Good Session - Taming Big Data with Berkeley Data Analytics Stack

Notes captured from the session

Big Data Use Cases (Making personalized decisions for each customer, Analyse data trends)

Data Processing Goals

Earlier Trend - Analyse historical data
Current Trend - Real time data processing
Goal - Sophisticated data processing (Trend analysis, Anomaly detection)

Open Analytics Stack

Apps - Data Analysis, Mining, Decision Driven Apps
Data Processing - HBase, Hive, Hadoop
Storage - HDFS
Infrastructure - Cluster

Goals of Open Analytics Stack

Support batch, interactive and stream processing

Implementation Notes

Store data in memory (SSD's, 512GB of RAM)
FB / Yahoo / Bing - Some very large jobs but vast majority are pretty small
Aggregating inputs for other jobs fit in memory of cluster
Parallelism of jobs, Failure Recovery, Job Scheduling handled
Trade-off between accuracy and response time
Single execution framework for batch, streaming and interactive computations

New layers added are mentioned in ()

Application
Data Processing (In Memory Processing)
Storage (Data Management Layer), (Resource Management)
Infrastructure
One cluster for both MPI and Hadoop
Spark (Batch & Interactive Apps Support)
Spark and Shark are available in Amazon Elastic Map Reduce
Tachyon - Storage abstraction

Architecture and Component - Screenshots

Download the components from link
AMP Lab Blog link

Good Session, Happy Learning!!!

Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database)

January 05, 2014

Weekend Learning - Good Session - Taming Big Data with Berkeley Data Analytics Stack

No comments:

About Me

What is your Expertise

Search This Blog

Git Code Repository

Translate

About Me and Disclaimer

Labels

Data Science Good Reads

Cloud, Datacentre, BigData and NOSQL Blogs

SQL Links

Archecture Blog List

Programming Problems

Startup - Reads

Perl-Python-Ruby-Linux-Oracle

Management + Leadership Blogs

Research Papers & Podcasts

My Wordpress

Interesting Reads

Useful Links - C# and .NET

Java, Selenium, QTP and Test Tools Learning

Agile Testing

Reverse Logistics Reads

Biztalk Blogs

MS BI Links

Process - Learnt it :)

Usability Guidelines - Building Better Sites

.NET Test Tools and Other Interesting Reads

Review Checklist

Blog Archive

Live Traffic

Total Pageviews

Popular Posts