Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): Big Data Testing

Showing posts with label Big Data Testing. Show all posts

October 10, 2014

HBase Overview Notes

Limitations of Hadoop 1.0

No Random Access --> Hadoop for more batch access (OLAP)
Not suitable for Real-time Access
No Update - Access Pattern is WORM (Write Once Read Multiple Times Hadoop best suited)

Why HBase

Flexible Schema Design --> Add a new column when a row is added
Multiple versions of a single cell (Data)
Columnar storage
Cache columns at client side
Compression of columns

Read v/s Write

For Availability (Compromise on Write) vs Consistency (Compromise on Read)

Hbase

NoSQL Class on Non-Relational Storage Systems
In RDBMS it is Rowkey based allocations, HBase it is columnar storage
Hbase needs HDFS for replication
ZooKeeper - Taking all requests from client. Client will communicate from zookeeper Client -> ZooKeeper -> HMaster
Region Server - It Serves the region. Region Server processor runs on slaves (Data Nodes)

Happy Learning!!!

August 16, 2014

Hbase Primer - Loading Data in HBASE Using SQLOOP / Java Code

This post is on examples using SQOOP / custom java code to import data into HBASE, HIVE, HDFS from MSSQL DB

Tried the steps on Cloudera VM - cloudera-quickstart-vm-4.4.0-1-vmware

From Linux terminal > hbase shell
create 'Employee_Details', {NAME => 'Details'}

Example #1 (Import to Hbase from MSSQL DB Table)

sqoop import --connect "jdbc:sqlserver://11.11.11.11;database=TestDB" --username "sa" --password "sa" --driver "com.microsoft.sqlserver.jdbc.SQLServerDriver" -m 1 --hbase-table "Employee_Details" --column-family "Details" --table "dbo.employee" --split-by "id"

Example #2 (List Tables in MSSQL DB Table)

sqoop list-tables --connect "jdbc:sqlserver://11.11.11.11;database=TestDB" --username "sa" --password "sa" --driver "com.microsoft.sqlserver.jdbc.SQLServerDriver"

Example #3 (Import to HDFS from MSSQL DB Table)

sqoop import --connect "jdbc:sqlserver://11.11.11.11;database=TestDB" --username "sa" --password "sa" --driver "com.microsoft.sqlserver.jdbc.SQLServerDriver" --table "dbo.employee" --split-by "id"

hadoop fs -ls (List files in hadoop file system)

Example #4 (Import into Hive Table from MSSQL DB Table)

sqoop import --hive-import --create-hive-table --hive-table Employee_Hive --connect "jdbc:sqlserver://11.11.11.11;database=TestDB" --username "sa" --password "sa" --driver "com.microsoft.sqlserver.jdbc.SQLServerDriver" -m 1 --table "dbo.employee"

Example #5 - Custom Java Code

Add all required Jars need to be added to compile the project. This was one of challenges to get this code working

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.HColumnDescriptor;

import org.apache.hadoop.hbase.HTableDescriptor;

import org.apache.hadoop.hbase.client.HBaseAdmin;

import org.apache.hadoop.hbase.client.HTable;

import org.apache.hadoop.hbase.client.Put;

import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;

public class HBaseTest

{

public static void main(String[] args) throws IOException

{

HBaseTest HT = new HBaseTest();

HT.DropTable();

HT.CreateTable();

HT.UpdateRecords();

}

public void DropTable()

{

try

{

Configuration config = HBaseConfiguration.create();

config.set("hbase.zookeeper.quorum", "localhost");

config.set("hbase.zookeeper.property.clientPort", "2181");

HBaseAdmin admin = new HBaseAdmin(config);

admin.disableTable("Employee_Details");

admin.deleteTable("Employee_Details");

}

catch(Exception Ex)

{

}

public void CreateTable()

{

try

{

Configuration config = HBaseConfiguration.create();

config.set("hbase.zookeeper.quorum", "localhost");

config.set("hbase.zookeeper.property.clientPort", "2181");

HTableDescriptor ht = new HTableDescriptor("Employee_Details");

ht.addFamily( new HColumnDescriptor("Id"));

ht.addFamily( new HColumnDescriptor("Details"));

HBaseAdmin hba = new HBaseAdmin(config);

hba.createTable( ht );

}

catch(Exception Ex)

{

}

public void UpdateRecords()

{

try

{

Configuration config = HBaseConfiguration.create();

config.set("hbase.zookeeper.quorum", "localhost");

config.set("hbase.zookeeper.property.clientPort", "2181");

HTable table = new HTable(config, "Employee_Details");

Put put = new Put(Bytes.toBytes("row1"));

put.add(Bytes.toBytes("Details"),Bytes.toBytes("Name"),Bytes.toBytes("Raka"));

put.add(Bytes.toBytes("Details"),Bytes.toBytes("Location"),Bytes.toBytes("Chennai"));

put.add(Bytes.toBytes("Id"),Bytes.toBytes("Eid"),Bytes.toBytes("Chennai"));

table.put(put);

table.close();

}

catch(Exception Ex)

{

}

This post was useful to try it out SQOOP Examples

Happy Learning!!!

August 09, 2014

HBase Primer - Querying for Seeks / Scans - Part I

This post is HBase Primer on Seeks / Scans. I recommend spending time on HBase Schema design to have a basic understanding of HBASE table structure. The motivation slide for this post is from Cloudera Session slides post.

I wanted to try out the suggested pattern of tables. This can be compared to TSQL Equivalent of DDL, DML Scripts. Querying, Select with filters is the key learning from this exercise. Key Learning's are select with filter examples. There are no aggregate functions available in Hbase. Hive and Phoenix which sits on top of Hbase Serves for this purpose of aggregations.

HBASE (One Liner)- Low Latency, Consistent, best suited for random read/write big data access

Few bookmarks provide great short intro - link1

Hbase Queries (Tried it on cloudera-quickstart-vm-4.4.0-1-vmware)

hbase shell

Disable and Drop Table

Disable 'Employee'

Drop 'Employee'

Create Table

Details is a column family. Inside this column family there are members Name, Location, DOB and Salary

create 'Employee', {NAME => 'Details'}

Insert Records

put 'Employee', 'row1', 'Details:Name', 'Raj'

put 'Employee', 'row1', 'Details:Location', 'Chennai'

put 'Employee', 'row1', 'Details:DOB', '01011990'

put 'Employee', 'row1', 'Details:Salary', '1990'

put 'Employee', 'row2', 'Details:Name', 'Raja'

put 'Employee', 'row2', 'Details:Location', 'Delhi'

put 'Employee', 'row2', 'Details:DOB', '01011991'

put 'Employee', 'row2', 'Details:Salary', '5000'

put 'Employee', 'row3', 'Details:Name', 'Kumar'

put 'Employee', 'row3', 'Details:Location', 'Mumbai'

put 'Employee', 'row3', 'Details:DOB', '01011992'

put 'Employee', 'row3', 'Details:Salary', '50000'

Select based on Column Qualifiers

scan 'Employee', {COLUMNS => ['Details:Name', 'Details:Location']}

scan 'Employee', {COLUMNS => ['Details:Location']}

Single Filter - Search by Location Column Filter

scan 'Employee' ,{ FILTER => "SingleColumnValueFilter('Details','Location',=, 'binary:Chennai')" }

ValueFilter - Search by Location Value

scan 'Employee' , { COLUMNS => 'Details:Location', FILTER => "ValueFilter (=, 'binary:Chennai')"}

Multiple Filter - Search by Name and Location

scan 'Employee' ,{ FILTER => "SingleColumnValueFilter('Details','Location',=, 'binary:Chennai') AND SingleColumnValueFilter('Details','Name',=, 'binary:Raj')"}

Timestamp and Filter - Search with Timestamp and Filter

scan 'Employee' ,{TIMERANGE=>[1407324224184,1407324224391], COLUMNS => 'Details:Location', FILTER => "ValueFilter (=, 'binary:Chennai')"}

Timestamp and Filter - Search with multiple Filter on same column (contains)

scan 'Employee' ,{ FILTER => "SingleColumnValueFilter('Details','Location',=, 'binary:Chennai') OR SingleColumnValueFilter('Details','Location',=, 'binary:Delhi')"}

Filter using regExMatch

scan 'Employee' ,{ FILTER => "SingleColumnValueFilter('Details','Location',=, 'regexstring:Che*',true,true)" }

Search using Prefix

scan 'Employee' ,{ FILTER => "SingleColumnValueFilter('Details','Location',=, 'binary:Chennai') AND PrefixFilter ('row1')"}

Return only one record

scan 'Employee', {COLUMNS => ['Details:Location'], LIMIT => 1, STARTROW => 'Chennai'}

Return two records

scan 'Employee', {COLUMNS => ['Details:Location'], LIMIT => 2, STARTROW => 'Chennai'}

Return records between range of rowkeys

scan 'Employee', {COLUMNS => ['Details:Location'], STARTROW => 'row1',STOPROW=>'row2'}

Get Specific Row with Filter

get 'Employee', 'row1', {FILTER => "ValueFilter (=, 'binary:Chennai')"}

Count records

count 'Employee'

Timestamp Range - Return within the timestamp range

scan 'Employee' ,{TIMERANGE=>[1407324224391,1407324234707]}

Hbase Table 3 Column Family

create 'Employee1', {NAME => 'Name'}, {NAME => 'Location'}, {NAME => 'DOB'}

Query with column family

scan 'Employee1', {COLUMNS => ['Name']}

Delete a Column Family

alter ‘Employee', ‘delete’ => ‘Location'

More Reads

Best Practices for Managing HBase in a High Write Environment

Hbase Shell Commands

Hbase basics

Hbase-Hive

Quick Start - Standalone HBase
Hbase with Apache Phoenix
External Hive Table on HBase
HBase Aggregation Example
Secondary Indexes in HBase

Happy Learning!!!

July 17, 2014

Big data Testing Tools - Functional and Performance

This post is based on my learning notes on functional test tools for Big Data ecosystem. Earlier posts, we have read the basics of big data ecosystem components. Sharing the first version of Test Tools Analysis for Functional Testing

Product / Area	Testing Tools	Test Approach	Programming Language	Reference
HiveMQ	MQTT Testing Utility, Tsung			link, link1
Storm	storm test		Clojure	link
Hive, Pig	Beetest, Pigmix, Apache DataFu	Query HIVE (Similar to TSQL)		link
Map Reduce Jobs	MRUnit, MRBench			link
Analytics		Lift charts, Target shuffling, Bootstrap sampling to test the consistency of the model		link
HBASE	Junit, Mockito, Apache MRUnit			link
	Jmeter Plugins for Hadoop, HBASE, Cassandra			link, link1

More Tools

Performance Testing Tools Analysis

Area	Tool	Comments
HBASE	Inbuilt tool usage – $ bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1 $ bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation --rows=10240 randomWrite 1 More Reads - link Automation scripts for comparing different HBase BlockCache implementations - link Hbase write throughput link	PerformanceEvaluation (Inbuilt HBASE tool) - (Validate Read / Writes / Scans performance in the environment etc..)
YCSB	Performance testing HBase using YCSB , link1, link2
All Big Data Areas (HBASE, Hadoop, MapReduce	Sandstorm commercial cloud / in premise tool for Big Data QA
Kafka	Benchmarking Apache Kafka with SimpleConsumer in Java Kafka performance testing Benchmarking Apache Kafka: 2 Million Writes Per Second (On Three Cheap Machines)
Spark	Testing Spark Best Practices

More Reads - Impetus Perf Engineering Blog
Rest-assured - Java DSL for easy testing of REST services
Retrofit - A type-safe REST client for Android and Java
G7 - Tools for Big Data

Cloud Testing Tools

Test Environment Setup using Cloud Infrastructure

Load generation in cloud for on premises application
Load generator on premises, application on cloud
Both load generator and application both on cloud

Amazon Cloud Pieces

EC2 - Elastic Compute Cloud -> CPUs
EBS - Elastic Block Storage -> Database
S3 - Simple Storage Services -> Storage
Ec2 Dream Tool for connecting to multiple cloud providers - link

Blazemeter

Distributed geographical performance test tool
For First level of testing only
Upload and run your custom Jmeter Scripts through blazemeter

Load Test Tools

Flood.io
loadfocus

Security Testing Tools

NTOSpider
Burp Proxy

Blazemeter walkthrough Example

Step 1 - Create Load Test

Step 2 - Configure URL

Step 3 - Start Run

Step 4 - Reports

You can also upload Jmeter scripts and execute it through blazemeter

Happy Learning!!!

July 06, 2014

Weekend Reading - Webinar - Performance Testing Approach for Big Data Applications.

Very good session – Webinar - Performance Testing Approach for Big Data Applications. Few interesting notes / slides from session

Rate of Data Ingestion - How fast system consumes data?
Data Processing - Speed how data is processed. Testing Data processing in isolation with data sets populated. Run specific perf tests (MR Jobs, Pig, Hive Scripts)
Data Persistence – I/O bound process. (Data Writes / Updates on DB, Garbage Collection, Monitoring Metrics)
Complete end to end time for processing (Network Connectivity, Processing, Results)

Big Data Test Challenges

Diverse Technologies
Unavailability of Test Tools for Big Data Technologies / Scenarios
Limited Monitoring / Diagnostic Solutions
Test Scripting / Environment

Perf Test Tools

Use cloud to simulate large infrastructure
Cloud orchestration scripts Puppet, Chef

Approach

Depending on usage in production identify patterns for production workload
Fault Tolerance Scenarios
Hadoop monitoring tools to check Map reduce jobs
Selecting Test Clients - Custom code
Performance / Failover tests to ensure scalability (Node failures during processing)

Test Parameters and Summary

Very Nice, Practical and useful webinar. There are a lot of posts / webinars. This one is very useful and practical.

More Reads
Evaluating SolrMeter for Performance Testing
Benchmarking with HTTPerf.js and NodeUnit

Happy Learning!!!

Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database)

October 10, 2014

HBase Overview Notes

August 16, 2014

Hbase Primer - Loading Data in HBASE Using SQLOOP / Java Code

August 09, 2014

HBase Primer - Querying for Seeks / Scans - Part I

July 17, 2014

Big data Testing Tools - Functional and Performance

July 06, 2014

Weekend Reading - Webinar - Performance Testing Approach for Big Data Applications.

About Me

What is your Expertise

Search This Blog

Git Code Repository

Translate

About Me and Disclaimer

Labels

Data Science Good Reads

Cloud, Datacentre, BigData and NOSQL Blogs

SQL Links

Archecture Blog List

Programming Problems

Startup - Reads

Perl-Python-Ruby-Linux-Oracle

Management + Leadership Blogs

Research Papers & Podcasts

My Wordpress

Interesting Reads

Useful Links - C# and .NET

Java, Selenium, QTP and Test Tools Learning

Agile Testing

Reverse Logistics Reads

Biztalk Blogs

MS BI Links

Process - Learnt it :)

Usability Guidelines - Building Better Sites

.NET Test Tools and Other Interesting Reads

Review Checklist

Blog Archive

Live Traffic

Total Pageviews

Popular Posts