"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;
Showing posts with label Big Data Testing. Show all posts
Showing posts with label Big Data Testing. Show all posts

October 10, 2014

HBase Overview Notes

Limitations of Hadoop 1.0
  • No Random Access --> Hadoop for more batch access (OLAP)
  • Not suitable for Real-time Access
  • No Update - Access Pattern is WORM (Write Once Read Multiple Times Hadoop best suited)
Why HBase
  • Flexible Schema Design --> Add a new column when a row is added
  • Multiple versions of a single cell (Data)
  • Columnar storage
  • Cache columns at client side
  • Compression of columns
Read  v/s Write

  • For Availability (Compromise on Write) vs Consistency (Compromise on Read)
Hbase
  • NoSQL Class on Non-Relational Storage Systems
  • In RDBMS it is Rowkey based allocations, HBase it is columnar storage
  • Hbase needs HDFS for replication
  • ZooKeeper - Taking all requests from client. Client will communicate from zookeeper Client -> ZooKeeper -> HMaster
  • Region Server - It Serves the region. Region Server processor runs on slaves (Data Nodes)
Happy Learning!!!

August 16, 2014

Hbase Primer - Loading Data in HBASE Using SQLOOP / Java Code

This post is on examples using SQOOP / custom java code to import data into HBASE, HIVE, HDFS from MSSQL DB

Tried the steps on Cloudera VM - cloudera-quickstart-vm-4.4.0-1-vmware

From Linux terminal > hbase shell
create 'Employee_Details', {NAME => 'Details'}

Example #1 (Import to Hbase from MSSQL DB Table)
sqoop import --connect  "jdbc:sqlserver://11.11.11.11;database=TestDB" --username "sa" --password "sa" --driver "com.microsoft.sqlserver.jdbc.SQLServerDriver" -m 1 --hbase-table "Employee_Details"  --column-family "Details" --table "dbo.employee"  --split-by "id"

Example #2 (List Tables in MSSQL DB Table)
sqoop list-tables --connect  "jdbc:sqlserver://11.11.11.11;database=TestDB" --username "sa" --password "sa" --driver "com.microsoft.sqlserver.jdbc.SQLServerDriver"

Example #3 (Import to HDFS from MSSQL DB Table)
sqoop import --connect "jdbc:sqlserver://11.11.11.11;database=TestDB" --username "sa" --password "sa" --driver "com.microsoft.sqlserver.jdbc.SQLServerDriver" --table "dbo.employee"  --split-by "id"

hadoop fs -ls (List files in hadoop file system)

Example #4 (Import into Hive Table from MSSQL DB Table)
sqoop import --hive-import --create-hive-table --hive-table Employee_Hive --connect  "jdbc:sqlserver://11.11.11.11;database=TestDB" --username "sa" --password "sa" --driver "com.microsoft.sqlserver.jdbc.SQLServerDriver" -m 1 --table "dbo.employee" 

Example #5 - Custom Java Code
  • Add all required Jars need to be added to compile the project. This was one of challenges to get this code working 
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.util.Bytes;
import java.io.IOException;

public class HBaseTest
{
   public static void main(String[] args) throws IOException
   {
          HBaseTest HT = new HBaseTest();
          HT.DropTable();
          HT.CreateTable();
          HT.UpdateRecords();
   }  

   public void DropTable()
   {
          try
          {
                Configuration config = HBaseConfiguration.create();
                config.set("hbase.zookeeper.quorum", "localhost");
                config.set("hbase.zookeeper.property.clientPort", "2181");
                HBaseAdmin admin = new HBaseAdmin(config);
                admin.disableTable("Employee_Details");
                admin.deleteTable("Employee_Details");
          }
          catch(Exception Ex)
          {
             
          }
   } 

   public void CreateTable()
   {
          try
          {
                Configuration config = HBaseConfiguration.create();
                config.set("hbase.zookeeper.quorum", "localhost");
                config.set("hbase.zookeeper.property.clientPort", "2181");
                HTableDescriptor ht = new HTableDescriptor("Employee_Details");
                ht.addFamily( new HColumnDescriptor("Id"));
                ht.addFamily( new HColumnDescriptor("Details"));
                HBaseAdmin hba = new HBaseAdmin(config);
                hba.createTable( ht );
          }
          catch(Exception Ex)
          {
                
          }
   }

   public void UpdateRecords()
   {
          try
          {
          Configuration config = HBaseConfiguration.create();
                config.set("hbase.zookeeper.quorum", "localhost");
                config.set("hbase.zookeeper.property.clientPort", "2181");
                HTable table = new HTable(config, "Employee_Details");
             Put put = new Put(Bytes.toBytes("row1"));
             put.add(Bytes.toBytes("Details"),Bytes.toBytes("Name"),Bytes.toBytes("Raka"));
             put.add(Bytes.toBytes("Details"),Bytes.toBytes("Location"),Bytes.toBytes("Chennai"));
             put.add(Bytes.toBytes("Id"),Bytes.toBytes("Eid"),Bytes.toBytes("Chennai"));
             table.put(put);
             table.close();
          }
          catch(Exception Ex)
          {
                
          }
       }
}

This post was useful to try it out SQOOP Examples

Happy Learning!!!

August 09, 2014

HBase Primer - Querying for Seeks / Scans - Part I

This post is HBase Primer on Seeks / Scans. I recommend spending time on HBase Schema design to have a basic understanding of HBASE table structure. The motivation slide for this post is from Cloudera Session slides post.




I wanted to try out the suggested pattern of tables. This can be compared to TSQL Equivalent of DDL, DML Scripts. Querying, Select with filters is the key learning from this exercise. Key Learning's are select with filter examples. There are no aggregate functions available in Hbase. Hive and Phoenix which sits on top of Hbase Serves for this purpose of aggregations.

HBASE (One Liner)- Low Latency, Consistent, best suited for random read/write big data access

Few bookmarks provide great short intro - link1

Hbase Queries (Tried it on cloudera-quickstart-vm-4.4.0-1-vmware)
hbase shell

Disable and Drop Table
Disable 'Employee'
Drop 'Employee'

Create Table
Details is a column family. Inside this column family there are members Name, Location, DOB and Salary

create 'Employee', {NAME => 'Details'}

Insert Records
put 'Employee', 'row1', 'Details:Name', 'Raj'
put 'Employee', 'row1', 'Details:Location', 'Chennai'
put 'Employee', 'row1', 'Details:DOB', '01011990'
put 'Employee', 'row1', 'Details:Salary', '1990'

put 'Employee', 'row2', 'Details:Name', 'Raja'
put 'Employee', 'row2', 'Details:Location', 'Delhi'
put 'Employee', 'row2', 'Details:DOB', '01011991'
put 'Employee', 'row2', 'Details:Salary', '5000'

put 'Employee', 'row3', 'Details:Name', 'Kumar'
put 'Employee', 'row3', 'Details:Location', 'Mumbai'
put 'Employee', 'row3', 'Details:DOB', '01011992'
put 'Employee', 'row3', 'Details:Salary', '50000'

Select based on Column Qualifiers
scan 'Employee', {COLUMNS => ['Details:Name', 'Details:Location']}
scan 'Employee', {COLUMNS => ['Details:Location']}

Single Filter - Search by Location Column Filter
scan 'Employee' ,{ FILTER => "SingleColumnValueFilter('Details','Location',=, 'binary:Chennai')" }

ValueFilter - Search by Location Value
scan 'Employee' , { COLUMNS => 'Details:Location', FILTER => "ValueFilter (=, 'binary:Chennai')"}

Multiple Filter - Search by Name and Location
scan 'Employee' ,{ FILTER => "SingleColumnValueFilter('Details','Location',=, 'binary:Chennai') AND SingleColumnValueFilter('Details','Name',=, 'binary:Raj')"}

Timestamp and Filter - Search with Timestamp and Filter
scan 'Employee' ,{TIMERANGE=>[1407324224184,1407324224391], COLUMNS => 'Details:Location', FILTER => "ValueFilter (=, 'binary:Chennai')"}

Timestamp and Filter - Search with multiple Filter on same column (contains)
scan 'Employee' ,{ FILTER => "SingleColumnValueFilter('Details','Location',=, 'binary:Chennai') OR SingleColumnValueFilter('Details','Location',=, 'binary:Delhi')"}

Filter using regExMatch
scan 'Employee' ,{ FILTER => "SingleColumnValueFilter('Details','Location',=, 'regexstring:Che*',true,true)" }

Search using Prefix
scan 'Employee' ,{ FILTER => "SingleColumnValueFilter('Details','Location',=, 'binary:Chennai') AND PrefixFilter ('row1')"}

Return only one record
scan 'Employee', {COLUMNS => ['Details:Location'], LIMIT => 1, STARTROW => 'Chennai'}

Return two records
scan 'Employee', {COLUMNS => ['Details:Location'], LIMIT => 2, STARTROW => 'Chennai'}

Return records between range of rowkeys
scan 'Employee', {COLUMNS => ['Details:Location'], STARTROW => 'row1',STOPROW=>'row2'}

Get Specific Row with Filter
get 'Employee', 'row1', {FILTER => "ValueFilter (=, 'binary:Chennai')"}

Count records
count 'Employee'

Timestamp Range - Return within the timestamp range
scan 'Employee' ,{TIMERANGE=>[1407324224391,1407324234707]}

Hbase Table 3 Column Family
create 'Employee1', {NAME => 'Name'}, {NAME => 'Location'}, {NAME => 'DOB'}

Query with column family
scan 'Employee1', {COLUMNS => ['Name']}

Delete a Column Family
alter ‘Employee', ‘delete’ => ‘Location'

July 17, 2014

Big data Testing Tools - Functional and Performance

This post is based on my learning notes on functional test tools for Big Data ecosystem. Earlier posts, we have read the basics of big data ecosystem components. Sharing the first version of Test Tools Analysis for Functional Testing

Product / Area
Testing Tools
Test Approach
Programming Language
Reference
HiveMQ
MQTT Testing Utility, Tsung


Storm
storm test

Clojure
Hive, Pig
Beetest, Pigmix, Apache DataFu
Query HIVE (Similar to TSQL)

Map Reduce Jobs
MRUnit, MRBench


Analytics

Lift charts, Target shuffling, Bootstrap sampling to test the consistency of the model

HBASE
Junit, Mockito, Apache MRUnit



Jmeter Plugins for Hadoop, HBASE, Cassandra



More Tools
Performance Testing Tools Analysis

Area
Tool
Comments
HBASE
Inbuilt tool
usage –
$ bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1
$ bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation --rows=10240 randomWrite 1
More Reads - link

Automation scripts for comparing different HBase BlockCache implementations - link

Hbase write throughput link

PerformanceEvaluation (Inbuilt HBASE tool) - (Validate Read / Writes / Scans performance in the environment etc..)
YCSB
Performance testing HBase using YCSB , link1, link2

All Big Data Areas (HBASE, Hadoop, MapReduce
Sandstorm commercial cloud / in premise tool for Big Data QA

Kafka




Spark



More ReadsImpetus Perf Engineering Blog
Rest-assured - Java DSL for easy testing of REST services
Retrofit - A type-safe REST client for Android and Java
G7 - Tools for Big Data

Cloud Testing Tools

Test Environment Setup using Cloud Infrastructure
  • Load generation in cloud for on premises application
  • Load generator on premises, application on cloud
  • Both load generator and application both on cloud
Amazon Cloud Pieces
  • EC2 - Elastic Compute Cloud -> CPUs
  • EBS - Elastic Block Storage -> Database
  • S3 - Simple Storage Services -> Storage
  • Ec2 Dream Tool for connecting to multiple cloud providers - link
Blazemeter
  • Distributed geographical performance test tool
  • For First level of testing only
  • Upload and run your custom Jmeter Scripts through blazemeter
Load Test Tools
  • Flood.io
  • loadfocus
Security Testing Tools
  • NTOSpider
  • Burp Proxy
Blazemeter walkthrough Example

Step 1 - Create Load Test


Step 2 - Configure URL


Step 3 - Start Run



Step 4 - Reports



You can also upload Jmeter scripts and execute it through blazemeter

Happy Learning!!!

July 06, 2014

Weekend Reading - Webinar - Performance Testing Approach for Big Data Applications.

Very good session – Webinar - Performance Testing Approach for Big Data Applications. Few interesting notes / slides from session


  • Rate of Data Ingestion - How fast system consumes data?
  • Data Processing - Speed how data is processed. Testing Data processing in isolation with data sets populated. Run specific perf tests (MR Jobs, Pig, Hive Scripts)
  • Data Persistence – I/O bound process. (Data Writes / Updates on DB, Garbage Collection, Monitoring Metrics)
  • Complete end to end time for processing (Network Connectivity, Processing, Results)

Big Data Test Challenges
  • Diverse Technologies
  • Unavailability of Test Tools for Big Data Technologies / Scenarios
  • Limited Monitoring / Diagnostic Solutions
  • Test Scripting / Environment
Perf Test Tools
  • Use cloud to simulate large infrastructure
  • Cloud orchestration scripts Puppet, Chef

Approach
  • Depending on usage in production identify patterns for production workload
  • Fault Tolerance Scenarios
  • Hadoop monitoring tools to check Map reduce jobs
  • Selecting Test Clients - Custom code
  • Performance / Failover tests to ensure scalability (Node failures during processing)
Test Parameters and Summary



Very Nice, Practical and useful webinar. There are a lot of posts / webinars. This one is very useful and practical.

More Reads
Evaluating SolrMeter for Performance Testing
Benchmarking with HTTPerf.js and NodeUnit

Happy Learning!!!