"No one is harder on a talented person than the person themselves" - Linda Wilkinson

December 09, 2014

RDBMS Vs CEP

Came across this slide in presentation in link. Below slide on RDBMS vs CEP processing was very clear in terms of compared attributes, representation of facts. Many thanks to author.



Happy Reading!!!

October 20, 2014

Open Source Test Tools Vs Commercial Test Tools


I have never had the taste of working with record - playback tools. I have mostly developed custom tools / scripts for qa tasks / deployment tasks. I work in different streams Database DEV - QA - Tools Development - Performance & Big Data too. Working on different areas provides a different perspective than doing repetitive things. My perspective of QA evolves with reusable scripts for data generation / scripts that simplify / eliminate repetitive tasks during deployment / configuration / testing / validating.

I have worked with Selenium, Coded UI, Custom developed Automation test frameworks. I have observed new engineering efforts for Automation test framework in every company I worked for. Either the code base becomes too big to manage / modify or new folks hired move towards develop from scratch than maintenance efforts. It is questionable ROI calculation. When the quality of DEV is poor every small bug in QA might show up as hundreds of bugs. How much of this bugs can actually be identified by basic QA check by DEV is another point to consider measuring QA bugs.

QA efforts are often viewed as commodity efforts where focus is mainly to deliver and repetitive cycles of testing are acceptable. Instead of such a model joint DEV-QA effort would always help to identify most bugs before releasing it to QA.

Both Open Source Tools Vs Commerical tools helps address test automation challenges. One instance. Working with WINCE apps, Its very difficult to automate hardware - software integration workflows. Testcomplete tool eliminated most of this efforts with emulating the actions on Mymobiler which in turn mimics real user on WINCE installed hardware.

Effort involved to automated WINCE App deployed on a device vs using a tool like test complete to completely eliminate the pain-point in writing WIN32 calls (Send message, Send Keys) is worth evaluating before thinking of license cost.

Overall its a mix of Open Source Tools, Commercial Tools, In-House scripts, Quality practices in coding, unit testing only can ensure a Quality product. Responsibilities are not just for one function but every function need to be accountable / responsible to deliver a Quality Product.

Automation tools are primarily viewed as Record / Playback, Automation Framework Implementation. More than it they can be also leveraged for
  • Throw Away Scripts to Aid Functional Testers to eliminate repetitive tasks
  • Automation tool can be used to support system activities during functional testing - Monitoring, Screen Capture, Aid Testing by Simulating user events during tests / Support long running tests  
  • Extend it for Support / UAT Environments for for Deployment / Installation where installation involves several client / server / web components installation
  • Aid Automated Deployments / Un-Installations
Happy Learning!!!

Testrail - TCM Tool

This post in on Analysis of Testrail and migrating existing test cases using Testrail

Testrail has a great web interface to organize and create test cases. The factors that makes Testrail competitive candidate are
  • Ease of creating / managing test cases
  • Migration Support for existing test cases
  • API support for automated migration / test case creation / execution / update results
  • Test Case execution out of box reports
  • Integration with bug tracking tools
  • Hosted / In-Premise Model
  • Existing Github projects for .NET / Java / Other languages Automation / Migration Support
  • Great Tech Support
Test Case Migration Efforts

Different aspects involved in test case migration efforts for any TCM tool 
  1. Test case template - Identify required fields inbuilt, custom fields for test case template. Develop, Modify and Evaluate templates to arrive at Test Case Template
  2. Organizing Test Cases  (Test Suites) - Feature wise, Release related test cases. Analyse, Identify structure and arrive, evaluate it (Functional, Regression, Features Areas) plus release specific cases
  3. Migration Efforts - Based on template, test case structures prepare custom xml cases for all migration test cases
    1. Validate, arrive at approach to validate all migrated cases
    2. For attributes identified from Test case template what are values to be filled for existing test cases in case if they were not used
  4. Default values for unused fields (Drop down list / custom values)
  5. Automation Integration, Defect / Bug Tracking Tools Integration
  6. Automation Test cases - Identify, Automation Test cases, Templates, Details
  7. QA Reports - Analyse available reports and custom report needs in Test Rail. Email based reporting on metrics, daily test case execution etc
  8. Custom Tools - Write Test cases in Excel and upload directly from excel to test rail. This tool can be used for writing test cases, update test results directly from excel to Testrail
  9. QA Process document Develop Process document (guidelines / best practices) on adding test cases, updating functional, regression, release related test cases, using Test rail (Permissions), Test case reviews using Test rail
  10. Test Results Archival / Maintenance - Test results / test runs / Test cases archival / maintenance approach
  11. Hosting – local hosting / cloud based Pros / cons of local hosting / cloud hosting
  12. Security / Administration / Configuring users - Admin related aspects, identifying roles / permissions for users
  13. Identifying Pilot projects for Test Rail evaluation period after finalizing above areas Pilot projects for usability, tracking, upgrading before complete migration

Happy Learning!!!

October 10, 2014

HBase Overview Notes


Limitations of Hadoop 1.0
  • No Random Access --> Hadoop for more batch access (OLAP)
  • Not suitable for Real time Access
  • No Update - Access Pattern is WORM (Write Once Read Multiple Times Hadoop best suited)
Why HBase
  • Flexible Schema Design --> Add a new column when a row is added
  • Multiple versions of single cell (Data)
  • Columnar storage
  • Cache columns at client side
  • Compression of columns
Read                                                    v/s Write
For Availability (Compromise on Write) vs Consistency (Compromise on Read)

Hbase
  • NOSQL Class on Non Relational Storage Systems
  • In RDBMS it is Rowkey based allocations, HBase it is columnar storage
  • Hbase needs HDFS for replication
  • ZooKeeper - Taking all requests from client. Client will communicate from zookeeper Client -> ZooKeeper -> HMaster
  • Region Server - It Serves the region. Region Server processor runs on slaves (Data Nodes)
Happy Learning!!!

October 09, 2014

Pig Overview Notes

Pig
  • Primarily for semi structured data
  • So called 'Pig' as it processes all kinds of data
  • Pig is data flow language not a procedural language
  • Map Reduce - Java Programmers, Hive - for TSQL folks, Pig (Rapid Prototyping & increased productivity)
  • Pig is on client side, need not be on cluster
  • Execution Sequence - Query Parser -> Semantic Checking -> Logical Optimizer (Variable level) -> Logical to physical translator -> Physical to M/R translator -> MapReduce Launcher
  • Ping Concepts - Map - array, Tuple - ordered list of data ,Bag - Unordered collection of tuple
  • Pig - for client side access, Hive will work only within cluster, semi structured data
  • Hive - Best suited for SQL style analytics, structured data
  • MR - Audio Video Analytics Map Reduce Approach is the only option

Happy Learning!!!

October 08, 2014

Hive Overview Notes

  • Data Warehousing package built on top of hadoop
  • Managing and querying structured data
  • Apache Derby embedded DB used by Hive
  • metastore_db folder for persistence of data
  • Suitable for WORM - Write Once Read Many Times Access Pattern
  • Core Components are Shell, Metastore, Execution Engine, Compiler (Parse, Plan, Optimize), Driver
  • Tables can be created as Internal Tables, External Table (Pointing to external file)
  • When Internal Tables are dropped schema + data is dropped. For external referencing tables only Schema is dropped not data. Both Internal and External tables reside in HDFS
  • Data files for created tables would be available in location /user/hive/warehouse
  • Partitioning in Hive - Hash Value % Number of buckets - that particular row will go into that bucket
  • Partition table should always be an Internal Hive Table
Happy Learning!!!

October 07, 2014

Map Reduce Internals

Client Submits Job. Job Tracker does the splitting, scheduling Job

Mapper
  • Mapper runs the business logic (ex- word counting)
  • Mapper (Maps your need from the record)
  • Record reader provides input to mapper in key value format
  • Mapper Side Join (Distributed Caching)
  • Output of mapper (list of keys and values). Output of mapper function stored in Sequence file
  • Framework does splitting based on input format, Default is new line (text format)
  • Every row / Record will go through map function
  • When there is a data split (row) is split between two 64MB Blocks. That particular row would be merged for complete record and processed
  • Default block size in Hadoop 2.0 is 128MB
Reducer
  • Reducer will poll it, job tracker will inform what all nodes to poll
  • Default number of reducer is 1. This is configurable
  • Multiple Reducers - Not possible - Multiple level MR jobs possible
  • Reduce Side join (Join @ Reducer Level)
Combiner
  • Combiner - Mini Reducer, Combiner before writing to disk, finds max value from data
  • Combiner is used when map job itself can do some preprocessing to minimize reducer workload
Partitioner
  • Hash Partitioner is default partitioner
  • Mapper -> Combiner -> Partitioner -> Reducer (For multi-dimension, 2012-max sales by product, 2013, max sales by location)
Happy Learning!!!

October 06, 2014

Hadoop Ecosystem Internals

Hadoop Internals - This post is quick summary from learning session.

Data Copy Basics (Writing data to HDFS)
  • Network Proximity during Data Storage (First 2 Ips closest to client)
  • Data Storage size in 64MB Blocks
  • Data Replication Copy by default 3
  • Client gets error message when Primary Node Data Write Operation Errors
  • Blocks will be horizontally split on different machines
  • Slave uses SSH to connect to master (Communication between Nodes also SSH)
  • Client communication through RPC
  • Writing happens parallel-y, replication happens in a pipeline
Analysis / Reads (Reading Data from HDFS)
  • Client -> Master -> Nearest Ips returned for Nodes
  • Master knows performance utilization of nodes, It would allocate machine which is least used for Processing (Where data Copy exists)
Concepts
  • Namenode - Metadata
  • DataNode - Actual Data
  • chmod 755 - Owner Write permission, others read and execute
  • Rack - Physical Set of Machines
  • Node - Individual machine
  • Cluster - Set of Racks
Learning Resources
Tools
Happy Learning!!!

October 03, 2014

Hbase Primer Part III


This post is on Read / Write Operations Overview on Hbase. Steps were clear from DB Paper (Exploring NOSQL, Hadoop and Hbase by Ricardo Pettine and Karim Wadie). I'm unable to locate the link to download the paper.

I'm reposting few steps from the paper which lists down steps on Read / Write Operations on Hbase. ZooKeeper is used to perform coordination in Storm, Hbase

Data Path
Table - Hbase Table
  Region - Regions for the Table
Store - Store per column family for each region 
MemStore - Memstore for each Store
Store File - Stores File for each Store
Block - Block within Store File

Write Path
  • Client Request sent to Zoo Keeper 
  • Zoo Keeper find meta data and returns it to client
  • Client Scans region server for new key storage where data need to be stored
  • Client sends request to Region Server
  • Region Server processes the request, Write operation follows WAL (Write Ahead Logging), Same concept is available in other database too
  • Memstore in this case when it is full, Data is pushed into disk

Read Path
  • Client Issues GetCommand
  • Zookeeper identifies Meta data and returns to client
  • Client Scans Region Server to locate data
  • Both memstore and store files are scanned

Happy Learning!!!

September 28, 2014

Pycon 2014


Every time when I attend a conference I plan to post my notes on same day. Delaying to post is inversely proportional to probability of posting it. After missing few conferences, Today this post is on my learning's from python conference 2014. There is a lot of motivation / inspiration to deliver / learn after every conference.

Interesting Quotes 

"Functionality is an asset. Code is a liability" by @sanand0
"Premature Optimization is baby evil" by @sanand0
"If it's not tested, it doesn't work. If it's tested, it may work" - @voidspace
'Libraries are good, your brain is better' - @sanand0. 

Short Notes from Sessions

Panel Discussion on Python Frameworks - Django, Flask , Web.Py 
  • Discussion was pretty interesting. For beginner level (Web.Py followed by Flask)
  • Django Elephant in the room scored over the rest based on usage, features, documentation
Interesting Talks and Notes are shared in Auth Evolution & Spark Overview Posts.

Happy Learning!!!

Auth Evolution

This session Auth as a service  by Kiran provided good overview of evolution of authentication the past decade

Complete text of presentation is available in link. The text is pretty exhaustive. I am only writing key points for my reference
  • HTTP basic Auth
  • Cookies
  • Cryptography Signed Tokens
  • HTTPS
  • Database backed sessions 
HTTP basic Auth - Username and password sent in the HTTP Request. To logout you need to send a wrong password, This gets preserved and server rejects the request after that

Cookies - Regular HTML form with username and Password encoded and put in HTTP cookie. This is sent in every request

Cryptographically signed tokens - random key + user name. Now cookie will be checked against the key to verify its the same user. Plus SSL on top it made sure most of issue are fixed

Database backed sessions - This is very nice one. These days I get notifications in Quora / google. You have these many open sessions / previously logged locations. This is all through database backed sessions. This seems to address all issues that came up as limitations of previous approaches.

Good Refresher!!!

Happy Learning!!!

Spark Overview


I remember Spark keyword appeared during Big Data Architecture discussions in my Team, I never looked more into Spark. Session by Jyotiska NK on Python + Spark: Lightning Fast Cluster Computing was a useful starter about Spark. (slides of talk)

Spark 
  • In memory cluster computing framework for large scale data processing
  • Developed using scala with Java + Python APIs
  • This is not meant to replace hadoop. It can sit on top of Hadoop
  • References on Spark Summit for Slides / Videos to learn from past events - link 
Python Offerings
  • PySpark, Data pipeline using spark
  • Spark for real time / batch processing
Spark Vs Map Reduce Differences
This section was session highlighter. They way how data is handled between Map Reduce Execution and Spark Approach is Key.

Map Reduce Approach - Load Data from Disk into RAM, Mapper, Shuffler, Reducer are the different approaches. Processing is distributed. Fault Tolerance is achieved by replicating data 

Spark - Load data in RAM, Keep it until you are done, Data is cached in RAM from disk for iterative processing. If data is too large, rest is spilled into disk. Interactive processing of datasets without having to load data in memory. RDD (Resilent distributed datasets)

RDD - Read Only collection of objects across machines. On losing information this can still be recomputed. 

RDD Operations
  • Transformations - Map, Filter, Sort, flatmap
  • Action - Reduce, Count, Collect, Save to local data in disk. Action usually involves disk operations

More Reads
Testing Spark Best Practices
Gatling - Open Source Perf Test Framework
Spark Paper

Happy Learning!!!

August 16, 2014

Hbase Primer - Loading Data in HBASE Using SQLOOP / Java Code

This post is on examples using SQOOP / custom java code to import data into HBASE, HIVE, HDFS from MSSQL DB

Tried the steps on Cloudera VM - cloudera-quickstart-vm-4.4.0-1-vmware

From Linux terminal > hbase shell
create 'Employee_Details', {NAME => 'Details'}

Example #1 (Import to Hbase from MSSQL DB Table)
sqoop import --connect  "jdbc:sqlserver://11.11.11.11;database=TestDB" --username "sa" --password "sa" --driver "com.microsoft.sqlserver.jdbc.SQLServerDriver" -m 1 --hbase-table "Employee_Details"  --column-family "Details" --table "dbo.employee"  --split-by "id"

Example #2 (List Tables in MSSQL DB Table)
sqoop list-tables --connect  "jdbc:sqlserver://11.11.11.11;database=TestDB" --username "sa" --password "sa" --driver "com.microsoft.sqlserver.jdbc.SQLServerDriver"

Example #3 (Import to HDFS from MSSQL DB Table)
sqoop import --connect "jdbc:sqlserver://11.11.11.11;database=TestDB" --username "sa" --password "sa" --driver "com.microsoft.sqlserver.jdbc.SQLServerDriver" --table "dbo.employee"  --split-by "id"

hadoop fs -ls (List files in hadoop file system)

Example #4 (Import into Hive Table from MSSQL DB Table)
sqoop import --hive-import --create-hive-table --hive-table Employee_Hive --connect  "jdbc:sqlserver://11.11.11.11;database=TestDB" --username "sa" --password "sa" --driver "com.microsoft.sqlserver.jdbc.SQLServerDriver" -m 1 --table "dbo.employee" 

Example #5 - Custom Java Code
  • Add all required Jars need to be added to compile the project. This was one of challenges to get this code working 
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.util.Bytes;
import java.io.IOException;

public class HBaseTest
{
   public static void main(String[] args) throws IOException
   {
          HBaseTest HT = new HBaseTest();
          HT.DropTable();
          HT.CreateTable();
          HT.UpdateRecords();
   }  

   public void DropTable()
   {
          try
          {
                Configuration config = HBaseConfiguration.create();
                config.set("hbase.zookeeper.quorum", "localhost");
                config.set("hbase.zookeeper.property.clientPort", "2181");
                HBaseAdmin admin = new HBaseAdmin(config);
                admin.disableTable("Employee_Details");
                admin.deleteTable("Employee_Details");
          }
          catch(Exception Ex)
          {
             
          }
   } 

   public void CreateTable()
   {
          try
          {
                Configuration config = HBaseConfiguration.create();
                config.set("hbase.zookeeper.quorum", "localhost");
                config.set("hbase.zookeeper.property.clientPort", "2181");
                HTableDescriptor ht = new HTableDescriptor("Employee_Details");
                ht.addFamily( new HColumnDescriptor("Id"));
                ht.addFamily( new HColumnDescriptor("Details"));
                HBaseAdmin hba = new HBaseAdmin(config);
                hba.createTable( ht );
          }
          catch(Exception Ex)
          {
                
          }
   }

   public void UpdateRecords()
   {
          try
          {
          Configuration config = HBaseConfiguration.create();
                config.set("hbase.zookeeper.quorum", "localhost");
                config.set("hbase.zookeeper.property.clientPort", "2181");
                HTable table = new HTable(config, "Employee_Details");
             Put put = new Put(Bytes.toBytes("row1"));
             put.add(Bytes.toBytes("Details"),Bytes.toBytes("Name"),Bytes.toBytes("Raka"));
             put.add(Bytes.toBytes("Details"),Bytes.toBytes("Location"),Bytes.toBytes("Chennai"));
             put.add(Bytes.toBytes("Id"),Bytes.toBytes("Eid"),Bytes.toBytes("Chennai"));
             table.put(put);
             table.close();
          }
          catch(Exception Ex)
          {
                
          }
       }
}

This post was useful to try it out SQOOP Examples

Happy Learning!!!

August 09, 2014

HBase Primer - Querying for Seeks / Scans - Part I

This post is HBase Primer on Seeks / Scans. I recommend spending time on HBase Schema design to have a basic understanding of HBASE table structure. The motivation slide for this post is from Cloudera Session slides post.




I wanted to try out the suggested pattern of tables. This can be compared to TSQL Equivalent of DDL, DML Scripts. Querying, Select with filters is the key learning from this exercise. Key Learning's are select with filter examples. There are no aggregate functions available in Hbase. Hive and Phoenix which sits on top of Hbase Serves for this purpose of aggregations.

HBASE (One Liner)- Low Latency, Consistent, best suited for random read/write big data access

Few bookmarks provide great short intro - link1

Hbase Queries (Tried it on cloudera-quickstart-vm-4.4.0-1-vmware)
hbase shell

Disable and Drop Table
Disable 'Employee'
Drop 'Employee'

Create Table
Details is a column family. Inside this column family there are members Name, Location, DOB and Salary

create 'Employee', {NAME => 'Details'}

Insert Records
put 'Employee', 'row1', 'Details:Name', 'Raj'
put 'Employee', 'row1', 'Details:Location', 'Chennai'
put 'Employee', 'row1', 'Details:DOB', '01011990'
put 'Employee', 'row1', 'Details:Salary', '1990'

put 'Employee', 'row2', 'Details:Name', 'Raja'
put 'Employee', 'row2', 'Details:Location', 'Delhi'
put 'Employee', 'row2', 'Details:DOB', '01011991'
put 'Employee', 'row2', 'Details:Salary', '5000'

put 'Employee', 'row3', 'Details:Name', 'Kumar'
put 'Employee', 'row3', 'Details:Location', 'Mumbai'
put 'Employee', 'row3', 'Details:DOB', '01011992'
put 'Employee', 'row3', 'Details:Salary', '50000'

Select based on Column Qualifiers
scan 'Employee', {COLUMNS => ['Details:Name', 'Details:Location']}
scan 'Employee', {COLUMNS => ['Details:Location']}

Single Filter - Search by Location Column Filter
scan 'Employee' ,{ FILTER => "SingleColumnValueFilter('Details','Location',=, 'binary:Chennai')" }

ValueFilter - Search by Location Value
scan 'Employee' , { COLUMNS => 'Details:Location', FILTER => "ValueFilter (=, 'binary:Chennai')"}

Multiple Filter - Search by Name and Location
scan 'Employee' ,{ FILTER => "SingleColumnValueFilter('Details','Location',=, 'binary:Chennai') AND SingleColumnValueFilter('Details','Name',=, 'binary:Raj')"}

Timestamp and Filter - Search with Timestamp and Filter
scan 'Employee' ,{TIMERANGE=>[1407324224184,1407324224391], COLUMNS => 'Details:Location', FILTER => "ValueFilter (=, 'binary:Chennai')"}

Timestamp and Filter - Search with multiple Filter on same column (contains)
scan 'Employee' ,{ FILTER => "SingleColumnValueFilter('Details','Location',=, 'binary:Chennai') OR SingleColumnValueFilter('Details','Location',=, 'binary:Delhi')"}

Filter using regExMatch
scan 'Employee' ,{ FILTER => "SingleColumnValueFilter('Details','Location',=, 'regexstring:Che*',true,true)" }

Search using Prefix
scan 'Employee' ,{ FILTER => "SingleColumnValueFilter('Details','Location',=, 'binary:Chennai') AND PrefixFilter ('row1')"}

Return only one record
scan 'Employee', {COLUMNS => ['Details:Location'], LIMIT => 1, STARTROW => 'Chennai'}

Return two records
scan 'Employee', {COLUMNS => ['Details:Location'], LIMIT => 2, STARTROW => 'Chennai'}

Return records between range of rowkeys
scan 'Employee', {COLUMNS => ['Details:Location'], STARTROW => 'row1',STOPROW=>'row2'}

Get Specific Row with Filter
get 'Employee', 'row1', {FILTER => "ValueFilter (=, 'binary:Chennai')"}

Count records
count 'Employee'

Timestamp Range - Return within the timestamp range
scan 'Employee' ,{TIMERANGE=>[1407324224391,1407324234707]}

Hbase Table 3 Column Family
create 'Employee1', {NAME => 'Name'}, {NAME => 'Location'}, {NAME => 'DOB'}

Query with column family
scan 'Employee1', {COLUMNS => ['Name']}

Delete a Column Family
alter ‘Employee', ‘delete’ => ‘Location'

July 21, 2014

Machine Learning Notes


This post is on my learning's from Machine Learning Session conducted by my colleague Gopi. It was really a good introduction and a lot of motivation towards learning the topic.

Concepts Discussed
  • Homogeneity - Is my data homogeneous
  • Pick the odd one out (Anomaly detection)
  • Entropy Computation
Wide variety of examples to find odd sets, variations. Example from below set identify the anomaly one
1,1,1,2
1,2,2,1
1,2,1,1
1,0,1,2

The last row involving zero is a odd one. Identifying them using entropy computation was very useful

Entropy Formula



Formula detailed notes from link

For row (1,1,1,2)
 = -[((3/4)*log2(3/4)) + ((1/4)*log2(1/4))]
 = -[-0.311 -0.5]
 = .811
 For row (1,2,2,1)
 = -[((2/4)*log2(2/4)) + ((2/4)*log2(2/4))]
 = -[-.5-.5]
 = 1
 For row (1,2,1,1)
 = -[((3/4)*log2(3/4)) + ((1/4)*log2(1/4))]
 = -[-0.311 -0.5]
 = .811 
  
 For row (1,0,1,2)
 = -[((2/4)*log2(2/4)) + ((1/4)*log2(1/4)) + ((1/4)*log2(1/4))]
 = -[-0.5 -0.311 -0.311]
 = 1.12

By excluding the row with higher values we will have homogeneous data set, The one last row with high entropy is the anomaly  
If Data set is homogeneous after removing a particular record set then that particular record set is the anomaly one

More Concepts Introduced
  • Conditional Probability
  • ID3 Algorithm
  • Measure Entropy
  • Decision Tree
  • Random Forest
  • Bagging Technique
Happy Learning!!! 

July 17, 2014

Big data Testing Tools - Functional and Performance

This post is based on my learning notes on functional test tools for Big Data ecosystem. Earlier posts, we have read the basics of big data ecosystem components. Sharing the first version of Test Tools Analysis for Functional Testing

Product / Area
Testing Tools
Test Approach
Programming Language
Reference
HiveMQ
MQTT Testing Utility, Tsung


Storm
storm test

Clojure
Hive, Pig
Beetest, Pigmix, Apache DataFu
Query HIVE (Similar to TSQL)

Map Reduce Jobs
MRUnit, MRBench


Analytics

Lift charts, Target shuffling, Bootstrap sampling to test the consistency of the model

HBASE
Junit, Mockito, Apache MRUnit



Jmeter Plugins for Hadoop, HBASE, Cassandra



More Tools
Performance Testing Tools Analysis

Area
Tool
Comments
HBASE
Inbuilt tool
usage –
$ bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1
$ bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation --rows=10240 randomWrite 1
More Reads - link

Automation scripts for comparing different HBase BlockCache implementations - link

Hbase write throughput link

PerformanceEvaluation (Inbuilt HBASE tool) - (Validate Read / Writes / Scans performance in the environment etc..)
YCSB
Performance testing HBase using YCSB , link1, link2

All Big Data Areas (HBASE, Hadoop, MapReduce
Sandstorm commercial cloud / in premise tool for Big Data QA

Kafka




Spark



Cloud Testing Tools

Test Environment Setup using Cloud Infrastructure
  • Load generation in cloud for on premises application
  • Load generator on premises, application on cloud
  • Both load generator and application both on cloud
Amazon Cloud Pieces
  • EC2 - Elastic Compute Cloud -> CPUs
  • EBS - Elastic Block Storage -> Database
  • S3 - Simple Storage Services -> Storage
  • Ec2 Dream Tool for connecting to multiple cloud providers - link
Blazemeter
  • Distributed geographical performance test tool
  • For First level of testing only
  • Upload and run your custom Jmeter Scripts through blazemeter
Load Test Tools
  • Flood.io
  • loadfocus
Security Testing Tools
  • NTOSpider
  • Burp Proxy
Blazemeter walkthrough Example

Step 1 - Create Load Test


Step 2 - Configure URL


Step 3 - Start Run



Step 4 - Reports



You can also upload Jmeter scripts and execute it through blazemeter

Happy Learning!!!

July 06, 2014

Weekend Reading - Webinar - Performance Testing Approach for Big Data Applications.

Very good session – Webinar - Performance Testing Approach for Big Data Applications. Few interesting notes / slides from session


  • Rate of Data Ingestion - How fast system consumes data?
  • Data Processing - Speed how data is processed. Testing Data processing in isolation with data sets populated. Run specific perf tests (MR Jobs, Pig, Hive Scripts)
  • Data Persistence – I/O bound process. (Data Writes / Updates on DB, Garbage Collection, Monitoring Metrics)
  • Complete end to end time for processing (Network Connectivity, Processing, Results)

Big Data Test Challenges
  • Diverse Technologies
  • Unavailability of Test Tools for Big Data Technologies / Scenarios
  • Limited Monitoring / Diagnostic Solutions
  • Test Scripting / Environment
Perf Test Tools
  • Use cloud to simulate large infrastructure
  • Cloud orchestration scripts Puppet, Chef

Approach
  • Depending on usage in production identify patterns for production workload
  • Fault Tolerance Scenarios
  • Hadoop monitoring tools to check Map reduce jobs
  • Selecting Test Clients - Custom code
  • Performance / Failover tests to ensure scalability (Node failures during processing)
Test Parameters and Summary



Very Nice, Practical and useful webinar. There are a lot of posts / webinars. This one is very useful and practical.

More Reads
Evaluating SolrMeter for Performance Testing
Benchmarking with HTTPerf.js and NodeUnit

Happy Learning!!!