Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): 2014

December 21, 2014

Productivity Tools list for Developer / DevOps / QA

Compiled list of Productivity Tools for Developer / DevOps / QA

MTPutty
Rapid Environment Editor
Agent Ransack
Heidisql
SSMS Tools Pack
Programmers Notepad
TextPad
Notepad++
SOAP UI
SSMS Tools Pack
HeidiSQL
Atlantis SQL Server
Snip-it Utility
Mtail

Happy Learning!!!

December 09, 2014

RDBMS Vs CEP

Came across this slide in presentation in link. Below slide on RDBMS vs CEP processing was very clear in terms of compared attributes, representation of facts. Many thanks to author.

Happy Reading!!!

October 20, 2014

Open Source Test Tools Vs Commercial Test Tools

I have never had the taste of working with record - playback tools. I have mostly developed custom tools / scripts for qa tasks / deployment tasks. I work in different streams Database DEV - QA - Tools Development - Performance & Big Data too. Working on different areas provides a different perspective than doing repetitive things. My perspective of QA evolves with reusable scripts for data generation / scripts that simplify / eliminate repetitive tasks during deployment / configuration / testing / validating.

I have worked with Selenium, Coded UI, Custom developed Automation test frameworks. I have observed new engineering efforts for Automation test framework in every company I worked for. Either the code base becomes too big to manage / modify or new folks hired move towards develop from scratch than maintenance efforts. It is questionable ROI calculation. When the quality of DEV is poor every small bug in QA might show up as hundreds of bugs. How much of this bugs can actually be identified by basic QA check by DEV is another point to consider measuring QA bugs.

QA efforts are often viewed as commodity efforts where focus is mainly to deliver and repetitive cycles of testing are acceptable. Instead of such a model joint DEV-QA effort would always help to identify most bugs before releasing it to QA.

Both Open Source Tools Vs Commerical tools helps address test automation challenges. One instance. Working with WINCE apps, Its very difficult to automate hardware - software integration workflows. Testcomplete tool eliminated most of this efforts with emulating the actions on Mymobiler which in turn mimics real user on WINCE installed hardware.

Effort involved to automated WINCE App deployed on a device vs using a tool like test complete to completely eliminate the pain-point in writing WIN32 calls (Send message, Send Keys) is worth evaluating before thinking of license cost.

Overall its a mix of Open Source Tools, Commercial Tools, In-House scripts, Quality practices in coding, unit testing only can ensure a Quality product. Responsibilities are not just for one function but every function need to be accountable / responsible to deliver a Quality Product.

Automation tools are primarily viewed as Record / Playback, Automation Framework Implementation. More than it they can be also leveraged for

Throw Away Scripts to Aid Functional Testers to eliminate repetitive tasks
Automation tool can be used to support system activities during functional testing - Monitoring, Screen Capture, Aid Testing by Simulating user events during tests / Support long running tests
Extend it for Support / UAT Environments for for Deployment / Installation where installation involves several client / server / web components installation
Aid Automated Deployments / Un-Installations

Happy Learning!!!

Testrail - TCM Tool

This post in on Analysis of Testrail and migrating existing test cases using Testrail

Testrail has a great web interface to organize and create test cases. The factors that makes Testrail competitive candidate are

Ease of creating / managing test cases
Migration Support for existing test cases
API support for automated migration / test case creation / execution / update results
Test Case execution out of box reports
Integration with bug tracking tools
Hosted / In-Premise Model
Existing Github projects for .NET / Java / Other languages Automation / Migration Support
Great Tech Support

Test Case Migration Efforts

Different aspects involved in test case migration efforts for any TCM tool

Test case template - Identify required fields inbuilt, custom fields for test case template. Develop, Modify and Evaluate templates to arrive at Test Case Template
Organizing Test Cases (Test Suites) - Feature wise, Release related test cases. Analyse, Identify structure and arrive, evaluate it (Functional, Regression, Features Areas) plus release specific cases
Migration Efforts - Based on template, test case structures prepare custom xml cases for all migration test cases

Validate, arrive at approach to validate all migrated cases
For attributes identified from Test case template what are values to be filled for existing test cases in case if they were not used

Default values for unused fields (Drop down list / custom values)
Automation Integration, Defect / Bug Tracking Tools Integration
Automation Test cases - Identify, Automation Test cases, Templates, Details
QA Reports - Analyse available reports and custom report needs in Test Rail. Email based reporting on metrics, daily test case execution etc
Custom Tools - Write Test cases in Excel and upload directly from excel to test rail. This tool can be used for writing test cases, update test results directly from excel to Testrail
QA Process document Develop Process document (guidelines / best practices) on adding test cases, updating functional, regression, release related test cases, using Test rail (Permissions), Test case reviews using Test rail
Test Results Archival / Maintenance - Test results / test runs / Test cases archival / maintenance approach
Hosting – local hosting / cloud based Pros / cons of local hosting / cloud hosting
Security / Administration / Configuring users - Admin related aspects, identifying roles / permissions for users
Identifying Pilot projects for Test Rail evaluation period after finalizing above areas Pilot projects for usability, tracking, upgrading before complete migration

Happy Learning!!!

October 10, 2014

HBase Overview Notes

Limitations of Hadoop 1.0

No Random Access --> Hadoop for more batch access (OLAP)
Not suitable for Real-time Access
No Update - Access Pattern is WORM (Write Once Read Multiple Times Hadoop best suited)

Why HBase

Flexible Schema Design --> Add a new column when a row is added
Multiple versions of a single cell (Data)
Columnar storage
Cache columns at client side
Compression of columns

Read v/s Write

For Availability (Compromise on Write) vs Consistency (Compromise on Read)

Hbase

NoSQL Class on Non-Relational Storage Systems
In RDBMS it is Rowkey based allocations, HBase it is columnar storage
Hbase needs HDFS for replication
ZooKeeper - Taking all requests from client. Client will communicate from zookeeper Client -> ZooKeeper -> HMaster
Region Server - It Serves the region. Region Server processor runs on slaves (Data Nodes)

Happy Learning!!!

October 09, 2014

Pig Overview Notes

Pig

Primarily for semi structured data
So called 'Pig' as it processes all kinds of data
Pig is data flow language not a procedural language
Map Reduce - Java Programmers, Hive - for TSQL folks, Pig (Rapid Prototyping & increased productivity)
Pig is on client side, need not be on cluster
Execution Sequence - Query Parser -> Semantic Checking -> Logical Optimizer (Variable level) -> Logical to physical translator -> Physical to M/R translator -> MapReduce Launcher
Ping Concepts - Map - array, Tuple - ordered list of data ,Bag - Unordered collection of tuple
Pig - for client side access, Hive will work only within cluster, semi structured data
Hive - Best suited for SQL style analytics, structured data
MR - Audio Video Analytics Map Reduce Approach is the only option

Happy Learning!!!

October 08, 2014

Hive Overview Notes

Data Warehousing package built on top of hadoop
Managing and querying structured data
Apache Derby embedded DB used by Hive
metastore_db folder for persistence of data
Suitable for WORM - Write Once Read Many Times Access Pattern
Core Components are Shell, Metastore, Execution Engine, Compiler (Parse, Plan, Optimize), Driver
Tables can be created as Internal Tables, External Table (Pointing to external file)
When Internal Tables are dropped schema + data is dropped. For external referencing tables only Schema is dropped not data. Both Internal and External tables reside in HDFS
Data files for created tables would be available in location /user/hive/warehouse
Partitioning in Hive - Hash Value % Number of buckets - that particular row will go into that bucket
Partition table should always be an Internal Hive Table

Happy Learning!!!

October 07, 2014

Map Reduce Internals

Client Submits Job. Job Tracker does the splitting, scheduling Job

Mapper

Mapper runs the business logic (ex- word counting)
Mapper (Maps your need from the record)
Record reader provides input to mapper in key value format
Mapper Side Join (Distributed Caching)
Output of mapper (list of keys and values). Output of mapper function stored in Sequence file
Framework does splitting based on input format, Default is new line (text format)
Every row / Record will go through map function
When there is a data split (row) is split between two 64MB Blocks. That particular row would be merged for complete record and processed
Default block size in Hadoop 2.0 is 128MB

Reducer

Reducer will poll it, job tracker will inform what all nodes to poll
Default number of reducer is 1. This is configurable
Multiple Reducers - Not possible - Multiple level MR jobs possible
Reduce Side join (Join @ Reducer Level)

Combiner

Combiner - Mini Reducer, Combiner before writing to disk, finds max value from data
Combiner is used when map job itself can do some preprocessing to minimize reducer workload

Partitioner

Hash Partitioner is default partitioner
Mapper -> Combiner -> Partitioner -> Reducer (For multi-dimension, 2012-max sales by product, 2013, max sales by location)

Happy Learning!!!

October 06, 2014

Hadoop Ecosystem Internals

Hadoop Internals - This post is quick summary from learning session.

Data Copy Basics (Writing data to HDFS)

Network Proximity during Data Storage (First 2 Ips closest to client)
Data Storage size in 64MB Blocks
Data Replication Copy by default 3
Client gets error message when Primary Node Data Write Operation Errors
Blocks will be horizontally split on different machines
Slave uses SSH to connect to master (Communication between Nodes also SSH)
Client communication through RPC
Writing happens parallel-y, replication happens in a pipeline

Analysis / Reads (Reading Data from HDFS)

Client -> Master -> Nearest Ips returned for Nodes
Master knows performance utilization of nodes, It would allocate machine which is least used for Processing (Where data Copy exists)

Concepts

Namenode - Metadata
DataNode - Actual Data
chmod 755 - Owner Write permission, others read and execute
Rack - Physical Set of Machines
Node - Individual machine
Cluster - Set of Racks

Learning Resources

hadoopilluminated
Different NOSQL DBs Comparisions
Apache Knox, Apache Sentry - Security Frameworks for Hadoop
HiBench - Hadoop Benchmarking suite

Tools

Happy Learning!!!

October 03, 2014

Hbase Primer Part III

This post is on Read / Write Operations Overview on Hbase. Steps were clear from DB Paper (Exploring NOSQL, Hadoop and Hbase by Ricardo Pettine and Karim Wadie). I'm unable to locate the link to download the paper.

I'm reposting few steps from the paper which lists down steps on Read / Write Operations on Hbase. ZooKeeper is used to perform coordination in Storm, Hbase

Data Path

Table - Hbase Table

Region - Regions for the Table

Store - Store per column family for each region

MemStore - Memstore for each Store

Store File - Stores File for each Store

Block - Block within Store File

Write Path

Client Request sent to Zoo Keeper
Zoo Keeper find meta data and returns it to client
Client Scans region server for new key storage where data need to be stored
Client sends request to Region Server
Region Server processes the request, Write operation follows WAL (Write Ahead Logging), Same concept is available in other database too
Memstore in this case when it is full, Data is pushed into disk

Read Path

Client Issues GetCommand
Zookeeper identifies Meta data and returns to client
Client Scans Region Server to locate data
Both memstore and store files are scanned

Happy Learning!!!

September 28, 2014

Pycon 2014

Every time when I attend a conference I plan to post my notes on same day. Delaying to post is inversely proportional to probability of posting it. After missing few conferences, Today this post is on my learning's from python conference 2014. There is a lot of motivation / inspiration to deliver / learn after every conference.

Interesting Quotes

"Functionality is an asset. Code is a liability" by @sanand0

"Premature Optimization is baby evil" by @sanand0

"If it's not tested, it doesn't work. If it's tested, it may work" - @voidspace
'Libraries are good, your brain is better' - @sanand0.

Learning References

Short Notes from Sessions

Panel Discussion on Python Frameworks - Django, Flask , Web.Py

Discussion was pretty interesting. For beginner level (Web.Py followed by Flask)
Django Elephant in the room scored over the rest based on usage, features, documentation

Interesting Talks and Notes are shared in Auth Evolution & Spark Overview Posts.

Happy Learning!!!

Auth Evolution

This session Auth as a service by Kiran provided good overview of evolution of authentication the past decade

Complete text of presentation is available in link. The text is pretty exhaustive. I am only writing key points for my reference

HTTP basic Auth
Cookies
Cryptography Signed Tokens
HTTPS
Database backed sessions

HTTP basic Auth - Username and password sent in the HTTP Request. To logout you need to send a wrong password, This gets preserved and server rejects the request after that

Cookies - Regular HTML form with username and Password encoded and put in HTTP cookie. This is sent in every request

Cryptographically signed tokens - random key + user name. Now cookie will be checked against the key to verify its the same user. Plus SSL on top it made sure most of issue are fixed

Database backed sessions - This is very nice one. These days I get notifications in Quora / google. You have these many open sessions / previously logged locations. This is all through database backed sessions. This seems to address all issues that came up as limitations of previous approaches.

Good Refresher!!!

Happy Learning!!!

Spark Overview

I remember Spark keyword appeared during Big Data Architecture discussions in my Team, I never looked more into Spark. Session by Jyotiska NK on Python + Spark: Lightning Fast Cluster Computing was a useful starter about Spark. (slides of talk)

Spark

In memory cluster computing framework for large scale data processing
Developed using scala with Java + Python APIs
This is not meant to replace hadoop. It can sit on top of Hadoop
References on Spark Summit for Slides / Videos to learn from past events - link

Python Offerings

PySpark, Data pipeline using spark
Spark for real time / batch processing

Spark Vs Map Reduce Differences

This section was session highlighter. They way how data is handled between Map Reduce Execution and Spark Approach is Key.

Map Reduce Approach - Load Data from Disk into RAM, Mapper, Shuffler, Reducer are the different approaches. Processing is distributed. Fault Tolerance is achieved by replicating data

Spark - Load data in RAM, Keep it until you are done, Data is cached in RAM from disk for iterative processing. If data is too large, rest is spilled into disk. Interactive processing of datasets without having to load data in memory. RDD (Resilent distributed datasets)

RDD - Read Only collection of objects across machines. On losing information this can still be recomputed.

RDD Operations

Transformations - Map, Filter, Sort, flatmap
Action - Reduce, Count, Collect, Save to local data in disk. Action usually involves disk operations

Read Quote of Ramzi Alqrainy's answer to What are use cases for spark vs hadoop? on Quora

More Reads
Testing Spark Best Practices
Gatling - Open Source Perf Test Framework
Spark Paper

Happy Learning!!!

August 16, 2014

Hbase Primer - Loading Data in HBASE Using SQLOOP / Java Code

This post is on examples using SQOOP / custom java code to import data into HBASE, HIVE, HDFS from MSSQL DB

Tried the steps on Cloudera VM - cloudera-quickstart-vm-4.4.0-1-vmware

From Linux terminal > hbase shell
create 'Employee_Details', {NAME => 'Details'}

Example #1 (Import to Hbase from MSSQL DB Table)

sqoop import --connect "jdbc:sqlserver://11.11.11.11;database=TestDB" --username "sa" --password "sa" --driver "com.microsoft.sqlserver.jdbc.SQLServerDriver" -m 1 --hbase-table "Employee_Details" --column-family "Details" --table "dbo.employee" --split-by "id"

Example #2 (List Tables in MSSQL DB Table)

sqoop list-tables --connect "jdbc:sqlserver://11.11.11.11;database=TestDB" --username "sa" --password "sa" --driver "com.microsoft.sqlserver.jdbc.SQLServerDriver"

Example #3 (Import to HDFS from MSSQL DB Table)

sqoop import --connect "jdbc:sqlserver://11.11.11.11;database=TestDB" --username "sa" --password "sa" --driver "com.microsoft.sqlserver.jdbc.SQLServerDriver" --table "dbo.employee" --split-by "id"

hadoop fs -ls (List files in hadoop file system)

Example #4 (Import into Hive Table from MSSQL DB Table)

sqoop import --hive-import --create-hive-table --hive-table Employee_Hive --connect "jdbc:sqlserver://11.11.11.11;database=TestDB" --username "sa" --password "sa" --driver "com.microsoft.sqlserver.jdbc.SQLServerDriver" -m 1 --table "dbo.employee"

Example #5 - Custom Java Code

Add all required Jars need to be added to compile the project. This was one of challenges to get this code working

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.HColumnDescriptor;

import org.apache.hadoop.hbase.HTableDescriptor;

import org.apache.hadoop.hbase.client.HBaseAdmin;

import org.apache.hadoop.hbase.client.HTable;

import org.apache.hadoop.hbase.client.Put;

import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;

public class HBaseTest

{

public static void main(String[] args) throws IOException

{

HBaseTest HT = new HBaseTest();

HT.DropTable();

HT.CreateTable();

HT.UpdateRecords();

}

public void DropTable()

{

try

{

Configuration config = HBaseConfiguration.create();

config.set("hbase.zookeeper.quorum", "localhost");

config.set("hbase.zookeeper.property.clientPort", "2181");

HBaseAdmin admin = new HBaseAdmin(config);

admin.disableTable("Employee_Details");

admin.deleteTable("Employee_Details");

}

catch(Exception Ex)

{

}

public void CreateTable()

{

try

{

Configuration config = HBaseConfiguration.create();

config.set("hbase.zookeeper.quorum", "localhost");

config.set("hbase.zookeeper.property.clientPort", "2181");

HTableDescriptor ht = new HTableDescriptor("Employee_Details");

ht.addFamily( new HColumnDescriptor("Id"));

ht.addFamily( new HColumnDescriptor("Details"));

HBaseAdmin hba = new HBaseAdmin(config);

hba.createTable( ht );

}

catch(Exception Ex)

{

}

public void UpdateRecords()

{

try

{

Configuration config = HBaseConfiguration.create();

config.set("hbase.zookeeper.quorum", "localhost");

config.set("hbase.zookeeper.property.clientPort", "2181");

HTable table = new HTable(config, "Employee_Details");

Put put = new Put(Bytes.toBytes("row1"));

put.add(Bytes.toBytes("Details"),Bytes.toBytes("Name"),Bytes.toBytes("Raka"));

put.add(Bytes.toBytes("Details"),Bytes.toBytes("Location"),Bytes.toBytes("Chennai"));

put.add(Bytes.toBytes("Id"),Bytes.toBytes("Eid"),Bytes.toBytes("Chennai"));

table.put(put);

table.close();

}

catch(Exception Ex)

{

}

This post was useful to try it out SQOOP Examples

Happy Learning!!!

August 09, 2014

HBase Primer - Querying for Seeks / Scans - Part I

This post is HBase Primer on Seeks / Scans. I recommend spending time on HBase Schema design to have a basic understanding of HBASE table structure. The motivation slide for this post is from Cloudera Session slides post.

I wanted to try out the suggested pattern of tables. This can be compared to TSQL Equivalent of DDL, DML Scripts. Querying, Select with filters is the key learning from this exercise. Key Learning's are select with filter examples. There are no aggregate functions available in Hbase. Hive and Phoenix which sits on top of Hbase Serves for this purpose of aggregations.

HBASE (One Liner)- Low Latency, Consistent, best suited for random read/write big data access

Few bookmarks provide great short intro - link1

Hbase Queries (Tried it on cloudera-quickstart-vm-4.4.0-1-vmware)

hbase shell

Disable and Drop Table

Disable 'Employee'

Drop 'Employee'

Create Table

Details is a column family. Inside this column family there are members Name, Location, DOB and Salary

create 'Employee', {NAME => 'Details'}

Insert Records

put 'Employee', 'row1', 'Details:Name', 'Raj'

put 'Employee', 'row1', 'Details:Location', 'Chennai'

put 'Employee', 'row1', 'Details:DOB', '01011990'

put 'Employee', 'row1', 'Details:Salary', '1990'

put 'Employee', 'row2', 'Details:Name', 'Raja'

put 'Employee', 'row2', 'Details:Location', 'Delhi'

put 'Employee', 'row2', 'Details:DOB', '01011991'

put 'Employee', 'row2', 'Details:Salary', '5000'

put 'Employee', 'row3', 'Details:Name', 'Kumar'

put 'Employee', 'row3', 'Details:Location', 'Mumbai'

put 'Employee', 'row3', 'Details:DOB', '01011992'

put 'Employee', 'row3', 'Details:Salary', '50000'

Select based on Column Qualifiers

scan 'Employee', {COLUMNS => ['Details:Name', 'Details:Location']}

scan 'Employee', {COLUMNS => ['Details:Location']}

Single Filter - Search by Location Column Filter

scan 'Employee' ,{ FILTER => "SingleColumnValueFilter('Details','Location',=, 'binary:Chennai')" }

ValueFilter - Search by Location Value

scan 'Employee' , { COLUMNS => 'Details:Location', FILTER => "ValueFilter (=, 'binary:Chennai')"}

Multiple Filter - Search by Name and Location

scan 'Employee' ,{ FILTER => "SingleColumnValueFilter('Details','Location',=, 'binary:Chennai') AND SingleColumnValueFilter('Details','Name',=, 'binary:Raj')"}

Timestamp and Filter - Search with Timestamp and Filter

scan 'Employee' ,{TIMERANGE=>[1407324224184,1407324224391], COLUMNS => 'Details:Location', FILTER => "ValueFilter (=, 'binary:Chennai')"}

Timestamp and Filter - Search with multiple Filter on same column (contains)

scan 'Employee' ,{ FILTER => "SingleColumnValueFilter('Details','Location',=, 'binary:Chennai') OR SingleColumnValueFilter('Details','Location',=, 'binary:Delhi')"}

Filter using regExMatch

scan 'Employee' ,{ FILTER => "SingleColumnValueFilter('Details','Location',=, 'regexstring:Che*',true,true)" }

Search using Prefix

scan 'Employee' ,{ FILTER => "SingleColumnValueFilter('Details','Location',=, 'binary:Chennai') AND PrefixFilter ('row1')"}

Return only one record

scan 'Employee', {COLUMNS => ['Details:Location'], LIMIT => 1, STARTROW => 'Chennai'}

Return two records

scan 'Employee', {COLUMNS => ['Details:Location'], LIMIT => 2, STARTROW => 'Chennai'}

Return records between range of rowkeys

scan 'Employee', {COLUMNS => ['Details:Location'], STARTROW => 'row1',STOPROW=>'row2'}

Get Specific Row with Filter

get 'Employee', 'row1', {FILTER => "ValueFilter (=, 'binary:Chennai')"}

Count records

count 'Employee'

Timestamp Range - Return within the timestamp range

scan 'Employee' ,{TIMERANGE=>[1407324224391,1407324234707]}

Hbase Table 3 Column Family

create 'Employee1', {NAME => 'Name'}, {NAME => 'Location'}, {NAME => 'DOB'}

Query with column family

scan 'Employee1', {COLUMNS => ['Name']}

Delete a Column Family

alter ‘Employee', ‘delete’ => ‘Location'

More Reads

Best Practices for Managing HBase in a High Write Environment

Hbase Shell Commands

Hbase basics

Hbase-Hive

Quick Start - Standalone HBase
Hbase with Apache Phoenix
External Hive Table on HBase
HBase Aggregation Example
Secondary Indexes in HBase

Happy Learning!!!

July 21, 2014

Machine Learning Notes - Anomaly Detection - Entropy Computation

This post is on my learning's from Machine Learning Session conducted by my colleague Gopi. It was really a good introduction and a lot of motivation towards learning the topic.

Concepts Discussed

Homogeneity - Is my data homogeneous
Pick the odd one out (Anomaly detection)
Entropy Computation

Wide variety of examples to find odd sets, variations. Example from below set identify the anomaly one

1,1,1,2

1,2,2,1

1,2,1,1

1,0,1,2

The last row involving zero is a odd one. Identifying them using entropy computation was very useful

Entropy Formula

Formula detailed notes from link

For row (1,1,1,2)

= -[((3/4)*log2(3/4)) + ((1/4)*log2(1/4))]

= -[-0.311 -0.5]

= .811

For row (1,2,2,1)

= -[((2/4)*log2(2/4)) + ((2/4)*log2(2/4))]

= -[-.5-.5]

= 1

For row (1,2,1,1)

= -[((3/4)*log2(3/4)) + ((1/4)*log2(1/4))]

= -[-0.311 -0.5]

= .811

For row (1,0,1,2)

= -[((2/4)*log2(2/4)) + ((1/4)*log2(1/4)) + ((1/4)*log2(1/4))]

= -[-0.5 -0.311 -0.311]

= 1.12

By excluding the row with higher values we will have homogeneous data set, The one last row with high entropy is the anomaly

If Data set is homogeneous after removing a particular record set then that particular record set is the anomaly one

More Concepts Introduced

Conditional Probability
ID3 Algorithm
Measure Entropy
Decision Tree
Random Forest
Bagging Technique

Happy Learning!!!

July 18, 2014

Multithreading - Automation Basics - Usage of lock to ensure threadsafe

Example Code

Usage of lock to ensure threadsafe

using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

using System.Threading;

namespace MultithreadedApp

{

class Program

{

//static long _value1;

private object threadLock = new object();

void runcode()

{

lock (threadLock)

{

for (int i = 0; i < 100; i++)

{

Console.WriteLine("Value of _value1 " + i);

}

static void Main(string[] args)

{

Thread[] agents = new Thread[10];

Program[] P = new Program[10];

for (int i = 0; i < 10; i++)

{

P[i] = new Program();

agents[i] = new Thread(P[i].runcode);

agents[i].Start();

}

Console.ReadLine();

}

Happy Learning!!!

July 17, 2014

Big data Testing Tools - Functional and Performance

This post is based on my learning notes on functional test tools for Big Data ecosystem. Earlier posts, we have read the basics of big data ecosystem components. Sharing the first version of Test Tools Analysis for Functional Testing

Product / Area	Testing Tools	Test Approach	Programming Language	Reference
HiveMQ	MQTT Testing Utility, Tsung			link, link1
Storm	storm test		Clojure	link
Hive, Pig	Beetest, Pigmix, Apache DataFu	Query HIVE (Similar to TSQL)		link
Map Reduce Jobs	MRUnit, MRBench			link
Analytics		Lift charts, Target shuffling, Bootstrap sampling to test the consistency of the model		link
HBASE	Junit, Mockito, Apache MRUnit			link
	Jmeter Plugins for Hadoop, HBASE, Cassandra			link, link1

More Tools

Performance Testing Tools Analysis

Area	Tool	Comments
HBASE	Inbuilt tool usage – $ bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1 $ bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation --rows=10240 randomWrite 1 More Reads - link Automation scripts for comparing different HBase BlockCache implementations - link Hbase write throughput link	PerformanceEvaluation (Inbuilt HBASE tool) - (Validate Read / Writes / Scans performance in the environment etc..)
YCSB	Performance testing HBase using YCSB , link1, link2
All Big Data Areas (HBASE, Hadoop, MapReduce	Sandstorm commercial cloud / in premise tool for Big Data QA
Kafka	Benchmarking Apache Kafka with SimpleConsumer in Java Kafka performance testing Benchmarking Apache Kafka: 2 Million Writes Per Second (On Three Cheap Machines)
Spark	Testing Spark Best Practices

More Reads - Impetus Perf Engineering Blog
Rest-assured - Java DSL for easy testing of REST services
Retrofit - A type-safe REST client for Android and Java
G7 - Tools for Big Data

Cloud Testing Tools

Test Environment Setup using Cloud Infrastructure

Load generation in cloud for on premises application
Load generator on premises, application on cloud
Both load generator and application both on cloud

Amazon Cloud Pieces

EC2 - Elastic Compute Cloud -> CPUs
EBS - Elastic Block Storage -> Database
S3 - Simple Storage Services -> Storage
Ec2 Dream Tool for connecting to multiple cloud providers - link

Blazemeter

Distributed geographical performance test tool
For First level of testing only
Upload and run your custom Jmeter Scripts through blazemeter

Load Test Tools

Flood.io
loadfocus

Security Testing Tools

NTOSpider
Burp Proxy

Blazemeter walkthrough Example

Step 1 - Create Load Test

Step 2 - Configure URL

Step 3 - Start Run

Step 4 - Reports

You can also upload Jmeter scripts and execute it through blazemeter

Happy Learning!!!

December 21, 2014

December 09, 2014

October 20, 2014

October 10, 2014

October 09, 2014

October 08, 2014

October 07, 2014

October 06, 2014

October 03, 2014

September 28, 2014

August 16, 2014

August 09, 2014

July 21, 2014

July 18, 2014

July 17, 2014

Git Code Repository

About Me

What is your Expertise

Search This Blog

Translate

About Me and Disclaimer

Labels

Data Science Good Reads

Cloud, Datacentre, BigData and NOSQL Blogs

SQL Links

Archecture Blog List

Programming Problems

Startup - Reads

Perl-Python-Ruby-Linux-Oracle

Management + Leadership Blogs

Research Papers & Podcasts

My Wordpress

Interesting Reads

Useful Links - C# and .NET

Java, Selenium, QTP and Test Tools Learning

Agile Testing

Reverse Logistics Reads

Biztalk Blogs

MS BI Links

Process - Learnt it :)

Usability Guidelines - Building Better Sites

.NET Test Tools and Other Interesting Reads

Review Checklist

Blog Archive

Live Traffic

Total Pageviews

Popular Posts