"No one is harder on a talented person than the person themselves" - Linda Wilkinson

July 21, 2014

Machine Learning Notes


This post is on my learning's from Machine Learning Session conducted by my colleague Gopi. It was really a good introduction and a lot of motivation towards learning the topic.

Concepts Discussed
  • Homogeneity - Is my data homogeneous
  • Pick the odd one out (Anomaly detection)
  • Entropy Computation
Wide variety of examples to find odd sets, variations. Example from below set identify the anomaly one
1,1,1,2
1,2,2,1
1,2,1,1
1,0,1,2

The last row involving zero is a odd one. Identifying them using entropy computation was very useful

Entropy Formula



Formula detailed notes from link

For row (1,1,1,2)
 = -[((3/4)*log2(3/4)) + ((1/4)*log2(1/4))]
 = -[-0.311 -0.5]
 = .811
 For row (1,2,2,1)
 = -[((2/4)*log2(2/4)) + ((2/4)*log2(2/4))]
 = -[-.5-.5]
 = 1
 For row (1,2,1,1)
 = -[((3/4)*log2(3/4)) + ((1/4)*log2(1/4))]
 = -[-0.311 -0.5]
 = .811 
  
 For row (1,0,1,2)
 = -[((2/4)*log2(2/4)) + ((1/4)*log2(1/4)) + ((1/4)*log2(1/4))]
 = -[-0.5 -0.311 -0.311]
 = 1.12

By excluding the row with higher values we will have homogeneous data set, The one last row with high entropy is the anomaly  
If Data set is homogeneous after removing a particular record set then that particular record set is the anomaly one

More Concepts Introduced
  • Conditional Probability
  • ID3 Algorithm
  • Measure Entropy
  • Decision Tree
  • Random Forest
  • Bagging Technique
Happy Learning!!! 

July 17, 2014

Big data Functional Testing Tools

I'm in the process of identifying functional test tools for Big Data ecosystem. Earlier posts, we have read the basics of big data ecosystem components. Sharing the first version of Test Tools Analysis for Functional Testing

Product / Area
Testing Tools
Test Approach
Programming Language
Reference
HiveMQ
MQTT Testing Utility


Storm
storm test

Clojure
Hive
Beetest
Query HIVE (Similar to TSQL)

Map Reduce Jobs
MRUnit


Analytics

Lift charts, Target shuffling, Bootstrap sampling to test the consistency of the model

HBASE
Junit, Mockito, Apache MRUnit



Jmeter Plugins for Hadoop, HBASE, Cassandra



Happy Learning!!!

July 06, 2014

Weekend Reading - Webinar - Performance Testing Approach for Big Data Applications.

Very good session – Webinar - Performance Testing Approach for Big Data Applications. Few interesting notes / slides from session


  • Rate of Data Ingestion - How fast system consumes data?
  • Data Processing - Speed how data is processed. Testing Data processing in isolation with data sets populated. Run specific perf tests (MR Jobs, Pig, Hive Scripts)
  • Data Persistence – I/O bound process. (Data Writes / Updates on DB, Garbage Collection, Monitoring Metrics)
  • Complete end to end time for processing (Network Connectivity, Processing, Results)

Big Data Test Challenges
  • Diverse Technologies
  • Unavailability of Test Tools for Big Data Technologies / Scenarios
  • Limited Monitoring / Diagnostic Solutions
  • Test Scripting / Environment
Perf Test Tools
  • Use cloud to simulate large infrastructure
  • Cloud orchestration scripts Puppet, Chef

Approach
  • Depending on usage in production identify patterns for production workload
  • Fault Tolerance Scenarios
  • Hadoop monitoring tools to check Map reduce jobs
  • Selecting Test Clients - Custom code
  • Performance / Failover tests to ensure scalability (Node failures during processing)
Test Parameters and Summary



Very Nice, Practical and useful webinar. There are a lot of posts / webinars. This one is very useful and practical.

Happy Learning!!!

Weekend Reads - API Testing (Web Service Testing, Inspect Http request, Rest API Testing) - Free tools

Very good presentation on compiled list of free tools for API Testing. More Details - Free API debugging and testing tools you should know about

Tools List from the presentation and SO reads
While reading through the list again went back to check on SOAP, Rest Basics. API testing / Web Service testing we will be looking into only aspects Rest, SOAP based web services. Summary based on StackOverflow readings, posts. References - StackOverflow reference answers link. More details Pls check reference link. (Consolidated Answer and Detailed short summary listed below)

Rest
SOAP
REST is over HTTP. REST has no WSDL interface definition
SOAP can be over any transportprotocols such HTTP, FTP, STMP, JMS etc.
REST stands for Representational State Transfer. REST approach uses the standard GET, PUT, POST, and DELETE verbs
Simple Object Access Protocol (SOAP) 
SOAP builds an XML protocol on top of HTTP  / TCP/IP.
REST is good for getting a blob of data that you don't have to work with
SOAP describes functions, and types of data. If you want to get an object, SOAP is way quicker and easier to implement
Typically uses normal HTTP methods instead of a big XML format describing everything
Has several protocols and technologies relating to it: WSDL, XSDs, SOAP, WS-Addressing
REST plays well with AJAX'y web pages. If you keep your requests simple, you can make service calls directly from your JavaScript, and that comes in very handy.
SOAP is useful from a tooling perspective because the WSDL is so easily consumed by tools. So, you can get Web Service clients generated for you in your favourite language.

More Reads
From AWS Blog - 80% REST / 20% SOAP usage pattern

Happy Learning!!!

July 05, 2014

APIs Good Read

I'm learning Big Data basics, checking real life architectures to understand the technology, implementation. Interesting slide on API's. I am sharing the same.


Happy Reading!!!

June 26, 2014

Tablediff - Quickly compare column data and schema between two Database Tables on Different Machines

Today Learning's 
  • Tablediff gets installed with replication features
  • Prepare Tablediff for multiple tables
  • Run it from a batch file and log results in text file
  • One Click Results Displayed :). Tested on SQL 2012
Example Batch File 1

TableDiff.bat
cd "C:\Program Files\Microsoft SQL Server\110\COM"
tablediff.exe -sourceserver SourceDBServerIP -sourcedatabase TestDB -sourcetable Table1 -destinationserver DestDBServerIP  -destinationdatabase TestDB -destinationtable Table1 -c -dt DiffsTable1
tablediff.exe -sourceserver SourceDBServerIP -sourcedatabase TestDB -sourcetable Table2 -destinationserver DestDBServerIP  -destinationdatabase TestDB -destinationtable Table2 -c -dt DiffsTable2

Explanation
  • Option –c - Perform column level comparisons
  • Option –dt – Output different records to particular table (drop and create if exists)
  • By checking the log and the diff tables created we can find error code and description of differences
  • By adding -strict option it expects the same schema (column order). I have not used it in above example
  • Add -sourceuser,  -sourcepassword , -destinationuser, -destinationpassword parameters in case of sql credentials. If not specified windows credentials will be used
Batch File 2 - Capture Log Results
RunTD.bat

C:\\TableDiff.bat > C:\\TableDiffResults.txt 2>&1

One Click RunTD.bat and check logs for results. SO link

Happy Learning!!!

June 19, 2014

Two interesting bugs - Web service changes, DB column position changes

Two interesting bugs and learning's from them

Bug#1 - Due to install / upgrade paths it resulted in different ordinal position of same tables. Due to usage of select * this resulted in wrong update on columns

Automation Solution - With SQL command (link) we can figure out ordinal position differences between two same tables. Querying two environments for each tables, dumping the results in a XML. XML comparison  between them would highlight tables with different cardinal positions for two same tables.

Bug #2 - When web services are changed, the optional parameters introduced, signatures changed often fails to support backward compatibility 

Automation Solution - SOA model is open source tool for wsdl comparison. This can be used to validate wsdl differences between versions. Example code from SO

More Reads

Happy Bugging & Learning!!!

Good QA Reads

Good QA Reads
More Reads
Happy Reading!!!

June 18, 2014

TSQL Learning's @ Work - Date Conversion yyyymmdd to mm/dd/yyyy, xp_cmdshell Tips

Tip#1 - Converting YYYYMMDD into MM/DD/YYYY Format

SELECT CONVERT(VARCHAR(10),CONVERT(DATETIME, '20120101', 121),101)



Tip #2 - Coverting HHMMSS into HH:MM:SS

SELECT STUFF(STUFF('211532', 3, 0, ':'), 6, 0, ':')



Tip #3 - xp_cmdshell does not work with temp tables / table variables. Use Global Temp Tables ## (Stackoverflow :) link)

Tip #4 - Enable xp_cmdshell in SQL Server - SO Link

Tip #5 - Testing Dynamic SQL Code with EXEC command - SO Link

Happy TSQL Revising!!!

June 15, 2014

Interesting Learning Notes

Note #1 - What is difference between performance and scalability problem ?
  • Performance problem - Fixing performance issue for a website example - Pageload time is very high for homepage of website
  • Scalability problem - Scaling website to support 10X user base than current user base
This post was very useful for above answer Performance v Scalability – For Employers

Note #2 - Read / Write Advantages / Disadvantages for Normalized / Denomarmalized Databases ?
  • Normalized - Insert in multiple tables, Highly consistent
  • Denormalized - Easy Insert, Consistency issue with updates (Multiple  versions of Records may exist)
I loved reading this post again and again - Data storages and read vs write controversy. Post is Very Simple, intuitive and clear.

  • Tip #1 - Changing window size during execution set_window_size
  • Tip #2 - Screenshots comparison using needle (Python based)

June 10, 2014

Interesting TSQL Question

Check the below TSQL Example involving NULL. What is the result for below three queries

Tested on SQL 2012 Setup

Create TABLE Test1
(Id           int,
 Score int not null)

Create TABLE Test2
(Id           int,
 Score2       int not null)

 insert into Test1(Id, Score) VALUES (NULL,10),(10,100),(20,200)

 insert into Test2(Id, Score2) VALUES (NULL,11),(10,110),(20,220)

Case #1 
 Select * from  Test1 JOIN Test2
 ON Test1.Id = Test2.Id

Case #2 
 Select * from  Test1 LEFT OUTER JOIN Test2
 ON Test1.Id = Test2.Id

Case #3
 Select * from  Test1 LEFT OUTER JOIN Test2
 ON Test1.Id = Test2.Id
 ORDER BY Test1.Id DESC

Results are

Happy Learning!!!

June 03, 2014

SQL Server Database Restore Notes

While installing SQL 2012 on Windows Server 2008 R2
  • Enable windows 3.5 from Windows Add / Remove Features
  • While trying to restore SQL 2012 backup on SQL 2008, Error 'media family on device is incorrectly formed' for version mismatch is misleading
SQL Version to check query

SELECT
SERVERPROPERTY('ProductVersion') AS ProductVersion,
SERVERPROPERTY('ProductLevel') AS ProductLevel,
SERVERPROPERTY('Edition') AS Edition,
SERVERPROPERTY('EngineEdition') AS EngineEdition;
GO

Useful Notes
Quick Script encompassing the steps from above notes

Step 1 - Stop SQL Server
NET stop MSSQLSERVER

Step 2 - Remove DB Folders (MDF, LDF Files)
rmdir /s /q "C:\Databases\VOL\Data\"
rmdir /s /q "C:\Databases\VOL\Log\"

Step 3 - Recreate Same file path for Restore
mkdir  "C:\Databases\VOL\Data\"
mkdir  "C:\Databases\VOL\Log\"

Step 4 - Start SQL Server
NET start MSSQLSERVER

Step 5 - Run DB Restore script
Db Restore Command

Tweak it as you need :)

Happy Learning!!!

May 29, 2014

Automation, Tools, QA

This is in continuation to previous post on Automation. I wanted to write this post after reading article Testing Trends…or not?

When I had to test ecommerce portal across locales automation was useful to validate happy path, search, order, returns etc...For specific fixes (perf improvements, tab size adjustments, zoom in adjustments, alignments I had to rely on manual validation. Automation can cover positive functional flows. The pain point was ids were changed frequently with every release. There was always a backlog between automation implementation vs current production code. If you refer back to article Stop Writing Automation, author clearly lists the failure points which will not be covered in automation. 

Automation can be broadly classified into
  • BVT Tests
  • Regression Tests
  • Functional Tests
  • In-house Tools for perf tests, set-up
There has been a lot of focus on exploratory testing, context driven testing etc.. My perspective is core of it lies in product centric knowledge. The more you explore / learn about the system, higher the chances to identify critical bugs. Three things that are essential for a QA role are
  • Product Centric Knowledge (Willingness to explore and master the domain / product)
  • Technical Acumen (Know the system functional know, Learn whenever possible)
  • Look at the big picture, Failure points (Here exploratory testing, context driven testing, mind maps will help)
I had worked in DB Dev / Test, UI - Test roles. DB has been more interesting and fascinating than UI :). 

I prefer SDET (profile) mix of both code level / functional tests than purely relying on black box tests. Rotational Software engineer program in Microsoft is a very good example. Fresh Graduates would spend 6 months each in DEV / Test / Support / Product management. Depending on their interest they can finally pick a role of their choice. This model would provide complete picture of Release Cycle. For every role you need to work with respect to the context.  Also, flexibility to adjust to different profile would give you broad range of skills and a better perspective of products / functions.


May 27, 2014

Reading Notes - Test Data Generator, Interesting Reads

Google reader used to be my favourite feed subscription list. After that was removed I tried feedspot, feedly. Both have not provided the same experience but still figuring out alternatives.

Today there were few interesting reads
Note #1 - mockaroo 
  • Live web based Test data generator looks promising to generate test data
  • Null percentage distribution, data type combinations, CSV, sql multiple data formats
  • Real time datatype distribution for near match of test data
  • Rest API exposure for Automated data generation
This tool would have consumed a lot of development time to develop intuitive, real world test data cases. Nice Work!!!

Ref - This post  was useful to reach out to mockaroo tool

More Testdata Generation Tools

Note #3 - Good Infographic on Devlandscape 2014 (Checkout big data, NOSQL and DevOps)
Happy Reading!!!

May 17, 2014

RootConf Day #2 Notes

June 6th Update - All RootConf Session Videos are available in link

Today was Day#2 of RootConf. Some sessions were engaging. Content, Presentation, Connecting with audience was good. Some good learning's for a powerful presentation
  • Creative Quotes (Similar to Quora answers with Pics)
  • From Tweets (Co-relating the context)
  • Movie Stills with modified subject + humour related conversations 
Some presentations / context will remain in our memory due to its impact / situation
Tools
  • ejoson (Secret management), 
  • mesos for resource management  
  • coreos - linux for massive system deployements
  • Ansible - Deployment + Configuration Management + Continuous Delivery
  • citoengine - Alert management and automation tool
  • pacemaker - Server side exploitation software - Python based
  • RobotFramework for Device Automation
  • Linux Profiling Tools - Perf Top, Perf Sched
Two days are full of Open Source related stack. There are open source alternatives to VMWare VSphere, AWS. BrowserStack manages all its hundreds of servers Ansible. Aditya Patawari demonstrated wordpress setup in few clicks.

First session on Security by Anant Shrivastava on Heartbleed bug was good. Demonstration of heartbleed bug was done. 

Session - DDOS mitigation @flipkart by Sameer Garg
Volumetric Attack
  • DNS, SNMP, NTP Amplification
  • SYN Flood
  • Fragmented Packets
App Layer
  • Wordpress Ping back
  • Exploiting HTTP
  • Incomplete requests
Volumetric Attack Mitigation
  • Use Scrubbing farms (3rs Party Mitigation Service)
  • Work with Upstream providers
  • Using BGP
App Layer Mitigations
  • Home grown solutions
  • Scrubbing farms
  • Real time log analysis
  • Identify Standard Patterns
  • Use data to block traffic
Happy Learning!!!

May 16, 2014

RootConf Day #1 Notes

Every Conference is a good learning opportunity in identifying best practices, technology trends, learning opportunities. Predominantly all sessions were open source tools for Dev-Ops, Continuous Integration, Automated Deployments.

Consolidated Set of Open Source Tools Discussed are
For Performance Testing Open Stack
Free Tools to be checked for Windows
  • Nagios for Windows Monitoring
  • Vagrant for Windows
Notes from Sessions
First session was on Building Elastic Infrastructures by Pankaj Kaushal
  • Automated creation of systems
  • Centralized monitoring
  • VM created with HostDB Entry
  • Tools - Puppet, HostDB
HostDB
  • Highly Available / Reliable
  • Namespaces & access controls
  • Rollback / GIT as backend
  • Rest APIs for APPs to interact
Puppet
  • Manage Configuration
  • Define Machine / nodes
Session Quick Prototyping with LXC and Puppet by BENJAMIN KERO
  • Tools CVS, SVN, BZR, Dares, RCS, GIT, Mercurial
  • Provided good comparision for tools Docker, VMWare VSphere, EC2, Linux + Puppet, C (Control) Groups
  • Mercurial, LinuxContainers.org
Session - Avoiding single point of failure in a multi-services architecture
  • Tools used - Sensu monitoring, salt stack, jenkins
Interesting Sites to check
Happy Learning!!!

May 11, 2014

Weekend Reading Notes

Session #1 Netflix's Distributed Computing Strategies: Optimistic Design for the Eventual Consistency Model



Good Netflix Case study on Cassandra for High Performance DB's
  • In a master / slave configuration there is a interval for data sync
  • Early 2000's reads were done on replicated databases 
  • Repair option possible in cassandara
  • MYSQL users - Facebook, Zappos, Symantec etc..
  • FB replays logs across Slave systems 
  • Remove Foreign keys to improve performance
  • Netflix Cassandra cluster (1 Million writes / Reads worked successfully) - More reads link
  • Benchmarking Cassandra Scalability on AWS - Over a million writes per second
  • Pessimistic Design - High Consistency = High Latency, Performance issues 
  • Optimistic Design - Trust Data Store, For 1% or edge cases have contingency plans
  • Example, Amazon (Low Consistency, sometimes sell items not in inventory), Send a polite email, 10% credit for next purchase
Session #2How Python Scripts Power Drones

April 29, 2014

Selenium Reads & QA Tools Post


Good Reads
PhantomJS
Tools / Useful Scripts
Fitnesse
Sikuli
Mobile Automation
Jmeter
Windows Testing
More Tools
Happy Learning!!!