"No one is harder on a talented person than the person themselves" - Linda Wilkinson

May 24, 2015

Interesting Sites

Two new interesting sites

Security Tools and Market Share of them - @ sectoolmarket 

Collection of useful tech talks available @ techtalkshub 

Happy Learning!!!


April 30, 2015

VodQA Presentation

My VodQA presentation slides


April 19, 2015

Testcomplete - VBScript Notes

Tip #1
Change browser locale setting using registry edit. IE browser locale changes

'Browser language local setting function
Sub SetLanguage_FR()
Set wshShell = CreateObject( "WScript.Shell" )
wshShell.RegDelete "HKEY_CURRENT_USER\Software\Microsoft\Internet Explorer\International\AcceptLanguage"
wshShell.RegWrite "HKEY_CURRENT_USER\Software\Microsoft\Internet Explorer\International\AcceptLanguage", "fr-FR", "REG_SZ"
Set wshShell = Nothing
End Sub

Tip #2
Dictionary object example in VBScript

Sub CreateDictionaryExample()
'Dictionary Adding Values
 set ExpectedValues = CreateObject("Scripting.Dictionary")
   ExpectedValues.add "label17", "User Name:"
   ExpectedValues.add "label23", "Password:"
   ExpectedValues.add "label2", "Please enter Details"
   ExpectedValues.add "Btnlogin", "Submit"
  'Dictionary Listing Values
   nofElemtns = ExpectedValues.Count
   keys = ExpectedValues.Keys()
     for i=0 to nofElemtns-1
        k = keys(i)
        v = ExpectedValues.Item(k)
        Call Log.Message(k & ": " & v)
     next
End Sub 

Tip #3 - Disable windows pop up 'Your computer is at Risk' feature. 
This link . was useful to fix group policy changes

Tip #4 - Vbscript Kill all browser instances
This stackoverflow code snippet link was useful 

Happy Learning!!!

January 19, 2015

Good Read - IOT Startup Ideas

IOT Startup Ideas - 50 Sensor Applications for a Smarter World. This covers all below categories
  • Smart City
  • Smart Environment
  • Smart Water
  • Smart Metering
  • Security
  • Retail
  • Logistics
  • Industrial Control
  • Agriculture
  • Farming
  • Home Automation
  • eHealth
Lot of Startup Ideas!!

Happy Learning!!!


January 07, 2015

QA Tools (Eliminate Repetitive Efforts)


Very Interesting post Reusable Software? Just Don't Write Generic Code

I echo similar thoughts on developing tools. I personally prefer smaller utilities / tools than a large consolidated Suite. This again depends on knowledge depth / design exposure. Some of Advantages of smaller components / utilities are
  • Working throw away solution solves current hunger while prioritizing on next set of applicable changes
  • Smaller components with dedicated code ownership would help in better maintenance, customization
  • Usually tools developed by one person ends up modified by someone else in team. The time to learn a complex solution which solves lot of use cases needs greater functional knowledge and time
When we develop / consolidate different framework we would end up implementing
  • Abstraction layers
  • Inheritance
Level / Depth of it depends on skillset, exposure. The intent of the post was to minimise efforts to maintain tools developed. Considering the team skills, functional knowledge, ownership aspects, maintainable tools would be a better option than developing a large suite over extended period of time

Interesting line reposted from the post

"Writing small components will give your software a high chance of survival: all individual components are easy to use and understand, and are usable on their own in various use cases"

Happy Learning!!!

Databases - IOT - CES

CES notes on IOT had a interesting tag line posted in MEMSQL blogpost

Tag line copied from the post


Also, vast landscape of DB products in multiple categories (RDBMS, NOSQL, In-memory, Hadoop, Stream processing) check-out 451 research paper. Depending on the application needs you can identify top products to evaluate / get started


NoSQL LinkedIn Skills Index – December 2014

Current State

Happy Learning!!!

December 21, 2014

Productivity Tools list for Developer / DevOps / QA

Compiled list of Productivity Tools for Developer / DevOps / QA
  • MTPutty
  • Rapid Environment Editor
  • Agent Ransack
  • Heidisql
  • SSMS Tools Pack
  • Programmers Notepad
  • TextPad
  • Notepad++
  • SOAP UI
  • SSMS Tools Pack
  • HeidiSQL
  • Atlantis SQL Server
  • Snip-it Utility
  • Mtail


Happy Learning!!!

December 09, 2014

RDBMS Vs CEP

Came across this slide in presentation in link. Below slide on RDBMS vs CEP processing was very clear in terms of compared attributes, representation of facts. Many thanks to author.



Happy Reading!!!

October 20, 2014

Open Source Test Tools Vs Commercial Test Tools


I have never had the taste of working with record - playback tools. I have mostly developed custom tools / scripts for qa tasks / deployment tasks. I work in different streams Database DEV - QA - Tools Development - Performance & Big Data too. Working on different areas provides a different perspective than doing repetitive things. My perspective of QA evolves with reusable scripts for data generation / scripts that simplify / eliminate repetitive tasks during deployment / configuration / testing / validating.

I have worked with Selenium, Coded UI, Custom developed Automation test frameworks. I have observed new engineering efforts for Automation test framework in every company I worked for. Either the code base becomes too big to manage / modify or new folks hired move towards develop from scratch than maintenance efforts. It is questionable ROI calculation. When the quality of DEV is poor every small bug in QA might show up as hundreds of bugs. How much of this bugs can actually be identified by basic QA check by DEV is another point to consider measuring QA bugs.

QA efforts are often viewed as commodity efforts where focus is mainly to deliver and repetitive cycles of testing are acceptable. Instead of such a model joint DEV-QA effort would always help to identify most bugs before releasing it to QA.

Both Open Source Tools Vs Commerical tools helps address test automation challenges. One instance. Working with WINCE apps, Its very difficult to automate hardware - software integration workflows. Testcomplete tool eliminated most of this efforts with emulating the actions on Mymobiler which in turn mimics real user on WINCE installed hardware.

Effort involved to automated WINCE App deployed on a device vs using a tool like test complete to completely eliminate the pain-point in writing WIN32 calls (Send message, Send Keys) is worth evaluating before thinking of license cost.

Overall its a mix of Open Source Tools, Commercial Tools, In-House scripts, Quality practices in coding, unit testing only can ensure a Quality product. Responsibilities are not just for one function but every function need to be accountable / responsible to deliver a Quality Product.

Automation tools are primarily viewed as Record / Playback, Automation Framework Implementation. More than it they can be also leveraged for
  • Throw Away Scripts to Aid Functional Testers to eliminate repetitive tasks
  • Automation tool can be used to support system activities during functional testing - Monitoring, Screen Capture, Aid Testing by Simulating user events during tests / Support long running tests  
  • Extend it for Support / UAT Environments for for Deployment / Installation where installation involves several client / server / web components installation
  • Aid Automated Deployments / Un-Installations
Happy Learning!!!

Testrail - TCM Tool

This post in on Analysis of Testrail and migrating existing test cases using Testrail

Testrail has a great web interface to organize and create test cases. The factors that makes Testrail competitive candidate are
  • Ease of creating / managing test cases
  • Migration Support for existing test cases
  • API support for automated migration / test case creation / execution / update results
  • Test Case execution out of box reports
  • Integration with bug tracking tools
  • Hosted / In-Premise Model
  • Existing Github projects for .NET / Java / Other languages Automation / Migration Support
  • Great Tech Support
Test Case Migration Efforts

Different aspects involved in test case migration efforts for any TCM tool 
  1. Test case template - Identify required fields inbuilt, custom fields for test case template. Develop, Modify and Evaluate templates to arrive at Test Case Template
  2. Organizing Test Cases  (Test Suites) - Feature wise, Release related test cases. Analyse, Identify structure and arrive, evaluate it (Functional, Regression, Features Areas) plus release specific cases
  3. Migration Efforts - Based on template, test case structures prepare custom xml cases for all migration test cases
    1. Validate, arrive at approach to validate all migrated cases
    2. For attributes identified from Test case template what are values to be filled for existing test cases in case if they were not used
  4. Default values for unused fields (Drop down list / custom values)
  5. Automation Integration, Defect / Bug Tracking Tools Integration
  6. Automation Test cases - Identify, Automation Test cases, Templates, Details
  7. QA Reports - Analyse available reports and custom report needs in Test Rail. Email based reporting on metrics, daily test case execution etc
  8. Custom Tools - Write Test cases in Excel and upload directly from excel to test rail. This tool can be used for writing test cases, update test results directly from excel to Testrail
  9. QA Process document Develop Process document (guidelines / best practices) on adding test cases, updating functional, regression, release related test cases, using Test rail (Permissions), Test case reviews using Test rail
  10. Test Results Archival / Maintenance - Test results / test runs / Test cases archival / maintenance approach
  11. Hosting – local hosting / cloud based Pros / cons of local hosting / cloud hosting
  12. Security / Administration / Configuring users - Admin related aspects, identifying roles / permissions for users
  13. Identifying Pilot projects for Test Rail evaluation period after finalizing above areas Pilot projects for usability, tracking, upgrading before complete migration

Happy Learning!!!

October 10, 2014

HBase Overview Notes


Limitations of Hadoop 1.0
  • No Random Access --> Hadoop for more batch access (OLAP)
  • Not suitable for Real time Access
  • No Update - Access Pattern is WORM (Write Once Read Multiple Times Hadoop best suited)
Why HBase
  • Flexible Schema Design --> Add a new column when a row is added
  • Multiple versions of single cell (Data)
  • Columnar storage
  • Cache columns at client side
  • Compression of columns
Read                                                    v/s Write
For Availability (Compromise on Write) vs Consistency (Compromise on Read)

Hbase
  • NOSQL Class on Non Relational Storage Systems
  • In RDBMS it is Rowkey based allocations, HBase it is columnar storage
  • Hbase needs HDFS for replication
  • ZooKeeper - Taking all requests from client. Client will communicate from zookeeper Client -> ZooKeeper -> HMaster
  • Region Server - It Serves the region. Region Server processor runs on slaves (Data Nodes)
Happy Learning!!!

October 09, 2014

Pig Overview Notes

Pig
  • Primarily for semi structured data
  • So called 'Pig' as it processes all kinds of data
  • Pig is data flow language not a procedural language
  • Map Reduce - Java Programmers, Hive - for TSQL folks, Pig (Rapid Prototyping & increased productivity)
  • Pig is on client side, need not be on cluster
  • Execution Sequence - Query Parser -> Semantic Checking -> Logical Optimizer (Variable level) -> Logical to physical translator -> Physical to M/R translator -> MapReduce Launcher
  • Ping Concepts - Map - array, Tuple - ordered list of data ,Bag - Unordered collection of tuple
  • Pig - for client side access, Hive will work only within cluster, semi structured data
  • Hive - Best suited for SQL style analytics, structured data
  • MR - Audio Video Analytics Map Reduce Approach is the only option

Happy Learning!!!

October 08, 2014

Hive Overview Notes

  • Data Warehousing package built on top of hadoop
  • Managing and querying structured data
  • Apache Derby embedded DB used by Hive
  • metastore_db folder for persistence of data
  • Suitable for WORM - Write Once Read Many Times Access Pattern
  • Core Components are Shell, Metastore, Execution Engine, Compiler (Parse, Plan, Optimize), Driver
  • Tables can be created as Internal Tables, External Table (Pointing to external file)
  • When Internal Tables are dropped schema + data is dropped. For external referencing tables only Schema is dropped not data. Both Internal and External tables reside in HDFS
  • Data files for created tables would be available in location /user/hive/warehouse
  • Partitioning in Hive - Hash Value % Number of buckets - that particular row will go into that bucket
  • Partition table should always be an Internal Hive Table
Happy Learning!!!

October 07, 2014

Map Reduce Internals

Client Submits Job. Job Tracker does the splitting, scheduling Job

Mapper
  • Mapper runs the business logic (ex- word counting)
  • Mapper (Maps your need from the record)
  • Record reader provides input to mapper in key value format
  • Mapper Side Join (Distributed Caching)
  • Output of mapper (list of keys and values). Output of mapper function stored in Sequence file
  • Framework does splitting based on input format, Default is new line (text format)
  • Every row / Record will go through map function
  • When there is a data split (row) is split between two 64MB Blocks. That particular row would be merged for complete record and processed
  • Default block size in Hadoop 2.0 is 128MB
Reducer
  • Reducer will poll it, job tracker will inform what all nodes to poll
  • Default number of reducer is 1. This is configurable
  • Multiple Reducers - Not possible - Multiple level MR jobs possible
  • Reduce Side join (Join @ Reducer Level)
Combiner
  • Combiner - Mini Reducer, Combiner before writing to disk, finds max value from data
  • Combiner is used when map job itself can do some preprocessing to minimize reducer workload
Partitioner
  • Hash Partitioner is default partitioner
  • Mapper -> Combiner -> Partitioner -> Reducer (For multi-dimension, 2012-max sales by product, 2013, max sales by location)
Happy Learning!!!

October 06, 2014

Hadoop Ecosystem Internals

Hadoop Internals - This post is quick summary from learning session.

Data Copy Basics (Writing data to HDFS)
  • Network Proximity during Data Storage (First 2 Ips closest to client)
  • Data Storage size in 64MB Blocks
  • Data Replication Copy by default 3
  • Client gets error message when Primary Node Data Write Operation Errors
  • Blocks will be horizontally split on different machines
  • Slave uses SSH to connect to master (Communication between Nodes also SSH)
  • Client communication through RPC
  • Writing happens parallel-y, replication happens in a pipeline
Analysis / Reads (Reading Data from HDFS)
  • Client -> Master -> Nearest Ips returned for Nodes
  • Master knows performance utilization of nodes, It would allocate machine which is least used for Processing (Where data Copy exists)
Concepts
  • Namenode - Metadata
  • DataNode - Actual Data
  • chmod 755 - Owner Write permission, others read and execute
  • Rack - Physical Set of Machines
  • Node - Individual machine
  • Cluster - Set of Racks
Learning Resources
Tools
Happy Learning!!!

October 03, 2014

Hbase Primer Part III


This post is on Read / Write Operations Overview on Hbase. Steps were clear from DB Paper (Exploring NOSQL, Hadoop and Hbase by Ricardo Pettine and Karim Wadie). I'm unable to locate the link to download the paper.

I'm reposting few steps from the paper which lists down steps on Read / Write Operations on Hbase. ZooKeeper is used to perform coordination in Storm, Hbase

Data Path
Table - Hbase Table
  Region - Regions for the Table
Store - Store per column family for each region 
MemStore - Memstore for each Store
Store File - Stores File for each Store
Block - Block within Store File

Write Path
  • Client Request sent to Zoo Keeper 
  • Zoo Keeper find meta data and returns it to client
  • Client Scans region server for new key storage where data need to be stored
  • Client sends request to Region Server
  • Region Server processes the request, Write operation follows WAL (Write Ahead Logging), Same concept is available in other database too
  • Memstore in this case when it is full, Data is pushed into disk

Read Path
  • Client Issues GetCommand
  • Zookeeper identifies Meta data and returns to client
  • Client Scans Region Server to locate data
  • Both memstore and store files are scanned

Happy Learning!!!

September 28, 2014

Pycon 2014


Every time when I attend a conference I plan to post my notes on same day. Delaying to post is inversely proportional to probability of posting it. After missing few conferences, Today this post is on my learning's from python conference 2014. There is a lot of motivation / inspiration to deliver / learn after every conference.

Interesting Quotes 

"Functionality is an asset. Code is a liability" by @sanand0
"Premature Optimization is baby evil" by @sanand0
"If it's not tested, it doesn't work. If it's tested, it may work" - @voidspace
'Libraries are good, your brain is better' - @sanand0. 

Short Notes from Sessions

Panel Discussion on Python Frameworks - Django, Flask , Web.Py 
  • Discussion was pretty interesting. For beginner level (Web.Py followed by Flask)
  • Django Elephant in the room scored over the rest based on usage, features, documentation
Interesting Talks and Notes are shared in Auth Evolution & Spark Overview Posts.

Happy Learning!!!

Auth Evolution

This session Auth as a service  by Kiran provided good overview of evolution of authentication the past decade

Complete text of presentation is available in link. The text is pretty exhaustive. I am only writing key points for my reference
  • HTTP basic Auth
  • Cookies
  • Cryptography Signed Tokens
  • HTTPS
  • Database backed sessions 
HTTP basic Auth - Username and password sent in the HTTP Request. To logout you need to send a wrong password, This gets preserved and server rejects the request after that

Cookies - Regular HTML form with username and Password encoded and put in HTTP cookie. This is sent in every request

Cryptographically signed tokens - random key + user name. Now cookie will be checked against the key to verify its the same user. Plus SSL on top it made sure most of issue are fixed

Database backed sessions - This is very nice one. These days I get notifications in Quora / google. You have these many open sessions / previously logged locations. This is all through database backed sessions. This seems to address all issues that came up as limitations of previous approaches.

Good Refresher!!!

Happy Learning!!!

Spark Overview


I remember Spark keyword appeared during Big Data Architecture discussions in my Team, I never looked more into Spark. Session by Jyotiska NK on Python + Spark: Lightning Fast Cluster Computing was a useful starter about Spark. (slides of talk)

Spark 
  • In memory cluster computing framework for large scale data processing
  • Developed using scala with Java + Python APIs
  • This is not meant to replace hadoop. It can sit on top of Hadoop
  • References on Spark Summit for Slides / Videos to learn from past events - link 
Python Offerings
  • PySpark, Data pipeline using spark
  • Spark for real time / batch processing
Spark Vs Map Reduce Differences
This section was session highlighter. They way how data is handled between Map Reduce Execution and Spark Approach is Key.

Map Reduce Approach - Load Data from Disk into RAM, Mapper, Shuffler, Reducer are the different approaches. Processing is distributed. Fault Tolerance is achieved by replicating data 

Spark - Load data in RAM, Keep it until you are done, Data is cached in RAM from disk for iterative processing. If data is too large, rest is spilled into disk. Interactive processing of datasets without having to load data in memory. RDD (Resilent distributed datasets)

RDD - Read Only collection of objects across machines. On losing information this can still be recomputed. 

RDD Operations
  • Transformations - Map, Filter, Sort, flatmap
  • Action - Reduce, Count, Collect, Save to local data in disk. Action usually involves disk operations

More Reads
Testing Spark Best Practices
Gatling - Open Source Perf Test Framework
Spark Paper

Happy Learning!!!