"No one is harder on a talented person than the person themselves" - Linda Wilkinson

April 13, 2014

Free Tool - Managing System, Environment Variables using Rapid Environment Editor

While managing servers / different environments we keep upgrading environmental variables, custom paths. Rapid Environment Editor tool is very useful for managing this job. It highlights invalid paths to quickly fix them.

While setting up eclipse, this environment issues pop up, this editor was useful to fix issues.

Happy Learning!!!

April 10, 2014

Interview Question - Maximum Palindrome Substring in given String

Some candidates write logic, working code, impressive ideas. This question 'Maximum Palindrome Substring in given String' I got a good answer during one of interview discussions.

I tried the same in codepad link. This is C++ code. The solution code is

int main()
{
    char inputname[50] = "aaabbaa";
    char match[50];
    char temp[50];
    char revtemp[50];
    int length = 0;
    int currentpalindromelength = 0;
    int maxpalindromelength = 0;
    int palstart=0, palend=0;
    int i,j;
    for(i = 0;inputname[i] !='\0'; i++)
    {
        length++;
    }
    printf("length is %d\n", length);
    for(int ipos = 0; ipos  < length; ipos++)
    {
        for(j = ipos+1; j < length; j++)
        {
            int startpos=0;
            for(int k = ipos; k < j; k++)
            {
                temp[startpos++] = inputname[k];
            }
            temp[startpos] = '\0';
            printf("string to compare %s \n", temp);
            startpos=0;
            for(int z = j-1; z >=0; z--)
            {
                revtemp[startpos++] = inputname[z];
            }
            revtemp[startpos] = '\0';
            printf("reverse string to compare %s\n", revtemp);
            int result = strcmp(temp, revtemp);
            if (result == 0)
            {
                    printf("Both strings are equal \n");
                    currentpalindromelength = 0;
                    for(int iCur = 0;temp[iCur] !='\0'; iCur++)
                    {
                        currentpalindromelength ++;
                    }
                    if(currentpalindromelength > maxpalindromelength)
                    {
                        maxpalindromelength = currentpalindromelength;
                        palstart = ipos;
                        palend =j;
                    }
            }
            else
            {
                  printf("\nBoth strings are not equal \n");
           }
        }
    }
    printf("max palindrome length is %d",maxpalindromelength);
    printf("start pos is %d\n",palstart);
    printf("end pos is %d\n",palend);
    for(int k = palstart; k < palend; k++)
    {
         printf("%c",inputname[k]);
    }
    printf("\n");
    return 0;
}


This worked for multiple cases. codepad is very good!.

Happy Coding!!!

April 08, 2014

Frequently Used Scripts & Notes


TSQL Reusable scripts - bookmarking the same

Tip #1What is the command to truncate a SQL Server log file?

Tip #2. Settings to Capture Deadlock Trace in SQL Server Logs
DBCC TRACEON (1204, -1)
DBCC TRACEON (1222, -1)

Tip #3. Checking SQL Server Version

select SERVERPROPERTY('productversion'),SERVERPROPERTY('productlevel'),SERVERPROPERTY('edition')

Tip #4. Creating a custom firefox template

Step 1. Windows -> Run specify below command
firefox.exe -ProfileManager -no-remote

Step 2. Create custom profile

Step 3. Command to run selenium using template
Example:  java -jar selenium-server.jar -firefoxProfileTemplate “<Selenium Profile Directory>”

Usage:  java -jar selenium-server.jar -firefoxProfileTemplate C:\Users\Administrator\AppData\Roaming\Mozilla\Firefox\Profiles\sd88nd1n.FRProfile

Step 4. With Logs captured below is modified steps

Usage: java -jar C:\\SeleniumServer\\selenium-server-standalone-2.25.0.jar -port 5555 > C:\\ReportLogs\\\SeleniumServerStatus.txt 2>&1 -firefoxProfileTemplate "C:\\Users\\Administrator\\AppData\Roaming\\Mozilla\\Firefox\\Profiles\\ku6gbc8j.FRProfile"

References Link1


Happy Learning!!!

February 17, 2014

mTail - Windows Log Monitoring Tool

This post is for mTail - Log monitoring tool on windows. I have used it it is very good. The benefits are

  • Monitor multiple Servers with multiple instances of this tool concurrently
  • Simple and easy to use
  • Start / Stop option to stop in case you have viewed error
  • Along with Notepad++, Textpad this is a good addition to debugging
Happy Learning!!

February 09, 2014

Interesting Bug

Sharing one Unique Interesting bug identified during testing

Case #1 - Property_Id values mismatch was the primary issue. Two tables were defined in two databases. The scenario is Property_Id on Database A is based on master tables defined , example Master A. Issue was## mapping of MasterA was used in DatabaseB instead of MasterB. It was difficult to identify as Property_Id values were always overlapping for a smaller set of input data

Database A
MasterA
Property_Id Value
1    A
2    B
3    C
.......
10    D

TableA
Property_Id  Local_Id (Identity Column) Value
1 1
2 2
3 3
1 4
2 5
2 6


Database B
MasterB
Property_Id Value
10    A
20    B
30    C

TableB (Expected)
Property_Id          Local_Id (Identity Column) Value
10 1
20 2
30 3
10 4
20 5
20 6

TableB (Actual)
Property_Id           Local_Id (Identity Column) Value
1 1
2 2
3 3
10 4
2 5
2 6

The bug was not easily identified as Property values in MasterA and MasterB overlapped.  The values were matching in most cases for a smaller dataset.


Happy Learning!!!

January 05, 2014

Weekend Learning - Good Session - Taming Big Data with Berkeley Data Analytics Stack

Good Session - Taming Big Data with Berkeley Data Analytics Stack 



Notes captured from the session

Big Data Use Cases (Making personalized decisions for each customer, Analyse data trends)

Data Processing Goals
  • Earlier Trend - Analyse historical data
  • Current Trend - Real time data processing
  • Goal - Sophisticated data processing (Trend analysis, Anomaly detection)
Open Analytics Stack
  • Apps - Data Analysis, Mining, Decision Driven Apps
  • Data Processing - HBase, Hive, Hadoop
  • Storage - HDFS
  • Infrastructure - Cluster
Goals of Open Analytics Stack
  • Support batch, interactive and stream processing
Implementation Notes
  • Store data in memory (SSD's, 512GB of RAM)
  • FB / Yahoo / Bing - Some very large jobs but vast majority are pretty small
  • Aggregating inputs for other jobs fit in memory of cluster
  • Parallelism of jobs, Failure Recovery, Job Scheduling handled
  • Trade-off between accuracy and response time
  • Single execution framework for batch, streaming and interactive computations
New layers added are mentioned in ()
  • Application
  • Data Processing (In Memory Processing)
  • Storage (Data Management Layer), (Resource Management)
  • Infrastructure
  • One cluster for both MPI and Hadoop
  • Spark (Batch & Interactive Apps Support)
  • Spark and Shark are available in Amazon Elastic Map Reduce
  • Tachyon - Storage abstraction
Architecture and Component - Screenshots






Download the components from link
AMP Lab Blog link

Good Session, Happy Learning!!!

January 04, 2014

December 15, 2013

Weekend Reading Notes

Note #1 - SQL Server Database Engine Performance Tuning Basics - Must read for every TSQL Developer
Key Learning's
  • Perf counter values analysis
  • Temp Db configuration
  • Enable Lock pages in memory
  • Interpreting Avg. Disk Sec/Read values

Key Query  - SELECT SERVERPROPERTY('productversion'), SERVERPROPERTY ('productlevel'), SERVERPROPERTY ('edition')

Note #3 - One more NOSQL DB that supports ACID transactions - FoundationDB

Key Learning's
  • Memory optimized Tables(MOT) reside in memory not in disks
  • Steps to estimate memory for MOT
  • Garbage collection of older version of records (similar to snapshot / read committed isolation levels)

Happy Learning!!!

December 12, 2013

DataScience Basics - Part I

Welcome to series of posts to learn Data Science basics. This video is a good starting material to get started



Link

Key Learnings
  • Data Science term defined by Peter Naur
  • How Role of Statistician differs from Data Scientist
  • BI Tools vs Data Science perspectives
More Reads
Happy Reading!!!

December 11, 2013

Getting Started with Python Visualization

This post is about getting started with visualization using python. This took less than an hour to see visual data representation

Below are the steps involved for our first example

Step 1 - Install Enthought Python Distribution (EPD) from link. Download was around 230MB, This took a couple of minutes

Step 2 - This post was pretty useful on installation on Windows

Step 3 - Post Installation below setting was done

Step 4 - The first example is from python book page 37

Step 5 - Create a new file, type the code, Save and Run it

Step 6 - Below is the output 

December 05, 2013

Create Dummy Files to consume diskspace - fsutil command

This post is about creating dummy files to consume disk space and mimic scenarios with reduced disk space. On windows server 2008 fsutil command was useful to create files to consume disk space

Example Usage

C:\Users\Administrator>fsutil file createnew c:\dummydata5.bat 1240171806

File c:\dummydata5.bat is created

Happy Learning!!!


November 09, 2013

Weekend Reading Notes

Note #1 - Session - Dirty Truth about Data Literacy

Interesting notes captured during session

Data Literacy - In simple terms
  • Reading the Data (Understand Values)
  • Reading between Data (Comparisons)
  • Reading beyond Data (Predictions / Inferences)
Improving User Experience by 
  • More Confident Chart Readers
  • Interactive through roll over, tool tip, highlight selected numbers, mirror what they are doing 
Note #2 - Deploying Hadoop ETL in the Hortonworks Sandbox

This session is about working with SyncSort DMX-h tool to implement ETL. With the GUI interface you can define the jobs. The concept is something similar how you would develop using SSIS. Probably Microsoft also may come up with similar SSIS capabilities to run on hadoop and load data.

Syncsort ETL offerings link

Exercise #3 - Tutorial 14: How To Analyze Machine and Sensor Data

Steps were pretty easy, I was able to execute till step 5. Since I do not have office 2013 the GUI part is pending. Pretty simple and fast.

Interesting Reads
Hardware Considerations for In-Memory OLTP in SQL Server 2014
In-Memory OLTP Q & A: Myths and Realities
In-Memory OLTP: High Availability for Databases with Memory-Optimized Tables

Happy Learning!!!

September 29, 2013

Exploring Hortonworks Sandbox - Part I (on Windows 7)

Setup Steps
  1. Downloaded Virtual Box from Link
  2. Howtonworks windows tutorial Link
  3. Download 1.8GB Hortonworks Sandbox from Link 
  4. After Configuring it, Ran through the first tutorial - Link
  5. Started the Server and Open-up in browser IP address http://192.168.56.101:8000/ from Win7 Machine
  6. Sandbox was setup and configured to use 192 series IP Address. Was able to use the Win7 browser interface to perform file upload, query operations
  7. Credentials to logon on Server Login: root, Password: hadoop
  8. Command to shutdown is poweroff

First Example Notes
  • Downloaded example data from Link 
  • Upload worked in Google Chrome not in IE
  • Poweroff is the command to poweroff the sandbox machine
  • Uploading data, running basic Select queries worked fine
More Info Tutorials
My Feedback
  • Impressive easy setup and easy to use
  • Got Started in < 2 hrs
  • Good Learning Start
More Reads

WTF does a Data Scientist do all day?

How do I become a data scientist?

What are some software and skills that every Data Scientist should know?

Read Quote of Joe Blitzstein's answer to Data Science: What is it like to design a data science class? on Quora

Read Quote of Nishant Neeraj's answer to Big Data: What should be ideal size, skill set and composition of team for a successful Big Data implementation in an organization? on Quora

Read Quote of Sean Owen's answer to Job Interviews: How can a computer science graduate student prepare himself for data scientist/machine learning intern interviews? on Quora

Read Quote of Pronojit Saha's answer to Data Science: What are some software and skills that every Data Scientist should know? on Quora

Read Quote of Ye Zhao's answer to How do I become a data scientist? on Quora

How does one begin to learn data science?

Harvard Data Science Course  

Software engineer's guide to getting started with data science


Happy Learning!!!

September 20, 2013

Advanced Cloud Computing 2013 Notes


Advanced Cloud Computing 2013 Notes. Yesterday I attended ACC2013 held @ Nimhans. Every conference provides a lot of inspiration and motivation to try out new things.

Session #1 - Inauguration Talk

Inauguration was done by Padma Bushan  Rajaram. He explained cloud computing in very simple terms. Cloud computing is utility computing in simple terms. It was coined by a management professor named Chellapa.
He had earlier written a paper in 2005 on challenges in utility computing. He recollected his experiences with new jargon's/ technical abbreviations. He provided several examples on BYOD keyword. BYOD - Bring Your Own Drinks, BYOD - Bring Your Own Dope, BYOD - Bring Your Own Device (Recent usage). Storage and processing costs have come down. This has become a business potential for Amazon and Google to leverage it by outsourcing their excess storage and processing infrastructure.

He stressed on several areas to standardize the cloud for leverage the complete potential of it. Example- SLA to provide the required performance while hosting / sharing the infrastructure, Developing a universally usable cloud, interoperability between cloud providers

Session #2 - This was Talk by Karanataka's IT secretary VidyaShankar.IAS

He mentioned on developing trends Virtualization, Cloud and 3D Technologies. He mentioned
couple of products cloudmagic, cubby.

Session #3 - Connected Systems, Cloud beginner tech talk by Vikas Agarwal (Tally)

This talk pretty much focused on evolution of cloud computing. He tracked from the very beginning PC Era to cloud computing.
  •  Stage 1 - Mainframe Systems
  •  Stage 2 - PC Era (Moving data to personal systems)
  •  Stage 3 - LAN (Locally connected systems) - Intranets
  •  Stage 4 - Connected Era, WANs
  •  Stage 4 - Evolution of Internet (Globally Connected)
  •  Stage 5 - Cloud (Shared computing, storage) - Access anytime / anywhere
Challenges / Features in Cloud
  •  Pooling Optimization
  •  Elasticity
  •  Efficiency
Session #4 - Big Data in Safety & Security Domain, Tech talk by Bob Brewin (Tyco)

This talk focused on basics of cloud computing, challenges and applications in Fire and Security Domain. Key notes covered were
  • Fallacies in Distributed computing
  • Current Challenges in Fire & Security Domain are
  • Identifying False Positives
  • Predictive Analytics to identify and isolate false positives
  • Real time monitoring
Session #5 - Cloud Services in Yahoo by Jothi Padmanabhan

Yahoo has its own private cloud, Author provided details on Yahoo infrastructure and their software stack
 Challenges
  • Scaling systems as per growing data
  • Data Partitioning
  • Data Consistency
  • Hardware provisioning
Benefits of Private Cloud
  • Developers can focus on Application logic instead of designing for crash / recovery scenarios
  • Focus on appealing content for users (UX Exp)
Requirements for Cloud
Multitenancy
  • Several applications will share the same hardware and software
  • Resources can be shared but there should not be performance conflict between resources
  • Multiple Apps will be running in parallel
  • Spike in resource consumption of one app should not affect other application's performance
  • SLA defined for performance need to be met for all hosted apps
Elasticity
  • Applications will have projected capacity vs actual capacity
  • Based on a ball park figure but actual load will be measure when the product is implemented
  • Scale as you need
Scalable
  • Process several requests, Store Huge data, Analytics on top of data are offerings
Other key aspects include Availability, Security, Metering, Global APIs, Load Balancing, Simple API's

More Detailed Architecture is explained in paper link
  • Overview of Open Stack
  • Apache Traffic Server used as caching proxy server
  • Proxy (Route Traffic through intermediate steps)
  • Reverse proxy vs Forward Proxy (Several Variations)
  • Yahoo has 25K Clusters and 40K Servers
  • Mobstor (Storage for large unstructured files)
  • Sherpa (NOSQL solution from Yahoo)

Happy Learning!!!

September 01, 2013

pyCon India 2013 - Day 2 Session Notes

Please find second day session notes
 
Session #1 – Rasberry PI basics by Sudar Muthu
 
Good basic session. Speaker presented the content and demo very well. Notes from the session
  • Simplest helloworld program on Rasberry PI is a light blink program
  • Speaker also spoke about controlling devices
  • Using PWM (Pulse width modulation) devices can be managed
  • PWM.py – pull up (Higher Voltage), pull down (Lower Voltage)
  • Protocols – I2C, SPI, Serial. These protocols can be used to talk to devices
  • Interacting with web cam using PyGame
More Reads -  Distributed Computing Tutorials, Author website – HardwareforFun
 
Session #2 – Robotics Demo
  •  ROS (Robotics Operating System)
  • Author used RasberryPi and arduino Node
Tools
  • Speech Synthesis – Festival and pyFestival
  • Speech Recognition – Gstreamer, Pocket Sphinx
  • Artificial Intelligence – AIML, pyAIML (Artifical Intelligence Markup language)
  • GUI – QT and pyQT
Author site – Technolabz, Lentin Joseph
 
Session #3 – arduino and Internet of Things
Arduino – Open Source Electronics Prototyping platform
 
Advantages of arduino
  • Easy to use
  • Cheap
  • Open Hardware
  • Open Documents
Components
  • Hw: device : electronic prototyping board
  • Sw:bootloader
  • Sw:libraries
  • Sw: IDE
  • Interfacing – Connected to computer using Bluetooth / USB
Tools
Use Cases
  • Talk over Serial, RF, Ethernet
  • Attach Sensor and relay other readings
  • Attach Actuators and make things move
  • Connecting devices through web
  • Security Sensor, email on touch
  • Author - Avik Dhupar
  • http://www.arduino.cc/
Session #4 - Testing tools Sessions (Open Source Tools)
  • Fabric for distributed testing (This deployment tool can be used for distributed computing testing  )
  • STAF IBM Test Automation Framework Tool -
  • Nitrate Test case management tools -
  • Test link – Test case management tool
  • Beaker Project – Managing Automated Tests
 
Session #5 – Web Scraping
  • Author used http://scrapy.org/
  • Pablo Hoffman is the scappy developer
  • Author Anuvrat Parashar provided examples on crawaling web, collecting data, extracting information from collected data 

Happy Learning!!!

August 31, 2013

Day1 - pyCon2013 Notes


Hello World!!
 
After a long time updating my notes, learning’s in this post.
 
Some good one liner’s from Pycon2013
  • WLT – We Love Typing
  • DRY – Don’t Repeat Yourself
Today I attended Pycon 2013. This was my first python focussed learning session. Sessions were pretty good. Summarizing my notes of today's session
 
Session #1 - The first session was log management in inmobi 
  • They have a framework built to store, manage all kinds of logs (Web Server Logs, DB Logs, Application logs, Cron Tab Logs
  • Three steps were followed in collecting logs
  • Collection of Logs – Apply Patterns – Fetch Logs
  • Tools used include logstash, grok (ships with parse patterns) - http://logstash.net/ 
Author also provided other alternatives and tools for implementation
  •  Transport – flume, rsyslog, conduit, scribe
  •  Search and Analytics – hadoop, graylogz, elsa
  •  Storage – HDFS, Cassandra, Elastic Search
  •  Apache Falcon
Jordan Sissel is the brain behind logstash. His video session is also available in link
 
Session #2 – Django Framework beyond Basics
 
It was a well presented session. Although this was my first session on Django I could get the feel of the framework. Presenter Arun (http://Arunrocks.com)
 
Django – Web Framework. This is built on principles of Rapid App Development. How a typical web request is handled in Django framework 
 
 
  • Models – Matter
  • Views – Thinkers
  • Templates – Should be dumb
Good Things about Django Framework
  • Good Admin Interface
  • Security
  • Great Documentation
  • Stable
  • Batteries Included
Basic Definitions of Query Set (Object that interfaces with DB), Media Separate etc. Python implements class based views. Earlier it was function based generic views
 
Tools Discussed
  • IDE – Pychan, PyDev, Emacs, Vi, Sublimetext
  • Deployment Tools – Chef, Puppet, fabric
  • Security Checks – ponycheckup.com for Django based sites
  • ORM – SQLAlchemy
Session #3 – Third Session was Rapid development & integration of Real Time Communication in Websites
 
The session as pretty interactive. Demonstrating realtime video chat using google webrtc. https://github.com/cjgiridhar
 
Demonstrated the setup
  •  End User Requests sent to Tornado Web Server
  •  Using Web Socket / Ajax communication happens (video chat / live chat)
  •  Chat messages saved in Redis database
Tools
  • Webserver – flask, bottle, tornado
  • Javascript and webRTC
Chetan’s website - link
More Links - pythontutor

Happy Learning!!!

July 16, 2013

HSQLDB - Getting Started

This post is on learning HSQLDB. HSQLDB is Rdbms db written in java
  1. Download HSQLDB from link 
  2. Getting started guide useful from link 
  3. Java is already installed on my laptop. Adding JAVA_HOME , PATH and CLASSPATH provided in link
Lets get started and try out some basic examples
Step 1 - To Start and create a DB
  
From command line
 
 Command text -  C:\Program Files\Java\jre7\bin>java.exe -cp "E:\HSQLDB\hsqldb-2.3.0\hsqldb-2.3.0\hsqldb\lib\hsqldb.jar" org.hsqldb.server.Server --database.0 file:mydb --dbname.0 xdb
Step 2 - DB files would be created as below
 
 
Step 3 - Opening the DB Manager
 
 
Command text - C:\Program Files\Java\jre7\bin>java.exe -cp "E:\HSQLDB\hsqldb-2.3.0\hsqldb-2.3.0\hsqldb\lib\hsqldb.jar" org.hsqldb.util.DatabaseManagerSwing
 
Command text - Connecting to DB Instance
jdbc:hsqldb:hsql://localhost/xdb

Step 4 - Basics on Table Creation

The three types of persistent tables are MEMORY tables, CACHED tables and TEXT tables
  • Memory Tables - Data stored in Files
  • Cached Tables - Cached detail remains in memory not in File
  • Text Files - Use CSV Supported Files
Step 5 - Table Creation
 
CREATE TABLE PUBLIC.TEST_TABLE
 (COL1 INTEGER NOT NULL,
 COL2 VARCHAR(25) NOT NULL,
 PRIMARY KEY (COL1))
 
Step 6 - Load Data and Select Query
 
INSERT INTO "PUBLIC"."TEST_TABLE"( "COL1", "COL2" ) VALUES ( 10, 'Test')
INSERT INTO "PUBLIC"."TEST_TABLE"( "COL1", "COL2" ) VALUES ( 20, 'Ram')
INSERT INTO "PUBLIC"."TEST_TABLE"( "COL1", "COL2" ) VALUES ( 30, 'Raj')
INSERT INTO "PUBLIC"."TEST_TABLE"( "COL1", "COL2" ) VALUES ( 40, 'Ravi')
INSERT INTO "PUBLIC"."TEST_TABLE"( "COL1", "COL2" ) VALUES ( 50, 'Raja')
 
SELECT * FROM "PUBLIC"."TEST_TABLE"
 
Step 7 - MVCC Basics
 
Reading only committed data, Driven by isolation level settings. More details in link

Supported Isolation levels
  • SET TRANSACTION READ ONLY
  • SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
  • SET TRANSACTION READ WRITE, ISOLATION LEVEL READ COMMITTED 

Step 8 - MVCC Example

Start Two Instances of Data Manager

Window1 - Run below query

SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
SET DATABASE TRANSACTION CONTROL MVCC;
set autocommit false;
INSERT INTO "PUBLIC"."TEST_TABLE"( "COL1", "COL2" ) VALUES ( 10, 'Test')
INSERT INTO "PUBLIC"."TEST_TABLE"( "COL1", "COL2" ) VALUES ( 20, 'Ram')
INSERT INTO "PUBLIC"."TEST_TABLE"( "COL1", "COL2" ) VALUES ( 30, 'Raj')
INSERT INTO "PUBLIC"."TEST_TABLE"( "COL1", "COL2" ) VALUES ( 40, 'Ravi')
INSERT INTO "PUBLIC"."TEST_TABLE"( "COL1", "COL2" ) VALUES ( 50, 'Raja')
commit
INSERT INTO "PUBLIC"."TEST_TABLE"( "COL1", "COL2" ) VALUES ( 70, '7Ravi')
INSERT INTO "PUBLIC"."TEST_TABLE"( "COL1", "COL2" ) VALUES ( 80, '8Raja')
 
Window2 - Run below query

SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
SELECT * FROM PUBLIC.TEST_TABLE;

The Result from Window2 will not include 70, and 80 the uncommitted records


Step 9 - Explore the system created files - mydb.script, mydb.log file using notepad, You would see details on DB Settings and properties

References

Happy Learning!!!