"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

December 15, 2013

Weekend Reading Notes

Note #1 - SQL Server Database Engine Performance Tuning Basics - Must read for every TSQL Developer
Key Learning's
  • Perf counter values analysis
  • Temp Db configuration
  • Enable Lock pages in memory
  • Interpreting Avg. Disk Sec/Read values

Key Query  - SELECT SERVERPROPERTY('productversion'), SERVERPROPERTY ('productlevel'), SERVERPROPERTY ('edition')

Note #3 - One more NOSQL DB that supports ACID transactions - FoundationDB

Key Learning's
  • Memory optimized Tables(MOT) reside in memory not in disks
  • Steps to estimate memory for MOT
  • Garbage collection of older version of records (similar to snapshot / read committed isolation levels)

Happy Learning!!!

December 12, 2013

DataScience Basics - Part I

Welcome to series of posts to learn Data Science basics. This video is a good starting material to get started



Link

Key Learnings
  • Data Science term defined by Peter Naur
  • How Role of Statistician differs from Data Scientist
  • BI Tools vs Data Science perspectives
More Reads
Happy Reading!!!

December 11, 2013

Getting Started with Python Visualization

This post is about getting started with visualization using python. This took less than an hour to see visual data representation

Below are the steps involved for our first example

Step 1 - Install Enthought Python Distribution (EPD) from link. Download was around 230MB, This took a couple of minutes

Step 2 - This post was pretty useful on installation on Windows

Step 3 - Post Installation below setting was done

Step 4 - The first example is from python book page 37

Step 5 - Create a new file, type the code, Save and Run it

Step 6 - Below is the output 

December 05, 2013

Create Dummy Files to consume diskspace - fsutil command

This post is about creating dummy files to consume disk space and mimic scenarios with reduced disk space. On windows server 2008 fsutil command was useful to create files to consume disk space

Example Usage

C:\Users\Administrator>fsutil file createnew c:\dummydata5.bat 1240171806

File c:\dummydata5.bat is created

Happy Learning!!!


November 09, 2013

Weekend Reading Notes

Note #1 - Session - Dirty Truth about Data Literacy

Interesting notes captured during session

Data Literacy - In simple terms
  • Reading the Data (Understand Values)
  • Reading between Data (Comparisons)
  • Reading beyond Data (Predictions / Inferences)
Improving User Experience by 
  • More Confident Chart Readers
  • Interactive through roll over, tool tip, highlight selected numbers, mirror what they are doing 
Note #2 - Deploying Hadoop ETL in the Hortonworks Sandbox

This session is about working with SyncSort DMX-h tool to implement ETL. With the GUI interface you can define the jobs. The concept is something similar how you would develop using SSIS. Probably Microsoft also may come up with similar SSIS capabilities to run on hadoop and load data.

Syncsort ETL offerings link

Exercise #3 - Tutorial 14: How To Analyze Machine and Sensor Data

Steps were pretty easy, I was able to execute till step 5. Since I do not have office 2013 the GUI part is pending. Pretty simple and fast.

Interesting Reads
Hardware Considerations for In-Memory OLTP in SQL Server 2014
In-Memory OLTP Q & A: Myths and Realities
In-Memory OLTP: High Availability for Databases with Memory-Optimized Tables

Happy Learning!!!

September 29, 2013

Exploring Hortonworks Sandbox - Part I (on Windows 7)

Setup Steps
  1. Downloaded Virtual Box from Link
  2. Howtonworks windows tutorial Link
  3. Download 1.8GB Hortonworks Sandbox from Link 
  4. After Configuring it, Ran through the first tutorial - Link
  5. Started the Server and Open-up in browser IP address http://192.168.56.101:8000/ from Win7 Machine
  6. Sandbox was setup and configured to use 192 series IP Address. Was able to use the Win7 browser interface to perform file upload, query operations
  7. Credentials to logon on Server Login: root, Password: hadoop
  8. Command to shutdown is poweroff

First Example Notes
  • Downloaded example data from Link 
  • Upload worked in Google Chrome not in IE
  • Poweroff is the command to poweroff the sandbox machine
  • Uploading data, running basic Select queries worked fine
More Info Tutorials
My Feedback
  • Impressive easy setup and easy to use
  • Got Started in < 2 hrs
  • Good Learning Start
More Reads

WTF does a Data Scientist do all day?

How do I become a data scientist?

What are some software and skills that every Data Scientist should know?

Read Quote of Joe Blitzstein's answer to Data Science: What is it like to design a data science class? on Quora

Read Quote of Nishant Neeraj's answer to Big Data: What should be ideal size, skill set and composition of team for a successful Big Data implementation in an organization? on Quora

Read Quote of Sean Owen's answer to Job Interviews: How can a computer science graduate student prepare himself for data scientist/machine learning intern interviews? on Quora

Read Quote of Pronojit Saha's answer to Data Science: What are some software and skills that every Data Scientist should know? on Quora

Read Quote of Ye Zhao's answer to How do I become a data scientist? on Quora

How does one begin to learn data science?

Harvard Data Science Course  

Software engineer's guide to getting started with data science


Happy Learning!!!

September 20, 2013

Advanced Cloud Computing 2013 Notes


Advanced Cloud Computing 2013 Notes. Yesterday I attended ACC2013 held @ Nimhans. Every conference provides a lot of inspiration and motivation to try out new things.

Session #1 - Inauguration Talk

Inauguration was done by Padma Bushan  Rajaram. He explained cloud computing in very simple terms. Cloud computing is utility computing in simple terms. It was coined by a management professor named Chellapa.
He had earlier written a paper in 2005 on challenges in utility computing. He recollected his experiences with new jargon's/ technical abbreviations. He provided several examples on BYOD keyword. BYOD - Bring Your Own Drinks, BYOD - Bring Your Own Dope, BYOD - Bring Your Own Device (Recent usage). Storage and processing costs have come down. This has become a business potential for Amazon and Google to leverage it by outsourcing their excess storage and processing infrastructure.

He stressed on several areas to standardize the cloud for leverage the complete potential of it. Example- SLA to provide the required performance while hosting / sharing the infrastructure, Developing a universally usable cloud, interoperability between cloud providers

Session #2 - This was Talk by Karanataka's IT secretary VidyaShankar.IAS

He mentioned on developing trends Virtualization, Cloud and 3D Technologies. He mentioned
couple of products cloudmagic, cubby.

Session #3 - Connected Systems, Cloud beginner tech talk by Vikas Agarwal (Tally)

This talk pretty much focused on evolution of cloud computing. He tracked from the very beginning PC Era to cloud computing.
  •  Stage 1 - Mainframe Systems
  •  Stage 2 - PC Era (Moving data to personal systems)
  •  Stage 3 - LAN (Locally connected systems) - Intranets
  •  Stage 4 - Connected Era, WANs
  •  Stage 4 - Evolution of Internet (Globally Connected)
  •  Stage 5 - Cloud (Shared computing, storage) - Access anytime / anywhere
Challenges / Features in Cloud
  •  Pooling Optimization
  •  Elasticity
  •  Efficiency
Session #4 - Big Data in Safety & Security Domain, Tech talk by Bob Brewin (Tyco)

This talk focused on basics of cloud computing, challenges and applications in Fire and Security Domain. Key notes covered were
  • Fallacies in Distributed computing
  • Current Challenges in Fire & Security Domain are
  • Identifying False Positives
  • Predictive Analytics to identify and isolate false positives
  • Real time monitoring
Session #5 - Cloud Services in Yahoo by Jothi Padmanabhan

Yahoo has its own private cloud, Author provided details on Yahoo infrastructure and their software stack
 Challenges
  • Scaling systems as per growing data
  • Data Partitioning
  • Data Consistency
  • Hardware provisioning
Benefits of Private Cloud
  • Developers can focus on Application logic instead of designing for crash / recovery scenarios
  • Focus on appealing content for users (UX Exp)
Requirements for Cloud
Multitenancy
  • Several applications will share the same hardware and software
  • Resources can be shared but there should not be performance conflict between resources
  • Multiple Apps will be running in parallel
  • Spike in resource consumption of one app should not affect other application's performance
  • SLA defined for performance need to be met for all hosted apps
Elasticity
  • Applications will have projected capacity vs actual capacity
  • Based on a ball park figure but actual load will be measure when the product is implemented
  • Scale as you need
Scalable
  • Process several requests, Store Huge data, Analytics on top of data are offerings
Other key aspects include Availability, Security, Metering, Global APIs, Load Balancing, Simple API's

More Detailed Architecture is explained in paper link
  • Overview of Open Stack
  • Apache Traffic Server used as caching proxy server
  • Proxy (Route Traffic through intermediate steps)
  • Reverse proxy vs Forward Proxy (Several Variations)
  • Yahoo has 25K Clusters and 40K Servers
  • Mobstor (Storage for large unstructured files)
  • Sherpa (NOSQL solution from Yahoo)

Happy Learning!!!