"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

July 28, 2016

Fifth Elephant Day #1 Notes - Part I

Sessions # - Link

Talk #1 - Data for Genomic Analysis

Great talk by Ramesh. I had attended his session / technical discussion earlier. This session provided insights on genome / discrepancies in genome sequence leading to rare diseases.

Genome - 3 Billion X 2 Characters
Character variables varies from person to person
Stats (1/10th of probability of cancer)
Baseline risk for breast cancer (1/8),(1/70) ovarian cancer
BRCA1 mutation (5-6 fold increase in breast cancer, 27 fold increase for ovarian cancer)

In India
  • 35% inherited risk mutation
  • 1/25 Thalassemia 
  • 1 in 400-900 Retinitis Pigmentosa
  • 1 in 500, Hypertrophic Cardiomyopathy
Data Processing
  • 1 Billion reads - 100GB data per person
  • Very similar sequence yet one character might differ
  • But reference is 3 Billion long
Efficiency
  • Need fast indexing
  • Suffix Trees and variations
  • Hash table based approaches
Reference Genome Sequence
  • Volume of data
  • Funnel down of variety of dimensions
  • Triplet Code (Molecule)
  • Variants of Triplets nailed down to difference of gnome
  • GPU processing / reduce computation time
Concepts Discussed / Used
  • Hypothesis Testing
  • Stats Models
  • GPU Processing to reduce computation time
They also provide assessment for hereditary diseases at corporate level.

Talk #2 - Alternative to Wall Street Data

This session gave me some new strategies to collect / analyze data

How to Identify occupancy rate at hotel ?
  •  Count of cars from parking lots
  •  Number of rooms lights on
  •  Take pics of rooms from corner of street and predict based on images collected
  •  Unconventional ways to think of data collection (Beating the wall street model)
What are usual ways
  •  Checking websites
From Investor perspective lodging key metrics is a very important aspect
Data Sources
  • Direct data gathering
  • Web harvesting
  • Primary research
Primary Research
  • Look at notice patterns in front of you
  • Difference in invoice numbers
  • Serial number changes, difference values
Free Data Sets in link
Lot of opportunity
  • Analyze international markets (India / China)
  • COGS
  • SG
  • ETC
How to value data sets ?
  • Scarcity - How widely used
  • Granularity - Time / aggregation level
  • Structured
  • Coverage



What is the generative value
  • Revenue Surprise Estimates
  • Dataset insight / Analysis
  • Operating GAAP measures
A Great case study on impact of smart watch vs luxury watch was presented ? This session provides great insight into unconventional data collection ways
  • Generate money in automated system
  • Stock sensitivity to revenue surprises
  • Identify underlying ground truth
"Some Refreshing changes to world of investment"

Happy Learning!!!
Post a Comment