"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

October 05, 2018

Deep Dive PySpark Examples - Big Data Setup - Part II

After experimenting a bit of pyspark I feel Its much better to handle with R / Python. Most of things we can achieve are repetitive between R /Python / Spark / SQL.

  • Data Pipeline tasks at DB Level
  • One Hot Encoding also can done with basic TSQL Code
  • While working in NLP it makes sense to use TF-IDF Vectorizers

Happy Learning!!!


No comments: