After experimenting a bit of pyspark I feel Its much better to handle with R / Python. Most of things we can achieve are repetitive between R /Python / Spark / SQL.
- Data Pipeline tasks at DB Level
- One Hot Encoding also can done with basic TSQL Code
- While working in NLP it makes sense to use TF-IDF Vectorizers
Happy Learning!!!
No comments:
Post a Comment