OptimizationExamples
Example 1 - Posts (Messages from FB)
Reused Computation
Example II - Query with Max columns results
Three approaches, Three queries
Example III - Sum interactions for profiles
Happy Learning!!!
Example 1 - Posts (Messages from FB)
- 4 Columns
- PostId (UnqueId), Id, DateDimensions, Interactions (Like)
- Query > 100, < 20
- Split, Run as two queries and union the results
- Union scan same data multiple times (Reused in this case)
Reused Computation
- Cache & Persistence
- Three approaches, Three queries
- Window
- GroupBy + Join
- Subquery
Three approaches, Three queries
- Window (1 Exchange + 1 Sort) - Efficient
- GroupBy + Join (HashAggregate + 2 Exchange + 1 Sort)
- Subquery (Broadcast HashJoin, 1 Exchange)
Example III - Sum interactions for profiles
- 3 Exchange (Shuffle)
- Sort
- SortMergeJoin
- Optimize Exchange
Repartition
Happy Learning!!!
No comments:
Post a Comment