Posts

Showing posts from July 12, 2024

[Day 193] Chapter 5, 6, and 7 from Effective Data Science Infrastructure

Image
 Hello :) Today is Day 193! A quick summary of today: covered chapter 5,6, and 7 from Effective Data Science Infrastructure Chapter 5: Practicing scalability and performance Effective infrastructure must accommodate a wide range of projects. Rather than adopting a one-size-fits-all approach, it should offer a versatile toolbox of robust methods to achieve adequate scalability and performance. To enhance organizational scalability and ensure projects are comprehensible to the largest audience, our primary strategy is simplicity. Given that people's understanding is limited overengineering and overoptimizing can cause extra costs. Vertical scalability it refers to the idea of handling more compute and larger datasets just by using larger instances To start things, we begin with a skeleton flow, and then keep adding new things till we get to the final solution.  The model uses Yelp review data and the goal is to group reviews together to find what kind of reviews are generally posted