Posts

Showing posts from July 11, 2024

Day 192] Chapter 4: Scaling with the compute layer (from the book: Effective Data Science Infrastructure)

Image
 Hello :) Today is Day 192! A quick summary of today: set up AWS resources and configured metaflow with aws (finally) What is Scalability? Scalability refers to a system's ability to handle increasing amounts of work by adding resources. This concept is distinct from performance, which measures how well a system functions under a fixed workload. Scalability involves: 1. Growth: It’s relevant only when discussing systems that need to handle more work. 2. Efficient Resource Use: Adding resources like more computers or memory should proportionally increase the system's capacity. 3. Different Measures: The dimensions of scalability (e.g., volume, velocity, validity, and variety) must be defined based on specific needs. When building infrastructure, it’s crucial to ensure scalability across all layers, supporting a large number of applications and users, and enabling quick development and deployment of data science projects. Culture of Experimentation Modern data science organizatio