Chapter 02: What Matters in Model Management
Chapter 02
- Logistics and model management need to be flexible to handle many different use cases and scenarios.
- But you don’t want it to be so complicated that it is also a hindrance to people getting work done.
Ingredients of the Rendezvous Approach
- “The rendezvous architecture takes advantage of data streams and geo-distributed stream replication to maintain a responsive and flexible way to collect and save data, including raw data, and to make data and multiple models available when and where needed.”
- “The design strongly supports ongoing model evaluation and multi-model comparison. It’s a new approach to managing models that reduces the burden of logistics while providing exceptional levels of monitoring so that you know what’s happening.”
- Some ingredients:
- streams
- containers
- DataOps style of design
- decoy models
- canary models
DataOps Provides Flexibility and Focus
- Don’t be rigid and siloed in your teams and responsibilities.
- DataOps style emphasizes better collaboration and communication.
- Cross-functional teams.
- Like DevOps, but with emphasis also on data engineering and data science.
- Architecture and Product Management too.
Stream Based Microservices
- Microservices allow more independence and agility.
- You can do synchronous REST calls, etc., but more and more it makes sense to use a message stream.
- Can your stream transport do the following?
- support multiple data producers and consumers
- provide message persistence with high performance
- decouple producers and consumers
- Why persistence?
- Get past the “use it or lose it” and “data exhaust” mentalities
- persistence is required to decouple producers and consumers
- Apache Kafka is a great messaging layer for this kind of stuff
- MARKETING MATERIAL: MapR also has something for this!!!
Streams Offer More
- Need an event-by-event replayable history? Streams!
- Also, send raw data to more than one model.
- Streams are excellent for model logistics. (See Chapter 03.)
Building a Global Data Fabric
- Data Fabric goes beyond and is better than a data lake:
- efficient way to access all data and types
- no silos
- fine-grained control over access control and locality
- across multiple data centers
- geo-distributed data
- MARKETING MATERIAL: Use our MapR Converged Data Platform!!!
Making Life Predictable: Containers
- Containers are regular, consistent, and flexible.
- They are particularly important for model management.
- But containers need data. They should read from and persist data to a data platform.
- Might use Cassandra or HDFS … or … MARKETING MATERIAL: MapR Converged Data Platform!!!
Canaries and Decoys
- Need to accurately record the input data for a model.
- Enter Decoy Models:
- Appears to be an ML model, but it’s not.
- It’s only job is to persist the input data that it receives.
- “The decoy sits in exactly the same position in a data flow as the actual model or models being developed, but the decoy doesn’t do anything except look at its input data and record it, preferably in a data stream.”
- See Chapter 04 for more info on decoy models