Você está na página 1de 66

In-Memory, Component-Based

Recommender Architecture
April 16th, 2019
Who are We?
● E-commerce founded in late 2014
○ Internal Engineering founded in early 2015
○ Launched our in-house website in mid-2015,
app in late-2015
● Concentrated on women's fashion
○ Around Rp 100K range as opposed to > Rp 200K
range
○ Number 1 fashion e-commerce in Indonesia
● Recently rebranded to Sorabel in January 2019
Google Play App Rankings
(5th)
Talk Overview
● Existing Recommender Development Workflow
○ Exploration, Computation, Serving
● New Recommender Development Workflow
○ Exploration, Computation, Serving
○ The Impact
● In-Memory Recommender Architecture
○ In-Depth Architecture
○ Pros & Cons
What is a Recommender?
● Software that tries to optimize future actions of its users
○ Personalized: based on a specific user’s past behavior
○ Non-personalized: based on the aggregate non-user contextual model
Recommenders in Sorabel
● We have recommenders for many things:
○ For our buyers: Restock, Budget, Trend Recommenders
○ For our warehouse: Order-Item Picking Route Optimizer, Item Placement
Optimizer, Logistics Partner Order Allocator
○ For our customers: Product Recommender
● In this talk, we are focused to Product Recommender
Uses of Product Recommender
● Home Feed
● Catalogue Feed
● Similar Products
● Search, and others!
Development Workflow
● Exploration
● Computation
● Serving
Development Workflow
● Exploration
○ Data scientists sift through the data to derive deeper insights by looking beyond the
basic metrics
● Computation
○ Which classes of algorithm are computationally feasible, and how each model
should be built, fit, validated, and then recomputed regularly
● Serving
○ How models are ultimately stored post-computation and served during production
to our millions of users
Previous
Development
Workflow
Previous Recommender
Architecture ● BigQuery: data warehouse, from
multiple data source -- MySQL,
Cassandra, Kafka
● Apache Spark: Engine for
large-scale data processing
● Dataproc: running Apache Spark
jobs inside the GCP
● ElasticSearch: An open-source,
distributed search engine
Exploration
● Data scientists explore, analyze the data, try to build the right model /
heuristics
○ Most of their explorations are done in Python
● This is usually done locally → limited hardware resource
○ Inherently lower limit to the size of data during experimentation → data
scientists are then limited to “toy” datasets during this stage
○ Harder for data scientists to collaborate
Computation
● Data engineers translate data scientists’ works into appropriate Spark jobs
○ Computation was mostly done inside Google Dataproc
● Data engineers will make necessary changes so the model is ready for production
○ For example, dummy dataset vs production-scale dataset
○ Long back-and-forth feedback loop between data scientists & engineers
● Recommendations were largely precomputed at less-than-optimal scope: at the feed
level, done daily
○ Computation and disk-writes take a long time (+ storage costs!)
○ Low usage rate → not everyone will visit their precomputed feed on a daily basis
Serving
● Production read-path that serves the actual models / recommendations to our
users
● A dual-layer architecture:
○ Highly-stateful data storage layer -- ElasticSearch
○ Stateless (horizontally-scaled) REST API servers, that mostly read from the
stateful layer with minimal post-processing
● Implemented and maintained by backend engineers
Recap: Existing Workflow Problems
● Exploration is usually done locally
○ Local resource is limited
○ They usually play with tiny subsets of data to get the work done locally
○ Harder for data scientists to collaborate
● Going back and forth between data scientists and data engineers took longer
than it should be
● Long indexing time (daily job took ~4-8 hrs)
● Non-trivial cost and complexity in the serving infra (Dataproc + ElasticSearch)
New
Development
Workflow
Exploration
● Data Scientists can utilize Sciencebox
○ Jupyterlab running in dedicated containers on top of Kubernetes for each
data scientist
○ Instant access to powerful resources
■ Large core count & RAM
■ GPU if needed
● No longer need to play around with tiny subset of data
● Easier to collaborate, share, and evaluate works between data scientists
Computation
● Introducing DataQuery
○ A platform where anyone can build their own derived tables easily
○ Derived table is composite-data tables -- tables whose data are composed
from multiple tables
■ From raw tables / other derived tables
■ Mostly defined by a SQL query
■ Editable data refresh frequency
○ Built on top of Google BigQuery
Computation
● Frequently, simpler model can be realized by using just DataQuery
○ No need for any Spark jobs for most of the cases
○ Data Scientists can do this independently
Serving
● Serving infrastructure is now a single-layer, Go-based in-memory service (IM for short)
○ We load the “models” from DataQuery (or any other data) into the service’s resident
memory as “Components”, conditionally at startup or as-needed
○ Components are built on top of each other to build a more complete and capable
“component tree” that then serves the actual recommendations as a group
Serving
● Serving infrastructure is now a single-layer, Go-based in-memory service (IM for short),
○ Additional computations (including but not limited to inter-model component
stitching, data re-sorting, realtime-data-sensitive logic, etc.) can be done within the
Components on-request, on-the-fly
○ Centralized “component registry” handles caching / re-computations of different
parts of the component-tree for best performance with little manual work, not
dissimilar to React’s Virtual DOM concept used in user interfaces
○ A much larger chunk of the user-specific recommendation computation can now be
done on-the-fly, only when the user comes to the site
Serving
● Backend engineers implement the components
○ However, due to its simplicity, data scientists often implement
components by themselves
○ Data scientist’s workflow on a new feature is now then very simple:
i. Play around with algorithms & data at Sciencebox
ii. Write the production version of the algorithm as a DataQuery table
iii. “Wrap” the DataQuery model in an IM `Component`
○ We’ll talk about how this works more in-depth later on
Workflow Comparison
Previous Workflow:
Data Scientists

Data Engineers/
Exploration Computation Serving Backend Engineers

New Workflow:

Exploration Computation Serving


Architecture Comparison

Previous Architecture: New Architecture:


In-Memory
Recommender
In-Depth
The IM Architecture
● Data Component
● Registry
● Cache
● Endpoint
● All within a single service
The IM Architecture
Data Component
● Responsible for processing data on-the-fly
● May depend on external data source (BigQuery, etc.) or other Data Components
(or both)
● The resulting data can be used directly by other components
Data Component
● CachePolicy defines the caching configuration -- more on this later
● GetKey serializes the args into a string -- for cache key
● GetData processes data with given args
○ Fetcher is an utility interface to interact with other components and data
sources
Data Component: Example
Data Component: Example
Registry
● Registry is the central manager that is responsible in making sure Components are
computed, cached, and pre-computed at the right times.
● It handles data caching, depending on the component’s CachePolicy
○ Uses LRU eviction policy
● All fetcher.Fetch (component sub-dependency) calls go through the registry
○ It checks the cache with the key returned by Component’s GetKey method
○ If exists, it returns the cached data
○ Otherwise, it calls the component’s GetData method and update the cache
Cache Policy
● Expiration: data TTL
● MaxEntrySize: maximum number of
cached data entries
● NoCache: GetData will be called at all times
● MustPresent: determines the cache type
Cache Types
● Persistent Cache: data must be present at all times
○ Critical data components are of this type, i.e. the service cannot run
without those data. Example: ProductMapComponent
○ They are initially fetched during startup
○ If the cached data is expired, the registry returns the expired data, while
fetching the new one in the background
● Volatile Cache: it is okay when the data is expired
○ Most of them are components that depend to other components
Cache Stampede Problem
● A cache stampede is a type of cascading failure that can occur when massively
parallel computing systems with caching mechanisms come under very high
load.
● Example: A sudden jump in traffic (e.g. from TVC) to a page will end up in mostly
identical data component request tree -- wasting CPU cycles and useless
downstream requests to service dependencies
Cache Stampede Problem
● Solution: Make sure only one of the identical GetData requests is executed at
the same time
○ Try to acquire lock whenever there’s a cache miss
○ If there’s a failure in acquiring the lock, wait until the lock is released. By
then, the data should already be cached
Cache Stampede Problem
Member Affinity Routing
● We load member-specific data only when that member arrives
● Meanwhile, we scale our service horizontally to multiple servers
○ Since the data is cached locally, the state of each server could be different
● Multiple requests by the same member can end up being served by different
server
○ The same computation will be done redundantly across multiple servers
○ Reducing the cache hit-rate
Member Affinity Routing
● We employ custom header-based routing through Envoy -- microservice proxy
○ Consistent routing through X-SS-Member-ID header
● The same member will always be served via the same server
● Example:
○ A member opens the home page. One of the servers then compute the home
feed specific to that member.
○ As the member scrolls down, the next requests will come to the same server. As
the result is already cached, it returns almost instantly.
Putting It All Together
● When we try to implement some feature, we think of the building blocks
○ What data components do we need? What is the data source? How should
the data be cached?
○ What are the dependencies between the components?
○ For most of the cases, it may depend on our existing components
Example: Search Endpoint
● Suppose that we want to create a search endpoint:
○ Users can search by keyword
○ Users can also sort the result, by price, popularity, etc.
Example: Search Endpoint
Example: Search Endpoint
● MatchedProductsForKeywords component filters the products from
Product component, and sort them based on the score it obtained from
ProductScore component
● But sort is a CPU-heavy operation. How can we improve?
Example: Search Endpoint
Putting It All Together
Putting It All Together
Example: Search Endpoint
● How if we want to make the search result personalized?
Example: Search Endpoint
Pros and Cons
Pros (#1)
● It is a simple yet powerful abstraction. Problems can be much simplified when thinking of
the building blocks / components that are required to build the recommendation context.
● Programming overhead is further reduced when dependencies can be assumed to be
in-memory and precomputed most of the time
○ No need to worry about network hops for every component dependency fetch (i.e.
vs an alternative approach where say, components are stored in Redis Cluster)
○ More approachable style of programming for data scientists → more end-to-end
ownership by data scientists
Pros (#2)
● Granular components:
○ Flexible and composable
○ Easier to understand and debug
○ Can divide work across engineers & scientists
○ Smaller unit-of-work → better parallelization
● In short, it boosts engineers’ and data scientists’ productivity
Pros (#3)
● Reduced cost (~50%) ; reduced the number of Dataproc workers, eliminated
ElasticSearch instances
○ Even though the cost of computing instances is increased
● User-specific computations are done only when the user comes -- inherently
less wasteful precomputations
● Better runtime performance (in-memory → no network latency)
○ Sub-100ms p95 latency in search and home feed
○ Automatic caching infrastructure makes us more resilient to traffic bursts
Pros -- the most important
● Improved business metrics in record time
○ Much faster iteration in experimentation and deployment of new recommendation
algorithms
● It is fun!
Cons
● Slow startup time; since we fetch all the required data first
○ Now it takes ~4 minutes for each startup (to load prerequisites data
components)
○ We implement local file cache to speed up local development
● More sensitive to memory-leak problems
● It requires a lot of RAM (~4G upon startup, increasing as cached data build up)
○ May pose a problem when developing locally with limited resource
Cons
● The issues on slow startup time and resource limitation can be fixed by playing
around with prerequisites configuration
○ We can choose a minimal subset of data to be fetched during startup
○ Other data can be fetched gradually as the service runs -- or to disable
completely during development
Final Notes
Final Notes
● This approach gives us a powerful programming and architecture model --
coupled with relevant tools like Sciencebox and DataQuery:
○ greatly improved productivity of our engineers
○ enabled the self-sufficiency of data scientists
○ improved the rate of iteration on our algorithms -- thus, metrics and
business performance
Thanks!
We’re Hiring!
We’re hiring!
● Various positions are open at https://careers.sorabel.io
○ Software Engineers
○ Data Scientists
○ Product Designers
● Projects:
○ Massively diverse kinds of projects in a vertically-integrated company
○ Many different companies in one
○ Flexibility to choose projects
We’re hiring!
● We're between 5-15x smaller in engineering team size to the next biggest
ecommerce in the rankings (we only have ~30 engineers!)
○ Each of you joining will have massive impact on the direction of the
company
● Immediately productive with:
○ Best-in-class, SV-level infrastructure
○ Infrastructure & tooling team who cares
Thanks! Questions?

Você também pode gostar