- TensorFlow Servables
- TensorFlow Servable Versions
- TensorFlow Servable Streams
- TensorFlow Models
- TensorFlow Loaders
- Sources in Tensorflow Architecture
- TensorFlow Managers
- TensorFlow Core
- TensorFlow Batcher
These are the fundamental units in TensorFlow Serving. TensorFlow Servables are the objects that clients use to carry out the computation.
The size of a servable is flexible. Servables can be of any type and interface, enabling flexibility and future enhancements such as:
- Asynchronous modes of operation
- Streaming results
- Experimental APIs
TensorFlow Servable Versions
TensorFlow Serving can manage one or more versions of a servable, over the lifetime of a single server instance. This unlocks the door for fresh algorithm configurations, weights, and other data to be loaded over time. They also enable more than one version of a servable to be loaded at the same time, supporting gradual roll-out and experimentation. At serving time, clients may request either the latest version or a specific version id, for a specific model.
TensorFlow Servable Streams
A sequence of versions of a servable is sorted by increasing version numbers.
A Serving represents a model as one or more servables. A machine-learning model may contain one or more algorithms including lookup or embedding tables and learned weights. A servable can also assist as a portion of a model, for instance, a huge lookup table can be served as many instances.
Loaders manage the life cycle of a servable. The Loader API enables independent conventional infrastructure from specific learning algorithms, data, or product use-cases involved. Specifically, Loaders standardize the APIs for loading and unloading a servable.
Sources in Tensorflow Architecture
Sources are the modules that find and provide servables. Each Source in TensorFlow provides zero or more servable streams. For each servable stream, a Source supplies one Loader instance for each version it makes available to be loaded.
Tensorflow Managers manage the full lifecycle of Servables, including:
- Loading Servables
- Unloading Servables
- Serving Servables
Managers listen to sources and track all the versions. The Manager tries to fulfill Sources’ requests but, may decline to load the desired version. Managers may also postpone an “unload”. For example, a Manager may wait to unload until a newer version completes the loading, depending on a policy to ensure that at least one version is loaded at all times. For example, GetServableHandle(), for clients to access loaded servable instances.
Using the conventional TensorFlow Serving APIs, TensorFlow Serving Core manages the following aspects of servables:
TensorFlow Serving Core handles servables and loaders as opaque objects.
Batching multiple requests into a single request can significantly decrease the cost of performing inference, especially in the presence of hardware accelerators such as GPUs. TensorFlow Serving includes a request batching widget that lets clients easily batch their type-specific inferences across requests into batch requests that algorithm systems can more efficiently process.