Limitations
Blocking Operations
- Pipelines currently does not support asynchronous operations. All embedding operations are blocking and take place in the foreground.
The impact of this depends on what type of embedding is being performed.
- When bulk embedding, the system will be blocked until the embedding is complete for all the items currently in the database or storage.
- When using auto-embedding, the system will be blocked until the embedding is complete for each item being inserted or updated.
Observability
Observability is currently limited to the logs and metrics provided by the underlying components. There is no single pane of glass for monitoring the entire system.
Monitoring progress of initial embedding is also not currently available. We recommend using the system logs to track progress.
Large documents
- The current implementation of Pipelines does not handle the chunking of large documents.
Data Filtering
- While Pipelines can use SQL filters and views based on the content of rows to limit embedded documents, the system does not currently support the ability to similarly filter on data in S3 storage. It is limited to using sub-paths and prefix filtering.
Load Balancing for Models
- There is currently no load balancing mechanism for model access.
Data Formats
- Pipelines currently only supports Text and Image formats. Other formats, including structured data, video, and audio, are not currently supported.
Could this page be better? Report a problem or suggest an addition!