xvc

For whom Xvc tolls? Machine Learning and Artificial Intelligence projects needs large amounts of data. This data is usually versioned. It’s used to create pipelines to create models. The data and models must be tracked together to see what data produces what models. Xvc aims to help ML Engineers, Data Engineers, Data Scientists, Software Engineers and everyone with data in their data work.

Features

Works on top of Git, and it’s 100% usable without Git.
Built to track terabytes of data in millions of files.
Create and manage data pipelines composed of steps that depend on files, directories, globs, file lines, regular expressions, and hyperparameters.
GPLv3 licensed free software without any tracking.

Planned Features

More pipeline dependencies like SQLite and PostgreSQL queries, S3 buckets, arbitrary URLs.
Share your data from supported cloud providers with timeouts.
Run and compare experiments, bring them to Git workflow or share without using Git.
Attach labels and arbitrary annotation to data files. Query and manage your data with these.
Version and deploy models.
Use all commands from Python, Julia or R scripts.
Manage data, create pipelines, and run experiments in remote servers or containers.
Manage data, create pipelines, and run experiments with mobile, web or desktop UI. (Planned as separate product.)