DataOps Platform Landscape for 2022
DataOps has become a new focus for data-driven teams with the rise of the fourth industrial revolution. As companies increasingly turn to data to not only provide a competitive advantage, but create new business models entirely, the workflow process that comes along with data is becoming more and more of a focus.
The broader engineering community is full of workflow tools, but DataOps platforms are a relatively new concept. The list is long for data-driven industries - machine learning, web3, scientific insights, consumer behavior, you name it. Any team that deals with data infrastructure, modeling, or deployment could likely benefit from streamlined data workflows.
In the past, it was normal to hack together internal solutions for specific blockers and use cases. Enter 2022, and companies are popping up left and right to provide deeper data tooling and features to speed product and feature growth. The benefit being - your team can focus on extracting value from your data, rather than spending their time on data infrastructure tasks.
In this list, we’ve put together quick descriptions, pros and cons, and comments available from review websites for popular DataOps and MLOps platforms - from the most complex data science tools to no-code options for business-facing teams. We’ll tell you which platform might suit your company depending on their features. Let’s get started!
**Please note that the statements in this blog are true to our best knowledge, via company websites and third party review statements as of April 29, 2022. If you work with one of the companies we've listed and we got something wrong - let us know! We'll fix it.**
Databricks
Databricks is considered the largest DataOps provider, having secured a massive $3.5 billion in 10 funding rounds. The company is known for its Lakehouse platform, combining features of a data warehouse and data lake to eliminate siloing, now used by hundreds of companies. It helps its customers unify their analytics across the business, data science, and data engineering, and provides tools for data engineering and business teams to build data products.
❌ Free trial
✅ Free/freemium version
Pros:
- Extensive functionality and features, giving users a lot of usage options
- Large professional services team for custom work and integrations
- Extensive list of integrations and supported programming frameworks
- Collaborative development environment
- High performance computing + cloud for complex queries
Cons:
- Supports Spark only for distributed computing frameworks
- Development environment is limited to notebooks
- Built for technical users, little tooling for no-code developers
- Project configuration can be complex due to the number of features
- Job failure and error reports are seen as low on useable information according to some users
Domino Data Labs
Domino is a leading MLops platform that combines data frameworks, tools, and software together for custom industry use cases. The company caters mainly to enterprise companies, boasting an impressive list of Fortune 100 clients. The Domino MLops platform is built to increase data science productivity and model velocity by accelerating modern analytical workflows.
✅ Free trial
❌ Free/freemium version
Pros:
- Extensive integrations and frameworks including Jupyter, JupyterLab, RStudio, VScode
- Custom environments and professional services for enterprise clients
- Effective and intuitive UX/UI
- Supportive customer success team
- Seamless connectivity to data warehouses with fast project loading
Cons:
- Enterprise-focused business model - not suitable for smaller teams
- Some users have commented on out of date documentation
- Extensive platform functionality can result in complexity in manipulating data
DataRobot
DataRobot is an AI cloud solution that focuses on collaboration within data science teams. Their platform is built for building, deploying, and managing machine learning models, and they boast an impressive list of data science features for machine learning and business ROI. Operating across several industries, DataRobot seeks to “democratize AI”.
✅ Free trial
✅ Free/freemium version
Pros:
- Unique data science features including time series, forecasting, and velocity estimation
- Simple user interface and explainability
- Users have rated customer support well
- Documentation is well written and easy to understand
- Rapid workflows for model development, deployment, and maintenance
Cons:
- Few features to modify unsupervised learning pipelines
- Low visibility into algorithms
- Data cannot be edited within the platform
- API is complex - no plug and play integrations for Salesforce and other business intelligence
Shakudo
Shakudo is a new data platform built for cross-industry use including machine learning, web3, scientific insights, and geospatial data. The Shakudo platform is designed for small to medium sized businesses with fast, intuitive project setup, extensive integration with data frameworks and tools, and appealing distributed cloud offerings promising a minimum of 25% reduction on your cloud bill compared to major service providers. Shakudo is a fit for teams looking to get started with DataOps and MLOps without DevOps support.
✅ Free trial
Pros:
- Preconfigured and custom environments for instant project setup
- Functions for the entire data product pipeline, including data prep, dev, and deployment
- Extensive debugging features with VScode integration
- Supports web3 development and other industry-specific nodes
- Integrates commonly used open source frameworks; built on Kubernetes
Cons:
- Not suitable for low-code or no-code team members
- Community is not yet robust
- Plug and play integrations with business intelligence tools are not yet available
Dataiku
Dataiku is a company that combines features used for MLOps, DataOps, and business analytics. The company is heavily focused on enterprise clients, and has a suite of tools for users at each point of the data process. The platform includes intuitive no-code graphing and visualization features, and it supports a wide range of data sources.
✅ Free trial
❌ Free/freemium version
Pros:
- Collaboration features for task monitoring
- Visualization features and tooling for low-code and no-code developers
- Intuitive design and functionality
- Supports a wide range of data sources from SQL, TeraData, Hadoop Hive, etc.
- Storage and compute agnostic
Cons:
- Limited community support
- Limited ability to integrate with automation tools like Blue Prism
- Users have commented on limited capabilities for data visualizations
Astronomer
Astronomer is a control plane for Apache Airflow. Built for companies with various stakeholders who need to build, run, and observe data pipelines-as-code, the Astro platform provides unified data flows with features built for workflow dependencies and monitoring. Astronomer is the commercial developer of Airflow, a commonly used workflow management system.
❌ Free trial
✅ Free/freemium version
Pros:
- Wide variety of plugins
- Workflow and dependency management features
- Extensive alerting and monitoring features
- Supports use cases for ingestion, data preparation, and load
- Suitable for enterprise-level data lakes
Cons:
- Built specifically for Apache Airflow - inflexible beyond that
- Low on features for data processing
- No Okta integration for cloud
- Dependent on AWS for deployment
- Inactive support community
dbt
dbt is an SQL development environment built to let data engineers take ownership of the entire model workflow. It offers a suite of collaboration tools with lightweight and fast run times. Using dbt, data teams are able to work directly within a data warehouse to produce datasets for reporting, ML modeling, and operational workflows.
✅ Free trial
✅ Free/freemium model
Pros:
- Low on no-code and analyst features
- Collaborative environment
- Extensive version control testing and automated documentation features
- Lightweight, quick run times
Cons:
- SQL - not sufficient for more complex or industry-specific frameworks
- Users have commented on high cloud pricing
- Learning resources are not yet robust
- Does not yet support all data warehouses
Datameer
Datameer is an SQL and no-code platform for exploring, transforming, and building data models. It’s designed for hybrid teams with an intuitive spreadsheet/graphing interface that users compare to Excel. Users can deploy their production models from within the platform, or through integration with dbt.
✅ Free trial
❌ Free/freemium version
Pros:
- Extensive data source connectivity, easily connects to source data using connectors
- No-code interface for non-technical users
- Provides Excel-like interface for data analysts
- Helpful customer support
- Overcomes usability issues with Hadoop with a simple UX
Cons:
- Limited to SQL and no-code - not suitable for more complex or industry-specific frameworks
- Built for Snowflake - usability outside of this is limited
- No application deployment