<  Blog

DataOps Platform Landscape for 2022

DataOps has become a new focus for data-driven teams with the rise of the fourth industrial revolution. As companies increasingly turn to data to not only provide a competitive advantage but create new business models entirely, the workflow process that comes along with data is becoming more and more of a focus.
April 29, 2022
in
Tutorials

DataOps has become a new focus for data-driven teams with the rise of the fourth industrial revolution. As companies increasingly turn to data to not only provide a competitive advantage, but create new business models entirely, the workflow process that comes along with data is becoming more and more of a focus.

The broader engineering community is full of workflow tools, but DataOps platforms are a relatively new concept. The list is long for data-driven industries - machine learning, web3, scientific insights, consumer behavior, you name it. Any team that deals with data infrastructure, modeling, or deployment could likely benefit from streamlined data workflows.

In the past, it was normal to hack together internal solutions for specific blockers and use cases. Enter 2022, and companies are popping up left and right to provide deeper data tooling and features to speed product and feature growth. The benefit being - your team can focus on extracting value from your data, rather than spending their time on data infrastructure tasks.

In this list, we’ve put together quick descriptions, pros and cons, and comments available from review websites for popular DataOps and MLOps platforms - from the most complex data science tools to no-code options for business-facing teams. We’ll tell you which platform might suit your company depending on their features. Let’s get started!

**Please note that the statements in this blog are true to our best knowledge, via company websites and third party review statements as of April 29, 2022. If you work with one of the companies we've listed and we got something wrong - let us know! We'll fix it.**

Databricks

Databricks is considered the largest DataOps provider, having secured a massive $3.5 billion in 10 funding rounds. The company is known for its Lakehouse platform, combining features of a data warehouse and data lake to eliminate siloing, now used by hundreds of companies. It helps its customers unify their analytics across the business, data science, and data engineering, and provides tools for data engineering and business teams to build data products.

❌ Free trial

✅ Free/freemium version

Pros:

  • Extensive functionality and features, giving users a lot of usage options
  • Large professional services team for custom work and integrations
  • Extensive list of integrations and supported programming frameworks
  • Collaborative development environment
  • High performance computing + cloud for complex queries

Cons:

  • Supports Spark only for distributed computing frameworks
  • Development environment is limited to notebooks
  • Built for technical users, little tooling for no-code developers
  • Project configuration can be complex due to the number of features
  • Job failure and error reports are seen as low on useable information according to some users

Domino Data Labs

Domino is a leading MLops platform that combines data frameworks, tools, and software together for custom industry use cases. The company caters mainly to enterprise companies, boasting an impressive list of Fortune 100 clients. The Domino MLops platform is built to increase data science productivity and model velocity by accelerating modern analytical workflows.

✅ Free trial

❌ Free/freemium version

Pros:

  • Extensive integrations and frameworks including Jupyter, JupyterLab, RStudio, VScode
  • Custom environments and professional services for enterprise clients
  • Effective and intuitive UX/UI
  • Supportive customer success team
  • Seamless connectivity to data warehouses with fast project loading

Cons:

  • Enterprise-focused business model - not suitable for smaller teams
  • Some users have commented on out of date documentation
  • Extensive platform functionality can result in complexity in manipulating data

DataRobot

DataRobot is an AI cloud solution that focuses on collaboration within data science teams. Their platform is built for building, deploying, and managing machine learning models, and they boast an impressive list of data science features for machine learning and business ROI. Operating across several industries, DataRobot seeks to “democratize AI”.

✅ Free trial

✅ Free/freemium version

Pros:

  • Unique data science features including time series, forecasting, and velocity estimation
  • Simple user interface and explainability
  • Users have rated customer support well
  • Documentation is well written and easy to understand
  • Rapid workflows for model development, deployment, and maintenance

Cons:

  • Few features to modify unsupervised learning pipelines
  • Low visibility into algorithms
  • Data cannot be edited within the platform
  • API is complex - no plug and play integrations for Salesforce and other business intelligence

Shakudo

Shakudo is a new data platform built for cross-industry use including machine learning, web3, scientific insights, and geospatial data. The Shakudo platform is designed for small to medium sized businesses with fast, intuitive project setup, extensive integration with data frameworks and tools, and appealing distributed cloud offerings promising a minimum of 25% reduction on your cloud bill compared to major service providers. Shakudo is a fit for teams looking to get started with DataOps and MLOps without DevOps support.

✅ Free trial

Free/freemium version

Pros:

  • Preconfigured and custom environments for instant project setup
  • Functions for the entire data product pipeline, including data prep, dev, and deployment
  • Extensive debugging features with VScode integration
  • Supports web3 development and other industry-specific nodes
  • Integrates commonly used open source frameworks; built on Kubernetes

Cons:

  • Not suitable for low-code or no-code team members
  • Community is not yet robust
  • Plug and play integrations with business intelligence tools are not yet available

Dataiku

Dataiku is a company that combines features used for MLOps, DataOps, and business analytics. The company is heavily focused on enterprise clients, and has a suite of tools for users at each point of the data process. The platform includes intuitive no-code graphing and visualization features, and it supports a wide range of data sources.

✅ Free trial

❌ Free/freemium version

Pros:

  • Collaboration features for task monitoring
  • Visualization features and tooling for low-code and no-code developers
  • Intuitive design and functionality
  • Supports a wide range of data sources from SQL, TeraData, Hadoop Hive, etc.
  • Storage and compute agnostic

Cons:

  • Limited community support
  • Limited ability to integrate with automation tools like Blue Prism
  • Users have commented on limited capabilities for data visualizations

Astronomer

Astronomer is a control plane for Apache Airflow. Built for companies with various stakeholders who need to build, run, and observe data pipelines-as-code, the Astro platform provides unified data flows with features built for workflow dependencies and monitoring. Astronomer is the commercial developer of Airflow, a commonly used workflow management system.

❌ Free trial

✅ Free/freemium version

Pros:

  • Wide variety of plugins
  • Workflow and dependency management features
  • Extensive alerting and monitoring features
  • Supports use cases for ingestion, data preparation, and load
  • Suitable for enterprise-level data lakes

Cons:

  • Built specifically for Apache Airflow - inflexible beyond that
  • Low on features for data processing
  • No Okta integration for cloud
  • Dependent on AWS for deployment
  • Inactive support community

dbt

dbt is an SQL development environment built to let data engineers take ownership of the entire model workflow. It offers a suite of collaboration tools with lightweight and fast run times. Using dbt, data teams are able to work directly within a data warehouse to produce datasets for reporting, ML modeling, and operational workflows.

✅ Free trial

✅ Free/freemium model

Pros:

  • Low on no-code and analyst features
  • Collaborative environment
  • Extensive version control testing and automated documentation features
  • Lightweight, quick run times

Cons:

  • SQL - not sufficient for more complex or industry-specific frameworks
  • Users have commented on high cloud pricing
  • Learning resources are not yet robust
  • Does not yet support all data warehouses

Datameer

Datameer is an SQL and no-code platform for exploring, transforming, and building data models. It’s designed for hybrid teams with an intuitive spreadsheet/graphing interface that users compare to Excel. Users can deploy their production models from within the platform, or through integration with dbt.

✅ Free trial

❌ Free/freemium version

Pros:

  • Extensive data source connectivity, easily connects to source data using connectors
  • No-code interface for non-technical users
  • Provides Excel-like interface for data analysts
  • Helpful customer support
  • Overcomes usability issues with Hadoop with a simple UX

Cons:

  • Limited to SQL and no-code - not suitable for more complex or industry-specific frameworks
  • Built for Snowflake - usability outside of this is limited
  • No application deployment

Yevgeniy is an experienced leader and entrepreneur in the artificial intelligence industry. Prior to establishing Shakudo, Yevgeniy built applied AI and machine learning product groups at Bank of Montreal, Borealis AI (RBC), Georgian Partners and worked closely with startups to introduce applied research into their products.