Top 16 Companies in the Hadoop-as-a-Service (HDaaS) Market

Digital Content Market

The Global Hadoop-as-a-Service (HDaaS) Market is dominated by many large and medium-sized vendors. Big companies, enterprise software vendors, and core cloud computing vendors are adopting M&A strategies to improve their global presence and increase their reach to customers. Smaller companies are acquired by bigger companies to increase their market share and customer base.

In addition, the Global HDaaS Market is witnessing the entry of many big data analytics vendors that compete with the traditional and on-premise vendors in the market.

TechNavio analysts have pinpointed the top 16 companies offering hadoop-as-a-service that are expected to help fuel market growth at a whopping CAGR of 84.81 percent from 2014-2019. 

Amazon Web Services

Amazon Elastic MapReduce (Amazon EMR) is a web service that makes it easy to quickly and cost-effectively process vast amounts of data.

Amazon EMR uses Hadoop, an open source framework, to distribute your data and processing across a resizable cluster of Amazon EC2 instances. It can also run other distributed frameworks such as Spark and Presto. Amazon EMR is used in a variety of applications, including log analysis, web indexing, data warehousing, machine learning, financial analysis, scientific simulation, and bioinformatics. Customers launch millions of Amazon EMR clusters every year.


EMC²

EMC’s Data Computing Division is driving the future of data warehousing and analytics with breakthrough products including Greenplum Data Computing Appliance, Greenplum Database, Greenplum Community Edition, Greenplum Apache Hadoop distribution, and Greenplum Chorus™-the industry’s first Enterprise Data Cloud platform. The division’s products embody the power of open systems, cloud computing, virtualization and social collaboration-enabling global organizations to gain greater insight and value from their data than ever before possible.


IBM

IBM® InfoSphere® BigInsights™ Standard Edition is an analytics platform, based on open source Apache Hadoop, for analyzing massive volumes of unconventional data in its native format. The software enables advanced analysis and modeling of diverse data, and supports structured, semi-structured and unstructured content to provide maximum flexibility.

InfoSphere BigInsights Standard Edition:

  • Fully integrated, completely compatible – Integrated install of Apache Hadoop and associated open source components from the Apache Hadoop ecosystem that is tested and pre-configured.
  • Includes Jaql, a declarative query language, to facilitate analysis of both structured and unstructured data.
  • Provides a web-based management console for easier administration and real-time views.
  • Includes BigSheets, a web-based analysis and visualization tool with a familiar, spreadsheet-like interface that enables easy analysis of large amounts of data and long running data collection jobs.
  • Includes Big SQL, a native SQL query engine that enables SQL access to data stored in BigInsights, leveraging MapReduce for complex data sets and direct access for smaller queries.

Microsoft

HDInsight is a Hadoop distribution powered by the cloud. This means HDInsight was architected to handle any amount of data, scaling from terabytes to petabytes on demand. You can spin up any number of nodes at any time. We charge only for the compute and storage you actually use.

Since it’s 100% Apache Hadoop, HDInsight can process unstructured or semi-structured data from web clickstreams, social media, server logs, devices and sensors, and more. This allows you to analyze new sets of data which uncovers new business possibilities to drive your organization forward.


Altiscale

At Altiscale, we’ve taken our experiences at Yahoo, Google, and LinkedIn to rethink how Apache Hadoop should be offered. We’ve developed a purpose-built, petabyte-scale infrastructure that delivers Apache Hadoop as a cloud service. We then back it with operational support for Hadoop itself and the jobs you run.

Altiscale’s optimized solution is faster, more reliable, easier to use, and more flexible than alternatives. Whether you’re new to Hadoop or just don’t want to invest more time and resources managing Hadoop yourself, get started with Altiscale today.


Cask Data

Our team has built massive-scale platforms and Big Data applications at some of the largest internet companies in the world. We have put our experience and three years of development into technologies that enable our customers to overcome their Big Data challenges.

We’re passionate about software development and developer productivity. We build things we’d want to use and share the tools we use. We know value comes from insights and applications, not infrastructure and glue. Our goal is to enable every developer in the world to deliver that value faster, having more fun with fewer headaches.

We believe the value of Big Data is more than hype, and Hadoop and related open source projects are the best path for organizations to realize that value. Open source is in our DNA, and we lead, contribute to, or utilize open source projects for everything we do.


Cloudera

CDH is the world’s most complete, tested, and popular distribution of Apache Hadoop and related projects. CDH is 100% Apache-licensed open source and is the only Hadoop solution to offer unified batch processing, interactive SQL, and interactive search, and role-based access controls. More enterprises have downloaded CDH than all other such distributions combined.

CDH delivers the core elements of Hadoop – scalable storage and distributed computing – along with additional components such as a user interface, plus necessary enterprise capabilities such as security, and integration with a broad range of hardware and software solutions.

All the integration work is done for you, and the entire solution is thoroughly tested and fully documented. By taking the guesswork out of building out your Hadoop deployment, CDH gives you a streamlined path to success in solving real business problems.


FICO

Despite the need to harness the power of Big Data, enterprise data platforms, like Hadoop, do not include BI or analytics software that makes data readily accessible for business users.  FICO® Big Data Analyzer is a purpose-built analytics environment for business users, analysts and data scientists to gain valuable insights from the exploration and analysis of any type and size of data on Hadoop.  It makes Big Data accessible by masking Hadoop complexity, allowing all users to drive more business value from any data.


Google

Take advantage of the performance and cost efficiency of Google Cloud Platform to run Apache Hadoop. Directly access data in Google Cloud Storage and BigQuery from Hadoop. Qubole has partnered with Google Compute Engine (GCE) to provide the first fully-elastic Hadoop service on the platform.

Users looking for big data solutions can take advantage of Compute Engine’s high-performance, reliable and scalable infrastructure and Qubole’s auto-scaling, self-managing, integrated, Hadoop-as-a-Service offering and reduce the time and effort required to gain insights into their business.


Hortonworks

Hortonworks Data Platform enables Enterprise Hadoop: the full suite of essential Hadoop capabilities that are required by the enterprise and that serve as the functional definition of any data platform technology. This comprehensive set of capabilities is aligned to the following functional areas: Data Management, Data Access, Data Governance and Integration, Security, and Operations.

Architected, developed, and built completely in the open, Hortonworks Data Platform (HDP) provides an enterprise ready data platform that enables organizations to adopt a Modern Data Architecture.

With YARN as its architectural center it provides a data platform for multi-workload data processing across an array of processing methods – from batch through interactive to real-time, supported by key capabilities required of an enterprise data platform — spanning Governance, Security and Operations.


HP

Enterprises are drowning in information – too much data and no way to efficiently process it. HP Cloud provides an elastic cloud computing and cloud storage platform to analyze and index large data volumes in the hundreds of petabytes in size. Distributed queries run across multiple data sets and are then returned in near real time.

HP Helion Public Cloud provides the underlying infrastructure required to process big data. We partner with third party solution providers who enable enterprises to better configure, manage, manipulate, and analyze data affordably.


Infochimps

Your team recognizes the power that massively parallel data analysis can provide, and Hadoop is the standard to handle massively scalable data. Cloud::Hadoop, a cloud service delivered by Infochimps™ Cloud, is the ideal Hadoop solution. Turn clusters on at a moment’s notice with advanced elastic spin-up/spin-down capabilities, scale and customize on the fly and leverage tools such as Pig, Hive and Wukong that make Hadoop easier to use and much more useful for enterprises.


MapR Technologies

MapR is a complete Distribution for Apache Hadoop that combines over a dozen different open source packages from the Hadoop ecosystem along with enterprise-grade features that provide unique capabilities for management, data protection, and business continuity.

These include: Apache Hive, Apache Pig, Cascading, Apache HCatalog, Apache HBase™, Apache Oozie, Apache Flume, Apache Sqoop, Apache Mahout, and Apache Whirr.

In addition, MapR has released the binaries, source code and documentation in a public Maven repository making it easier for developers to develop, build and deploy their Hadoop-based applications.


Datadog (Mortar Data)

Datadog is a monitoring service that brings together data from servers, databases, applications, tools and services to present a unified view of the applications that run at scale in the cloud. These capabilities are provided on a SaaS-based data analytics platform that enables Dev and Ops teams to work collaboratively on the infrastructure to avoid downtime, resolve performance problems and ensure that development and deployment cycles finish on time.


Pentaho

The Pentaho Business Analytics platform provides Hadoop users with visual development tools and big data analytics to easily prepare, model, visualize and explore data sets. From data preparation and configuration to predictive analytics, Pentaho covers the data lifecycle from end-to-end with a complete solution to your business intelligence needs.

Pentaho’s Java-based data integration engine works with the Hadoop cache for automatic deployment as a MapReduce task across every data node in a Hadoop cluster, making use of the massive parallel processing power of Hadoop.


Teradata

The Teradata Portfolio for Hadoop is a flexible suite of products and services for our customers to integrate Hadoop into a Teradata environment and across a broader enterprise architecture, while taking advantage of world-class Teradata service and support. It includes products and services to suit every budget and maturity level of Hadoop skills, from an enterprise Hadoop distribution and fully-integrated appliances to consulting and support services on existing customer hardware.