Technical-focus - Basho offers integrated platform for big data analytics

These supporting services include a data store and the Apache Spark processing framework and management tools.

Set to be available from this month, the Basho Data Platform builds on the Riak database, now known as Riak KV (for key-value), and is intended to greatly simplify the deployment and operation of a big data platform to support mission-critical active workloads at organisations.

“People select Riak when they need a database that stays on no matter what happens. They have mission-critical systems that cannot go down or cannot suffer a mis-write,” Basho’s EMEA managing director Emmanuel Marchal said.

Basho is now extending the operational simplicity and reliability of Riak to turn it into a complete big data solution. The firm has integrated a number of extra tools and technologies with an eye on the scalability and reliability of the complete platform.

Along with the Riak KV database, the new platform integrates Basho’s Riak S2 (formerly Riak CS) object storage backend and a number of service instances, including Apache Spark, the Redis in-memory data store for caching, and the Apache Solr search engine.

“Basho intends to add further data storage back-ends to the Basho Data Platform in the future, as well as expand on the number of service instances bundled with the software,” Marchal added.

Basho also provides additional tools, including message routing, networking, and coordination, as well as data replication and cluster management and monitoring services across all the nodes of a Basho Data Platform Deployment.

The aim is to offer customers a single solution for big data processing that is relatively easy to set up and operate. Part of the value additions it brings to the table are tools to ensure high availability and scalability of components such as Redis and Spark, according to Marchal.

“To deploy Spark as a highly available fault-tolerant cluster, things get a bit complex. Spark is a master-slave architecture and one node needs to be the leader. As soon as you do that, you have a single point of failure; so you need to put something in place to overcome that,” he said.

“Organisations often deploy tools such as Apache Zookeeper, but this is not trivial to use,” Marchal said. Therefore, Basho instead uses the core technology in Riak and the data platform to provide high availability support for Spark. Meanwhile, the Spark connector included with the Basho Data Platform allows the Spark cluster to retrieve operational data from Riak, perform the analytical calculations, and then send the result back to Riak.

Basho did not disclose pricing for the Basho Data Platform but, as with most software based on open source, it is offered to enterprise customers as part of a service and support agreement.