Cask Software, creator of an application development platform for Big Data, has updated the platform and expanded beyond Hadoop through a new partnership with Cassandra company DataStax.
In a blog post, CEO Jonathan Gray discussed Cask Hydrator, a new capability built into version 3.2 of the CDAP app platform that enables data ingestion and ETL from a wide variety of sources. He also noted a new integration with Cassandra that takes the platform past its roots in Hadoop, as delivered via partnerships with Cloudera, Hortonworks and MapR.
(Related: A primer for working with Hadoop)
“As the first example, Cask Hydrator is implemented as an application template for batch and real-time ETL,” wrote Gray. “It defines plug-in APIs for source, transform and sink. You can create instances of an ETL pipeline through JSON configuration. New sources, transforms and sinks can be easily developed as plug-ins in Java.”
The application templates, Gray wrote, “extend the dataset concept of individual data patterns to complete application patterns. Application Templates are based on the concepts of Applications and Plugins. An application can contain any number of programs like Spark, MapReduce, etc., and those programs can define and reference the API of a plug-in.”
In a news release announcing the DataStax partnership, Cask wrote: “Moving forward, the CDAP road map will support rapid development of real-time data applications on DataStax Enterprise… The first phase includes CDAP’s direct support for Cassandra Datasets, providing the usability of CDAP Dataset libraries for Cassandra users and the flexibility for CDAP applications to run against both Apache HBase and Apache Cassandra. The second phase includes integration of Cassandra with CDAP’s open-source transaction engine, Tephra. This will provide scale-out, fault-tolerant, high-throughput transactions on Cassandra and will allow any application developed on CDAP for HBase to be run on Cassandra without changing any code.”
“By extending our platform to integrate with Cassandra, we will enable a broader set of use cases and allow our customers to have more choices,” Gray said in the statement. “Our solution will also bridge the critical gap of governing and operationalizing data between Cassandra and Hadoop.”
Sep 28, 2015 3:23:00 PM