Big Data TechCon | April 26-28, 2015 | Boston, MA

LATEST NEWS

Rancher Labs builds Linux system for Docker

As Docker continues to gain popularity, more and more minimalist operating systems are emerging to run the platform in production and at scale. Rancher Labs recently announced a new open-source operating system designed explicitly for Docker.

Feb 27, 2015 3:45:00 PM

Topics: Container Tech

Apache HBase Hits 1.0

After eight years of development, Apache HBase officially reached version 1.0 yesterday. This stable release includes more than 1,500 bug fixes and changes, and has fully revised documentation. HBase is the database that runs inside of Hadoop, and can be used to store relational data from traditional data stores on the Hadoop File System.

Michael Stack, vice president of Apache HBase, said that version1.0 “marks a major milestone in the project’s development. It is a monumental moment that the army of contributors who have made this possible should all be proud of. The result is a thing of collaborative beauty that also happens to power key, large-scale Internet platforms.”

Performance improvements topped the list of features for version 1.0. Old APIs are being replaced on the client-side, and HTableInterface, HTable and HBaseAdmin all being deprecated and designated for removal as of the 2.x releases. (No date is yet set for those releases.)

Facebook’s contributions to HBase had been left behind at a branch of version 0.89, but those changes were finally merged into the 1.0 release. They include allowing a subset of the server configuration to be reloaded, without requiring a restart of the region servers.

Mike Hoskins, CTO of Actian, said that his company uses HBase internally. “I love it. It’s one level up from HDFS, it’s columnar, has a flexible schema, and it’s time-based,” he said.

“We use it as an event historian. It’s an infinitely large historian, where we can look through all the timestamps coming through our pipeline. These pipelines send time-stamped metrics that are generated from events, and we have to eat them, and process them, and derive big insights and big models from these streams. Time is a first-class dimension [in HBase], which I am a big believer in it.”

Feb 25, 2015 1:46:00 PM

Topics: Apache HBase

Hadoop Wars: Hortonworks v. Cloudera in 2015

Tomorrow after the market closes, Hortonworks will report its quarterly results in its first earnings call since they announced its IPO in December. Analysts expect encouraging results, but regardless of the outcome, all major Hadoop vendors are set to square off in 2015.

Feb 23, 2015 3:00:00 PM

Topics: hadoop

Strata highlights mature Hadoop ecosystem

Feb 19, 2015 10:48:00 AM

Topics: hadoop

A Big Data roundup from Strata + Hadoop World

Feb 19, 2015 9:45:42 AM

Topics: Big Data,

Hitachi to acquire Big Data analytics company Pentaho

Japanese enterprise technology company Hitachi has announced its plans to acquire Big Data analytics company Pentaho.

Feb 18, 2015 2:54:48 PM

Topics: Big Data,

Survey: 70% of enterprises adopting Docker containers

As Docker and application container software become more widespread in deployment automation, enterprises are jumping on the bandwagon in droves.

Feb 18, 2015 1:51:48 PM

Topics: databases

An attorney's perspective on data security and privacy

 

 

Jeffrey Kosseff started his career in privacy and security as a journalist for the Oregonian, right before the dot-com bubble burst. A fascination with the privacy issues of the early Web led him back to law school, and he now works at the Washington, D.C. office of Covington & Burling, LLP. In his current position, he advises media and technology companies on compliance with privacy laws and managing user expectations, and shared some insight on the current state of data privacy laws and regulations.

Feb 4, 2015 10:26:00 AM

Topics: Big Data,, data privacy

Apache Storm turns toward security in 2015

Since it was created in 2011, Storm has garnered a lot of attention from the Big Data and stream-processing worlds. In September 2014, the project finally reached top-level status at the Apache Foundation, making 2015 the first full year in which Storm will be considered “enterprise ready.” But that doesn’t mean there’s not still plenty of work to do on the project in 2015.

Taylor Goetz, Apache Storm project-management committee chair and member of the technical staff at Hortonworks, said that the road map for Storm in 2015 shows a path through security territory. “Initially, Storm was conceived and developed to be deployed in a non-hostile environment. Storm is a distributed system. There are multiple nodes,” he said.

Thus, Storm was not originally designed with security in mind. That’s changing this year, said Goetz, as the team works to add authentication and process controls to the system. Some of that work will be featured in future branches of the project.

“We’ve allowed every process to authenticate against all the other components in that cluster,” he said. “All interactions are authenticated with Kerberos [and] also have the concept of individual users. A unit of computation, in Storm, is called a ‘topology.’ With security features we’ve added, the topology itself runs as the user that submitted it, that allows us to implement security.”

Storm is part of an increasingly crowded world of data tools that have clustered around Hadoop. Hortonworks offers frontline support for the project now as well, cementing the usefulness of Spark in Hadoop environments. But that’s not to say there isn’t still some confusion in the marketplace as to what Storm is used for.

Specifically, Goetz said that the use cases for Apache Spark and Apache Storm are different, despite some overlaps in their capabilities.

“Storm is a stream-processing framework that allows one-at-a-time processing or micro-batching,” he said. “Spark is a batch-processing framework that also allows micro-batching.”

The advantage of using something like Storm with a Hadoop Big Data system, said Goetz, is that it allows developers and analysts to bridge the gap between the batches and jobs that will be done in a few hours, as well as the information that’s coming into the system right this moment.

He added that Hadoop can be used to store all the raw data coming into Storm, while Storm does all the data processing and transforming as it arrives. If errors occur during this process, the administrator can just hit rewind and feed in the raw data that Hadoop has stored again.

Goetz said that he personally will be spending his time on Storm helping to bring order to the various integration points it now has. This list of connectors should grow as 2015 moves on, he said.

Feb 3, 2015 2:12:22 PM

Topics: Apache Storm

Data Governance Initiative expands the Hadoop ecosystem

Hadoop has, for the most part, moved beyond the proof-of-concept phase and the initial chasm of adoption. More and more organizations are putting the open-source framework to work on mountains of complex Big Data. The next step in Hadoop’s evolution is getting a handle on governance.

To that end, Hortonworks—the enterprise data platform provider and open-source Apache Hadoop contributor—announced a new Data Governance Initiative (DGI) to design and implement a comprehensive, centralized approach to data governance. The initiative’s goals range from defining governance standards and protocols within Hadoop, to recreating real-time auditable and traceable data landscapes.

Feb 3, 2015 2:04:00 PM

Topics: hadoop