Big Data TechCon
Register Now and Save

Registration Details  
Government Details  
HP Vertica
MindStream Analytics
Actian Corporation
BlueMetal Architects
Hitachi Data Systems
Darwin Ecosystem
FujiFilm Recording Media
Sripathi Solutions
Arista Networks, Inc.
Basho Technologies
Brandeis University
Super Micro Computer, Inc.
Think Big Analytics

Tutorials for Big Data San Francisco will be posted soon! In the meantime, check out tutorials from Big Data TechCon Boston 2014.

Filter classes by :
Day                         Time Slot
Speaker  Level  Clear Filters
Monday, March 31
Full-Day Tutorial
8:30 AM - 5:00 PM

Big Data Engineering Practicestarburst image

This full-day tutorial introduces the practice of Big Data Engineering (BDE), defined as the pragmatic application of a systematic, disciplined, quantifiable approach to the end-to-end life cycle of Big Data solutioning. BDE is a holistic body of knowledge comprising several modules. The core is composed of Big Data Discipline Areas (BDDA) and Big Data Lifecycle (BDLC). BDDA focuses on eight crucial areas: Methodology, Program, Governance, Resources, Quality, Risk Mitigation, KPI & Financials, and Competency. BDLC systematically addresses individual stages of Big Data solutions: Inception, Requirement, Analysis, Modeling, Platform, Design, Development, Integration, Testing, Runtime, Deployment, and Operation. Each of these eight areas and 12 stages comprises specific elements as sub-disciplines.

We will drill down to selected components and aspects. For example, the NoSQL platform options include key-value, column-based, document-oriented, graph, NewSQL, and in-memory stores. Case studies and working examples will be discussed in great detail to illustrate the practical use of BDE in real-world implementations. Best practices and lessons learned are articulated as well during the session. Topics you will learn about include: Engineering discipline
, life cycle, methodology, governance and best practices.

Level: Intermediate
Hadoop: A One-Day, Hands-On Crash Course image image


This full-day tutorial is a fast-paced, vendor-agnostic technical overview of the Hadoop landscape, and is targeted at both technical and non-technical people who want to understand the emerging world of Big Data, with a specific focus on Hadoop. You will be introduced to the core concepts of Hadoop, and dive deep into the critical paths of HDFS, Map/Reduce and HBase. You will also learn the basics of how to effectively write Pig and Hive scripts, and how to choose the correct use cases for Hadoop. During the tutorial, you will have access to an individual one-node Hadoop cluster in Rackspace to run through some hands-on labs for the five software components: HDFS, Map/Reduce, Pig, Hive and HBase.

In each sub-topic, you will be provided links and resource recommendations for further exploration. You will also be given a 100-page PDF slide deck, which can be used as reference material after the course. PDFs will also be given out for the five short, hands-on labs. No prior knowledge of databases or programming is assumed.

Note: You are required to bring a laptop. If you run into an issue during the hands-on portions, it is also not guaranteed the instructor will be available to help you troubleshoot. 

Level: Overview
Introduction to Neo4jstarburst image image

This full-day tutorial helps build a good knowledge of graph databases. It also teaches the core functionality of the Neo4j graph database. With a mixture of theory and hands-on practice sessions, you will quickly learn how easy it is to work with a powerful graph database, using Cypher as the query language.

Note: You do not need any previous experience with Neo4j, NoSQL databases or specific development languages to attend this tutorial. However, you will need your own laptop. Please arrive early to quickly install the product and labs used in the class.

Level: Overview
Half-Day Tutorial
8:30 AM - 12:15 PM

NoSQL for SQL Professionalsstarburst image

With all of the buzz around Big Data and NoSQL (non-relational) database technology, what actually matters for today's SQL professional? Learn more in this tutorial about big data and NoSQL in the context of the SQL world, and get to what's truly important for data professionals today. In this session, you will learn:
  • The main characteristics of NoSQL databases
  • High-level architectural overviews of the most popular NoSQL databases
  • Differences between distributed NoSQL and relational databases
  • Use cases for NoSQL technologies, with real-world examples from organizations in-production today
Finally, we will drill down into a NoSQL document database and its underlying distributed architecture with a hands-on tour of how it works in a production environment, including online rebalancing while adding nodes to a cluster, indexing and querying and cross data center replication.

Level: Intermediate
Probabilistic Graphical Models with Factoriestarburst image image

Probabilistic graphical models model systems with inherently probabilistic, as opposed to deterministic, behaviors. The two most common model types are Bayesian networks and Markov chains. Factorie is a Scala-language toolkit for building factor graphs used in probabilistic modeling. Target applications include natural language processing. This hands-on tutorial will introduce the theoretical concepts at a conceptual level and how they apply to several example problems, which we will implement together.

This tutorial will best suit people interested in machine learning problems, such as natural language processing, advanced software developers and data analysts and scientists with SAS, R, Python, or related experience. No Scala experience will be assumed, but some prior exposure to Probabilistic Graphical Models is helpful but not required.

Level: Advanced
1:15 PM - 5:00 PM

Analyzing Big Data with Hivestarburst image image

While Apache Hive is designed to allow users to leverage their SQL skills for Big Data analysis, it’s still a relatively new data warehouse infrastructure based on Hadoop and MapReduce operations. In this tutorial, you will see how Hive can be optimized to outperform expectations.

We will begin with a brief overview of Big Data and Apache Hive, its pros and cons, and focus on the key differences between Hive and traditional data warehouses built on top of relational databases. This introduction builds the foundational perspective for you to understand the key strategies of the operational segment.

During the hands-on portion of the tutorial, we will cover a variety of techniques to increase performance and simplify Hive. Operational topics may include Data Modeling in Hive, Hive Query Language constructs, features and syntax, the Hive Execution Model using MapReduce, and Advanced Optimization. Best Practices of core operations will also be discussed and demonstrated, as well as an opportunity for a Q&A. The tutorial will conclude with recommendations and insight on the future of Hive, including developing tools such as Apache Tez.  

Level: Intermediate
Data Transfer Tools for Hadoopstarburst image

Hadoop has become the preeminent platform for storage and large-scale processing of data sets on clusters of commodity hardware. Organizations are increasingly seeking to take advantage of the batch processing capabilities of the Hadoop ecosystem for efficiency and direct cost savings. However, these same organizations are wrestling with moving data from their current data stores to Hadoop and back. In this hands-on half-day tutorial, you will be led through the following common use cases of data transfer from data stores to Hadoop:
  • Moving event data and structured data to Hadoop Clusters using Flume: This part will explain the capabilities of Flume and provide examples of using Flume to move event data from Web servers, access logs and other such structured files.
  • Moving relational data to and fro using Sqoop: This part will explain the capabilities of Apache Sqoop, a tool designed for efficient bulk transfer of data between Hadoop and structured data such as relational databases. The session will also highlight features of the upcoming Sqoop 2.0 platform.
  • Moving data from an enterprise NoSQL Database: This part will provide a detailed overview of the capabilities of the MarkLogic Hadoop Connector as: a) Parallel loading from HDFS to MarkLogic, b) Leveraging MarkLogic’s indexes for MapReduce processing, and c) Parallel reads and writes between a MapReduce job and a MarkLogic database. 
  • Moving data from MongoDB: This part will provide a detailed overview of the he MongoDB Connector for Hadoop is a plug-in for Hadoop that provides the ability to use MongoDB as an input source and/or an output destination.
Each use case will be supplemented with feature coding activities and will include best practices supplemented by real-world experience in using these tools in various projects.

Level: Advanced
Tuesday, April 1
8:30 AM - 12:00 PM

Analyzing Social Media Streams In Java with Redisstarburst image image

Social media sources are one of the biggest data pipelines one can consume. Ingesting and analyzing these firehoses can be challenging and does not match the well understood request/response behavior of traditional data gathering applications. In this hands-on tutorial, we will build a functional java application that will ingest a stream of twitter data and use Redis as a key/value storage mechanism to gather metadata about the ingested data. During this project, we will discuss many unique aspects of consuming firehoses in Java, including:
  • Authentication to stream APIs
  • Ingesting streams without falling behind
  • Storing relevant data efficiently
  • Options for archiving redundancy
The goal for this tutorial is for software developers to have a basic working framework that can be expanded on for their own purposes.

Note: Before attending this tutorial you must have Git installed. As well as Java 7, Maven, and Redis: Server.

Level: Intermediate
Data Structures and Algorithms for Big Databasesstarburst image

This tutorial will explore data structures and algorithms for big databases. The topics include:
  • Data structures including B-trees, Log Structured Merge Trees, Streaming B-trees, and Fractal Trees.
  • Bloom filters and Bloom filter alternatives designed for SSDs.
  • Index design, including covering indexes.
  • Getting good performance in memory.
  • Cache efficiency including both cache-aware and cache-oblivious data structures and algorithms.
These algorithms and data structures are used both in NoSQL implementations such as HBase, LevelDB, MongoDB, and TokuMX, and in SQL-oriented implementations such as MySQl and TokuDB. We will explain the underlying data structures and algorithms that drive big databases. There will also be an emphasis on write-optimized data structures, such as Log Structured Merge Trees or Fractal Trees.

This tutorial includes explaining and analyzing data structures, so it might not be aimed at someone who hates seeing O(N log N); however,  the content will be accessible so that anyone who can tolerate some math will benefit from attending.

Level: Advanced
Getting Started with Cassandra

Unless you have experience with Google BigTable, HBase or Cassandra, column-oriented databases are probably an enigma. Cassandra's data model is both simple and powerful.

It takes some time to get used to the differences between the relational model and Cassandra's column-based model.

Cassandra is not schema-less, but we do not model relationships in Cassandra either. Data Modeling in Cassandra usually consists of finding the best way to denormalize the data when you put the data in the database so that you can retrieve it quickly and efficiently. This workshop will prepare you for success when modeling your data. This tutorial will dive into Cassandra from a developer perspective and give you the tools you need to get started with Cassandra today.

This tutorial will cover:
    •    An introduction to Cassandra in the context of relational databases and non-relational alternatives.
    •    Best practices for modeling your data in Cassandra
    •    Cassandra Query Language (CQL version 3)
    •    Wide, and Composite Columns
    •    Practical Examples
    •    Anti-Patterns (things to avoid)

For a more advanced look at Cassandra, attend the "Apache Cassandra -- A Deep Dive" class.
Level: Overview
Hadoop Programming with Scaldingstarburst image image

Scalding is a Scala-language API for writing advanced data workflows for Hadoop. Unlike low-level APIs, it provides intuitive pipes and filters
idioms, while hiding the complexities of MapReduce programming. Scalding wraps the Java-based Cascading framework in Functional
Programming concepts that are ideal for data problems, especially the mathematical algorithms for machine learning. Scalding code is very
concise compared to comparable Java code in Cascading or the low-level Hadoop API, providing far greater productive. Even non-developer
data analysts could learn Scalding.

In this hands-on tutorial aimed at advanced Java developers and data scientists, we will work through exercises that demonstrate these points. You will see that Scalding is an ideal tool when a full-featured and flexible toolset for Big Data applications is needed beyond what Hive or Pig can provide.

Level: Advanced
Thinking in HBase: Developing Big Data Solutions Using HBasestarburst image image

HBase allows you to build big data applications for scaling, but with this comes some different ways of implementing applications compared to developing with traditional relational databases. For example, in HBase you cannot do transactions spanning multiple tables. In this tutorial, you will learn how, in some cases, to work around to implement transactions using HBase. We will explore real-world big data problems and talk about architectures, APIs, and key parts of  HBase for developing efficient big data applications to solve these problems. We will go through some best practices and how to apply them, using demos and code.

Note: There are special pre-setup instructions with the slides called "MapR Sandbox Install Guide." Please ensure you follow them before attending this tutorial.

Level: Intermediate


A BZ Media Production