Details for Government Employees
Big Data TechCon
is sponsored by:



Chris AndersonChris Anderson

Chris Anderson is a co-founder of Couchbase and Chief Architect of the company’s mobile technology. Chris has a personal obsession with bending the physics of the Web and giving control back to users. Chris is co-author of “CouchDB: The Definitive Guide” and has spoken at a number of conferences including: SXSW, OSCON, MySQL, GDC, ApacheCon and the Erlang Factory.
Twitter : @jchris

(Click here for classes)

Sync is the Future of Mobile (Big) Data

Naveen is an Associate Professor in Research Neurology (Informatics) at the University of Southern California (USC), Los Angeles. He is also a founder and Principal of Cognie Inc., a text analytics R&D and consulting organization. His research expertise is in the broad area of artificial intelligence (AI) applied to (big) data analysis and management, with a particular focus on technologies for automated information extraction text, machine learning, information integration and information semantics. A significant portion of his work has been in the domain of health and biomedical informatics. He has published close to 80 technical papers in top journals and forums in the field and has also authored two books in the areas of Geospatial Semantics and Information Mediation.

He has given over 30 invited talks and tutorials at various organizations in the above areas. He has taught graduate and undergraduate courses in the area of data management and information retrieval at the universities of Georgia, and California at Irvine.  He received his PhD in Computer Science from USC in 2000. Prior to joining USC, he was on the faculty of informatics at the University of California at Irvine, and prior to that worked as a scientist at NASA Ames Research Center for three years.

(Click here for classes)

Deep Machine Reading: Taming Unstructured, Natural Language Data

Michael A. BenderMichael A. Bender

Michael is an associate professor of computer science at Stony Brook University and the chief scientist and co-Founder of Tokutek, Inc. His research interests span the areas of data structures and algorithms, I/O-efficient computing, scheduling, and parallel computing. He has coauthored over 100 articles on these and other topics. He has also won several awards, including an R&D 100 Award and four awards for graduate and undergraduate teaching, and a White House Big Data grant.

Michael received his A.B. in Applied Mathematics from Harvard University in 1992 and obtained a D.E.A. in Computer Science from the Ecole Normale Superieure de Lyon, France in 1993. He completed a Ph.D. on Scheduling Algorithms from Harvard University in 1998. He has held Visiting Scientist positions at both MIT and King's College London.

(Click here for tutorials)

Data Structures and Algorithms for Big Databases

Ron BodkinRon Bodkin

Ron, Founder & CEO of Think Big Analytics, founded the company to help companies realize measurable value from Big Data. Think Big is the leading provider of independent consulting and integration services specifically focused on Big Data solutions. Its expertise spans all facets of data science and data engineering and helps our customers to drive maximum value from their Big Data initiative. Previously, Ron was VP Engineering at Quantcast where he led the data science and engineer teams that pioneered the use of Hadoop and NoSQL for batch and real-time decision making. Prior to that, Ron was Founder of New Aspects, which provided enterprise consulting for Aspect-oriented programming. Ron was also co-founder and CTO of B2B applications provider C-Bridge, which he led to team of 900 people and a successful IPO. Ron graduated with honors from McGill University with a B.S. in Math and Computer Science. He also earned his Master’s Degree in Computer Science from MIT, leaving the PhD program after presenting the idea for C-bridge and placing in the finals of the 50k Entrepreneurship Award.

(Click here for tutorials)

The Building Blocks of Storm Trident Deployment

Dipti BorkarDipti Borkar

Dipti is Director of Product Management at Couchbase where she is responsible for the company’s flagship product, Couchbase Server, and works with customers and users to understand emerging requirements for low-latency, scalable data stores. Dipti has deep technical experience in the database industry having worked at IBM as a software engineer and Development Manager for the DB2 server team, and then at MarkLogic as a Senior Product Manager.
Twitter : @dborkar

(Click here for tutorials)

NoSQL for SQL Professionals

Stephen BrobstStephen Brobst

Stephen is the CTO at Teradata. Previously, he founded three start-up companies (one acquired by IBM, one acquired by NCR, and one IPO). Stephen performed his Masters and PhD research at the Massachusetts Institute of Technology in the area of high performance parallel processing. Stephen was appointed to Barack Obama's Presidential Council of Advisors on Science and Technology (PCAST) where he focused on Big Data strategy across all federal agencies. Previously, Stephen taught in the computer science departments at MIT and Boston University. Stephen is a Fellow at The Data Warehousing Institute where he is a top-rated instructor on topics such as Big Data exploitation, data visualization, high performance data warehouse design, and agile data warehousing.

(Click here for classes)

Advanced Implementation of Big Data Analytics with Graph Processing, Part I
Advanced Implementation of Big Data Analytics with Graph Processing, Part II
Optimizing Your Big Data Ecosystem, Part I
Optimizing Your Big Data Ecosystem, Part II


Brian BulkowskiBrian Bulkowski

Brian, founder and CTO of Aerospike (formerly Citrusleaf), has more than 20 years of experience designing, developing and tuning networking systems and high-performance Web-scale infrastructures. He founded Aerospike after learning first hand, the scaling limitations of sharded MySQL systems at Aggregate Knowledge. As director of performance at this media intelligence SaaS company, Brian led the team in building and operating a clustered recommendation engine. Prior to Aggregate Knowledge, Brian was a founding member of the digital TV team at Navio Communications and chief architect of Cable Solutions at Liberate Technologies where he built the high-performance embedded networking stack and the Internet-scale broadcast server infrastructure. Before Liberate, Brian was a lead engineer at Novell, where he was responsible for the AppleTalk stack for Netware 3 and 4.
Twitter : @bbulkow

(Click here for classes)

Implementing Real-Time Analytics with Apache Storm

Stephen BuxtonStephen Buxton

Stephen Buxton is Director of Product Management for Search and Semantics at MarkLogic, where he has been a member of the Products team for 8 years. Stephen focuses on bringing a rich semantic search experience to users of the MarkLogic NoSQL database, document store, and triple store. Before joining MarkLogic, Stephen was Director of Product Management for Text and XML at Oracle Corporation.

(Click here for classes)

Semantic Technology in the Real World

Masoud CharkhabiMasoud Charkhabi

Masoud is the director of Advanced Analytics at the Canadian Imperial Bank of Commerce (CIBC). Masoud founded and manages the Advanced Analytics division within CIBC Business Support & Strategic Initiatives. The scope of the group spans across CIBC's vast structured and unstructured data sources. Previously, Masoud held consulting and management roles in the decision sciences, technology and operational divisions of CIBC. He has published papers, presented at conferences and held tutorials on topics related to machine learning, data mining and analytics. He holds an academic degree in management and engineering.

(Click here for classes)

Big Data Use Cases in Banking
Discovering Novel Segments in Big Data with Statistical Learning

Todd CioffiTodd Cioffi

Todd is the Director of RapidMiner University at RapidMiner, a leader in Predictive Analytics providing an easy-to-use desktop-to-cloud solution designed for data scientists and business leaders. As a strong advocate for training and certification, he combines his experience in technology and education to impart real-world use cases to students and users of analytics solutions across multiple industries.

For more than 20 years, Todd has been highly respected as both a technologist and a trainer.  As a tech, he has seen that world from many perspectives, including “data guy” and developer, architect, analyst, and consultant. As a trainer, he has designed and covered subject matter from operating systems to end-user applications, with an emphasis on data and programming. He is a regular contributor to the community of analytics and technology user groups in the Boston area, writes and teaches on many topics, and looks forward to the next time he can strap on a dive mask and get wet. 

(Click here for classes)

Predictive Analytics: Turning Big Data into Smart Data
Predictive Analytics at Scale: The Impact of Big Data

Ben CoverstonBen Coverston

Ben currently helps coordinate the training and support activities at DataStax. He has more than 15 years of development experience and has written code running on some of the largest travel websites in the world. He became interested in Big Data through his experiences in troubleshooting data-related problems in which the velocity and volume of data exceeded the capabilities of a single machine.
Twitter : @bcoverston

(Click here for classes) (Click here for tutorials)

Apache Cassandra, An Introduction
Application Development in an Eventually Consistent World

Max De MarziMax De Marzi

Max is a Software Field Engineer at Neo Technology, where he built the Neography Ruby Gem, a REST API wrapper to the Neo4j Graph Database. He is addicted to learning new things, taking on a challenge and finding (and sharing) pragmatic solutions.
Twitter : @maxdemarzi

(Click here for classes) (Click here for tutorials)

Building a Recommendation Engine with Neo4j
Introduction to Neo4j

Sameer FarooquiSameer Farooqui

Sameer is a freelance Big Data consultant and trainer, specializing in Hadoop and Cassandra. For the past five years, he has deployed various clustering software packages internationally to clients, including Fortune 500 companies, governments, hospitals and banks. Most recently, he was a Systems Architect at Hortonworks, where he specialized in designing Hadoop prototypes and Proof-of-Concept use cases. Previously, Sameer worked at Accenture's Silicon Valley R&D lab, where he was responsible for studying NoSQL databases, Cloud Computing and Map/Reduce for their commercial applicability to emerging Big Data problems. At Accenture Tech Labs, Sameer was the lead engineer for creating a 32-node prototype using Cassandra and Amazon Cloud Computing to host 10TB of Smart Grid data. He also worked on a more than 30-person team in the design phase of a multi-environment Hadoop cluster pilot project at NetApp. Before Hortonworks and Accenture, Sameer spent five years at Symantec, where he deployed VERITAS Clustering and Storage Foundation solutions (VCS, VVR, SF-HA) to Fortune 500 and government clients throughout North America.
Twitter : @blueplastic

(Click here for tutorials)

Hadoop: A One-Day, Hands-On Crash Course

Tom FastnerTom Fastner

Tom is a Senior Member of Technical Staff at eBay. He works on the architecture of the analytical platforms and related tools for eBay’s platform which includes multiple large-scale Hadoop clusters (50+PB), a relational data warehouse environment over 50PB in size, and a wide range of visualization and advanced data mining tools. He spends most of his time driving innovation to process Big Data. Tom has worked on XLDB analytic ecosystems for over twenty years. He holds a Masters in Computer Science from the Technical University of Munich, Germany.

(Click here for classes)

Advanced Implementation of Big Data Analytics with Graph Processing, Part I
Advanced Implementation of Big Data Analytics with Graph Processing, Part II
Optimizing Your Big Data Ecosystem, Part I
Optimizing Your Big Data Ecosystem, Part II


Lutz FingerLutz Finger

Lutz, a director at LinkedIn, is an authority on social media and text analytics. He’s also co-founder and former CEO of Fisheye Analytics, a media data-mining company whose products support governments and various NGOs, such as the Organisation for Economic Co-operation and Development (OECD) and the International Olympic Committee, which was acquired by the WPP group.

Lutz is a highly regarded technology executive who built a sales center for Dell Europe as well as an incubator for mobile applications at Ericsson. He is a popular public speaker on business analytics and serves as an advisor and board member at several data-centric corporations in Europe and the US. He has an MBA from INSEAD as well as an MS in quantum physics from TU Berlin (Germany).
Twitter : @lutzfinger

(Click here for classes)

How to build (and use) a Text Mining Platform

Bianca GandolfoBianca Gandolfo

Bianca is a JavaScript Engineer and Evangelist for Hack Reactor, an intensive JS school that takes talented amateurs and turn them into software engineers. She also is a chapter leader for Girl Develop It SF and is the SF Evangelist for Women Who Code. She is passionate about teaching, writing beautiful code, adventuring and learning new things.

Mark GroverMark Grover

Mark is a Software Engineer at Cloudera and a contributor to the Apache Hive open-source project. He is also a section author of O'Reilly's book on Apache Hive called “Programming Hive.” Mark is an active respondent on the Hive mailing list and IRC channel.
Twitter : @mark_grover

(Click here for classes)

Application Architectures with Hadoop: Putting the Pieces Together Through Example

Richard M. HeibergerRichard M. Heiberger

Richard  is a statistical consultant and Professor Emeritus of Statistics at Temple University. His current focus is development of software for graphical display of data and for teaching. He was Chair of the Section on Statistical Computing of the American Statistical Association for 2011. He is an elected Fellow of the American Statistical Association. He is a member of the Special Task Force on Student Posters of the Statistical Graphics Section of the American Statistical Association, and was appointed by the Provost in June 2010 to be a "Faculty Mentors for the Future of Instructional Technology."

His books include "Statistical Analysis and Data Display" and "R Through Excel," which has been translated into Japanese. His publications include the design of new forms of graphic display for complex data situations.

(Click here for tutorials)

Clear Data Graphics with Illustrations in R

Shane K. JohnsonShane K. Johnson

Shane is a developer and evangelist with a background in Java and distributed systems who now occupies a marketing role. He has consulted with organizations in the financial, retail, telecommunications, and media industries to draft and implement architectures that relied on distributed systems for data and analysis.
Twitter : @shane_dev

(Click here for classes)

The Architecture Behind Real-Time Big Data

Diwakar KasibhotlaDiwakar Kasibhotla

Diwakar is a Principal Architect with GE's Leadership Program and has more than 15 years of experience in implementation of large business intelligence and data warehousing projects. Diwakar has implemented some of the most complex big data projects at large banks and financial institutions. At GE, Diwakar is working with Aviation business to capture, analyze and consume engine data for making realtime decisions. A frequent speaker at Oracle special interest groups, Diwakar blogs about big data and data warehouse techniques.

(Click here for classes)

Industrial Internet Using Big Data

Amandeep KhuranaAmandeep Khurana

Amandeep is a Principal Solutions Architect at Cloudera. He works with numerous customers on strategizing on, architecting, developing and deploying solutions using the Hadoop ecosystem. Prior to this, Amandeep worked at Amazon Web Services as a Software Engineer on the Elastic MapReduce product.
Twitter : @amansk

(Click here for classes)

Lessons Learned and Best Practices for Running Hadoop in AWS

Ken KrupaKen Krupa

As Chief Field Architect, Ken supports field and technical pre-sales activities globally within MarkLogic.  With 25 years of professional IT experience, Ken has a unique breadth and depth of expertise within nearly all aspects of IT architecture. Prior to joining MarkLogic, Ken consulted at some of the largest North American Financial institutions during difficult economic times, advising senior and C-level executives. Prior to that, he consulted with Sun Microsystems as a direct partner and also served as Chief Architect of GFI Group, a Wall St. inter-dealer brokerage.

Today, Ken continues to pursue professional, individual and community-based engineering activities. Current intellectual pursuits involve community science as well as the study of applying purely declarative, rules-based logic frameworks to complex business and IT problems.

(Click here for classes)

Big Data and the New Data Warehouse Paradigm

Bradley C. KuszmaulBradley C. Kuszmaul

Bradley is a founder and chief architect at Tokutek, and is a research scientist in the Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory (MIT CSAIL). His research focuses on developing computer systems and hardware that behave well both in practice and in theory. His entry won five out of six categories in Jim Gray’s 2007 "sorting benchmark contest," sorting a terabyte in 197 seconds. He also holds the world record for sorting a terabyte since they retired that category in 2009.

Bradley formerly architected Akamai’s distributed data collection system, was a Yale Professor of Computer Science and was a principal network architect for the Thinking Machines Connection Machine CM-5. He was one of the developers of the Cilk multithreaded programming system (and now Cilk appears both in gcc and in icc, the Intel compiler). He wrote, in Cilk, winning entries for four out of 12 of problems in the 2009 Intel Threading Challenge.

Bradley's most recent research investigates cache-oblivious data structures for maintaining indexes on SSD and disk. He received four degrees from MIT (two S.B. degrees, an S.M. degree, and a Ph.D.)

(Click here for tutorials)

Data Structures and Algorithms for Big Databases

Samir LadSamir Lad

Samir is a Principal Architect with GE's Leadership Program and has more than 18 years of experience in big data, data warehousing and unstructured data space. Samir is working on driving big data platform operationalization and adoption across GE. Samir has a strong platform operationalization experience coming from Wells fargo where he worked for 13 years and ran their emerging technologies group.

(Click here for classes)

Industrial Internet Using Big Data

Gloria LauGloria Lau

Gloria, Manager of Data Scientist at LinkedIn, leads the core data products team at Linkedin. Her team focuses on understanding and engaging members to construct the best professional identity on the Web, including education and occupation, and builds interesting data products on top of said data. Previously, she was a research scientist at FindLaw, a Thomson Reuters business, where she led the effort on search. She has a MS and PhD from Stanford, and BS from UCLA.

Building Data Products: The Right Order of Things (keynote)

YY LeeYY Lee

YY Lee is COO of FirstRain, leading engineering, SAAS, and data science in the United States and India. YY and the FirstRain team develop patented technology and analytic algorithms that continually analyze unstructured global and social Web data – and then dynamically derive the implicit and explicit business developments, as well as structural changes, within and between companies.

YY has been working at the intersection of math and software through her entire career, applying software and optimization techniques solid-state physics, chip design, and enterprise software. She is focused on real-time semantic and data science engines that distill insights, meaning and impact from the massively heterogeneous, unstructured “digital exhaust” cloud of global open-Web and social content. She regularly speaks at various industry functions, such as Dreamforce, SemTech and DataWeek. She earned her A.B. in mathematics from Harvard University, and has been earning her keep by programming since her teens (before the internet!).
Twitter : @thisisyy

(Click here for classes)

Techniques for Driving Truly Personal Information Experiences

Don MartiDon Marti

Don is a technical marketing manager for Cloudius Systems, the OSv company.  He has written for Linux Weekly News, Linux Journal, and other publications. He co-founded the Linux consulting firm Electric Lichen, which was acquired by VA Linux Systems. Don has served as president and vice president of the Silicon Valley Linux Users Group and on the program committees for Uselinux, Codecon, and LinuxWorld Conference and Expo.
Twitter : @dmarti

(Click here for classes)

Moving Cassandra from Bare Metal to the Cloud with OSv

Aaron T. MeyersAaron T. Meyers

Aaron is a Software Engineer at Cloudera and an Apache Hadoop Committer. Aaron’s work is primarily focused on HDFS. Prior to joining Cloudera, Aaron was a Software Engineer and VP of Engineering at Amie Street, where he worked on all components of the software stack, including operations, infrastructure, and customer-facing feature development. Aaron holds both an Sc.B. and Sc.M. in Computer Science Brown University.

(Click here for classes)

Hadoop Puzzlers

Paco NathanPaco Nathan

Paco, Chief Scientist for Mesosphere in San Francisco, is a "player and coach" who has led innovative data teams building large-scale apps for more than 10 years. He is also an expert in distributed systems, machine learning, predictive modeling, cloud computing, enterprise data workflows, and Open Data. Paco is the author of "Enterprise Data Workflows with Cascading" and a developer evangelist for the Apache Mesos. Paco received his Bachelors in Math and Science and a Masters in Computer Science from Stanford University, and has more than 25 years of technology industry experience, ranging from Bell Labs to early-stage start-ups.
Twitter : @pacoid

(Click here for tutorials)

Hands-on Intro to Apache Spark

Supreet OberoiSupreet Oberoi

Supreet is a hands-on, entrepreneurial, technology leader with over two decades of experience in successfully developing transformative information technologies, and working in leadership roles at Concurrent Inc., American Express, Oracle, Microsoft and many privately-held Silicon Valley companies. 

Supreet currently serves as the Vice President of Field Engineering at Concurrent. 

Prior to this he served as Big Data Technical Evangelist at American Express. Combining business acumen with technical insights and strong execution skills, Supreet developed reference architectures and new enterprise-level capabilities with the Hadoop stack using Map Reduce, HBase, Hive, Solr, Mahout, sqoop and many proprietary “big data” technologies. Before that, as Vice President of Engineering for RTI, he developed the engineering organization from infancy, creating and delivering software products for analyzing and sharing information in real time.

(Click here for classes)

Future-Proof Your Big Data Investments With Cascading
Increase Hadoop Utilization With Your Data Warehouse

Arvind PrabhakarArvind Prabhakar

Arvind is the CTO of StreamSets, a Big Data startup based in the San Francisco Bay Area that is redefining the way we look at data in motion. Previously at Cloudera, Arvind was an innovator and technical director for several platform projects, handling some of the world's largest data integration and access challenges. Prior to Cloudera, Arvind was Software Architect at Informatica working on the Core Platform Technology team and was responsible for designing and implementing the systems that power the next generation Informatica products. Arvind is a member of The Apache Software Foundation, the PMC Chair on Apache Flume and Sqoop projects, and PMC member on Apache Storm, Sentry, and MetaModel projects.

(Click here for classes)

Data Aggregation at Scale Using Apache Flume

Krishan RamanKrishnan Raman

Krishnan is a data scientist at Twitter. He was formerly a risk quant at Bank of America, an associate at Goldman Sachs, and an engineer at Sun Microsystems. His experience in building the real-time proprietary trading system WebET at Goldman Sachs, and concurrent Scala systems to compute the conditional value at risk of large credit portfolios at BAC, have put him in good stead at the Revenue Quality team at Twitter. His primary tools are Scala, Scalding and a dash of statistics and math. He has graduate degrees in math, computer science and mathematical finance from the University of Chicago.
Twitter : @dxbydt_jasq

(Click here for tutorials)

Introduction to Machine Learning in Scalding

Naomi B. RobbinsNaomi B. Robbins

Naomi is a consultant and seminar leader who specializes in the graphical display of data. She trains employees of corporations and organizations on the effective presentation of data. She also reviews documents and presentations for clients, suggesting improvements or alternative presentations as appropriate. She is also the author of "Creating More Effective Graphs." Naomi is Chair-elect of the Statistical Graphics Section of the American Statistical Association and a member of the Special Task Force on Student Posters of the Section. She has presented short courses to numerous corporations, government agencies, non-profits and at many conferences. Naomi received her Ph.D. in mathematical statistics from Columbia University, M.A. from Cornell University, and A.B. from Bryn Mawr College. She had a long career at Bell Laboratories before forming NBR, her consulting practice.
Twitter : @nbrgraphs

(Click here for tutorials)

Clear Data Graphics with Illustrations in R

Mayur RustagiMayur Rustagi

Mayur is a CTO and co-founder of Sigmoid Analytics. His areas of expertise includes real-time Big Data analytics using open source technologies like Apache Spark, Shark and Apache Hadoop. Sigmoid Analytics has worked with over 25 customers in the Big data space, including several Bay Area companies like Pubnub, FusionOps and NBC. They work with companies to get them real-time insights on Tbs of data using in-memory Apache Shark warehouse and streaming data input using Apache Spark.
Twitter : @mayur_rustagi

(Click here for classes)

Reducing Cost in Big Data Using Statistics and In-Memory Technology

Krishna SankarKrishna Sankar

Krishna is a Chief Data Scientist at Earlier stints include Data Scientist with Tata America Intl, Director of Data Science and Bioinformatics at a startup, and a Distinguished Engineer at Cisco. He is member of Program Committee and Paper reviewer for KDD2013 and KDD2014. Krishna’s speaking engagements include PyCon, PyData, OSCON, Hitchhiker’s Guide to Kaggle and NoSQL as well as guest lecturing at the Naval Postgraduate School. His other passion is Lego Robotics and was a Robots Design Judge at the St. Louis FLL World Competition.

(Click here for classes) (Click here for tutorials)

R, Data Wrangling and Data Science Competitions

The Hitchhiker's Guide to Machine Learning with Python and Apache Spark, Part I

The Hitchhiker's Guide to Machine Learning with Python and Apache Spark, Part II

Eric SchmidtEric Schmidt

Eric is a Solutions Architect on the Google Cloud engineering team focused on Big Data scenarios. His main focus is enabling customers and partners to build large-scale real-time data-processing architectures with Google Cloud services, including BigQuery, Compute Engine, Cloud Storage and Cloud Dataflow, as well as open-source technologies like RabbitMQ, ZeroMQ, Apache Storm, Apache Spark, and the related Apache Hadoop family.

Prior to joining Google, Eric was a Sr. Director of Technical Evangelism and Development at Microsoft, where he led a team focusing on the adoption of Microsoft’s devices and cloud services for consumer lifestyle application scenarios. His team delivered Emmy-award-winning applications for NBC Sports, the Democratic National Convention, NCAA March Madness on Demand, and Major League Soccer. Eric’s core focus was to drive monetization scenarios based on world-class user experiences built around scalable cloud services. He has a deep passion for user interaction modeling, data modeling and analytical processing of user behaviors, and development experience with Java, .NET, C, JavaScript and Python.

Prior to joining Microsoft, Eric was a managing consultant with PricewaterhouseCoopers, focusing on systems integration and data warehousing. Eric holds a bachelor's degree in organizational management from Pennsylvania State University and certificates within Data Structures and Algorithms and Methods for Data Analysis from the University of Washington. In his spare time, Eric is an avid road biker and is a guest host DJ on 90.3 KEXP Seattle, channeling his passion for modern global music. Most importantly, he is a proud and supportive father of a beautiful baby boy whom he cherishes with his mother and now second son—their dog Apollo.

(Click here for classes)

Inside Google Cloud Dataflow

Jonathan SeidmanJonathan Seidman

Jonathan is a Solutions Architect on the Partner Engineering team at Cloudera. Before joining Cloudera, he was a Lead Engineer on the Big Data team at Orbitz Worldwide, helping to build out the Hadoop clusters supporting the data storage and analysis needs of one of the most heavily trafficked sites on the Internet. Jonathan is also a cofounder and organizer of the Chicago Hadoop User Group and the Chicago Big Data Meetup, and a frequent speaker on Hadoop and Big Data at industry conferences such as Hadoop World, Strata and OSCON.
Twitter : @jseidman

(Click here for classes)

Application Architectures with Hadoop: Putting the Pieces Together Through Example

Tony ShanTony Shan

Tony is a renowned thought leader and technology visionary with decades of experience and guru-level knowledge on emerging technologies for pragmatic enterprise computing. He has directed and led the life-cycle design of complex distributed systems on diverse platforms in Fortune 50 companies and big public-sector organizations. He drove innovations with insightful consulting and advising on large-scale high-profile projects that won many rewards. He authored dozens of top-notch publications and more than 10 books on next-generation technologies. He wrote multiple entries on architecture and methodology to IT encyclopedias.

He is also a regular keynote speaker and chair, moderator, advisor, and organizing committee member in preeminent conferences; an editor and editorial advisory board member of IT research journals and books; and a founder of several user groups and forums. In particular, he is a world-leading authority in the Big Data and cloud space, delivering scores of presentations, panels and workshops in various industry events, and serving general chair in international conferences. He has extensive speaking experience at conferences and industry events.
Twitter : @tonyshan

(Click here for tutorials)

Big Data Architecture Approach

Dinesh SubhravetiDinesh Subhraveti

Dinesh is responsible for the multi-tenancy and virtualization infrastructure at Altiscale. He developed the notion of Operating System level virtualization as a part of his Ph.D., which later came to be known in the industry as Containers. Published in OSDI 2002, his work showed for the first time that enterprise applications can be virtualized and live-migrated. Dinesh applied that research to drive industry's first Container virtualization product for enterprise Linux applications at Meiosys, the company behind Linux Containers that IBM acquired in 2005. He authored over 35 patents and papers in the areas of virtualization, storage and operating systems, and holds a B.E. degree in computer science from BITS-Pilani, India and M.S., M.Phil., and Ph.D. degrees in computer science from Columbia University, New York. He also had extensive teaching experience both from his time at Columbia, as well as multiple meet ups and industry conferences around the world.

(Click here for classes)

Privilege Isolation in Docker Containers

Daniel TempletonDaniel Templeton

Daniel works on the Cloudera training team, building Cloudera’s developer and data science Cloudera Certified Professional certifications. Daniel also has a long history as a software engineer in the high performance computing space and has been kicking around big data since about 2009. Prior to Cloudera, Daniel spent more than a decade at Sun doing various engineering and product management roles and speaking at conferences. Daniel has a BE in EE/CS from Vanderbilt and an MSCS from Stanford.

(Click here for classes)

Hadoop Puzzlers

Dean WamplerDean Wampler

Dean is a consultant for Typesafe. He specializes in scalable, distributed, data-centric application development, “Big Data” or otherwise, applying Functional Programming principles with the Typesafe stack, Hadoop, and other tools. Dean is a contributor to several open-source projects and the founder of the Chicago-Area Scala Enthusiasts. He is the co-author of "Programming Scala," the author of "Functional Programming for Java" Developers, and the co-author of "Programming Hive."
Twitter : @deanwampler

(Click here for classes) (Click here for tutorials)

Apache Spark for Event Stream Processing
Big Data Programming with Spark
Copious Data, the “Killer App” for Functional Programming

Patrick WhitePatrick White

Patrick is the CEO and co-founder of Synata, a San Francisco enterprise search startup that supplies software to cloud-enabled enterprises. Prior to Synata, he spent the last decade building software at companies such as Intuit, Microsoft, VistaConnect, CyberArts, and Fortify Software (now HP Security).  As Fortify’s Group Product Manager, Pat managed the development of award-winning products. In 2009 he founded the consulting firm, Ally Software. Pat and his team at Ally spent over three years successfully implementing large scale enterprise software for corporate and government clients, specializing in complex deployments of search products such as FAST, Autonomy, and Google Search Appliance.
Twitter : @patwhite

(Click here for classes)

Graph Analysis of Enterprise Data with Spark

Yongzheng ZhangYongzheng Zhang

Yongzheng, a Business Analytics manager at LinkedIn, is an active researcher and practitioner on text mining and machine learning. He has developed many practical and scalable solutions for utilizing unstructured data to help e-commerce and social networking applications, including search, merchandising, social commerce, and customer service excellence.

Yongzheng is a highly regarded expert in text mining. He has published many papers in text mining and machine learning on top journals and conferences. He is actively giving tutorials and organizing workshops both on sentiment analysis at prestigious conferences. He holds a Ph.D. in computer science from Dalhousie University in Canada.

(Click here for classes)

How to build (and use) a Text Mining Platform