Strata Conference Summary

Strata data conference logo

Recently I spent three days at O’Reilly’s Strata Data Conference learning just how broad and deep the Big Data space has become. The sessions were interesting, the tutorials were informative, and the keynotes were inspiring. The keynotes are available online.

Janelle Shane’s presentation “Sprouted clams and stanky bean: When machine learning makes mistakes” was hilarious as well as informative. Ajey Gore’s description of how GO-JEK is using big data to transform Indonesia was inspiring.

Big Data, while still fast-paced and ever changing, feels like a moment of thoughtful reflection after the ridiculous, frenetic days of Internet expansion. Some of the data takeaways are directly applicable to any software engineering situation:

  • “Love the problem, not the solution.”
  • “Let the data scientist (or engineer) help with the framing of the problem, not just coming up with the solution after management has framed it.”

For those of you interested in data, there’s also the chance for anyone with a computer to play around with cutting edge tools and big data sets.

The PyViz tutorial walks you through the pyviz libraries using jupyter notebooks and publicly available datasets to see the power and range of options for presenting your data and analysis. Notebooks aren’t new, and jupyter isn’t the only game in town, but they are a truly interactive way to share code, data, and commentary with reproducible results. For folks that haven’t touched Python before, this is a great way to get in and play with code.

Neo4J offers a graphing database and datasets to let you explore the relationships behind some of the top news stories. Using their tools you can explore the Russian Twitter Troll database and find the hashtags most tweeted by the accounts recently suspended by Twitter. You’ll also get an introduction to their Cypher query language, which borrows enough from SQL to seem familiar.

Way back in 2016, the International Consortium of Investigative Journalists released the Panama Papers, an event that should already be familiar to those who maintain security releases on Drupal sites. The hack resulted in the release of 11.5 million documents that exposed crime and corruption enabled through offshore companies. You can explore that with Neo4J too.

Big Data is a big industry and Strata is a conference big enough to show you the highlights and the details. From the inspirational to the technical minutiae, Strata had it all.