To further strengthen our commitment to providing industry-leading coverage of data technology, VentureBeat is excited to welcome Andrew Brust and Tony Baer as regular contributors. Watch for their articles in the Data Pipeline.
The summer has barely started, but MongoDB World and Snowflake Summit are now past tense, even as the paint is still drying on all the announcements made at each event. With its Data + AI Summit kicking off as a hybrid virtual/in-person event in San Francisco today, Databricks is wasting no time responding, with a huge manifest of its own announcements.
Databricks’ cofounder and chief technologist (and Apache Spark creator) Matei Zaharia briefed VentureBeat on all the announcements. They fall into two buckets: enhancements to open-source technologies underlying the Databricks platform — like Apache Spark — on the one hand, and enhancements, previews and general availability (GA) releases pertaining to the proprietary Databricks platform on the other.
In this post, I’ll cover the full range of announcements. There’s a lot here, so feel free to use the subheads as a kind of random access interface to read the bits you might care about most, then come back and read the rest if you have time.
Spark Streaming goes Lightspeed
Because Spark and its companion open-source projects have become de facto industry standards at this point, I’d like to start with the announcements in that sphere. First, to Spark itself, Databricks is making two roadmap announcements, covering streaming data processing as well as connectivity for Spark client applications. Spark Streaming has been a subproject of Spark for many years, and its last major enhancement — a technology called Spark Structured St …