One of the more significant challenges that organizations face today, in managing large data landscapes with multiple disparate sources and repositories, is simply managing that data and the ingestion processes, as well as modeling and reporting, often from equally disparate systems and technologies.
LIVE WEBINAR: April 29th, 2020
Presented by Matthew Bowers
Historically, if a technical or business user needed to manage and define a source system for managing structured, normalized data, they would need a database environment or operational data store. If they then needed to model that data and transform it for additional data insights, they would need a tool like SSIS and load the data into a data warehouse, not always available from a single user interface.
Reporting needs would have to be met, from yet a different application and interface.
And as the environment becomes more sophisticated, and there is a need to bring in unstructured or semi- structured data, they there is a need for a different storage model, such as a data lake or blob storage.
Modeling and training data require yet another application tool, such as Data Bricks, accessed via yet another interface.
As complexity and sophistication of data, data needs, data movement and reporting etc. evolves, the organization finds itself needing an ever-increasing array of tools, applications and interfaces.
Imagine ingesting your data, modeling your data, transforming your data and reporting on your data all from a unified platform. A platform combining the best of traditional data analytics, data science, and reporting. One that integrates your data warehouse, Azure Databricks, Data Lakes, Power Bi and MORE!
This is now available from Microsoft, in the form of Azure Synapse Analytics. Synapse was introduced in November 2019 at Microsoft’s Annual Ignite Conference.
Azure Synapse is a limitless analytics service that brings together enterprise data warehousing and Big Data analytics. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources—at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate BI and machine learning needs. (Microsoft)
While it is, a makeover or rebranding of Azure SQL Data Warehouse, it is that and so much more!
Microsoft has revved its Azure SQL Data Warehouse, re-branding it Synapse Analytics, and integrating Apache Spark, Azure Data Lake Storage and Azure Data Factory, with a unified Web user interface. (Ignite 2019)
Historical perspective – how many remember the release to General Availability of the original Azure SQL Data Warehouse product? Or Gen 1? That first generation was announced for release by Microsoft in 2015.
Azure SQL Data Warehouse Gen 2 was released to General Availability in 2018. In May of that year, Gen 2 was generally available across 20 Azure regions. It was at this time, that Microsoft adopted the Gen 1 and Gen 2 nomenclature. Gen 2 offered significant performance benefits or enhancements over Gen 1:
- Improved individual query execution times by as much as 10 times over the Gen1 tier
- Provisions SQL Data Warehouse with five times the computing power and unlimited storage capacity, making it suitable for the most intensive analytics workloads
- Uses the latest generation of Azure hardware to improve compute and storage scalability
Synapse Analytics is basically, Generation 3 of the Azure SQL Data Warehouse product.
This transition from the Gen1 and Gen2 makes another leap in performance, as well as capabilities for big data, advanced analytics workloads and reporting.
Microsoft has “revved” up the core database engine in the SQL DW product, along with adding new feature sets to allow Microsoft to compete with other cloud-based data warehouse platforms, such as Snowflake. Some of these features include an ability to provision and manage data workloads though explicitly provisioning on-demand or “serverless” infrastructure.
In addition to enhancing the database engine, Microsoft has added other key components and enhancements:
- Integration with Apache Spark (Open Source, not Data Bricks) and Azure Data lake storage to facilitate big data integration.
- And a Unified Web User interface, called Azure Synapse Studio, providing control over the data warehouse and data lakes, as well as Azure Data Factory. Essentially giving you control of your entire data landscape from a central unified user interface.
Azure Synapse has four components:
- Synapse SQL: Complete T-SQL based analytics – Generally Available
- SQL pool (pay per DWU provisioned)
- SQL on-demand (pay per TB processed) – (Preview)
- Spark: Deeply integrated Apache Spark (Preview)
- Data Integration: Hybrid data integration (Preview)
- Studio: Unified user experience. (Preview)
This level of integration into a unified platform allows for a powerful cloud data solution. It facilitates the ingestion of unstructured data into big data stores from a variety of sources. Once your data is ingested into a big data store, other tools such as Hadoop, Spark and machine learning algorithms prepare and train your data.
This architecture allows for a truly modern data warehouse experience as well as integrating all the standard stages and processes for processing big data into a single, unified platform:
Ingestion – The ingestion phase identifies the technology and processes that are used to acquire the source data. This data can come from files, logs, and other types of unstructured data that must be put into the Data Lake Store. The technology that is used will vary depending on the frequency that the data is transferred
Store – The store phase identifies where the ingested data should be placed.
Prep and train – The prep and train phase identifies the technologies that are used to perform data preparation and model training and scoring for data science solutions.
Model and serve – Finally, the model and serve phase involves the technologies that will present the data to users.
There is the added benefit of integration with Microsoft Power BI for deep data analysis, advanced analytics and reporting.
Truly delivering a key component of a cloud-based, end-to-end big data solution.
Visit the Microsoft Analytics docs library to explore the Synapse Architecture.
In summary, Synapse Analytics delivers at scale, on the promise of a modern data warehouse solution, as we discussed in our February Blog post here: https://oakwoodsys.com/data-estate-modernization/
If you’d like to learn more about Azure Synapse Analytics, please send a message below to Oakwood’s Data & Analytics Team.