Blog: How we started out on a big data journey with Microsoft Azure

“Data is growing faster than ever before and by the year 2020; about 1.7 megabytes of new information will be created every second for every human being on the planet.”— IDC.

RedBlack Software and Microsoft Azure

RedBlack Software CTO Steve Dickinson explains how using Microsoft’s Azure cloud platform led to a new world of discovery for our developers.

Over the course of a couple of posts, I’m going to look at how the RedBlack team implemented solutions for data movements in a serverless cloud environment and embraced developments within machine learning and artificial intelligence. This first post reviews the approach we took to move away from the traditional data warehouse model of a SQL Server, SSIS and SSAS.

Some time ago, RedBlack decided to focus on building feature-rich mobile and desktop applications geared towards a cloud environment. On the question of application architecture, microservices were the obvious approach to take. When it came to our cloud platform, we chose Microsoft Azure.

The use of a cloud environment would allow us to scale and deploy individual components which would fit perfectly with our agile approach to development. Each microservice would have its own data source, whether it was an Azure SQL database, within Azure Blob storage or stored within cache.

The fact that each microservice possessed its own individual storage started our true journey with data within Microsoft Azure. There were numerous areas that we needed to find solutions to:

* how to keep data synchronised across different data stores
* how to prepare and transform the data
* how to store the transformed data
* how to publish data to the end user.

We looked at what was available within Microsoft Azure to help. There were some impressive features available for data and data movements like Azure HDInsight, Azure Data Warehouse and Azure Data Lake.

These are all big data solutions but, at this point in our journey, we just didn’t require these enterprise features and the cost was a little more than we had budgeted for. So, what did we use? We chose Azure Data Factory.

Transforming data

Using Azure Data Factory allowed us to create the required workflows to aid in moving, preparing and transforming data that was currently stored in multiple data locations and paved the way to publish data in either a raw format or through Microsoft’s Power BI.

The architecture and data flow we used with the Azure Data Factory
The architecture and data flow we used with the Azure Data Factory

The Azure Data Factory has multiple pipelines tasked with ingesting data for each of the microservices’ individual data stores. Each pipeline has a collection of activities that can be custom built through Visual Studio or come as standard, out of the box.
 
Each of these activities are scheduled to perform the required data transformations and move the data to an Azure SQL database that acts as the data store for the data warehouse.

From the data warehouse, Azure Functions are utilised to produce the required datasets, on a per-company basis, which are partitioned and stored within Azure Blob storage. The data is then ready for consumption by Power BI, Excel Power Query or any other third-party that has authority to access the data.

Next steps

Our main goal is to get to a stage where the data movements are all event-driven microservices and data storage. The idea is to use either the Azure Service Bus or Azure Event Hubs, coupled with Azure Functions. There is also an option to use Azure Stream Analytics, but until we are at the stage where we require real-time data, this is currently on the back-burner.

Useful Links

Azure Data Factory

Azure Data Factory Local Environment for debugging in Visual Studio

Azure Data Factory Custom Activities

Azure Blob Storage

Azure Functions

Steve Dickinson CTO RedBlack SoftwareWeb technology expert Steven Dickinson first joined RedBlack in 2002 as a programmer. He went on to take senior development positions at the Food Standards Agency and Capita. Steve rejoined RedBlack in 2014 and, as chief technology officer, heads up the development of our software solutions.