Data quality management and network security upgrade for a large retail company, through a mixed technology stack.
Summary
The company (client) is a major British multinational retailer with headquarters in London, England, that specializes in selling clothing, home products and food products. They operate a family of businesses, selling high-quality own-brand products in the UK and internationally, from 1,509 stores and over 100 websites globally. Through their stores, support centres, warehouses, and supply chain they serve nearly 30 million customers each year.
Challenges
There were two challenges that were faced by the client.
1. There was a lot of data that was being generated through multiple data sources. Hence, maintaining the data quality and removing irrelevant data was a high priority for the client.
2. Due to a vast network of interconnected resources, it was becoming harder to monitor and find out those devices and resources which were not meeting the security standards, as set by the client’s security policy.
Solutions
Both the challenges were met through individual solutions –
1. A metric was created to check the data quality of all the incoming data and all the irrelevant data was filtered out with high accuracy. The team used tools such as – Alation and Ataccama for data cataloging and refining.
2. To fix the security vulnerabilities the team created a pipeline to identify which resources are behind the firewall and which are outside.
Technical aspects of the solutions
· A Metric for measuring the data quality was created and implemented.
· Deployed third-party tool – Ataccama for data refining and management.
· Data from multiple data sources and schemas were streamlined through Alation.
· Created a security pipeline to check vulnerable resources in the network.
Tools used
· Alation
· Ataccama
· Postman
· Python
· Databricks
· Azure
· Azure storage accounts
Business value
After Cloudaeon deployed the solutions, the client witnessed immediate resolution to both its problems. On the data front, data quality was immensely increased. The irrelevant or redundant data was now filtered out regularly. The data generation, storage, and management became more streamlined. On the security front, the network became more robust and resilient to security vulnerabilities. All the network resources were now able to be monitored and quick detection of security threats became possible.