Containers are a useful tool for cloud migration adoption, but they may be unfamiliar to those who...
Making a Cloud-to-Cloud Migration Occur
Migrating to the cloud can be challenging, but it may be necessary after a business merger or acquisition.
DevOps was hired to handle the cloud-to-cloud migration of Drivit’s Azure platform to AWS for Zego, a provider of insurance solutions for business vehicles. This migration was necessary following Zego’s acquisition of Drivit, a vehicle insurance telematics specialist. It was important for all systems and applications to be on AWS to maximize consistency and cohesion. However, the migration process was technically complex.
Tens of millions of datafiles are transferred from Azure to AWS
A significant challenge during the migration was the need to move Drivit’s large volume of data files, which were used for processing driver habit data, from Azure Blob to AWS S3. The data consisted of tens of millions of small files, with an average size of less than 1MB, totaling around 45TB. The volume of data made it impractical to use common tools like AWS Snowballs or Rclone to transfer the files. The platform was also rewritten to use AWS S3 for data flow. The migration was necessary to enable more cost-effective lifecycle management.
A strategy based on partitioning
It was a concern that the sheer number of data files to be moved could cause traditional data transportation tools to consume too much memory, potentially leading to system failure. These tools typically scan the source location and store an inventory in memory before starting the migration.
DevOps required a more efficient and less complex method for discovering and cataloging the data that did not rely on storing the inventory in memory.
To handle the large volume of data files, DevOps implemented a queue system that was able to scale to millions of files. The solution involved iterating over the Azure object store, publishing each file to the queue to be collected, and moving the file as an asynchronous process. To ensure efficient processing, they used AWS Batch to run the “export_files.py” process across the Fargate container platform. The object store scan had a long runtime, so this approach was used to speed up the migration.
Although this approach was successful in testing, it was not fast enough for practical use. Even with the assistance of AWS Batch, the “export_files.py” process took several days to complete.
The task of moving the data files was slow because it was like trying to find a specific name in a phone book that is not organized alphabetically. You can start at the front of the book and search through the names one by one, but it will take a long time. This is because it is a single-threaded process, meaning it can only use one CPU core at a time. Increasing the resources available would not help because it is limited to a single core. To improve the speed of the process, we needed a multi-threaded approach.
The final solution
To solve the problem of slow datafile migration, DevOps implemented a solution using data partitioning and multi-processing. Drivit’s data files were uniquely identified by user IDs, so DevOps partitioned the entire dataset using an alpha-numeric scheme with 62 partitions. They then modified the “export_files” process to use Python threads to increase its speed and added a filter step to remove migrated files from the processing queue. The new architecture is shown below:
The “export_files.py” AWS Batch job pushes messages to the “filter queue,” which is read by the “export_worker.py” Lambda function. This function checks if the file has already been migrated and either discards the message or adds it to the primary queue. The “import_files.py” Lambda function then moves the file into S3.
Navigating cloud-to-cloud complexity
Cloud-to-cloud migrations often present a range of challenges that can make them difficult to complete. However, with experience in handling complex migrations, it is often possible to find a solution. In this case, the team’s persistence and practical approach enabled the successful migration of Drivit’s data to AWS.
Here at CourseMonster, we know how hard it may be to find the right time and funds for training. We provide effective training programs that enable you to select the training option that best meets the demands of your company.
For more information, please get in touch with one of our course advisers today or contact us at training@coursemonster.com