This project from Autodesk involved data containing phone numbers which contains letters, special characters, missing country code and some other data problem. Clues sometimes are available such as country/city location, country code and iso code. The objective is to provide a column containing cleansed/corrected phone number in the '+' + Country Code + Phone Number (Leading zeros removed). The idea is to clean the data first then validate all phone numbers using the Google Phone Library API.
In this project, I have created a VBA function utilizing RegEx (Regular Expressions) to clean data unwanted special characters. Data is also formatted to conform with the clients requirement.
Some of the projects that I have worked with requires writing special scripts in order to deliver the requirements in a much efficient manner.
This script has been used many times on different projects which requires data cleansing.
With the latest upgrades/updates from Microsoft, Excel has become more powerful with Power Query and Power Pivot. With Power Query, Data Extraction, Transformation and Loading has become easier. No need to write scripts or perform multiple data/table transformations in a worksheets/s. Instead with Power Query's advanced capabilities, writing the M Code has become automatic which significantly increased user's speed in performing ETL.
Notice the image contains data that needs transformation. Some names contains double last name. Some Cell phones have dashes, open and closed paranthesis, spaces, letters, etc. SMS Numbers needs to be reformated to client-required format.
After performing some data cleansing techniques, we can now see that the 'Name Clean' column was created containing corrected Names. It has also removed unwanted spaces and capitalized each word first letter. 'Cell Phone' and 'SMS Number' now contained correct client-specific format.
Table is now ready to be loaded to an Excel worksheet.
From different data sources having different formats, after some transformation, all 4 data sources in this project has been merged to be able to perform data matching. For names, we have used fuzzy matching so even if some names are misspelled or missing middle names, using nick names, incomplete first name, we are able to match it with the correct records easily.
As for emails and phone numbers, we have performed an exact match rule. We could also do a fuzzy matching with emails as long as we remove the character '@' and anything that folllows to ensure a higher correct yield for matches.
This particular chart is not available in Excel which would show mapping of employee names based on their performance scores. This also identifies their performance based on the lowest and highest performers. not based on a passing mark.
For intended users, creating an interface wherein they are guided by buttons and other embelishments makes it easier for them to avoid creating mistakes. This project also has an auto-drawing system for easier rebar layout visualization