Case Studies

How Used Dropbase to Identify Personal Data Leaks

Find out just how much personal data you've exposed about yourself online using the tool developed by four software engineering students from the University of Waterloo!

This article is an ongoing series showcasing some of the best hacks we encountered through the Hack the North competition. To view more of these projects, please checkout the full list of hacks using the Dropbase API

Do you know what sort of personal information is out there that you've shared? Maybe it was a tweet where you replied to your friend with a phone number, or a comment on Reddit that mentioned the town you live in. Alone these sorts of details may seem harmless, but for malicious users, or "Doxxers", information such as this can be used to stalk you, harass you, or even hack accounts.

For four first-year software engineering students at the University of Waterloo, this concept was what formed the basis of their project at Hack the North 2020. Despite having their first semester of university completely online, Sunny Zuo, Aurik Datta, Wolf Van Dierdonck and Matt Zhang found ways to connect with one another, and decided to enter the Hack the North hackathon as a team together. Working from around the world, from Calgary to the Maldives, the team managed to put together an impressive project in less than 36 hours.

What information can we find out about you?

The project was inspired by the recent news of the right-wing social media site Parler not scrubbing the metadata from photos posted to the site, which led to personal information and locations being leaked for all their members. Oversights like these can be incredibly damaging and dangerous.

The team decided to try to find a way to stop doxxers from gaining your personal data. For the team, the solution to this was to beat them at their own game; they created to let you dox yourself. By feeding the application the usernames of your social media profiles, the tool will scrape the data from the various social media sources and try and find out personal information about you. This allows you to make informed decisions about what content you might want to delete, and what personal information you're comfortable with the world having access to.

How it works, and how its built

With a React frontend, a user enters the usernames for their Reddit, Twitter and Facebook profiles, and then has to go through some authorization in order to ensure that the user being queried is actually the person doing the search. The app then sends calls to the API's of Twitter, Reddit and Facebook to gather parsed data of the users past posts. This data can then be sent through Azure's Natural Language Processing to identify entities and word patterns that would reveal personal information about a user. The application was made to identify your name, email, address, location, phone number, any potential data breaches, alongside some cool data visualizations.

Sentiment analysis and word frequencies

These data visualizations included sentiment analysis on your posts (whether you tend to make positive or negative posts), word clouds with your most commonly used words, and post frequency data to inform you of when you post the most and least throughout each day and every week.

Using Dropbase to standardize data

Because the team had to use multiple API's to gather their data, each API returned information in a unique format. In order to optimize their data processing and aggregation, the team setup multiple data pipelines through Dropbase as a means of ingesting their different data formats, and clean the data so that it could be fed into the natural language processing engine. This also meant that after the first users data was cleaned, the process was replicable without the need for any human intervention, allowing the data to be programmatically cleaned.

How Dropbase was helpful to Sunny Zuo in automatically processing and uploading data to a database

Next steps for

The team said that their next big goal is to expand the number of social platforms where they aggregate data from, to include a broader view of your online self. Platforms in their immediate roadmap include Instagram and further integration with Facebook. Beyond that, they said that they'd love to find other cool ways of visualizing the data that is collected, and perhaps trying to find other types of identifying information that they could help maker users aware of.

Set up automated data pipelines in minutes with Dropbase! Get started today for free, or contact our product team for a demo

Insights and updates from the Dropbase team.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
By signing up you agree to our Terms of Service