Business

Barriers to Working With Data for Business Users

Learn the biggest bottlenecks you'll encounter in the process of working with data, and how you can prepare your organization to optimize their data use!

Have you ever undertaken a small project, only to have it balloon in time and cost because of unforeseen circumstances?

Businesses often treat data decisions in this manner, where they resolve to become more data-driven in decision making, but the extent of organizational change needed is not understood. To help you avoid these common pitfalls before you encounter them, we've put together a guide to help you progress through the data lifecycle. Along each step we've identified some of the key tasks you may encounter, and the risks of bottlenecks and delays that are associated with each.

Searching/finding data you need for analysis

Locating the data you need is oftentimes one of the initial struggles you will need to overcome when beginning your analysis. Information may be buried in legacy systems, stored locally on other employees machines, may require third party sources, or your organization isn't recording the detailed information you need for your analysis.

Questions to ask:

Is my organization currently collecting this data? If not, how could we begin recording this data?

If the data you require is web based events (such as clicks, user events, etc), try using a tool like Amplitude or Mixpanel to record these events in a manner that is easy to navigate and analyze. These analytics products allow you to improve retention, and gather insights into customer behaviour.

Is the information I need available from public datasets?

Sources like the World Bank Open Data, Google Public Data Explorer, or Registry of Open Data on AWS can be a great place to start when looking at datasets. This can reduce the time and cost of doing your own primary research.

Is the data stored in offline formats?

If you realize that the data is being recorded by your company, but is being saved on local machines as CSV's or Excel sheets, try using a tool like Dropbase. Dropbase centralizes your offline documents by instantly converting your offline files into databases.

Getting data from clients

Getting files from clients can sometimes feel like pulling teeth. They may be disorganized, or simply have more pressing matters to attend to, and your emails requesting data are forgotten. The easier you can make the process for your clients to send you data, the less friction you'll generate in this step.

Questions to ask:

Is this a file I need the client to send only once, or is it a reoccurring event?

For one time file transfers it's generally fine to receive these through email or SFTP folders. However, for more frequent data requests (think monthly sales reports, or weekly web traffic reports), this solution doesn't scale as well. Consider the software that your client is using to generate these reports, and build a solution that allows you to ingest files directly from the software for analysis. This eliminates the need for you to manually upload the files, and reduces the risk of the client forgetting to send the data to you.

Does the client understand how to send me the data?

If you're not careful, data requests to clients can become a game of telephone. Be clear in the format you want the data in, how they should send the data, the timeframe the data covers, and any other information that might get lost in translation. It's always better to be overly explicit with your requests, rather than having to go back and forth with a client trying to get the data you actually want.

What is the magnitude of data that I need to access?

The strategies you will employ for gathering data will vary immensely whether you're dealing with a couple 10MB spreadsheets, or hundreds of terabytes of data. If you're dealing with small transfers of data, you're likely fine using software like Google Drive or Box to handle your files. If the amount of data you're dealing with is massive, then it's a good idea to look into data lakes like Snowflake to store and access the information from.

Moving data from offline sources to the cloud

Many of us are used to recording data on spreadsheets and folders on our local computers, which seems like a fine solution until you need to share the information with others. When you inevitably have to share this information, it's a time consuming process of uploading and organizing the data. To avoid this problem in the future you should consider using databases to store information rather than spreadsheets.

Questions to ask:

Does the original file format need to be preserved, or doesn't need editing?

For some types of assets (such as images, charts, or videos), you want to preserve their format. If this is the case, using data lakes or shared folders are the ideal solution. These tools help you share documents across your company in a simple and effective manner.

Would a database be a more efficient way of organizing and sharing my data?

For data that needs to be more readily transformed or edited, your ideal solution is a database. The benefits of hosting you information in a database are many, but in particular it allows you to share information across the organization with a single source of the truth. No longer do you need to worry if your spreadsheet is out of sync with other teams.

Transforming incomplete/unformatted data

The next challenge you'll deal with when using data is converting it to a format that allows you to perform the analysis you require. Data can be partially complete, filled with null values, formatting of phone numbers all over the place, and any other number of errors that mean you can't immediately start working with your data. The next task is figuring out how to turn this mess of data into a clean, usable dataset.

Questions to ask:

What flaws does my current dataset have? Can I fix these easily?

By identifying what flaws exist within the dataset and determining how serious they are, you can accurately determine how much of a bottleneck to your analysis that this step poses. If your data is already in the format you require, this might not be a consideration for you.

How could I alter this data to be more valuable for my analysis?

For this, I'd recommend checking out our article on the ten most valuable data analysis functions. These functions will help you think about some of the ways that you could alter or enrich your data. Beyond just cleaning your data, you should be looking at removing outliers, filtering your data, and applying transformations.

Centralizing data from many sources

A Market Pulse survey of North American tech companies found that the average firm had over 400 data sources feeding their business intelligence and analytics. If you're using data for your business, chances are that you're having to bring together data from a wide variety of sources. This is a time consuming process, but there are tools that can help you eliminate this bottleneck.

Questions to ask:

Do I need to keep track of customer data from multiple sources?

For B2C businesses especially, tracking customer behaviour is essential. To centralize this data efficiently, using a Customer Data Platform like Segment is ideal. Segment allows you to combine information from internal applications, email, websites, and mobile apps to gather a holistic view of your customers.

Do I want to clean the data while I centralize it, or only after?

The answer to this question will help you determine whether its an ETL (Extract, Transform, Load) or an ELT (Extract, Load, Transform) that you should be investigating. Both tools centralize your data from many sources into one data warehouse or data lake. But ETL tools allow you to apply transformations of the data before it enters the storage, whereas ELT waits until the data is already in your storage before allowing you to perform any transformations. The benefit of ELT tools is the speed at which data can enter the system, and lower overall maintenance. The benefits for ETL tools is GDPR or HIPAA compliance, since these regulations require you to strip certain data away before storing them in your system.

Analyzing data with web tools

Once your data is housed in a single location, the next hurdle is to take the raw information and transform it into actionable data through analysis. There are a number of tools that can help you do this easily, so this is less of a bottleneck than it is a preference of which platform will best suit your needs.

Questions to ask:

What format do I want to present my analysis in?

Generally, there are two ways that you can present your findings: either in static forms or dynamic ones. Static charts allow you to display results over a given time period, or a specific sample of data. Dynamic charts are updated as new information rolls in, and are more about analyzing the ongoing health of an organization. Both are crucial to running a successful data-driven business, but it's important to choose tools that are specialized for your more dominant purpose. Looking to create sales dashboards for your business development representatives? Perhaps try a tool like Tableau or Looker which allow you to create custom dashboards using your data in realtime.

What level of technical skill do I possess?

Your level of technical skill used to play a much larger role in limiting the type of analysis that you can perform, but with modern web applications, the bar to entry is much lower. Still, if you come from a strong development background, you may want to consider tools that allow you to write custom functions and queries in order to maximize your effectiveness, rather than going with a no-code tool.

What budget do I have for subscribing to analytics platforms?

The types of software you decide to use will vary dramatically whether this analysis is meant to support thousands of employees, or if it's just to help you with tracking user sign ups for a side project you're working on. The good news is most leading BI and visualization tools offer free trials, so you can try the product before needing to commit to one.

Analyzing data using machine learning

If you're a more technical user, or are willing to pay for software that will help you draw insights from the data, using artificial intelligence and machine learning to drive analysis can give you a leg up on your competition. This isn't a solution for every businesses data needs, but AI and ML over the long run often leads to a more hands off approach to analyzing data.

Questions to ask:

Do I need to prepare my data further for machine learning?

If you decide to apply your dataset to train machine learning models, you will often need to revisit the "Transformation" step of the data lifecycle to prepare the data for ML. If you want to learn more about some of the transformations you should be considering, read our article How to Clean and Prepare Your Data for Machine Learning.

Is my dataset large enough to make the predictions reliable?

Another bottleneck or reason to not pursue using machine learning for your analysis is simply inadequate dataset size. In order to get reliable results from these models, we recommend a dataset of at least 10,000 rows, but for some problems it might be better to train on a dataset holding millions of rows.

Sharing data with coworkers

At this point in your data analysis, we're looking to share your findings with other parts of the organization. This could presenting findings to executives of the company, identifying weaknesses and strengths of your department, or sharing research with sales teams to improve outcomes. Regardless of the intention, the ability to share the data easily and widely is essential.

Questions to ask:

Do the people I'm sharing with need to edit and manipulate the data?

Do I need to control permissions and roles with respect to the data?

Sharing data with clients

Depending on the type of organization, you may need to share your findings with external stakeholders as well. Similar questions to the ones above still apply, but it's important to also consider how you're presenting the data to them.

Questions to ask:

How do they want to view the data?

How will you communicate your findings with them?

Scaling your data cleaning

The final steps you need to consider in your data processing are the most important. Ensuring that your processes are scalable is what will allow you to continue to see success. This is the portion of the data lifecycle that will consistently form a bottleneck if you don't address it head-on. In many ways, this shouldn't been seen as a separate step to the rest of the parts of the data lifecycle, but more something you should consider in all your decision making.

Questions to ask:

Is this process scalable?

How can I make my processes easily repeatable/lower effort to set up?

Automating your data processes

Going hand-in-hand with scaling your data comes automating the process. If you're able to create data solutions that are both scalable and automated, you'll be able to increase the output of your work significantly without having to increase your effort. If you work with clients this means you're able to take more clients on and generate more revenue, and if you work with internal teams this means you're able to free up your time for more important tasks, rather than wrangling data for others.

Questions to ask:

Can I automate this process with the software I'm using?

Are there any areas of the process that I won't be able to automate?

Newsletter
Insights and updates from the Dropbase team.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
By signing up you agree to our Terms of Service