Hello new startup or old company! You are going to lose at least a part of your data. I’m sorry. I’m not the one who is going to do it to you, don’t worry.
But do worry about the impact! Because it is just a statistical probability that
you are going to get caught with your pants down.
You will lose data because someone will make a mistake. That is fine, mistakes
happen, and hopefully you’ll learn from it. But mistakes will be made!
You store your data in a bucket on a cloud provider, or you have copies of
your data in personal accounts. Someone doesn’t believe that his superstrong
password monkey123
, will be guessed and a credential stuffing account gets access. Or maybe you mis-configure some system, making your entire database readable by anyone on the
internet. This hasn’t happened yet, but it could happen.
So right now, while you still live in wonderland, thinking nothing is wrong, please think about what could happen when things CAN go wrong. What data are you collecting about people? How could the data you have, be abused? What could happen if a dictator found that information? Can someone impersonate someone else? These are not funny questions, your data people need to answer these questions and the CEO is ultimately responsible. Are you worried? Yes? Good!
Now, for every piece of information you have about people, ask yourself:
what are you going to use this information for?
Direct goals are clear
If you have a direct goal, for example:
- I have to split my marketing campaign on sex so I ask for biological sex (I don’t understand that, but you are the expert here, remember)
- I have special discounts for people above a certain age and so I collect age data (you have to decide if you want Date of Birth)
- It is a lawful requirement to record delivery address and bank account information, and so I record it and store it separately
Write the goals for each piece of information down and talk about it with your coworkers. You might be able to use other information. Set clear data deletion and archiving rules.
Direct goals are not clear
if you have a future possible vague goal with certain information:
- If we just collect more data about people we will magically find fraud
- If we just collect more data about people we will have better predictions for sales
Set a strict timeline and goals f.i.: collect gender information for 2 months, try if you can achieve predictions for sales that are 5% better. If you don’t achieve the goal, re-evaluate. If you collect information with the idea that you might use it in the future, throw away the data and stop collecting it. You will never use it and it is a liability.
(For example:) So if you collect massive amounts of data from google analytics, because that happens automatically, re-evaluate if you need all that information! Throw away everything you don’t need. Re-evaluate if you want to be associated with a big company like Google. Try out if other analytics products might be a better fit for you.
If you have a small blog: what do you do with the information you receive from the website analytics? I made the decision to not add tracking, I’m never going to use it, so why bother?
Because big data is like oil: it is hard to refine, can contain a lot of power, but spillage is hard to clean and the disasters that follow will haunt your company for decades.
go for data minimalisation:
If you don’t collect stuff you don’t need, you cannot lose it!
Picture from unsplash by Saikiran Kesari on Unsplash