Creating a Focused Data Strategy
As data practitioners, we are almost by nature curious, interested by novelty in and of itself. Why else would we jump headfirst into this most ‘21st Century’ of careers? The ability to take a meaty problem, break it apart and discover something new is why this field is so compelling. The joy of finally understanding a new framework or tool, turning old inefficiencies into new trivialities, is an exciting feeling. Everyone I’ve met in Data Engineering and Data Analysis is similar; incredibly clever, curious people who voraciously ingest new knowledge.
There’s an incredible amount of innovation in data technologies; new libraries to include, exciting products to implement and interesting technologies to learn. Right now, just off the top of my head, you could decide to learn or implement; Spark, Kafka, Tableau, Looker, dbt, Airflow, Kinesis, Snowplow, Snowflake, Redshift, Google Big Query, Hadoop, Great Expectations, Amundsen, R, Jupyter Notebooks, Lambda, Arrow, Parquet, Azure, Matpotlib, Stitch, FiveTran… and that is just a tiny part of the data engineering and analytics ecosystem.
The field is moving fast – even for smaller companies, having a highly-functioning analytics stack is a leg-up on the competition – so developers are always reaching for the next hot new tool that can help them grow or increase efficiency.
But any race to novelty has its downsides.
Here Be Dragons
Often in the quest to explore, charting blank territories and taming unknown dragons, we focus on the thrill of the journey and lose sight of where we should be going. While it is intellectually satisfying to learn a new tool, to grow our knowledge and celebrate our place on the leading edge of innovation, we need to remind ourselves that we’re not doing our jobs solely for the fun of it (although sometimes it feels like it!)
We use data to do our jobs better. It’s not to laud it over a competitor because they are using a technology 6 minutes older that the one we are using. It’s not to finish your computer science degree (but if you can, more power to you).
A tool is a tool is a tool. Some are better than others. Some are better than others for specific problems.
Define a Data Strategy
Knowing which tool to use often comes down to simply identifying which problems you are trying to solve. It’s not a complicated process, and don’t get hung up on some long drawn-out offsite, but do it before you commit to a tool no one in the organisation can get anything useful out of. It can be as simple as writing down your business goals on a sheet of paper. (If you need help with your data strategy, get in touch.)
- What are your tactics to meet your wider goals? How can the right data, in the hands of the right people, help you carry out those tactics? What is the most important thing you need to do?
- Identify your data stakeholders; Not only the end users, but the suppliers and owners of the data too. They’re not always the same people.
- What problems are your end users trying to solve? What is their skill level? They could be highly-skilled data analysts with deep SQL knowledge, or they could be front-line manufacturing staff who need real-time visual feedback on their production line. How you present the data to them will explicitly affect their ability to do their jobs.
- How do you get data into your pipelines? Can you define an understanding that the requests you make (api call, event stream, web scrape, db query etc) will always return the same data? What are the chances this will change?
- Who (or what) do you have to develop a relationship with to make sure that that data flows interrupted and without change? Who has the power to make those changes, and will they let you know if they do change something? What is your plan if something does change?
- Will you transform or enrich your data? The value an analytics engineering team can add to a process is often enriching data from multiple sources, or aggregating data into efficient reporting. What are the metrics that you need to supply? Will you be performing optimisations and experiments, or simple reporting?
- Think about the timeliness of your data; are you looking to analyse real-time events, or are your true business insights gained through comparing data over the last few months?
- What infrastructure will you need to support this tool in production? Will the savings you make now be lost in the time taken to support the technology?
- How will your business change? Will the tools you implement now be useless in a year?
- How will you know you have been successful? What can you measure to make that decision?
You Can Always Change Your Mind
This isn’t a comprehensive list. And it might not even suit your business, needs or culture. The important thing is that when you are potentially spending resources (humans, treasure, time) in a technology you need to stop, think and consider the impacts and requirements.
Take an hour and write your thoughts down. Get a group of smart people into a room (ok, Zoom call, sorry) and start thinking together on the whiteboard. The important part is spending some time in consideration.
Writing something down makes you think it through. And creating some user stories, sticking them in a backlog and plotting them on a roadmap can help you organize those thoughts into a real plan. Make your mistakes before you spend money, or implement new infrastructure.