Why should SaaS companies invest in data analytics?
Without analytics or data, you can’t have good reporting – without analytics or data, you’re going to be flying blind and not really know how your company is doing. Early on, you can probably get away with out of the box reporting from the various tools that you’re using. As the business gets more complex though, you’ll run into some limitations where you either can’t implement the business logic that you need to, or a lot of manual effort is required to pull reports from different places and combine them.
Data helps the business make better decisions – the real value of analytics comes when data is used to inform decision-making for your company. Reporting is great, but if nothing changes as a result of an analysis, what’s the point? You’ve got to be able to identify and capitalize on opportunities for growth. Running experiments (i.e., A/B testing), once you reach a large enough scale, is a particularly valuable capability.
Some companies can productize data – the data itself might be critical to the core value proposition of the product. For example, think about companies where recommendations are a core part of their product offering (e.g., what other shows to watch, items to buy, places to visit, etc.). If you’re going to do data science and machine learning, that starts by having a strong analytics foundation.
Customers are increasingly demanding access to data – customers are smart. They know that you’ve got data, and they expect you to be able to share it back to them in some form.
What are the key activities of a data analytics team? How should teams spit their time?
Transforming raw data into a usable state – the data team manages the data warehouse, taking the raw data that comes into the warehouse and preparing it, cleaning it, and curating it so that it’s usable by analysts themselves as well as others. Data transformation includes applying calculations and business logic to the data to produce reliable and consistent metrics.
This might take up ~30% of time for a mature function. Early on, as you’re laying the foundation, you may spend more time here. This role is commonly split off into a specialized “Analytics Engineering” team at some point (when there’s enough work).
Answering ad-hoc questions – whether it’s diagnosing an issue or responding to someone who needs you to pull some data, this is one category of things that a team should be prepared to do.
~10% of time for a mature function. Reduce the time spent by teaching people to self-serve. If you can equip people to answer their own questions, that’ll take some of the burden off your team.
Building dashboards and automating processes – trying to reduce the amount of manual work other people need to do to access data they need regularly, or making people’s jobs more efficient.
This might take up ~30% of time for a mature function. More time will be taken up when you’re getting started on a big early investment and should taper as your data function matures. One way that can go south is if your business changes its “north star” every quarter; if new metrics are constantly being defined, your dashboards will have a short shelf-life, and you’ll constantly be building new reporting to support the “metrics of the quarter”.
Answering important strategic questions – the team should work with stakeholders to understand their biggest challenges. Take time to break down the issues, determine the right questions to ask, and go chase down the answers. Build the story, provide recommendations, and then work together with the business to implement solutions and drive change.
Target to spend 30% or more of your time here. This work will be the most valuable to the company and the most engaging for the team. However, getting here requires that all of the other activities (data modeling, ad hoc questions, reporting) are in place and working well. If you get to this point, congratulations, you’ve made it as a team!
What’s in the tech stack?
You definitely need
Data warehouse – the big players in that market right now are Snowflake, BigQuery, and Redshift. This will become the central repository for all of the data from various applications so that you can work with it all in one place.
ETL (Extract Transform Load) – how you get data into the warehouse. It’s all about moving data from some source system into a data warehouse.
For moving data from a vendor’s application (like Salesforce data), buy something from a third party like Stitch or Fivetran and don’t try to build the ETL yourself. Many other companies are using these applications as well, so don’t waste time reinventing the wheel when it comes to moving that data into your warehouse.
You may need someone internally to set up the ETL to move your own production data into your warehouse. Production data is one of our most important data sources because it’s like the soul of the product, all the most interesting information comes from our production database. And it’s also one of the highest quality data sources because it has to be; it’s what’s running your product.
Reporting tool – connects to your data warehouse and allows you to explore the data and build visualizations. Looker and Tableau are the two most common choices in this category, but there are a lot of other companies that do this.
You may need
Data transformation layer – dbt (“data build tool”) has become the leader in this space, but other similar tools have emerged as well (e.g., Dataform). These tools enable analytics to manage the data transformation layer rather than relying on engineering to transform the data before loading it into the data warehouse. You’ll hear these two approaches referred to as ETL (extract-transform-load) vs. ELT (extract-load-transform), with the difference being the order of the steps. The rise of modern cloud data warehouses (Snowflake, BigQuery, Redshift) and tools like dbt have caused a shift towards ELT and a SQL-based transformation layer managed by analysts as opposed to engineers. This setup provides a lot of power, flexibility, and agility, but it also requires analysts to be more technical than they were several years ago as software development best practices begin to be adopted by analytics teams.
Orchestration and scheduling – you’re going to have regular jobs related to moving data around or processing data that need to be run at some particular time, based on some trigger, or in relation to one another. Most tools will allow you to schedule tasks to run at specific times, and you can get by early on by coordinating those schedules yourself. However, if you need a separate system to schedule and run jobs, or if the dependencies between tasks become more complex, you may need an orchestration or scheduling tool like Airflow.
Data science tool – e.g., Databricks, allows you to run code in a notebook environment in the cloud instead of on your local machine.
Event tracking and analytics – this could be owned by product, but the analytics team should be involved since they’re likely to be heavy consumers of this data. Tools like Amplitude, Segment, Mixpanel, or Snowplow allow you to define and track users and events (i.e., user behavior on your site or in your product). Some companies might be able to use a tool like Heap that tracks user behavior on your site automatically (e.g., no need to define events up front), but for other companies, such a tool could be a recipe for chaos (if the product is complex and tracking based on CSS selectors isn’t meaningful). Be thoughtful about your event planning; one of the most important steps in event tracking is deciding what things are actually important to track.
Adjacent systems you need to consider
The analytics team needs to understand third-party tools – this category includes things like Marketo and Salesforce. The analytics team probably won’t be managing or using these systems directly, but they’ll be using the data from these tools to answer questions. The analytics team needs to understand these systems so that they can interpret the data correctly and combine it appropriately with data from other systems. These tools are the “secondary orbit” of the data stack that you need to take into consideration.
Emerging data tech trends
Reverse ETL / Operational Analytics – reverse ETL tools, as the name suggests, do the opposite of what ETL tools do. Instead of bringing data from third party applications into your data warehouse, they push data from the data warehouse into third party applications. This enables you to use the data warehouse as a source of truth and as a hub for moving data between systems. Remember all of the data cleaning and transformation that’s happening in the data warehouse? Rather than simply using that data for reporting, you can enable other teams to take advantage of that data by getting it back into their operational systems (hence the term “Operational Analytics”). For example, you could push product usage or billing data into your CRM. You could create targeted audiences in your ad platforms based on user behavior. You can ensure that a particular metric matches across all of your systems because it comes from the same place.
Data governance, knowledge management, and data lineage – we’re starting to see more specialized tools for capturing and communicating data knowledge, whether that’s maintaining a “data dictionary” or being able to answer frequently asked questions about “what tables should I use if I need to get this information?” or “what past projects have been done in this area?”
How can early-stage companies handle data before they have a dedicated person or team?
SQL → Excel – early on, you can use the built-in reporting capabilities of your various tools, especially if those tools allow you to export data. You’ll probably need someone who can write SQL queries to pull data from your internal databases. Leaders need to be analytical enough to crunch that data in Excel. You can get a lot of mileage out of taking data exports from various systems and combining/manipulating them in Excel in order to answer specific questions.
When should you hire a dedicated person?
Consider bringing someone dedicated on by 50 people – don’t wait too long to hire someone; you can get away with having people who are Excel wizards or have backgrounds in consulting for a while, but there’s the risk that “that one person” leaves and you don’t have a team prepared.
If functional teams start standing up de-centralized analytics – this could be a red flag that you’re not thinking enough about an overall data strategy.
When you hire a first data analytics person, what should you look for?
Find someone who will provide value on day 1, but can grow a team – they frequently have MBAs and/or backgrounds in finance or consulting. While this person should be able to roll up their sleeves and get their hands dirty, focus on hiring someone that can think about data and about your business strategically. They’re going to be the most important piece of your analytics puzzle for years to come. (credit to Tristan Handy for this one)
What should a new hire’s first steps be?
Stand up core tech – decide what tools you need for a data warehouse, ETL, overall reporting.
Transition reporting – if your reporting is in a homegrown setup like a Google Sheet, transition key reporting over to a tool like Tableau.
Learn while responding to ad-hoc questions – learn what’s important to the company by fielding questions. Understand the data and build relationships with stakeholders. The people who have questions are the people who tend to be the most analytically minded. They’re actually thinking about the data and are hungry for the data to help address some problem.
Where should data analytics sit within the organization? How does it interact with other teams?
|Centralized||-Consistency of metrics and logic|
-Standardization of tools and processes
-Cross-functional strategic viewpoint
-A close team with strong culture, growth opportunities, peer mentorship
|-Tends to be reactive (ticket-taking)|
-Less responsive (can take longer to respond to particular needs, frustrating business units)
-Rivalry/politics emerge when leaders rely on resources they don’t control
|Decentralized||-Benefits from direct functional team context and specialization|
-Functional teams have a sense of ownership of analytics
-Greater potential for impact, work is more likely to be put into action
|-Lack of objectivity, everyone “grades their own work”|
-Inefficient (redundancy across functions)
-Inconsistent (different processes, standards across functions)
-Single point of failure risk if the specialized resource quits
|Hybrid||-Ideally the best of both worlds: analysts get close to the problems the work on, while having the resource of a central team||-Hybrid is hard to do right, “serving two masters” can get confusing if you’re not careful|
How do you measure the success of a data analytics team?
One of the great ironies is that it’s hard to objectively measure an analytics team’s performance. The nature of the work is so variable, and the impact of your work is usually a few steps removed from the team’s output. For those reasons, it’s unlikely that you’ll be able to build a dashboard to show how well the analytics team is performing. That said, here are some things to look at:
- Company sentiment – e.g. survey or need-finding interviews to gauge how people feel about analytics. What has been their experience working with your team or with data in general?
- Demand for the team’s work – if demand for the team’s work is going up, that’s a good sign (especially if functions sponsor the budget for an analytics headcount, which they should)
- Check that the work you’re doing is moving a metric, changing a product, or improving a process – you should tie everything you’re working on to one of those three things. If we can’t clearly tie what we’re working on to one of those three things that’s a serious red flag. (credit to Ken Rudin for these three categories)
What are the most important pieces to get right?
Build trust in the data – if people don’t trust the work your team is doing because they see discrepancies, then they aren’t going to believe your analysis or recommendations and will rely on their own gut instinct instead. Sometimes these apparent data discrepancies are simply a result of misunderstanding the data. In that case, you can focus on communication, documentation, and training to build trust. If your data really does have issues, fix it! Bad data is often worse than no data.
Make sure the data team has access to technical resources – as much as the modern data stack has empowered analysts, you will still need engineering help in some cases. Make sure you have engineers you can work with so you don’t have to sit around waiting.
Build in slices, not in layers – pick a particular problem and solve the whole stack from end to end for that narrow problem. Get your data sources in order that you need for that problem, make sure you have reporting in place for that problem, and make sure you know what the biggest strategic questions and opportunities are. Go tackle that and build things out one slice at a time, rather than trying to do it layer by layer. If you try to get everything “perfect” in a single layer (e.g., having perfectly modeled data built out in your data warehouse), you’ll never be able to move past the first layer.
What are the common pitfalls?
Don’t hire a junior analyst – one of the mistakes that I see companies making is thinking, “I just need to hire a junior level analyst person to do this Excel work to get it off of these other people’s plates” vs. saying “okay, it’s time to bring in some real horsepower, and someone who can actually build this team out and position this team for success over the long term.”
Don’t let yourself become a ticket-taking team – partner closely with the stakeholders you work with.
Don’t expect end users to do too much too quickly – you have to gradually teach end users how to use the data you have. You want to empower your users, and you can make meaningful strides towards data democratization, but you don’t want to set your expectations too high on how quickly they learn your systems. It’s unlikely that you’ll be able to teach everyone SQL and never be asked to pull data again.