Found’s Data Toolshed

6th August 2019 – 17 minutes read  Data & Analytics
James Wolman – Data Scientist

Our Data Science & Analytics team strive to innovate new tools and techniques to enhance the performance of Found’s teams. If our delivery teams are athletes, then the data team’s output is 100% pure muscle-building whey isolate. With just a touch of sweetener for taste. (We like stracciatella).

Found’s in-house tools are always built with integration front of mind. No tool is exclusive to one team or for a single purpose. Our tools to bring our teams together.

I’m conscious that might sound a bit airy-fairy, so let’s cut the intro short and dive straight into the details around some of the tools we’ve developed in-house. 

Don’t have time to read the whole post? I’ve summarized it into a table for you lazybones:

TOOL NAME PURPOSEUSED BY
1Keyword Clustering ToolOrders massive keyword lists into hierarchically structured keyword groups of similar themes and intentionsSEO,Paid Media
2Outreach Prediction
Engine (IN:Reach)
Uses machine learning to estimate the likelihood of prospects converting into a backlinkSEO,Content
3Anomaly DetectiveAlerts via Slack when our clients’ website performance begins to show unusual performanceSEO, Paid Media, Data, Management
4Automated ForecastingRapidly generate performance forecasts for clientsSEO, Paid Media, Data, Business Development
5Influencer DiscoveryUncover tens of thousands of potential influencers across Facebook, Twitter, YouTube and InstagramInfluencer
6Traffic EstimatorDownload keyword search volume data at scale, beyond the limits of Google’s Keyword Planner, and with greater accuracySEO, Paid Media
7Ad Copy Performance AnalyzerEasily assess the performance of CTAs and common phrases across Google Ads ad copyPaid Media, SEO

We built our hierarchical clustering engine in response to spotting a gap in the market. There are plenty of tools that group keywords into categories or clusters, but that doesn’t reduce the complexity of creating a reliable website taxonomy and mapping search volume demand. To solve this problem, we created a tool that allows for scale when mapping huge volumes of keywords to a website’s structure while defining the taxonomy itself – by means of hierarchical clusters. 

What it does: This tool uncovers the hidden structure from an unorganised list of keywords. Put more explicitly, it groups extremely large lists of keywords into clusters of similar theme and intent.

Why that is clever: The complexity of creating a reliable website taxonomy from huge lists of keywords is a long and arduous task. Why? Most websites do not have a flat structure. Amazon for example has a homepage, departments within the homepage, categories within each department, subcategories in each category etc. In other words, it’s hierarchical. The tool is clever because it doesn’t just organise keywords into a list of n groups, it creates a 3-level structure, arranging keyword groups into child, parent and grandparent groups.

Who uses it?: Our SEO team use it to accelerate keyword targeting strategies, and our Paid Media team use it to build out ad campaign structures.

Uses & Benefits: The tool takes a very time consuming task and smashes through it in seconds. Also gives the user additional information to cross reference, assisting the decision making process around site architecture, keyword grouping and keyword targeting.

User interface to the keyword clustering tool.

We have also integrated IBM Watson into the tool. For each cluster of keywords we’re able to dive into the sentiment and emotional flavour of each cluster, as well as the important keywords and concepts that define each grouping. This helps our teams more effectively map keyword groups to search intent.

Visual diagram of the hierarchical grouping algorithm. Shows how keywords are arranged into a tree structure, mirroring how websites are typically structured. 

Known internally as “IN:Reach”.

What it does: The tool takes a prospecting list and tells you how likely it thinks each link is to return a response, and ultimately turn into a backlink. We’ve adopted the 3Rs when building this one: we reused stockpiles of previous outreach data collected by our teams, reduced the time taken to acquire backlinks and added the ability to recycle previously unsuccessful prospects for more relevant clients.

Why that is clever: We trained a machine learning model on previous outreach efforts from the team, to be able to predict whether future links will convert for a client. It is finely tuned to our own team and internal data.

Who uses it?: Outreach and SEO to prioritise backlink acquisition strategies.

Uses & Benefits: Helps to prioritise outreach prospecting target lists, recycling previously unsuccessful links and best of all, saving loads of time for the team.

User interface example result from the IN:Reach UI.
Page two of IN:Reach where users are able to recycle links for other clients by quickly identifying the prospects more similar to each of our clients.
Found’s anomaly detective pictured under an electron microscope.

Sometimes spending many years and even more PhDs on building the perfect, most robust AI anomaly detection system pays off. Sometimes cleverly applying an open sourced technology in one month is even better.

We process a lot of data for our clients, so it’s important that when something goes wrong (or incredibly right) that we know about it immediately.

Also, we developed a bot I like to call The Anomaly Detective (snapped above) that sweeps over our important data points (Google Analytics KPIs, Google Lighthouse Reports etc.) and flags when something outside of “normal” happens.

We deployed it as a workplace Slack app to be able to seamlessly communicate with all our teams.

What it does: Alerts our teams to unusual (anomalous) activity across all of our important data points by sending them immediate alerts detailing the potential problem through Slack the messaging platform.

Why that is clever: Defining “normal” is not straightforward. In this case, we leaned on Netflix’s own open source outlier detection function, called Robust Anomaly Detection to help us out. It helps us to very reliably tell when one of our KPIs doesn’t look quite right, without us having to manually define rules of normalcy. In other words, it’s plug and play and no time is wasted defining normal.

Who uses it?: Everyone, including management.

Uses & Benefits: Mainly used as an early warning system to highlight unexpected performance lifts or dips, it can also alert us to more unusual findings. Unexpected sources of traffic, significant slowing in landing page load times or anything else unusual for that matter.

An example screenshot of our Slack alert system.

We’re well aware that being bombarded by alerts every day, for all your clients can desensitize you to them, to the point where they start to become invisible. That’s why we’re also working on a polling system so we can use machine learning to further classify the “usefulness” of an anomaly alert and only issue alerts when they’re important in context.

We leveraged Facebook’s open source forecasting library, Prophet, for this one. User input is minimal, all it requires is a dataset. Delivery teams are able to make tweaks to the forecast in line with planned campaigns and important dates.

What it does: Quickly outputs forecasts of any input time series data set, allowing the rapid development of robust performance predictions across a large number of clients.

Why that is clever: Forecasting web performance is not a quick task when you do it by hand; with the advent of AI “by hand” might as well mean “outdated”. 

Who uses it?: Paid Media, SEO, data and business development teams.

Uses & Benefits: What took days for delivery teams to collate and iron out now takes seconds. Paid Media and SEO teams can focus more time on crafting digital strategies to hit our clients’ targets.

Our (unpolished) UI for the forecasting tool.

This is our data-driven approach to influencer discovery and acquisition. We know good, authentic influencers are out there. I would also bet money that the social media platforms probably know in great detail who they are. Our influencer discovery tool is our attempt to peer through the veil to find them for ourselves at a scale not possible by a purely human approach. This is augmented influencer marketing.

What it does: Deep scan of social media platforms, including Facebook, Twitter, YouTube and Instagram to surface profile information of potential useful influencers based on input criteria.

Why that is clever: The tool empowers our Influencer Marketing gurus to analyze tens of thousands of social media profiles in terms of how well they engage their audience, the size of the audience and frequency of posting.

Who uses it?: Influencer marketing tea

We often visualise the results of the analysis in a Data Studio dashboard that allows our Influencer Marketing team to refine the results to whatever suits their campaign

Here’s the problem statement: downloading search volume data from Google’s Keyword Planner Tool is a very manual, labour-intensive process. Quite simply, it is just not scalable when analysing tens-of-thousands or more keywords and our SEO and Paid Media teams needed a way to retrieve accurate search volume data for very large volumes of keywords without taking away from time spent on analysis.

The Keyword Planner problem is actually compounded by another factor. The Keyword Planner Tool groups search volume for similar keywords together, meaning search volume per keyword isn’t accurate – it’s duplicated. For example, for “dress” and “dresses”, Google might say they each have a monthly search volume of 2,000 searches, leading us to believe that the total search volume for that portfolio is 4,000 FAKE NEWS SAD! The problem is, “dress” might have a search volume of 100, and “dresses” might have 1,900 searches.

To fix address this issue, we built an interface to Google’s Traffic Estimator Service API which allows users to process a virtually unlimited number of keywords while they get on with other work. The added bonus is that we designed an algorithm that decouples Google’s search volumes from similar keywords to give a much more accurate depiction of search volume per keyword. 

This tool was used most recently for a massive project where SEO had to conduct an in-depth analysis of online fashion landscape across a portfolio of more than 400,000 keywords. Since all keywords could be loaded into the tool at once, a lot more time could be devoted to data wrangling and analysis.

What it does: Retrieves Google search volume data for massive lists of keywords.

Why that is clever: We’ve built an interface to Google’s Traffic Estimator Service API which allows users to process a virtually unlimited number of keywords while they get on with other work. The added bonus is that we designed an algorithm that decouples Google’s search volumes from similar keywords to give a much more accurate depiction of search volume per keyword. 

Who uses it?: Our Paid Media and SEO teams

Uses & Benefits: Our tool is more productive and more accurate than Google’s Keyword Planner UI. If you’ve ever had to download search volumes for thousands (or more) keywords at a time using Google’s Keywords Planner tool then you already understand how much time automation could save. What used to be heavy manual labour for our SEO strategists is now a button click with a cuppa tea.

Traffic Estimator UI

The Ad Copy Performance Analyzer is used to facilitate “Total Search” – the integration between paid search and SEO by allowing for easy knowledge sharing.

It analyses the best combination of words from PPC ads (titles & descriptions) to inform SEO title tag creation with support from data-driven results.

It helps to optimise ad copy by informing analysts which phrases of words tend to drive the most clicks, conversions and profit

It’s an extra layer of analytics that uses Natural Language Processing to uncover what is really driving performance in ad copy.

What it does: Calculates performance of 1- to 5-word phrases present in ad copy

Why that is clever: Borrowing some basic natural language processing techniques empowers us to understand the language that resonates most strongly with shoppers and researchers. The only required input is the ID of the Google Ads account. Users can filter ads by a number of different criteria to hone the usefulness of the output. 

Who uses it: Paid Media, SEO and content

Uses & benefits: 

  • Ad optimization
  • Ad copy creation
  • Page targeting for SEO
  • Copywriting
  • A/B testing calls to action

Caption: Ad copy performance analyzer tool UI. The sidebar allows our teams to simply choose an account and apply any ad filters necessary for the analysis.

And that just about covers everything! If you’re looking for assistance or would like to discuss any of this further, get in touch – we’d love to hear from you!