I just got back from #TechSEOBoost and spent a lot of time engaged in amazing conversations about data analysis, math and very importantly sharing data. So, after thinking about it, I decided I’m going to open source the data we used for the 2017 Local SEO Ranking Factors.
This data is pretty interesting, and honestly would be pretty expensive. It’s ~150k rows of data (each row representing a different business listing on Google My Business). That data was scraped by Places Scout and joined with a bunch of their own data, as well as link API data from AHREFs and Majestic. All in all there are ~150 data points per listings/business, and you can find out more about them here in this data dictionary.
For those curious what we did with the data, we employed two statistical methods. First Kendell’s Tau-b was used to analyze ordinal variables (continuous or integer independent variables) while the Kruskal-Wallis test was used for categorical independent variables. These tests were before we started doing more complicated linear regressions and other modeling, so I’m kinda excited to see what people will do with the data
So, why am I doing this? Well, first as someone who is a constant critic of the way other people conduct research I felt it was time to put up instead of shutting up (as people who know me, know I’m not very good at that.) Also, I just spent a lot of time getting help from amazing members of the community. I have also had the benefit of having people like Andrew, who constantly gave me free help for basically no reason, long before we started working together. There are some amazing parts of this community to counteract the ones that aren’t so great, and I’m gonna start contributing more there.
The data is the hyperlink below, in a Google Cloud bucket:
Also, I have the forthcoming analysis for the 2019 Local SEO Ranking Factors, which will be going live later this week! So much data!