One of the biggest challenges for SEOs is Google itself. The black box of Google, as Dan described it to me early on at my time at LSG, limits its displayed metrics in Google Search Console to representative calculations. Google causes even more SEO indigestion when you try to claw out a site’s historical performance from these tools and you run into the, now defunct, 90-day limits for GSC queries data. It gets even worse when you’re trying to compare dozens of sites you manage that target the same niche. If you know where this is going (which is a NodeJS script that connects to the GSC API and downloads all the analytics data for each site managed by a single Google account), feel free to skip down to the bottom. If you’re unsure of why this might be useful or what kind of day-to-day work this might replace, read on!
The old-fashioned way to get data from all your sites is to go through each property you can view, click through to the search analytics tab, and then export the data as chopped and screwed as you’d like. This is totally a valuable exercise that all SEOs probably need to do occasionally (and particularly at the beginning of their careers). Any SEO worth their salt could burn through those click-throughs in a matter of seconds, and are probably hindered more by their ISP/Google’s latency and the time it takes your browser to render the page than they are by anything else. But you might as well light your hands on fire as a sacrifice to the carpal tunnel gods at that point.
At the end of the day, there are no real good options for doing this that doesn’t require an unlimited budget or a little technical know how; and even if you’re the best damn SEO in the land, combing through a thousand sites individually creates an absurd opportunity cost with literally every other thing you’re doing as an SEO. Even if you’ve got the best damn keyword tracker in the land, if it’s not setup flawlessly, then you’re probably missing some important keyword cannibalization issues that you didn’t even think about. Now, I’m not suggesting this script is a silver bullet for inter-site keyword cannibalization that will magick away all your micro-keyword optimizations, but this is a good start at significantly reducing that opportunity cost of discovering keyword competition/cannibalization between your sites that you aren’t aware about.
An example of this unknown unknown keyword competition might be a few local car dealerships selling Kia’s in two different markets from two different websites. Let’s say one site is for a dealership in the DFW area in Texas, which has high sales volume, and the other site you manage is a dealership in a place like Lamesa, Texas, a small town that will have a comparatively low sales volume. If the small market, low sales volume site is kicking ass, and in particularly kicking the ass of your large market, high sales volume site, then knowing that can help you move your high volume site back to the top of organic search.
Now, for the nitty gritty details about the API. The GSC API limits each API to 200 queries per minute, and each month you are given a 100,000,000 query limit, which means you could fire off the max queries per minute and you will still have about 90,000,000 queries remaining on your quota. While this script is in no way optimized to fire off 10 million queries per month, it’s a good start on burning through your limits.
This script will access all sites authorized in your GSC account and export all queries by page over the past 90 days; and that’s just the default behavior. It also enables downloading queries using custom dates, accessing queries using most dimensions available within GSC (i.e. mobile, desktop, country, etc), and search types (web is default, but image search and video search are also available).
Just download the zip file and extract it. Inside you’ll find a detailed README written in Markdown that should be easy to follow. The script was written using Node 8.4.0, and so no guarantees it will work with prior versions of NodeJS. That being said, it does not leverage any super recent additions to the NodeJS API such as async/await, so it will likely work on most other Node versions down to 0.10.x.
If anyone has any suggestions or comments for improving or updating the script please let me know, as this will eventually be moved to github and added as an NPM package. My email address is firstname.lastname@example.org.