Google sites - advice for getting a good capture?
I'm doing test crawls on a website made using google sites (e.g. https://sites.google.com/view/...)
Has anyone managed to get a good capture of such a site, and if so how did you do it? I'm getting vastly proportioned header and icons, and all the column formatting seems to disappear.
Thanks in advance, Dana
-
Official comment
Hi Dana,
Thanks for sharing your question. It sounds like some of the stylesheets and/or page templates are out of scope. You can check the Out of Scope column for the domain sites.google.com in your crawl report’s Hosts tab to view all the documents that the crawler found, but determined to be out of scope.
In the past we’ve noticed that it’s necessary to add an expand scope rule to include URLs that contain “sites.google.com” at the seed level to ensure that the look and feel elements of the site will be captured. Adding this at the seed level is important because this expand scope rule at the collection level will cause every seed in every crawl to look for google content.
If you don’t notice any improvement or are having trouble reading the hosts report, please feel welcome to submit a support ticket and we’ll take a closer look for you.
Thanks again,
Mary
Comment actions -
This is an old post, but I'm having a similar issue in 2025 -- trying to crawl a site made with Google Sites and I can't capture the top menu/nav bar. I've added sites.google.com as an expanded scope rule and run the crawl as both standard and brozzler. Attempting to patch the crawl with QA doesn't show me any missing documents that would be causing the menu to not appear.
Anyone have any suggestions?
The site I'm trying to capture is https://www.spiritofasilomar.org
Please sign in to leave a comment.
Comments
2 comments