Partners can crawl, archive, and often replay content from the social media platform Pinterest. These sites frequently do not require any crawl scope modification. However, some Pinterest seeds will benefit from putting the rules below in place.
General scoping rules for Pinterest
- The most effective way to archive this platform is to use the Standard page seed type.
- Ignore robots at the seed level or on the host www.pinterest.com
- As some Pinterest sites can be data heavy, we recommend at least a 3 day test crawl to start.
- Crawl using Brozzler
Optional scoping rules for Pinterest
- To limit the scope of your crawls to only archive content in English, add a rule to block URLs on pinterest.com that match the following regular expression: ^https?://[a-z]{2}.pinterest.com/.*$
What to expect from your archived Pinterest seeds
We regularly see playback issues for Pinterest content, especially for web pages of individual pins. Please review the results of Pinterest crawls for completeness and accuracy. A long-term development fix is necessary to improve replay issues, so please submit a support ticket to let us know if you experience these so we can gauge partner demand for a solution.
Comments
0 comments
Please sign in to leave a comment.