🔎 400,000+ GitHub Wikis, now indexable by your favorite search engine.
GitHub Wiki Search Engine Enablement (GHWSEE) allows non-indexed GitHub Wikis to be indexed by search
engines. The robots.txt
and HTTP headers on
this site do not prevent indexing of wiki content and GitHub Wiki content proxied from GitHub and served from
this service can be crawled and indexed by major search engines.
The situation so far:
GHWSEE proxies the rendered contents of the pages and has a link with a large button at the top to visit the page on GitHub for everyday usage and a link at the bottom explaining what GHWSEE is. The content of this service is not meant to be read directly by users and is intended for crawlers. GHWSEE is designed to behave like those StackOverflow content mirroring sites that rank high with "stolen" content from StackOverflow but with far less ad-farming. Unlike its inspiration, GHWSEE has no ads and tries to make it obvious, quick, and easy for users to get to the content they want directly on GitHub.com short of an automatic redirect that may make search engines not crawl or index the content.
Another purpose of GHWSEE is to bring awareness to the issue of GitHub Wikis purposefully not being indexed since 2012. Many are caught off-guard and have produced massive libraries of information that they are unaware of being generally invisible to the internet. Likewise, many searching for information in these libraries are not aware that the information is invisible to them if it was in a GitHub Wiki. Up until April 2021 (9 years since 2012), GitHub did not document this limitation; the only way you knew was by previously looking at robots.txt. A warning is still not present in GitHub's UI itself about this limitation. By showing up in real-world search results, it is hoped that greater awareness to this issue can be made amongst affected communities and projects.
If you are a GitHub Wiki maintainer, you may want to consider GitHub Pages backed by a public repository for your community's content as a more crawlable, stable, blessed, and probably higher search ranking alternative. This alternative is also mentioned in GitHub Docs about GitHub Wikis. Consider adding an "Edit on GitHub" link to the content on GitHub Pages to have an experience closer to wikis, easily permit editing, and to encourage contributions. This is the official solution which can be burdensome and you are welcome to continue using GHWSEE.
If you are interested in providing feedback to GitHub or seeing what GitHub staff has said about their blocking of Wiki content from appearing in search engines, participate in the discussion here: https://github.com/github/feedback/discussions/4992. Let us hope that GitHub can find a solution to unblock GitHub Wiki content in harmony with their SEO concerns, and GHWSEE can be decommissioned. Already, Wikis that are known to be directly indexable are skipped by GHWSEE as part of the partial decommissioning.
For anti-abuse reasons, all links rendered in the service that are going out of GitHub are tagged with
rel="nofollow ugc"
as to not affect search engine rankings or to promote mass vandalism of GitHub Wikis. This is in addition to
rel="nofollow"
in the original content.
These are just examples and testing wikis.
So far, about $300 have been spent on experiments, queries, and so on with this service during its initial ramp-up. The good news is that the cost to currently host the service is very low. In lieu of any compensation, please consider donating money recurringly or one-time to the Internet Archive and/or your time to Archive Team projects.
This service will be completely decommissioned to redirect old links once the block is lifted or GitHub produces some other solution to index all useful GitHub Wikis in harmony with their SEO concerns.
People shouldn't have linked to this site but maybe the warnings weren't big and red enough. Flashing yellow was very unpopular. Regardless, let us not contribute to link rot.
As of January 2022, GitHub does appear to be letting some Wikis be indexed according to some non-public
criteria. These small sample of Wikis allow native indexing by not having a x-robots-tag: none
HTTP
header. A Cloudflare Worker is implemented in front of the proxied URLs to check GitHub and automatically
redirect if the backing page is detected to allow indexing if something doesn't get added to that list on time.
By automatically redirecting, it is hoped that GHWSEE's entries for that GitHub wiki falls off the search engine
index and that GHWSEE's ranking may be conferred directly on GitHub's ranking.
Some projects have chosen to mirror their own wiki's content onto their own site as well. These cannot be automatically detected, but may be manually added to the decommission list.
The small sample of links below help test the partial decommissioning functionality.
This project is not affiliated with GitHub in any way, if it wasn't obvious from the substandard CSS and code. This project is only made and run by nelsonjchen, GitHub User #5653.