James Bachini

robots.txt Disallow All | Block Bots

In this article we are going to look at how to block bot traffic using the robots.txt disallow all feature, then some of the more advanced uses of the robots.txt file.

  1. How To Disallow All in robots.txt
  2. Custom robots.txt for Specific Bots and Directories
  3. Complete List of Bots – robots.txt

How To Disallow All in robots.txt

If you want to block search engine and crawler bots from visiting your pages you can do so by uploading a robots.txt file to your sites root directory.

Include the following code in the file:-

User-agent: *
Disallow: /

Note that this will prevent search engine spiders accessing your site and will affect page rankings and search listings in Google and other search engines.

Custom robots.txt for Specific Bots and Directories

An alternative is to use user agent filtering to block specific bots. An example is below.

User-agent: Googlebot Disallow: /secret/

The above code in robots.txt would prevent Google from crawling any files in the /secret directory.

Go through the list at the bottom of this post and remove any bots that you are OK with accessing your site.

If you are getting a lot of bots from a particular traffic source optimise your sources. These type of bots will inevitably cloak the user agent anyway but can be detected by lack of micro-conversions, time on page and mouse actions via Javascript.

Complete List of Bots – robots.txt

User agent data source: https://raw.githubusercontent.com/monperrus/crawler-user-agents/master/crawler-user-agents.json

The following list contains all known bot and crawler user agents. This can be used to custom build a robots.txt file.

User-agent: Googlebot
User-agent: Googlebot-Mobile
User-agent: Googlebot-Image
User-agent: Googlebot-News
User-agent: Googlebot-Video
User-agent: AdsBot-Google
User-agent: AdsBot-Google-Mobile
User-agent: Feedfetcher-Google
User-agent: Mediapartners-Google
User-agent: Mediapartners
User-agent: APIs-Google
User-agent: bingbot
User-agent: Slurp
User-agent: [wW]get
User-agent: LinkedInBot
User-agent: Python-urllib
User-agent: python-requests
User-agent: libwww-perl
User-agent: httpunit
User-agent: nutch
User-agent: Go-http-client
User-agent: phpcrawl
User-agent: msnbot
User-agent: jyxobot
User-agent: FAST-WebCrawler
User-agent: FAST Enterprise Crawler
User-agent: BIGLOTRON
User-agent: Teoma
User-agent: convera
User-agent: seekbot
User-agent: Gigabot
User-agent: Gigablast
User-agent: exabot
User-agent: ia_archiver
User-agent: GingerCrawler
User-agent: webmon 
User-agent: HTTrack
User-agent: grub.org
User-agent: UsineNouvelleCrawler
User-agent: antibot
User-agent: netresearchserver
User-agent: speedy
User-agent: fluffy
User-agent: findlink
User-agent: msrbot
User-agent: panscient
User-agent: yacybot
User-agent: AISearchBot
User-agent: ips-agent
User-agent: tagoobot
User-agent: MJ12bot
User-agent: woriobot
User-agent: yanga
User-agent: buzzbot
User-agent: mlbot
User-agent: YandexBot
User-agent: YandexImages
User-agent: YandexAccessibilityBot
User-agent: YandexMobileBot
User-agent: purebot
User-agent: Linguee Bot
User-agent: CyberPatrol
User-agent: voilabot
User-agent: Baiduspider
User-agent: citeseerxbot
User-agent: spbot
User-agent: twengabot
User-agent: postrank
User-agent: TurnitinBot
User-agent: scribdbot
User-agent: page2rss
User-agent: sitebot
User-agent: linkdex
User-agent: Adidxbot
User-agent: ezooms
User-agent: dotbot
User-agent: Mail.RU_Bot
User-agent: discobot
User-agent: heritrix
User-agent: findthatfile
User-agent: europarchive.org
User-agent: NerdByNature.Bot
User-agent: sistrix crawler
User-agent: Ahrefs
User-agent: fuelbot
User-agent: CrunchBot
User-agent: IndeedBot
User-agent: mappydata
User-agent: woobot
User-agent: ZoominfoBot
User-agent: PrivacyAwareBot
User-agent: Multiviewbot
User-agent: SWIMGBot
User-agent: Grobbot
User-agent: eright
User-agent: Apercite
User-agent: semanticbot
User-agent: Aboundex
User-agent: domaincrawler
User-agent: wbsearchbot
User-agent: summify
User-agent: CCBot
User-agent: edisterbot
User-agent: seznambot
User-agent: ec2linkfinder
User-agent: gslfbot
User-agent: aiHitBot
User-agent: intelium_bot
User-agent: facebookexternalhit
User-agent: Yeti
User-agent: RetrevoPageAnalyzer
User-agent: lb-spider
User-agent: Sogou
User-agent: lssbot
User-agent: careerbot
User-agent: wotbox
User-agent: wocbot
User-agent: ichiro
User-agent: DuckDuckBot
User-agent: lssrocketcrawler
User-agent: drupact
User-agent: webcompanycrawler
User-agent: acoonbot
User-agent: openindexspider
User-agent: gnam gnam spider
User-agent: web-archive-net.com.bot
User-agent: backlinkcrawler
User-agent: coccoc
User-agent: integromedb
User-agent: content crawler spider
User-agent: toplistbot
User-agent: it2media-domain-crawler
User-agent: ip-web-crawler.com
User-agent: siteexplorer.info
User-agent: elisabot
User-agent: proximic
User-agent: changedetection
User-agent: arabot
User-agent: WeSEE:Search
User-agent: niki-bot
User-agent: CrystalSemanticsBot
User-agent: rogerbot
User-agent: 360Spider
User-agent: psbot
User-agent: InterfaxScanBot
User-agent: CC Metadata Scaper
User-agent: g00g1e.net
User-agent: GrapeshotCrawler
User-agent: urlappendbot
User-agent: brainobot
User-agent: fr-crawler
User-agent: binlar
User-agent: SimpleCrawler
User-agent: Twitterbot
User-agent: cXensebot
User-agent: smtbot
User-agent: bnf.fr_bot
User-agent: A6-Indexer
User-agent: ADmantX
User-agent: Facebot
User-agent: OrangeBot
User-agent: memorybot
User-agent: AdvBot
User-agent: MegaIndex
User-agent: SemanticScholarBot
User-agent: ltx71
User-agent: nerdybot
User-agent: xovibot
User-agent: BUbiNG
User-agent: Qwantify
User-agent: archive.org_bot
User-agent: Applebot
User-agent: TweetmemeBot
User-agent: crawler4j
User-agent: findxbot
User-agent: SEMrushBot
User-agent: yoozBot
User-agent: lipperhey
User-agent: Y!J
User-agent: Domain Re-Animator Bot
User-agent: AddThis
User-agent: Screaming Frog SEO Spider
User-agent: MetaURI
User-agent: Scrapy
User-agent: Livelap[bB]ot
User-agent: OpenHoseBot
User-agent: CapsuleChecker
User-agent: [email protected]
User-agent: IstellaBot
User-agent: DeuSu
User-agent: betaBot
User-agent: Cliqzbot
User-agent: MojeekBot
User-agent: netEstate NE Crawler
User-agent: SafeSearch microdata crawler
User-agent: Gluten Free Crawler
User-agent: Sonic
User-agent: Sysomos
User-agent: Trove
User-agent: deadlinkchecker
User-agent: Slack-ImgProxy
User-agent: Embedly
User-agent: RankActiveLinkBot
User-agent: iskanie
User-agent: SafeDNSBot
User-agent: SkypeUriPreview
User-agent: Veoozbot
User-agent: Slackbot
User-agent: redditbot
User-agent: datagnionbot
User-agent: Google-Adwords-Instant
User-agent: adbeat_bot
User-agent: WhatsApp
User-agent: contxbot
User-agent: pinterest.com.bot
User-agent: electricmonk
User-agent: GarlikCrawler
User-agent: BingPreview
User-agent: vebidoobot
User-agent: FemtosearchBot
User-agent: Yahoo Link Preview
User-agent: MetaJobBot
User-agent: DomainStatsBot
User-agent: mindUpBot
User-agent: Daum
User-agent: Jugendschutzprogramm-Crawler
User-agent: Xenu Link Sleuth
User-agent: Pcore-HTTP
User-agent: moatbot
User-agent: KosmioBot
User-agent: pingdom
User-agent: AppInsights
User-agent: PhantomJS
User-agent: Gowikibot
User-agent: PiplBot
User-agent: Discordbot
User-agent: TelegramBot
User-agent: Jetslide
User-agent: newsharecounts
User-agent: James BOT
User-agent: Bark[rR]owler
User-agent: TinEye
User-agent: SocialRankIOBot
User-agent: trendictionbot
User-agent: Ocarinabot
User-agent: epicbot
User-agent: Primalbot
User-agent: DuckDuckGo-Favicons-Bot
User-agent: GnowitNewsbot
User-agent: Leikibot
User-agent: LinkArchiver
User-agent: YaK
User-agent: PaperLiBot
User-agent: Digg Deeper
User-agent: dcrawl
User-agent: Snacktory
User-agent: AndersPinkBot
User-agent: Fyrebot
User-agent: EveryoneSocialBot
User-agent: Mediatoolkitbot
User-agent: Luminator-robots
User-agent: ExtLinksBot
User-agent: SurveyBot
User-agent: NING
User-agent: okhttp
User-agent: Nuzzel
User-agent: omgili
User-agent: PocketParser
User-agent: YisouSpider
User-agent: um-LN
User-agent: ToutiaoSpider
User-agent: MuckRack
User-agent: Jamie's Spider
User-agent: AHC
User-agent: NetcraftSurveyAgent
User-agent: Laserlikebot
User-agent: ^Apache-HttpClient
User-agent: AppEngine-Google
User-agent: Jetty
User-agent: Upflow
User-agent: Thinklab
User-agent: Traackr.com
User-agent: Twurly
User-agent: Mastodon
User-agent: http_get
User-agent: DnyzBot
User-agent: botify
User-agent: 007ac9 Crawler
User-agent: BehloolBot
User-agent: BrandVerity
User-agent: check_http
User-agent: BDCbot
User-agent: ZumBot
User-agent: EZID
User-agent: ICC-Crawler
User-agent: ArchiveBot
User-agent: ^LCC 
User-agent: filterdb.iss.netcrawler
User-agent: BLP_bbot
User-agent: BomboraBot
User-agent: Buck
User-agent: Companybook-Crawler
User-agent: Genieo
User-agent: magpie-crawler
User-agent: MeltwaterNews
User-agent: Moreover
User-agent: newspaper
User-agent: ScoutJet
User-agent: (^| )sentry
User-agent: StorygizeBot
User-agent: UptimeRobot
User-agent: OutclicksBot
User-agent: seoscanners
User-agent: Hatena
User-agent: Google Web Preview
User-agent: MauiBot
User-agent: AlphaBot
User-agent: SBL-BOT
User-agent: IAS crawler
User-agent: adscanner
User-agent: Netvibes
User-agent: acapbot
User-agent: Baidu-YunGuanCe
User-agent: bitlybot
User-agent: blogmuraBot
User-agent: Bot.AraTurka.com
User-agent: bot-pge.chlooe.com
User-agent: BoxcarBot
User-agent: BTWebClient
User-agent: ContextAd Bot
User-agent: Digincore bot
User-agent: Disqus
User-agent: Feedly
User-agent: Fetch
User-agent: Fever
User-agent: Flamingo_SearchEngine
User-agent: FlipboardProxy
User-agent: g2reader-bot
User-agent: G2 Web Services
User-agent: imrbot
User-agent: K7MLWCBot
User-agent: Kemvibot
User-agent: Landau-Media-Spider
User-agent: linkapediabot
User-agent: vkShare
User-agent: Siteimprove.com
User-agent: BLEXBot
User-agent: DareBoost
User-agent: ZuperlistBot
User-agent: Miniflux
User-agent: Feedspot
User-agent: Diffbot
User-agent: SEOkicks
User-agent: tracemyfile
User-agent: Nimbostratus-Bot
User-agent: zgrab
User-agent: PR-CY.RU
User-agent: AdsTxtCrawler
User-agent: Datafeedwatch
User-agent: Zabbix
User-agent: TangibleeBot
User-agent: google-xrawler
User-agent: axios
User-agent: Amazon CloudFront
User-agent: Pulsepoint
User-agent: CloudFlare-AlwaysOnline
User-agent: Google-Structured-Data-Testing-Tool
User-agent: WordupInfoSearch
User-agent: WebDataStats
User-agent: HttpUrlConnection
User-agent: Seekport Crawler
User-agent: ZoomBot
User-agent: VelenPublicWebCrawler
User-agent: MoodleBot
User-agent: jpg-newsbot
User-agent: outbrain
User-agent: W3C_Validator
User-agent: Validator.nu
User-agent: W3C-checklink
User-agent: W3C-mobileOK
User-agent: W3C_I18n-Checker
User-agent: FeedValidator
User-agent: W3C_CSS_Validator
User-agent: W3C_Unicorn
User-agent: Google-PhysicalWeb
User-agent: Blackboard
User-agent: ICBot
User-agent: BazQux
User-agent: Twingly
User-agent: Rivva
User-agent: Experibot
User-agent: awesomecrawler
User-agent: Dataprovider.com
User-agent: GroupHigh
User-agent: theoldreader.com
User-agent: AnyEvent
User-agent: Uptimebot.org
User-agent: Nmap Scripting Engine
User-agent: 2ip.ru
User-agent: Clickagy
User-agent: Caliperbot
User-agent: MBCrawler
User-agent: online-webceo-bot
User-agent: B2B Bot
User-agent: AddSearchBot
User-agent: Google Favicon
User-agent: HubSpot
User-agent: Chrome-Lighthouse
User-agent: HeadlessChrome
User-agent: CheckMarkNetwork
User-agent: www.uptime.com
User-agent: Streamline3Bot
User-agent: serpstatbot
User-agent: MixnodeCache
User-agent: ^curl
User-agent: SimpleScraper
User-agent: RSSingBot
User-agent: Jooblebot
User-agent: fedoraplanet
User-agent: Friendica
User-agent: NextCloud
User-agent: Tiny Tiny RSS
User-agent: RegionStuttgartBot
User-agent: Bytespider
User-agent: Datanyze
User-agent: Google-Site-Verification
User-agent: TrendsmapResolver
User-agent: tweetedtimes
User-agent: NTENTbot
User-agent: Gwene
User-agent: SimplePie
User-agent: SearchAtlas
User-agent: Superfeedr
User-agent: feedbot
User-agent: UT-Dorkbot
User-agent: Amazonbot
User-agent: SerendeputyBot
User-agent: Eyeotabot
User-agent: officestorebot
User-agent: Neticle Crawler
User-agent: SurdotlyBot
User-agent: LinkisBot
User-agent: AwarioSmartBot
User-agent: AwarioRssBot
User-agent: RyteBot
User-agent: FreeWebMonitoring SiteChecker
User-agent: AspiegelBot
User-agent: NAVER Blog Rssbot
User-agent: zenback bot
User-agent: SentiBot
User-agent: Domains Project
User-agent: Pandalytics
User-agent: VKRobot
User-agent: bidswitchbot
User-agent: tigerbot
User-agent: NIXStatsbot
User-agent: Atom Feed Robot
User-agent: Curebot
User-agent: PagePeeker
User-agent: Vigil
User-agent: rssbot
User-agent: startmebot
User-agent: JobboerseBot
User-agent: seewithkids
User-agent: NINJA bot
User-agent: Cutbot
User-agent: BublupBot
User-agent: BrandONbot
User-agent: RidderBot
User-agent: YandexMetrika
User-agent: YandexTurbo
User-agent: YandexImageResizer
User-agent: YandexVideoParser
User-agent: Taboolabot
User-agent: Dubbotbot
User-agent: FindITAnswersbot
User-agent: infoobot
User-agent: Refindbot
User-agent: BlogTrafficd.d+ Feed-Fetcher
User-agent: SeobilityBot
User-agent: Cincraw
User-agent: Dragonbot
User-agent: VoluumDSP-content-bot
User-agent: FreshRSS
User-agent: BitBot
Disallow: /
bots

Get The Blockchain Sector Newsletter, binge the YouTube channel and connect with me on Twitter

The Blockchain Sector newsletter goes out a few times a month when there is breaking news or interesting developments to discuss. All the content I produce is free, if you’d like to help please share this content on social media.

Thank you.

James Bachini

Disclaimer: Not a financial advisor, not financial advice. The content I create is to document my journey and for educational and entertainment purposes only. It is not under any circumstances investment advice. I am not an investment or trading professional and am learning myself while still making plenty of mistakes along the way. Any code published is experimental and not production ready to be used for financial transactions. Do your own research and do not play with funds you do not want to lose.


Posted

in

,

by