Automation to bypass captchas

In this section we will show how you can make use of our service in a real-life scenario to bypass captchas using our captcha solver, by testing it on our own reCAPTCHAs (keys), on our own website. We're going to do it in 2 programming languages, most common languages one being a compiled language, and one a scripting language. We've picked C# and Python for this. We'll bypass the pages with both browser and without browser (only http requests) which can be found on the Examples paragraph

Here are the test pages with captchas we've developed for testing:

  • image - testing image (classic) captcha
  • reCAPTCHA v2 - most common captcha used this days, from Google
  • invisible reCAPTCHA - another captcha from Google, similar to above one, but works a bit differently

Types

Bypassing Image captcha - (demo)

Background

The image captcha, is the classic captcha. You're presented with an image like this:

User is asked to type the characters presented inside the image into a text field

Solution

The solution for this is quite easy (this days). You have to send us the image, as a b64 encoded string. Most of our API libraries have methods that support a file instead of b64 string, which is then transformed into b64 by the libraries, so you don't have to worry about that.

Once submitted, you receive an ID for that captcha.

After submission, our workers will then solve the image captcha. Using the ID you can regulary check for the text for the submitted captcha. Once workers finished solving it, and you check for text, text inside the image will be returned with the request.

Still not sure how it works ? No worries. For this and for the rest of captcha types, we have examples on this page showing this entire process on how to bypass these captchas using our captcha solver.

Bypassing reCAPTCHA v2 (demo)

Background

Most common captcha used today. For a webmaster to use this, he would need to register an account to Google, get inside the reCAPTCHA dashboard an create a new captcha. This requires one thing from the webmaster, which is a list of domains on which the captcha will work. Usually, this list contains only one domain. Webmaster is provided with a pair of keys site_key and secret_key. The site_key is a public key, which is used inside the websites frontend, to load reCAPTCHA on site.

User will then see something similar

When you click the checkmark, you're asked to click some images, from a set of images, based on what's shown in the title of the reCAPTCHA as shown here

Once the user solved all the images, he will see this and the captcha has been bypassed.

At this point, something interesting is happening. The DOM is changing, and a token is inserted into the page

This is from the console of the browser

This is also what the webmasters use, to verify if captcha was actually solved correctly. They first check if the token exists in the 1st place that tells if the captcha was completed at all.

Once token is present, to make sure it's not being set by a hacker to something random and still bypass the page. The token generated is then sent to google (from the backend), along with the secret_key to verify the token validity. Based on this answer, the webmasters know if the token is indeed genuine and acts accordingly (eg. accepts the data of the request / form gets submitted successfully)

Solution

In order to bypass the captcha page, without actually solving the captcha on your side, but still be able to automate, there is a solution. That is, solving the captcha somewhere else, and get a valid token back. This is where we come in. In order for us to do this, we need 2 things from your end. That is the page_url where captcha was found and site_key. The page_url is what you see in the browser web address, and the site_key can be gathered by looking inside the page source, where you'll find something like

When you submit the page_url and site_key to us, similar to image captcha process, you get an ID back, which is regulary checked on our system, to get the gresponse / token back, which was generated by our workers.

After you have the token, you could set it in page using JS and jQuery like this:

Only thing left now is to submit the form normally. That will let you bypass the captcha form, and show you something similar:

Bypassing Invisible reCAPTCHA (demo)

Background

An improved version of recaptcha V2, which works in most parts but does have few things different.

One difference is that captcha loads on demand. This means, the captcha is loaded into the page, and you can always know that, by looking at the bottom right of the page, and you'll see something similar:

When you submit the form, captcha either loads up, or you get to not even see it, based on what it things. If it thinks you're more a robot than a human, you get presented with the captcha, same as V2 but without clicking any checkmark.

When the captcha is completed correctly, a callback method is executed, which handles the real form submission (eg. login or register)

For this to work, the captcha also loads with a different code, shown here:

The submission button, has two data tags added to it the data-sitekey and data-callback

  • data_sitekey - sitekey of invisible reCAPTCHA
  • data-callback - a JS method, which gets executed after captcha gets solved by user. Again with this captcha, most of the time user might not even see any captcha (if captcha he thinks he's human enough). In that case, callback gets executed almost instantly

The verification process on the server side is done in the same way as v2. Token is also verified along with secretkey, with Google's endpoint by webmaster.

As you can see in the image above, verification request is sent to our server, which in backend does the request with Google. Privatekey is not known by the client

Solution

The communication with our server from your end, the developer is the same. We require a page_url and sitekey for the solving, along with the type parameter set to 2, which tells us it's invisible reCAPTCHA.

What does change for the developer in this situation is how you set the token, or better said, how you submit the form, once you have the token.

You'll have to find the callback method, and execute that with the token as parameter like this:

callback_method(token_from_us);

Some websites are made so that it's easy to find the callback method, so make it harder. In some cases it does require some skills to find the callback method.

Enough talking, let's get our hands dirty and dive into the examples

Examples to bypass captchas

We'll go through all 3 types of captchas, and for each type we'll do it in 2 ways: with browser, with pure http requests

It's important to keep in mind that each website is built differently, and tweaks and tricks might be required to get the site_key, submit a form, etc. Still, we've tried to cover here the most common real-life scenarios.

For the browser, in both languages, we'll use Selenium WebDriver with the Chrome browser. Chrome or chromium has to be installed on the system in order for this to work.

When it comes to requests, in python we'll use the requests library and in C# the HttpWebRequest and HttpWebResponse that come with the System.Net namespace.

The main function of the code looks like this:

For this to work you have to change the ACCESS_TOKEN to your own access token

Let's start with the easiest one, that is the image captcha

Bypassing Image captcha

Browser

The process here is quite easy. Here's what the above code does:

  • start browser line 31
  • go to image captcha url line 34
  • complete rest of fields (username) line 36
  • get b64image from page line 37
  • submit b64image to our system line 41-43
  • get captcha text and check if it was solved line 47
  • set captcha text in page once solved line 51
  • submit form line 53
  • quit / close browser line 57

Requests

In this case, we had to also get the URL on which we had to submit the data, once we have the captcha, which was also gathered from the page source

  • create a new requests session, which will handle cookies for us line 65
  • make 1st request, so cookie is set by server, and we get the image line 68
  • parse response using lxml line 70
  • get image we're interested in line 71
  • get b64 encoded string of image, from src attribute line 72
  • submit b64image to our system line 75-77
  • get captcha text and check if it was solved line 81
  • submit data to server line 87-90

Bypassing reCAPTCHA v2

Browser

Not too hard compared to image captcha either. Here's how to go through a reCAPTCHA v2 page protected from top to bottom

  • start browser line 101
  • navigate to page url line 104
  • complete rest of fields (username) line 106
  • get site_key from page line 107
  • submit reCAPTCHA v2 to our system line 111-114
  • get gresponse and check if it was completed line 118
  • set gresponse in page once we have it, through JS and jQuery line 122
  • submit form line 124
  • quit browser line 128

Requests

  • create a new requests session line 136
  • make a request, so we get needed data such as site_key line 139
  • parse response using lxml line 141
  • get site_key from data-sitekey attribute of div with g-recaptcha class line 142
  • submit reCAPTCHA v2 to our system line 145-148
  • get gresponse and check if it was completed line 152
  • submit data to server and get response line 158-162

Bypassing Invisible reCAPTCHA

Browser

Very similar to reCAPTCHA v2, only difference is we have to pass the type parameter with the submission, set to value 2 and the submission of gresponse into page

  • start browser line 173
  • navigate to page url line 176
  • complete rest of form fields (username) line 178
  • get site_key from page line 179
  • submit invisible reCAPTCHA2 to our system with type parameter set to 2 line 183-187
  • get gresponse and check if it was completed line 191
  • execute callback method, with gresponse as parameter line 195
  • quit browser line 200

Requests

  • create a new requests session line 207
  • make a request, so we get needed data such as site_key line 210
  • parse response using lxml line 212
  • get site_key from data-sitekey attribute of button with g-recaptcha class line 213
  • submit invisible reCAPTCHA to our system line 216-220
  • get gresponse and check if it was completed line 224
  • submit data to server and get response line 230-234

Download

You can get the full source code from github, for both Python2 and C# using the links below to bypass captchas using our captcha solver: