Bash scripts for proxy grabbing, checking and applying to PAC file. Make your own list of 99.9% uptime proxies

There is a way to form 99.9% uptime proxies and visit inaccessible sites without using Tor browser, VPN or others. All we need is bash, proxybroker, curl and grep.

ProxyBroker is easier to install using pip (tool to automatically install and configure python packages).

In addition I have an article about how to install ProxyBroker in a separate directory without pip and root access.

It may be interesting: Configure proxies for certain sites in a browser, without addons

Here is my system:

uname -a            : Linux Computer 4.18.0-1-amd64 #1 SMP Debian 4.18.6-1 (2018-09-06) x86_64 GNU/Linux
bash --version    : GNU bash, version 4.4.23(1)-release
proxybroker -v  : proxybroker 0.3.2
grep -V              : grep (GNU grep) 3.1
curl -V               : curl 7.60.0

These are all the tools we need.

Add certain amount of proxies to a file (PROXY_ADD script)

Here is the first bash script:

#!/bin/bash
if [ -z "$1" ]
then
	echo "Pass desired amount of proxies as an argument"		                            # Error if we didn't pass an argument to the script
	exit 1
fi

MAIN_PROXY_LIST="/home/user/Documents/My_Proxy_List"			    # Edit it. Where our main proxy list is located
export PYTHONPATH="/home/user/.local/packages/ProxyBroker3.2/lib/python"   # This line is needed if ProxyBroker was installed in a separate directory

proxybroker --timeout 2 --max-conn 20 --max-tries 2 find --types HTTP HTTPS --lvl Anonymous High --limit $1 | grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}:[0-9]*' >> "$MAIN_PROXY_LIST"

This script adds new proxies to the main proxy list file in IP:PORT format. Old proxies in this file will not be deleted. That is, each time the script runs, amount of proxies in the file increases.

Edit MAIN_PROXY_LIST variable and set your own path to this file.

Assume we call it PROXY_ADD. Don’t forget to give execution rights:

chmod +x ./PROXY_ADD

This can now be called with the following command, pass a number of proxies that should be grabbed as an argument:

PROXY_ADD 20                                  # It will grab 20 proxies

List saved proxies:

cat ~/Documents/My_Proxy_List          # File path that we set
...
xxx.xxx.xxx.xxx:8080
xxx.xxx.xxx.xxx:3128
...

Here are the settings that you can change to your own, if necessary:

--timeout 2                        - Timeout of a request in seconds (How fast a proxy responds to a request)
find                                   - Find and check proxy (It should not be dead, we will check a site availability later)
--types HTTP HTTPS      - Find for HTTP HTTPS proxies
--lvl Anonymous High      - If HTTP proxy it must be Anonymous or High (These proxy types don't show your real ip address, but it still can be revealed via javascript or browser)
--limit $1                            - The number of proxies we need to get (In our case it is commandline argument)

OK. We have proxies, but some proxies can censor certain sites. Therefore, we must check required site for availability.

Check website availability from all proxies in the file (PROXY_VERIFY script)

Here is the second script:

#!/bin/bash
MAIN_PROXY_LIST="/home/user/Documents/My_Proxy_List"    # Edit it. Where  your main proxy list is located
VERIFIED_PROXY_LIST="/tmp/verified_proxy_list.tmp"
SITE='bing.com'                                                                                 # Write only a domain name on which we want to check proxies
SITE_KEYWORD='Bing helps you turn information into action'     # Keyword or keyphrase that helps us, read below!

if [ -s "${VERIFIED_PROXY_LIST}" ]
then
   rm "${VERIFIED_PROXY_LIST}"
fi

echo "$(sort -u ${MAIN_PROXY_LIST})" > "${MAIN_PROXY_LIST}"       # Sorts and removes duplicated proxies

for proxy in $(cat "${MAIN_PROXY_LIST}")
do
	if curl -L -H "Host: ${SITE}" -H "Cache-Control: max-age=0" -H "Acc,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8" -H "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.89 Safari/537.36" -H "HTTPS: 1" -H "DNT: 1" -H "Referer: https://${SITE}" -H "Accept-Language: en-US,en;q=0.8,en-GB;q=0.6,es;q=0.4" --compressed -s --max-time 3 -x $proxy "$SITE" | grep "$SITE_KEYWORD"
	then
		echo $proxy >> "${VERIFIED_PROXY_LIST}"
		echo "$proxy works with ${SITE}, added to verified proxy list  <========================================"
	fi
done

cp -i "${VERIFIED_PROXY_LIST}" "${MAIN_PROXY_LIST}"    # Ask about main proxy list overwriting
rm "$VERIFIED_PROXY_LIST"                                                        # Remove temporary file

Edit MAIN_PROXY_LIST, SITE and SITE_KEYWORD variables.

MAIN_PROXY_LIST is the path to our main proxy list.

SITE is a domain name in which we want to make sure that it is available (proxy is free to access the pages of this site).

SITE_KEYWORD is a keyword or keyphrase which we can use to make sure that we have successfully loaded the site page.

How to find a keyphrase? Go to the site homepage and click ‘View Page Source’:

Go to site homepage and click View Page Source
Bing homepage

Then look at the page html code and find some description, name or tag which all the time is presented in this page:

Bing helps you turn information into action
This is my keyphrase for Bing website. This is in the site description, it doesn’t change in time

And add this keyphrase to SITE_KEYWORD variable.

What does this script do?

First of all, it sorts our main proxy list and removes duplicates. Don’t be afraid, a proxy will not repeat.

Then the script tests all proxies in our main list.

If a proxy has loaded the page and found the keyphrase, it will be added to the list of verified proxies.

When all proxies in the main list are checked, the script will ask us to overwrite our main proxy list with the verified proxy list (it also contains old proxies that were able to pass the test).

Thus, launching this script each time forms a more stable list with high uptime proxies.

You can change curl options and user-agents if needed.

Also notice this option:

--max-time 3

This is the time for which a page should load completely (in seconds). You can modify this value if your page is very large and takes a long time to load. Also, this option can filter a proxy download speed.

Great, now we can add a list of working and fast proxies to our PAC file. See this article

Export proxies to a PAC file (PROXY_TO_PAC script)

PAC file should have content as in the previous article:

cat ~/proxy.pac
...
function FindProxyForURL(url, host) {
var proxy = "No matter what is here"        		                // This variable will be found and patched
var sites = new Array("bing.com", "duckduckgo.com");	// List of sites on which proxies must be used
 for(var i=0; i<sites.length; i++) {
 var domain = sites[i];
 if ( localHostOrDomainIs(host, domain) ) 
     {
       return proxy;
     }
 }
return "DIRECT";
}

Here is the third script, this adds proxies from our proxy file to proxy variable in a PAC file:

#!/bin/bash
MAIN_PROXY_LIST="/home/user/Documents/My_Proxy_List"     # Path to main proxy list
PAC_FILE="/home/user/proxy.pac"				                     # Path to PAC file

for ip in $(cat "${MAIN_PROXY_LIST}")
do
	PROXY_STRING="$PROXY_STRING PROXY $ip;"
done
PROXY_STRING="${PROXY_STRING#?}" 			              # Deletes first character - space
PROXY_STRING="${PROXY_STRING::-1}" 			              # Deletes last character - semicolon

sed -i '/var proxy =.*/c\var proxy = ''"'"${PROXY_STRING}"'";' "$PAC_FILE"    # Finds proxy variable in PAC file and applying our main proxy list to it

PAC file before:

function FindProxyForURL(url, host) {
var proxy = "No matter what is here"        		 // This variable will be found and patched
var sites = new Array("bing.com", "duckduckgo.com");
 for(var i=0; i<sites.length; i++) {
 var domain = sites[i];
 if ( localHostOrDomainIs(host, domain) ) 
     {
       return proxy;
     }
 }
return "DIRECT";
}

PAC file after:

PAC file AFTER:

function FindProxyForURL(url, host) {
var proxy = "PROXY xxx.xxx.xxx.xxx:XXXX; PROXY xxx.xxx.xxx.xxx:XXXX; ...";   // Now here are verified, fast proxies
var sites = new Array("bing.com", "duckduckgo.com");
 for(var i=0; i<sites.length; i++) {
 var domain = sites[i];
 if ( localHostOrDomainIs(host, domain) ) 
     {
       return proxy;
     }
 }
return "DIRECT";
}

The Ultra string

This will add working and fast proxies to PAC file:

PROXY_ADD 200 && PROXY_VERIFY && PROXY_TO_PAC

After some time (hours, days, months), the proxies will die. Update proxy list with still working and fast proxies and replace PAC file with them:

PROXY_VERIFY && PROXY_TO_PAC