DotWeak.com
Image default
Home » Use webXray to Identify the Third-Party Domains which Collect User Data
Gathering

Use webXray to Identify the Third-Party Domains which Collect User Data

Do you know that while you are browsing a website, you are mostly tracked by third parties who compile detailed records of your browsing behaviors and of course without your consent. In many case, even the domain owner is unaware of this practice.

Third party data collection creates a variety of risks. Identifiable details of the person or company can be sold, often for illegal purposes such as fraud or phishing attempts.

webXray is a tool for analyzing third-party content on webpages and identifying the companies which collect user data. This professional tool is designed for academic research, but it can also be used by site adminstrators as well as people who are usually curious about hidden data streams on the Web.

WHAT YOU WILL NEED

To follow this tutorial, you will to have Google Chrome installed on your machine along with Python3 (Version >= 3.4) and Selenium. If Google Chrome is not yet installed on your machine, you can refer to this article about how to install Google Chrome on Ubuntu, Debian, Fedora and other derived distributions.

INSTALLING ON UBUNTU/DEBIAN OR OTHER DERIVED DISTRIBUTIONS

Install Chromedriver

Chromedriver allows other programs to control Chrome. In order to download and install chromedriver, you will need first to execute the below command to find out what is the Google Chrome version you are currently using.

google-chrome --version

The output should look like “Google Chrome 75.0.3770.142“. You can now open a web browser and go to https://sites.google.com/a/chromium.org/chromedriver/downloads to grab and copy the URL of the appropriate version of Chromedriver associated to your version of Google Chrome (In our case ChromeDriver 75.0.3770.140).

use-webxray-to-identify-the-third-party-domains-which-collect-user-datause-webxray-to-identify-the-third-party-domains-which-collect-user-data

Once you got the download link of the chromedriver needed by your version of Google Chrome, open a terminal and run the following commands.

cd /tmp/
Replace the below URL by the one you just grab
wget https://chromedriver.storage.googleapis.com/75.0.3770.140/chromedriver_linux64.zip
unzip chromedriver_linux64.zip

Need root password
sudo mv chromedriver /usr/bin/

Now run the following command to make sure chromedriver is installed. If you get an error try the above steps again or search the web for advice.

chromedriver --version

Install pip3 and Selenium

Pip is a de facto standard package-management system used to install and manage software packages written in Python. By default pip3 is not installed in Linux distribution. In order to install it simply run the following commands.

sudo apt install python3-pip

Selenium WebDriver is the most popular tools for Web Automation that can execute automatic actions performed in a web browser window like navigating to a website, filling forms that include dealing with text boxes, radio buttons and drop downs, submitting the forms, browsing through web pages, handling pop-ups and so on.

sudo pip3 install selenium

If everything has been done properly, you are now ready to download and use “webXray“.

INSTALLATION AND FIRST RUN

In order to get started you will need first to clone the webXray repository from Github in your computer using the below command.

cd ~/
git clone https://github.com/timlib/webxray

You are now ready to start using webXray. Before you begin, you must choose between the two audit modes offered by the tool.

Interactive Mode

The interactive mode consists of creating a SQLite database that will be used to store the information transmitted by webXray and analyze it. This mode is very easy to use since you will simply have to execute the below command below and answer the questions displayed on the screen. The advantage of this mode is that you can set up a list of several sites to scan then go to do something else while webXray is scanning your list.

cd webXray
python3 run_webxray.py

In order to realize this tutorial, we created a file “custom_list.txt” that we saved in the directory “page_lists” of webXray and in which we included 4 URLs to scan.

use-webxray-to-identify-the-third-party-domains-which-collect-user-data

In order to get all the third-party elements possible, webXray waits for 45 seconds after loading a page. You can edit this value to make it longer or shorter by changing the line “browser_wait = 45” in “run_webxray.py”. When the script has finished scanning your sites list, you can run the “[A] Analyze Data” option in order to get your stored data analyzed and the reports to be saved.

use-webxray-to-identify-the-third-party-domains-which-collect-user-data

Once is done you will find in the “reports” folder a collection of CSV files as per the below details.

use-webxray-to-identify-the-third-party-domains-which-collect-user-data

Manual Mode

The manual mode can be used to scan a single URL and it will return all the data directly in your terminal.

cd webXray
python3 run_webxray.py -s https://neoslab.com
use-webxray-to-identify-the-third-party-domains-which-collect-user-data

If you are having problems installing the software or find bugs, please open an issue on GitHub. If you if have advanced needs and require assistance, please visit webXray website. This tool is developed, distributed, and hosted by Tim Libert.

If you have any questions about this article, any feedback, suggestion, if you want to share your thoughts with us or either if you would like to join the community and contribute, please feel free to do it using the below comment form.

Related Posts

Leave a Comment

Become a Cybersecurity Professional

DID YOU KNOW ?

You can build your Cybersecurity or IT Career for FREE !!
Make yourself happy, join our 8.000 members and receive
our latest tutorials and online courses in your mailbox
SUBSCRIBE NOW
I'm not interested !
close-link