Digital privacy and traffic analytics

达摩俱乐部
·
·
IPFS
·

Traffic analysis can provide key operational data for websites, and large and mature websites have different types of traffic analysis. Analytics is commonly used in the Internet industry to describe traffic analysis, which can be implemented in various ways: browser-side, web-side, network link, and server-side. This article is just a simple literacy, welcome all experts to make bricks.

browser side

For example, if your browser has a traffic analysis plug-in installed, then this plug-in may transmit all your browsing data to the database of an analysis company, such as SimilarWeb . If your browser is equipped with a traffic analysis plug-in, then whether you are Accessing the network through proxy, VPN, Tor, your real IP address, operating system, browser fingerprint and other information will be collected by analysis companies. Obviously, the traffic analysis on the browser side has the highest risk level, which is why some totalitarian governments are keen to promote domestic browsers. See here.

Cookies

Web-side traffic analysis is usually achieved through cookies and web plug-ins. Cookies are an essential element of many websites, but any website that requires you to log in requires cookies. It is a temporary token that the website distributes to you.

For example, if you go to a nightclub street like Lan Kwai Fong at night, every time you enter a venue, the nightclub security guard will put a unique pattern on your arm, so that when you go out to eat barbecue or make a phone call, the security guard at the door will come back. Checking that you have this chapter will let you in directly instead of making you re-buy a ticket. Cookies are the stamps that the website stamps on your browser, but the website will not stamp the same stamp for each guest like a nightclub, but will give each person a unique temporary number, so that the website not only knows when you come, Also know that you are you and not Zhang San or Li Si who are logged on to the site at the same time. Human security guards can distinguish between you and Zhang San and Li Si based on appearance and voice, and your appearance and voice are unique.

What are the dangers of cookies? Imagine you were playing in the nightclub street from 9:00 to 6:00, and when you got home, you bumped into your dad going out for a morning exercise. You were just about to take out the nonsense that you had practiced beforehand and say that you were drunk at the class reunion last night and lived in a famous place. The honest Liu Shuai's house, but your father pulled up your sleeve without saying a word, revealing the chapters of more than ten different nightclubs on his arm... So, in order to prevent this embarrassment of privacy leakage, you have to use a closed The browser automatically deletes all cookies in the window, and when your dad pulls up your sleeve, you can only see the white arm.

The nightclub chapter will disappear automatically after a long time. Cookies also have different validity periods. Some cookies can be stored for a long time, so that you don’t need to enter the account password again every time you open this website, which is convenient, but the problem is privacy. And some websites set a lifespan for cookies, and they will not be recognized when they expire, just like a nightclub badge. Otherwise, if you buy a ticket once, you can play for free every day, and the nightclub owner will not lose money?

Http Cookies are also divided into different general categories, see Wikipedia for details.

web plugin

There are many websites that don't need cookies, like the static pages you're looking at right now that don't require a login at all. But how does a website know how much traffic it has, including traffic to specific content? There are usually two methods: web plug-in and server-side.

Web plugins can be javascript or images. There are two reasons why websites use web plugins to analyze traffic:

1. Convenience Web plug-ins are easy to use. For example, Google Analytics can directly provide various analysis data and charts, and deployment is much more convenient than installing analysis software on the server side.

2. Authority The traffic analysis on your own server is only known to you. What should you do if you encounter that kind of bragging? Obviously there are only 1,000 page views, which is said to be one million, and third-party plug-ins can directly display your Google Analytics when necessary, such as facing investors or advertisers.

3. Not your own server Many websites can be built without their own server. For example, the one you are looking at is hosted on Github Pages. In this case, Github knows my traffic, but I don't know it myself, so I can only install a web plugin to analyze the traffic.

Web plug-ins are easy to identify, you just need to right-click on the web page and view the code of the web page to see the plug-in. Well you can't read code when I'm not talking. However, there are many browser plug-ins on the market that specifically detect or block web plug-ins. The common ad blocker (Ad Blocker) or script blocker no script, or the tracking plug-in blocker uBlock Origin can tell you which plug-ins have problems, or directly help you block them. In this way, you are invisible to traffic analysis. Cheers~

Note: The use of web plug-ins for user tracking traffic analysis is a very common phenomenon. Please check the brief statistics made by this site in the top navigation bar.

Service-Terminal

The process for you to open a website is roughly as follows:

1. Your browser must first ask the DNS server where the server you entered this URL is, and the DNS server tells your browser an IP address.

2. Your browser then sends a request to the server at that IP address, and that server sends the web page file to your browser after receiving your request.

3. Your browser renders the received web page file into the beautiful web page you see.

Well, the above process is a bit long, but if you read it twice, you will find that this process has two "servers" that know that you have visited the site. If you are visiting pornhub, then the DNS server knows that you have asked where the website of pornhub is. Although you just asked it how to get to pornhub, it does not mean that you must go to pornhub, but, hehe, come on, everyone knows 99.99% of the people who asked pornhub how to get there went. The second person who knows that you have visited pornhub is, of course, pornhub itself. If you ask for a webpage from it, how can you get rid of it?

However, you trust pornhub very much, because you know that pornhub will never betray you, otherwise who would dare to go to it after the news spread? DNS server can't do anything about it.

Are you thinking "I know how pornhub went this time, so I don't need to ask the DNS server when I go next time, so the DNS server doesn't know how much I've seen". This is theoretically possible, but unrealistic for various reasons. A popular website like pornhub has at least tens of thousands of servers, a veritable chain store. Your visit to pornhub is more like looking for a Sinopec gas station in a big city. There may be a long queue at the place where you last refueled, so you have to check which gas station does not have to wait in line every time you refuel. To use the metaphor of a nightclub, pornhub will ask the waiter who receives you to take you to an empty booth every time. So in general, you have to ask the DNS server pornhub how to get there every time.

Traffic analysis on the server side is unavoidable. The server knows everything you do here. If you change your IP every time and don’t log in, it’s fine. The server doesn’t recognize you as the same person, but if you log in to your account…

network link

The Internet is a network. The opening process described above is only at the application level. It seems that only the DNS server and the website server (Pornhub) know about it. In fact, in the lower-level network, every data packet you send and receive is relayed by many servers layer by layer, just like the package is handed over to the courier by the owner of the online store after you chop your hands on Taobao. Passing through the receiving station, transferred to the airport or railway station by truck, delivered to your city terminal by rail or air, and then forwarded to the express store two or three times, and then delivered to you (or the uncle of the community doorman) by the courier brother. on hand. Many people in the middle may open your package and take a peek at the good things you bought. Each link in the middle may record how many things you bought. For example, the express store will check how many items have been sent to you this month. SMS can count your traffic. The transmission of data packets on the Internet is essentially the same as the above-mentioned physical packages, except that various transit stations are composed of various expensive routers, gateways and other equipment, and the whole process is shorter (well, several orders of magnitude shorter). ).

Therefore, in addition to the DNS server that can perform traffic analysis, there are many links in the network that may perform traffic analysis. By the way, seeing this, you probably understand how the Great Firewall blocked you, right? One way is like airport security, where you scan your package for a gun or a bomb and stop it straight away. The second method is to stop it directly when you see that your sending address is not allowed (such as the recipient: Taiwan Presidential Office), and the third and cheapest method is when you ask the DNS server pronhub how to get there Directly tells you a bad address (DNS pollution).

Well, back to the topic, in addition to the analysis of the DNS side, your ISP service provider (that is, the company that charges you monthly Internet fees, or the organization that provides you with the Internet) usually analyzes which websites you visit, because you have all the All traffic goes through the servers it controls. Using the previous example of express delivery, your ISP service provider is the express store in your area, and your company or institutional network is equivalent to the uncle who is responsible for sending and receiving parcels.

Summarize

The above analysis is nothing more than to tell everyone that not only the website you log in or visit can analyze your activities on the Internet, and whether your activities are recorded and analyzed can not only be found by looking at the source code of a website. There are countless links in the entire network that can be used to analyze and track you, the server has all your information, and there are many links in the network intermediate facilities that can monitor and review your information flow. When you stare into the abyss, the abyss also stares into you. Protecting privacy starts from the source of your own computer, not just relying on or blaming others.

For more tools, please refer to https://www.privacytools.io/ and https://prism-break.org/en/

Source: https://diymysite.github.io/analytics/#!pages/analytics.md

CC BY-NC-ND 2.0

Like my work? Don't forget to support and clap, let me know that you are with me on the road of creation. Keep this enthusiasm together!