Wi-Fi Analytics

We all know Wi-Fi. It’s the primary way we all access the interwebs nowadays and probably one of the most hidden treasures for data analytics. Are we aware of how our information travels through the internet? Misunderstanding the scope of the networks involved in “securely” transmitting our information through wires and magnetic fields can lead us to think Google, Microsoft and Facebook are the only ones pulling in the benefits of tons of data being harvested by complicated algorithms and top-tier spionage. Prepare yourself, today, you will learn.

Warning! This blog started as a small thought and became a tremendous amount of my own biased opinions, lots of information and research about the subject, side-notes and a few terrible could’ve been better jokes. It’s quite a nice read, but please be aware it is pretty long (for a blog). If you dare, keep on reading. Thanks :)

Let’s start from the beginning: Information vs Data.

From the moment we wake up at 11:00 AM because we don’t have a normal sleep routine and probably will die soon early in the morning until the dawn of the sun we generate data, some of it gets digitalized through different devices with sensors such as cameras, text inputs, microphones and barometers, so we can process it and give some information about the results. We need to define the main difference between data and information so here are a couple of easy-peasy definitions I recently copy-pasted from Wikipedia just for you, my dear padawan:

Data:

Data is measured, collected and reported, and analyzed, whereupon it can be visualized using graphs, images or other analysis tools. Data as a general concept refers to the fact that some existing information or knowledge is represented or coded in some form suitable for better usage or processing. Raw data (“unprocessed data”) is a collection of numbers or characters before it has been “cleaned” and corrected by researchers. Raw data needs to be corrected to remove outliers or obvious instrument or data entry errors (e.g., a thermometer reading from an outdoor Arctic location recording a tropical temperature). Data processing commonly occurs by stages, and the “processed data” from one stage may be considered the “raw data” of the next stage. Field data is raw data that is collected in an uncontrolled “in situ” environment. Experimental data is data that is generated within the context of a scientific investigation by observation and recording. Data has been described as the new oil of the digital economy.

Wikipedia Data Article.

Information:

Information is any entity or form that provides the answer to a question of some kind or resolves uncertainty. It is thus related to data and knowledge, as data represents values attributed to parameters, and knowledge signifies understanding of real things or abstract concepts.[1] As it regards data, the information’s existence is not necessarily coupled to an observer (it exists beyond an event horizon, for example), while in the case of knowledge, the information requires a cognitive observer.

Wikipedia Information Article.

In case you still don’t get the difference (or skipped through my Wikipedia citations), here is some data:

Perez, Juan, Río de la Plata, AR, 1990, Caucasian

And this is some information:

Juan Perez
AR, Río de la Plata
1990, Caucasian

So the data refers to a word, a phrase, a number, a character, that, without context, you can’t understand what it is or what does it represent. On the other hand, information is data formatted in a way it allows to be understood and actually mean something for the interpreter. In the example above, you can see how we formatted the data from my friend Juan Perez in a way it looks understandable and logical to us, as humans. The steps to get from Data to Information may vary but, in the end, it does not matter if it is a complicated algorithm or a simple script running on your Raspberry Pi, what does matter is that you can get to a conclusion (or as close as possible), and this information will give you some insights about a specific subject.

Let’s get down to business.

So, now we know the main difference between data and information. It is very important that you at least understand the basics behind this two subjects as they will be used extensively thorough this blog.

Now I need you to think about all the possible things you can imagine that happen whenever you post a new photo on Instagram. Let me assist in breaking down the process:

  1. Open up the App
  2. Take a photo
  3. Pick some cool filters that makes that little pimp up in your nose disappear so you can get out of the friendzone, hopefully
  4. Write a nice caption
  5. Publish the photo
  6. Profit!

And that’s about it. Done. You published your amazing photograph.

Skynet strikes again

Whenever we post new content on social media, hundreds, thousands, millions of bots start to analyze and process the data we just feed ‘em. Some of them are good, as the ones that keep the Instagram posts without pornography, murders and other graphic and/or violent contents, and some of them are bad. Actually, there is a bigger amount of bad bots than good bots. Some bad bots will scan your photographs for known faces (for paparazzi purposes), some of them will just look for specific brands (for marketing purposes) and some other ones will look for credit card information, concert tickets or phone numbers (these ones are just evil, evil, bots). The thing here is that all of them are automated processes that transform ‘raw data’ into valuable information.

Large companies like Facebook, Instagram, Google and Microsoft, invest millions of dollars each year to mitigate this bad bots and still they’re quite an issue.

But let’s be real, you already knew all of this.

Zoom In

All of this recent scandals involving data-breaches and other ugly stuff have captured the media and keep being subject of debate in many courts. It’s easy to understand why this is just getting bigger and bigger: People are capable of understanding the potential information they are (or already have been) giving to these platforms. It is not a secret, we all use social media, we all share significant information with each other, and this information can tell a lot about you. But, what would you say if I told you there is a bunch of data being tracked, analyzed, recorded and processed each second of each minute of each hour of each day without you knowing and/or possibly doing something about it? Oh yes! it does exists and is called: Networking Analytics.

We need to be connected

Let me begin with a simple assumption: You don’t need the internet.

Yeah, that’s right, you don’t need it. But, as impossible as this may sound, society tricks us into thinking that being connected to the internet is a necessity, even the UN thinks internet access is a human right. And they’re not joking. We are a social species, with a genetic predisposition to seek approval and recommendations, to learn and teach, to communicate and to listen, analyze, comprehend, understand and give multiple interpretations to a certain situation.

It is in our veins to communicate, get information about others and self-validate our own simple, yet amazing, every day lives.

This kind of connection we’re seeking is getting to another different level with the internet and social media. It is now a necessity to really be connected every single second, of every minute, of every day, of every year. We have developed an addiction for information, and as stupid as it sounds, it makes us so comfortable in our own virtual-world, that the idea of leaving behind all of this in favor of a less technological-way of communicating is seen as a violation to our logical principles.

Anyhow, we are always looking for connection, we need to satisfy our aspirational needs (Instagram), our sharing needs (Facebook), our ego needs (Twitter) and our sense of “efficient communication” (WhatsApp) and, in order to satisfy all of these needs, we require an internet connection.

How do we connect to the Magical, Mysterious and Miraculous world of the Internet

Now please join me in a simple game as we proceed with this, never ending, blog. Today we play the “Lets pretend to go back in time and relate to things about that specific era even if I never, ever, did some stuff that is described but all the surrounding people did, so I understand but can’t really establish a connection with it” game. This lovely night we’re going back to the summer of 2011. Can you hear it? It’s Rolling in the Deep by Adele playing in the radio. People are buying the brand-new iPhone 4s (probably the best iPhone that ever existed in my humble opinion) and most zookeepers are now afraid of this generation of teenagers that visit the Zoo but just keep asking: “Is there Wi-Fi in here?” as they take a selfie with the Lion which probably is telling himself: “Ugh, I should’ve listened to my mom and stayed in Africa”.

Sounds weird, but it’s kind of what happened. Everyone started getting these brand-new Smartphones, a new kind of phone that (presumably) was so smart it could listen to voice commands, understand them and act upon them or as the one and only Steve Jobs (who died the same year btw) named it: Siri.

This small machines had now the capabilities of a computer with some clear advantages: size, form factor, usability, phone calls, internet browsers and mobile applications. The last ones where the responsible for pushing the telecommunications industry into generating better mobile ways to deliver safer, faster and more robust wireless internet connections such as 2G, 3G, HSPA+, 4G and 4GLTE. But all of these networks weren’t actually the way people got their noses on Facebook. In fact, it was extremely difficult to find a Mobile Carrier that had data services (or to find one that wouldn’t ask for your liver/MB). There was Wi-Fi, and as simple as it may sound, it wasn’t.

It was a fully-fledged pandemic.

Everyone was seeking for Open Wi-Fi, like, all the time.

Wi-Fi Joke

People would ask for Wi-Fi before they even asked for your name. They traveled in herds around the city, looking at their phones just in case a nice person decided to leave their network open.

If you recall all of this, you probably know that Starbucks gained most of their clients with a simple tactic no one on the coffee industry would ever consider: free Wi-Fi. Aaaaaand, boom!

I don’t care, give me some Instagram

If we look through the entire history of internet adoption, we can see that people are joining the internext faster than ever before:

Internet Adoption Histogram

This gives us an idea of how many users are actually browsing the internets for information nowadays: a rough estimate of 50% of the total world population, 3.5 billion people, in other words: half the world’s population is accessing internet. But, how does this amount of people are accessing this information?

Desktop vs Mobile internet access

It isn’t hard to understand why Mobile is the way internet is getting into everyone’s life: it is easier to use, easier to acquire, but, most importantly, it’s cheaper to buy a mobile device than a desktop (or laptop for those matters) computer. So, people is accessing the big network via mobile devices such as phones and tablets, and there is only two (supported and realistic) ways these devices can access the internet: Mobile Data Plans and Wi-Fi.

Wi-Fi, the perfect sidekick

So, lets go back to 2011, everyone was really enjoying this new Twitter thing, Instagram became a big hit, Facebook became the top social-network and YouTube became the platform of free-speech (at least that’s what it seemed) we’ve all been waiting for.

As the time went by, mobile networks became faster and cheaper to deploy and so did the data plans prices. Still, as the writing of this blog, the price is higher for a few Megabytes than the entire monthly fee for a dedicated home internet service. This is why people keep using Wi-Fi whenever it is possible.

My computer got AIDS at the airport, RIP

Security will always be determined by the amount of intelligent people using the system.

“If you don’t get caught, you deserve everything you steal.”

Daniel Nayeri, Another Faust

Whenever you give people a chance to steal something, they will. This is not from a morality point of view, is a human condition. Wi-Fi use spread so widely, people started to look at ways to exploit the people’s necessity for a network connection. At this point, thousands of free Wi-Fi spots started to implement something called Captive Portals which, when the user connected to the Wireless Network, asked for some data, stored it, and then proceeded to authorize the device onto the network for a couple of hours.

This enabled thousands of small and large businesses that were already giving Wi-Fi for free a way to capture some user data for later analytics and user-targeted marketing campaigns.

The Captive Portal technique is easy to implement and easy for the end-user to understand:

  1. People will look for free Wi-Fi.
  2. They will find your password-less network.
  3. When the device connects to the network, every request redirects to a local web service which shows a simple form.
  4. User fills their personal data into the form.
  5. As the form is uploaded, the web service:
    1. Stores the user data into a database.
    2. Captures the device’s MAC Address.
    3. Authorize the device’s MAC Address on the network for a couple of hours.
  6. Finally, the user have access to the internet. Yet, their information is saved with no specific guarantee or even legal advice.

It didn’t take long until the first attacks gathered around this kind of authentication methods and the target were easy to define: Airports.

Babe I am sorry, it’s not you, nor me, who the hell is this guy?

As I just told you, this kind of authentication is performed via a web page which is served to the network clients, and they become captive until they fill their information on the form. Some people started buying an external USB Wi-Fi adapter which allows a computer to:

This way, the attacker can connect to an open Wi-Fi network on the area and, with the help of the other Wi-Fi adapter, create a new open Wi-Fi with a captive portal that “steals” users’ information.

Some of this portals even show a Google, Facebook and/or Twitter login option which, when used by the victims, look just like the real login forms from these companies.

This attack is called a Man in the Middle and it is one of the main vulnerabilities even nowadays, Airports are picked for this kind of attack as people are just passing by and most of them do not have local mobile data chips so their only option is to look for open Wi-Fis at the airport.

A Man in the Middle Schematic

Warning! This kind of attack is extremely common, and there are no effective-enough methods for the devices to prevent them, so you are encouraged to BE CAREFUL whenever you connect to an open connection! Use your common-sense and always, always be careful with the personal information you share with this kind of forms.

But, this is a decent, good-looking Starbucks

Even at local coffee shops, the device can not be 100% sure it is a verified connection, so you should always verify the connection you’re on.

The chance you’ve been to a Man in the Middle-ed connection is low, but still a possibility. And some modern Access Points (the actual things giving you Wi-Fi) have security protocols to take down this networks by buzzing the transmitter until it goes down.

But, there are other stuff to be worried about, something… worse.

My mom told me not to talk to strangers

Yeah, sure. We grew up with people telling us not to talk to a stranger asking for information, and it really became one of the most important rules, so we even encouraged it on others too. And, as talking to a stranger may seem dangerous, giving your information away for a few hours of free memes became so common, even some autocomplete features on iOS and Android fill-in with the information automatically when they detect a captive portal. This is bad, really bad. But my faith in humanity has been restored lately, as people are becoming aware of the data they’re sharing and even began asking how is going to be used, why is going to be used, where is going to be stored and by whom is going to be stored and used.

This progress began to worry the companies that had been milking-away their users’ information for years and forced Big Brother to look for other ways he could get their (your) information without even asking them to give it away, and yeah, they found a way.

You can leave the ghetto, but the ghetto will never leave you

In the early days of modern civilization, when a person became ill with some non-treatable disease (almost anything that couldn’t be cured by an exorcism, doing cocaine and/or drinking alcohol) they were excluded from the rest of the people. This way of handling diseases wasn’t a nice way to treat people but, hey! It worked! Yay!

As today, if we tried to exclude the people that have been ad-targetted based on the silent information given away by their devices, we probably would end up with half the internet users being thrown away to the Sahara Desert, or even worse, to the Trump tower.

Wait. How is it possible that such a massive amount of people have gotten their information stolen without even knowing?

The answer is simple but complicated, let me explain. It all begins with your device’s MAC Address.

MAC Address? More like iAddress

Come on, this is by far one of the best bad jokes I have written so far. Not funny? Ok, ignore that and keep reading.

So, there is the Internet Protocol which defines how computers should get a virtual, yet unique, address on a network in order to communicate with other computers. This protocol, is one of the oldest and its acronym (IP) is known all-around the globe. Even if you don’t understand what exactly it is, you probably have heard of it at some point in your life (movies use the word “IP” in ways God is the only one that can truly forgive them).

This IP addresses, are virtual, so they’re interchangeable, and they actually change from time to time in order to keep the network functional for all the clients that may connect to it.

When a computer (or any internet-capable device) connects to a network, it asks the DHCP server: “Hey man! Please give me an IP address, so I can send messages to other computers, and they know who actually send the message and can answer back!”. So the DHCP server answers with: “No problem bro, here you go: 192.168.1.44. Come back in 24 hours or so for a new one if you decide to keep around!”. Now this computer has its own IP Address and is allowed to communicate with other fellow computers.

But there is another computer identifier for each device and that is the MAC Address. This identifier is a unique set of characters and numbers. It stands for “Media Access Control”, and it is determined with a serial-like kind of structure containing:

This makes the MAC Address of a device, unique (there is actually quite a possibility or repetition but bear with me). And this address is burned-in within your device’s Network Interface Card (NIC) which is the component that grants your device the wonders of the internet. Yeah, you can alter it, but most users don’t even care about this stuff. As this address is unique, assigned to your device, and can’t be (easily) changed, the network you’re connected-to may use this information to profile you, and all the other users, without even expressing you they’re doing so.

Don’t get me wrong, they are not getting your personal information (since they’re not even asking for it anymore and, honestly, because they don’t need it) they’re getting every single time you’re connecting to their network, they’re getting the time you stood there, the amount of data you downloaded and uploaded, your device’s manufacturer, model and, if you don’t use a VPN, the applications and websites you’re visiting. Are you scared now? Yeah, me too.

When I began this blog, I gave you a crash-course on Data and Information so now you should be able to conclude some of these points:

This does not imply, by any means, they’re reading your WhatsApp messages or your Facebook information, as those communications are (hopefully) encrypted using SSL and TLS (the green padlock you see on your browser). What this means, is that the data generated by the network communication can tell a lot about you. And they’re using it to track you, to know you, to understand you and, finally, to make you buy their product/service in a best-case scenario.

If you are not scared enough, grab to your seat: They don’t even need you to connect to their network anymore.

Hey Siri! Turn off the Wi-Fi

You heard me right. You don’t need to be connected to their network for them to know all this data about you, they just need you to be around. How is this possible? Well, this isn’t your fault, your device’s fault, even the network designers’ fault, this is just the way it was designed to work from the beginning.

When your device is searching for a Wi-Fi connection, it keeps sending something called a Probe Request. This thing is transmitted from your device with some information about previously-known networks among other stuff such as your device’s MAC Address and the actual name of the connection your device is looking-for. Your device sends a Probe Request every 2-3 seconds, each with every single name of all the device’s known networks.

This information, can be read by the Access Point and, if the network name your device is asking for, matches the Access Point’s network, they can begin the authentication, and potential connection, protocol.

As you can imagine, some genius found out you could keep listening for these Probe Requests all the time, and managed to gather this information in order to keep track of you even when you’re not connected to the actual network.

This attack was described by Edward Snowden as the way the NSA was spying on citizens all around the globe to keep track of the places they visited, the time they spent on each place, the apps they were using and the time they spent on each one.

Yeah, now you get it, they’re getting your data, and they’re transforming it into valuable information.

The world is infected

The thing about Probe Requests is that they’re sent every 2-3 seconds as long as your device’s Wi-Fi is turned on but, the latest iOS and Android versions never actually completely turn off the Wi-Fi as this is required for some GPS precision stuff and connecting to “Trusted Open Wi-Fi Connections” in order to keep you online.

This means that each time you turn off your device’s Wi-Fi (or as it seems to do so) it is actually just partially turning it off as it wakes every minute to send a Probe Request to check out for this Google’s Certified Virus Faucets (Trusted Open Wi-Fi networks) all around the world.

This means that you are always reporting your device’s location to the ones looking for it.

It even implies the fact that Apple and/or Google are easing things up for government agencies like NSA and FBI to track you down (scary as hell). But this is far from over, in fact, is just the beginning.

Let’s blame the user (again)

This kind of tracking came into mainstream business models around 2014 when a lot of tech companies started to look into MAC Addresses as a very interesting subject as they launched, in the same year, a bunch of products that exploited this vulnerability. And it really became the way-to-go for some brands as McDonald’s, Starbucks and many, many, MANY others. In fact, it is difficult to round-up the amount of companies using this system for user tracking to a number.

During the fall of 2014, it became such a bigger issue, that Apple introduced a new feature called “MAC Randomization” which, in simple terms, generates a random MAC Address for each Probe Request. This ensures that the phone won’t be tracked by a Wi-Fi Network (yay!).

Even Android, starting from V6, added MAC Randomization to their core OS but, as you may have thought, it wasn’t enough.

The thing about trying to help people is that, from time to time, people do not want to get help. They even prefer to have someone spying on them than loosing an easier way to connect to a Wi-Fi Network. Pretty specific, huh? Well, it is because that’s actually what happened. For Android phones, it came in some phone manufacturers saying: “Hey, how about a super fast way to connect to Wi-Fi Networks?”, and people saying: “Oh yes please”, which eventually just ended up in a bunch of phone manufacturers skipping MAC Randomization from Android Core in order to give their morons customers what they wanted: A faster Wi-Fi Connecting process.

For Apple, it became even a bigger problem when services like Network Sharing, Fast-Connect and other self-published connection protocols relied on MAC Addresses. They stood-up for privacy and, to balance everything out, they vanished away all their products that used those protocols (Oh yes, can you see that Airtime capsule which was a super-ultra-mega-macro-turbo-extra-fast Wi-Fi Router? Well, now it belongs to a museum).

But to be honest, MAC Randomization is pretty common nowadays. The thing is that, also, computer geniuses are pretty common and, when you give them a chance to tear everything apart, they will and, actually, they did.

How can I be intelligent yahoo answers please

The number of companies that developed analytics and business intelligence systems based on these models was so big that, eventually, they found out ways of actually getting through the Random MAC Address, so they developed AI-powered systems, machine learning algorithms and some pretty dark, mathematical models, to break the anonymous-barrier. Some of these companies, such as Purple, Cloud4Wi, Cisco Meraki and many others, have made millions of dollars with this kind of analysis. Even Cisco’s Meraki launched a service this year (2018, for the future readers) called Meraki-Go which is a plug-and-play installation with a factory configuration for this kind of features.

Some of them just analyze store visitors as unique clients and use this information to give better retail analysis. Some others track the user-journey through the space and give information about it. Others offer analysis with the addition of a Captive Portal (we talked about them before). And all of them are based on a simple principle I want you to stick with: People desperately want to be connected, even to the point of sharing their most private, personal and delicate information.

Your honor, I rest my case

The legal implications of this kind of tracking are astonishing, not just because this enables millions of businesses all around the world with the ability to find out information about their customers even without them knowing about it, but because people’s perception of how their personal data is used can be biased towards the notorious biggies such as Facebook, Google, Twitter and friends.

Some companies doing this sort-of tracking give these statements about their Personal Information use in their systems:

But, legally speaking, this information does not match the things the people in charge of actually writing the laws that protect our privacy say about personal-data collection:

So, as I am not a lawyer, I can only give my opinion based on mere technical knowledge and so will I: This is messed-up in so many dangerous, intrusive and privacy-violating ways it can only stop when the users start to ask how their information is being used.

If you can’t beat them, join them

Wi-Fi is not going anywhere in the near future, there isn’t anything nearly as feasible to install when you think about home networks as a simple Wi-Fi Router provided by your ISP (Internet Service Provider). And it may be for the common benefit.

In the last 30 years, we’ve seen hundreds of wireless communication protocols being borned and washed away in a couple of years. Can you remember the infrared data transmission? Yeah, it was awful. Oh my god, transmitting a four minute-long song on those things required patience, a steady hand, a really stable table and nothing else to do for thirty, long, frustrating, long, long minutes. And, if you moved the device just a millimeter, the entire transmission would fail and you had to begin all over again. “Hey! How about Bluetooth?” – Yeah, it isn’t able to work properly with sound systems, good luck trying to play an online game with that or sending an email, or a WhatsApp, or almost anything for those matters.

Those are the most common examples, but there have been so many cool projects that tried to fix what’s wrong with Wi-Fi and failed because they really didn’t get the main problem: Wi-Fi itself. There needs to be a really different approach than just applying the Wi-Fi protocols to a new physical way of connection (light, magnetism etc), we need a wireless revolution and, by all means, this revolution won’t come in the way of cellular networks.

Cellular networks charge their users based on the amount of data we consume so simple tasks such as watching a movie on Netflix would mean to start looking for some good liver offers in the black market. Realistically even the top speed of a modern cellular network, such as 4G LTE networks which can make up to 12 megabits per second, does not even get close to the top of a modern Wi-Fi network, such as a 802.11ac which can make up to 1,300 megabits per second or 1.3 gigabits per second. So, you get the point, Wi-Fi will be around for a decade at least. Even, there are so many legacy devices that are only Wi-Fi-supported that there would be a complete chaos if networks started to switch to a newer technology (which will happen in the future, eventually, hopefully).

If you think about it, there is no easy way to get away from Wi-Fi, so we might as well join them.

Know your customer(s)

There are actually pretty nice uses of this kind of technology. How about asking your users to join your Wi-Fi network and fill in some opinions about your business? How about we use the Beacon Frames to “predict” the near-future network usage and act upon that? How about we don’t ask the users for their data but instead we give them some of our own business information such as heat maps of a Mall, so they can plan their visit ahead? How about we try to be human with our customers?

It is actually pretty easy to use technology for bad things, there is actually more ways to do bad things with the tools provided to us, than creative, innovative and intelligent ways of helping those around us with those same tools.

It is our responsibility to use the surrounding things to add some value to the people using our services. If you gotta spy on your users to sell them the product, then the product isn’t good enough for the users.

There are thousands of different ways we can deliver new experiences with these technologies, but we keep focusing on all the boring, senseless and stupidest ways of making people engage with them. You can give them a captive portal with promotions, with games, with accessibility features, with language translators, with a connection, a real connection.

Some final stats

Sources for stats: Ericsson Mobility Report, Statista 2015’s Wireless Internet Access and Statista 2017’s Internet Users.

Conclusion

It is amazing how people will always find a way to use technology in new and creative ways. The possibilities with networks will never cease to amaze us, and the fact that they have been used for bad things doesn’t mean it has to be that way.

Wi-Fi networks can be a golden treasure for analytics if you know the way it works. You can actually make use of the network in a more efficient ways as a developer and as a user. Some core features are never used, and some others are taken for granted.

User privacy should be the most important aspect of a service nowadays and, if the government or other companies, agencies and manufacturers keep abusing their power position to force us into a dystopian surveillance future the only effective barrier is within us, the users.

For comments and feedback, you can find me on Twitter as: @humbertowoody.