Uncovering Threats to Data Integrity Embedded in AI

2018-02-20 Penta Security Blog

AI and IoT integrations in intelligent transport systems, smart factories, smart homes and smart grids are supposed to provide us with more accurate, real-time data for better decision-making. However, while many hear of the huge cost of data breaches and data theft, there are less who talk about the devastating and often more severe consequences of threats to data integrity.

Some level of trust is necessary for society to function. And that’s not just talking about being a good neighbor . You’ll have to trust that banks do their best to maintain the security of their ATMs because you can’t go into their systems and search for malware yourself, or you’ll have to trust that the electronic lock on your door is accessible only to you who has the passcode, because you can’t examine every nook and cranny for intruders every time you come home.

The issue with data integrity threats in AI systems, however, is exactly this kind of trust. We’ve been conditioned to place a high level of trust in machines, as if, fed a sufficiently large data set, these machines have an unconditionally superior intuition compared to humans. This results in AI technology offering a high level of convenience together with a high level of perceived security — although as we’ll show, it isn’t all that inherently secure.

Here are 5 reasons why:

1) We’re losing visibility with neural networks

If you aren’t already familiar with the basics of deep learning, it is a method by which machines eventually program themselves. By automating algorithms to build themselves, their creators are left unable to explain how the machines’ outputs have been derived. In this way, we’re shifting humans down the production chain from Designer to Software Tester — and with software testing automation, we may simply play the role of Consumer.

However, the only way by which we know an algorithm does what it’s supposed to do, is if it functions well under tested conditions. This means that failures are completely unpredictable and there is little accountability for decisions reached by automated systems.

2) Hardware cannot be blindly trusted

“Billions of deployed sensors lack designed-in protections against intentional physical manipulations” warned professors from Zhejiang University and the University of Michigan. Coined “transduction attacks”, modern-day sensors can be tricked into reading data incorrectly. In cyber-physical systems, software relies on real-time input from sensors to automate workflow and decisions.

Researchers have already demonstrated the exploit of such vulnerabilities in voice-recognition software with the DolphinAttack to trick voice assistants like Siri, Alexa and Cortana into executing commands. These commands are inaudible and could potentially instruct a user’s browser to download malicious apps.

More shockingly, similar research shows a Tesla car’s ultrasonic sensors to be vulnerable to picking up false sensor data like perceiving objects to be further away than they actually are. This has dangerous consequences on the safety of autonomous driving. Therefore, the researchers advised that systems have to be able to check for the trustworthiness of sensor output continuously.

3) Visual data can be easily manipulated

How difficult is it to foil machine vision? Apparently you don’t need to be a cybersecurity expert or an engineer — you just need these nifty adversarial patches developed by Google. With these patches, you can trick deep learning systems into classifying any image as a toaster.

Applications include malicious manipulation of smart cars relying on image recognition software, for example, by making subtle modifications to road signs. In fact, researchers have demonstrated that it is possible to trick such a system into confusing a stop sign for a 45mph sign with just a few pieces of tape — or perhaps even a post-it.

4) Backdoors and triggers can be maliciously trained into algorithms

With the proliferation of open training models like Machine-Learning-as-a-Service (MLaaS) platforms, this is especially worrisome. Training data is not easy to attain because few entities besides, Google, Microsoft, and Amazon possess the resources to gather data on such an immense scale as needed. Therefore, it is increasingly commonplace for researchers to outsource training of their AI models to third-parties. Because of the complexity of neural networks, it is near impossible to detect any data poisoning or backdoors if injected.

5) Hackers learn to imitate us the way systems learn to recognize us

Ever since Google’s DeepMind developed and published wavenet, a special neural network for synthesizing sound, speech synthesis technology has been quickly catching up to that of voice recognition. With wavenet, you could potentially synthesize speech from a voice sample, without having to record the voice speaking all the individual pieces of the speech and joining them up together.

Perhaps it’s not a surprise that cybercriminals might try automating their malicious tools with the same machine learning capabilities that organizations like banks use to automate authentication. But when you really consider what this means, speech-based biometric authentication systems may soon fail to distinguish between a user’s voice and what has been synthesized by hackers.

In the context of increasing integration of voice-activation IoT devices all around us, our passwords are essentially distributed everywhere — waiting for hackers to piece together.

Well then, are we all doomed?

Not quite, because awareness is always a great first step. By shedding light on aspects of insecurity within AI technology, we’re opening up a conversation on the need for data integrity safeguards within our increasingly AI-powered IoT environments — so that convenience and security can continue to advance in tandem!

To ensure that increasing convenience is matched with security, we’ll need to raise our expectations for the products we use. Are the manufacturers of your smart home IoT devices or smart cars taking adequate measures to ensure data integrity? And if you’re a business owner, have you considered the reliability of your industrial IoT systems? Or are your service providers implementing data security solutions?

There is in fact a growing field in adversarial machine learning research to help outline best practices for secure learning under adversarial settings. By promoting a security-by-design approach to all IT developments, we can better ensure IoT devices are tamper-proof against hackers before and after deployment.