Pandas Dataframe

Encryption at Rest, is not enough


WE THOUGHT WE WERE SAFE

cliff_lock_man_1200_628In 2019 Capital One had a massive data breach in which the sensitive data for 100 million applicants who applied for credit cards between 2005 through to 2019.

The breach occurred when a hacker, Paige Thompson discovered a misconfigured firewall and was able to abuse it to get access to a privileged server that then allowed them to move laterally within the network to access other cloud resources including cloud permission settings (IAM) eventually leading them to an S3 bucket with the mother load of data.

The incident led to a congressional investigation of Capital One and AWS, and resulted in an $80 Million federal fine and $180 Million civil fine. 

This is one of many stories we've heard over the years, without real resolution.

Many cloud providers will often tout a security mechanism called Encryption at Rest as a key feature to protect sensitive data, but is it? 

TLDR; in this post we'll describe what Encryption at rest is designed for, and we'll show you what we at The VGER Group are doing to overcome to secure sensitive data. (p.s. We open sourced our solution) it's available for python with a focus on Data Science and called fsspec-encrypted

 

Your data is not protected

If your IT department has setup an encrypted hard drive on your laptop, with say Bitlocker or Disk Encryption. You'll boot it up and likely won’t notice anything, except perhaps a slightly slower boot time. You're running Encrypted at rest... 

criminal_necklaceIt's designed to stop a repair person, criminal or a hardware hacker from opening up your laptop and being able to recover or copy sensitive files. But when your computer is up and running, your files look unencrypted, in fact you'll have no idea there's any protection there, your operating system handles the decryption transparently. 

Similarly, anyone interacting with sensitive cloud data, whether a data scientist, software engineer, web server, or even a hacker, is provided unencrypted access by the cloud itself.

Encryption at rest is designed to secure the hardware containing the data, not the data itself. For that, you need application-level encryption.

 

Do you need Encryption at rest?

For a company with their own hardware, or a server room in an office building, it makes total sense to use encryption at rest when you need to worry about hard drives being stolen, refurbished, or sent out for recovery. castle_crocs

If you are using a large cloud provider with 24/7 security guards, biometrics, and security protocols out the wazoo? Where nobody gets in the door unauthorized... Also YES, but it's not enough!

Amazon along with many cloud providers retire their hardware, in the era of AI a lot of the standard hardware in the cloud is being replaced with systems designed to be high performance, customized for AI and lower power consumption. Companies like Amazon have a process to recycle & reuse their decommissioned hardware using a 'program' called Reverse Logistics and yes, customer data is sanitized before the hardware is refurbished. 

This is the primary reasons that Encryption At Rest exists in the cloud. It's protection at a physical hardware level, the moat around a castle, but you still need to protect the drawbridge. 

As Corey Quinn notes in his blog post, "S3 Encryption at Rest Does Not Solve for Bucket Negligence" there are serious limitations:

- Publicly Accessible Buckets: If your S3 bucket is misconfigured to be publicly accessible, encryption at rest does nothing to prevent unauthorized users from accessing and downloading your sensitive data. The S3 service will happily decrypt the data upon request and deliver it to anyone with access.
  
- Compliance Isn't Everything: While encryption at rest is great for compliance and meeting certain InfoSec policies, it doesn't protect your data from misconfigurations or exploitation of access control weaknesses.

In other words, while encryption at rest offers some protection, it does little to guard against improper access. Once your data leaves the storage medium, it’s plain-text to anyone who has accesses it through the cloud. The data may be safe on the disk, but it’s exposed as soon as it’s in transit or requested.

How fsspec-encrypted Solves This Problem

fsspec-encrypted provides a simple, Python-based solution to implement application-level encryption on top of any fsspec-compatible filesystem, including S3, GCS, local filesystems, FTP, and more. We've released it on github under an MIT license and on pypi as a pip installable module with CLI. 

Here's why it's a game-changer:

- Encryption Happens at the Application Level: The data is encrypted before it's uploaded to S3, GCS, or any other cloud storage service, meaning it’s protected both at rest and in transit.


- Seamless Integration with Existing Data Workflows: By integrating with `fsspec`, a widely-used library for accessing and managing different types of filesystems, `fsspec-encrypted` allows you to add encryption to your existing codebases without significant rework.


- Transparent to Your Users: You can easily work with your encrypted data using libraries like pandas or dask, thanks to fsspec compatibility, ensuring that your data remains secure without adding significant development overhead.

Here’s a quick example of how you can use fsspec-encrypted to read and write encrypted files to S3:

 


import pandas as pd
from fsspec_encrypted.fs_enc_cli import generate_key
from fsspec_encrypted.fs_enc import EncryptedFS

# Your encryption key can be generated using the generate_key function
encryption_key = generate_key(passphrase="my_secret_passphrase", salt=b"12345432")

# Create a sample DataFrame
data = {
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35]
}
df = pd.DataFrame(data)

df.to_csv('enc://s3://{BUCKET}/encrypted-file.csv', index=False, storage_options={"encryption_key": encryption_key})

print("Data written to encrypted file with key:", EncryptedFS.key_to_str(encryption_key))

df2 = pd.read_csv('enc://s3://{BUCKET}/encrypted-file.csv', storage_options={"encryption_key": encryption_key})
  

 

There's even a CLI

Allowing for easy scripting, like say key rotation

e.g.

 


 fs-enc decrypt --key $old_key --file s3://bucket/file | fs-env encrypt --key $new_key --file s3://bucket/file
 

 

Key rotation provides that extra level of security when you have employee changes, or at scheduled periods.

Long living keys eventually become less useful as eventually everyone will have access to them. 

 

Final thoughts 


Encryption at rest is a step in the right direction for securing data, but it's not enough.

If you are working with PII / PHI or any level of sensitive data, you owe it to yourself, your company and the people whose data you are working on this additional level of security.

Application-level encryption, like the kind offered by fsspec-encrypted, is essential for safeguarding your data against misconfigurations, unauthorized access, and breaches.

By encrypting data before it ever reaches the cloud, you ensure that sensitive information remains secure, even in the face of cloud vulnerabilities or mistakes in configuration.

Data security doesn’t end with compliance — it ends with encryption that you control.


Similar posts