Recently while working on a project, we faced yet another dilemma: whether we should encrypt the database or not? This was not a big decision to make, but as it turns out, it bugged me for some days. So much so, that I decided to write a post about it. :D
I've been working on a Django Project for some time now. We recently migrated its database from Mysql to Postgres and implemented multi-tenant architecture on it. Then, we decided to encrypt some of the database fields for "extra security".
Over time, we realised that security comes at a price. Initially, I decided to go with AES Encryption using PyCrypto, a python package made specifically for encryption and stuff. AES is a block cipher technique, and I had to wrap my head around it to finally get it working (padding the data, using fixed length keys etc). Though the encryption part worked fine, while storing it in the database there were errors like "utf-8 codec can't decode byte". After some research I couldn't find a promising way around it. Encoding the data would give more errors, or change the ciphertext which was not what we wanted. Similar thing happened with RSA Algorithm. So, I thought about trying another method.
There's a Django package named django-encrypted-fields, that uses Keyczar and provides a new model field, an "Encrypted Field" which can be stored and retrieved like a normal field, but behind the scenes it will encrypt and decrypt the field whenever needed. The drawback? It is a good option for those fields which you won't be using to filter data. For example, if you have a Django model "Student" containing student's data and you encrypt the Name field, you won't be able to search a student by his name in the database. In other words, queries like Student.objects.filter(Name="some name") won't work. For a small number of records, you could just fetch all the records and manually check if Name field is equal to "some name" and then show those records, but as the size of the database increases, it is too slow to be practical. So, we had to look for another way.
One way around this was: hashing. Add a new database field that will store the hash of the encrypted field. Then, if you want to search by that field, hash the input, then compare it with the stored hashes.
Ex:
from Crypto.Hash import SHA256 from someApp.models import Student name = request.POST.get('Name') #Input the name from user nameHash = SHA256.new(name).hexdigest() #Hash the input students = Student.objects.filter(NameHashedField = nameHash)
This approach is good. But the problem is: what if the user wants to search by partial words/names? Ex: You want that a search for "nik" should return "nikhil","nikita" etc. Unfortunately, I couldn't find a way to accomplish this.
So finally, we had to abandon the idea of encrypting the fields for now. We may find some other, better (and correct) way in the future, but there are some things that we all need to think about before thinking about encryption. Remember to look for other loopholes in the web app or the server, so that the database is not compromised in the first place.
No comments:
Post a Comment