Problem
I have a table A with 50 columns and as the specification I have to check if combination of 25 columns in that is unique. It's a table records the manufacturing patterns and the 25 columns that need to check is the steps within each pattern. So they don't want to have duplicate pattern.
But the problem is MySQL only allow me to index for maximize 16 columns. If put the logic in app layer and write some code to check then I need to query all the records I have from databases and loop through them (the time complexity will be O(n^2) ), so this is not a good solution.
But the problem is MySQL only allow me to index for maximize 16 columns. If put the logic in app layer and write some code to check then I need to query all the records I have from databases and loop through them (the time complexity will be O(n^2) ), so this is not a good solution.
Solution
I create another column to save unique hash which generate from object of key and value of 25 columns that need to be checked, sort it by keys and then stringify it and finally hash that string. I have to hash it because if I leave it as a string then store it the length of string may exceed 255 characters, and MySQL can not index the column that exceed 255 characters either. The hash algorithm I choose is MD5, this choice is base on the number of records we estimate that will populate our database. Since our customer is a big manufacturing but the record itself is the manufacture pattern we make a assume it can not exceed 5000 records and MD5 is a great choice since the duplication chance is very small. To have a 50% chance of any hash colliding with any other hash you need 2^64 hashes. if we only have 5000 hashes and the algorithm is simple enough to ensure it doesn't take much time to generate and take minimum space in database to store (32 characters).
If in your cases the number of records may exceed 2^64 or around that, you should choose other hash like hash265.
After that I index the hash column in database and also add validate in Rails model.
If in your cases the number of records may exceed 2^64 or around that, you should choose other hash like hash265.
After that I index the hash column in database and also add validate in Rails model.
Note
- To make sure this work, we need to ensure the input has been sorted (by keys name). Because for the same record if we put the different order then the output string after stringify will be changed and lead to different hash when we hash. The unique check soon become pointless because of that.
- The second notice is if you hash complicated input params, we always have to stringify the object then sort inside out. From the most nested value to the outside value. And also make sure the format is the same for every object, don't trust the FE side (even when you implemented FE as well, because the data will be converted to string to pass through the internet, and then when it hit Rails server, Rails will parse those string once again to hash and integer, ...). One small different can lead to 2 completely different hash.