Metadata Tribbles
August 30, 2017 ยท View on GitHub
Breaking down a table or column by year/user/etc.:
You might be trying to split a single column into multiple columns,
using column names based on distinct values in another attribute.
For each year or user, you will need to add one more column or table.
You are mixing metadata with data. You will now need to make sure that
the primary key values are unique across all the split columns or tables.
The solution is to use a feature called sharding or horizontal partitioning.
(PARTITION BY HASH ( YEAR(...) ). With this feature, you can gain the
benefits of splitting a large table without the drawbacks.
Partitioning is not defined in the SQL standard, so each brand of database
implements it in their own nonstandard way.
Another remedy for metadata tribbles is to create a dependent table.
Instead of one row per entity with multiple columns for each year,
use multiple rows. Don't let data spawn metadata.
Store each value with the same meaning in a single column:
Creating multiple columns in a table with the same prefix indicates that you are trying to store a multivalued attribute. This design makes it hard to add or remove values, to ensure the uniqueness of values, and handling growing sets of values. The best solution is to create a dependent table with one column for the multivalued attribute. Store the multiple values in multiple rows instead of multiple columns and define a foreign key in the dependent table to associate the values to its parent row.