This is more of an experimental type of approach to performance in regards to storing and retrieving data. The results are absolutely astounding, and the process is definitely worth considering if your project meets the criterium.
I was discussing data storage with Jared Atchison last week, and at some point in the conversation, this idea that I will present to you today came up. I think you will enjoy the read, and I hope it will make you think outside of the box for ways to improve performance in WordPress.
For the majority of WordPress plugin and theme authors, we store our data in two primary ways: options and post meta.
Options are great because the API built around them is super simple to understand and use. A quick
get_option call to your designated key and viola, your data has arrived. Post meta works in much the same manner – an easy
get_post_meta reference to your post ID and key and you’ve got your data primed and ready to be manipulated.
Today, I’m going to focus on the post meta side of things. The post meta table is generalized dumping ground for all sorts of different types of data: simple key/value pairs and basic indexed arrays to complex multidimensional arrays and objects. The structure of the table is great because it allows for a wide array of data types to be stored and retrieved with relative ease. The metadata API handles all of the queries behind the scenes and presents you with your data, just as you would expect.
However, there is one drawback to all of this: anything other than a simple key/value pair must be serialized on insertion and unserialized on retrieval.
This makes sense, as databases don’t know how to store objects or arrays of data with meta inside of them, so it has to serialize the data in a format that can then be parsed back into the appropriate structure when retrieved. This is all well and good, and ultimately necessary if we are using objects, but I would assert that 99% of the time you will be storing an array of data. Here’s where you might be able to really find some performance wins with how you store and retrieve your data.
post_content – have you harnessed it?
Again, keep in mind that this technique is experimental and only applicable to certain use cases, but I think the idea behind it is still incredibly powerful. Many custom post types that are used today do not actively use the
post_content field. In fact, many of the main post object fields are not populated in favor of storing the data as metadata. But if you aren’t using the
post_content field in your custom post type, you might be missing out on some really sweet performance benefits by using it as your data storage mechanism. How, you say? JSON, says I.
post_content field is meant to hold a wide variety of content, from simple 300 word blog posts to 10,000 word essays with videos, pictures and the whole shebang. If, then, this field can handle all of this data and display it without a problem, why not use it as a data dumping ground for lightweight JSON data storage?
Data in the form of an array does not need to be serialized because there is no extra meta involved to describe a particular key or value. If there is any meta involved, it is simply stored as another key/value pair, which makes it a perfect candidate for JSON. There are a few key benefits from using JSON instead of serializing data, namely:
- JSON data is more secured against corruption. Because of the myriads of plugins that have different ways to handle and retrieve data from the database, it is much easier to corrupt a serialized string that relies on an explicit character length for unpacking.
- JSON data is more portable. Serialized strings have to be treated with care when migrating sites because character length is important to the string that is created. JSON doesn’t care.
- JSON data takes up less space in the database because its structure is much less verbose than serialization.
JSON overall is the better choice because it is more portable, more resilient, more efficient, and as we will see in the case of
post_content use, waaaaayy more performant vs. the alternative of post meta.
Let’s compare the retrieval methods. For my test, I used my local install and a simple dataset array that looked like this:
$dataset = array( 'multi' => array( 'key_one' => array( 'key' => 'value', 'key2' => 'value2' ), 'key_two' => array( 'key' => 'value', 'key2' => 'value2' ) ), 'base' => 'key', 'one_level' => array( 'k1' => 'v1', 'k2' => 'v2' ) );
With that dataset handy, I created three different pages. On the first page, I saved the dataset as JSON via
json_encode into the
post_content field. On the second page, I saved the dataset using
update_post_meta with the field named “test_post_meta”. On the third page, I did a simple query to grab the data from both and display it in the exact format in which is was originally stored: a multidimensional array.
Here’s the code inside of my custom template to test the results:
// Test JSON first. $timestart_uno = microtime( true ); $data_one = json_decode( get_post_field( 'post_content', 51836 ), true ); echo '<pre>' . print_r( number_format( microtime( true ) - $timestart_uno, 7 ), true ) . '</pre>'; // Test postmeta first. $timestart_dos = microtime( true ); $data_two = get_post_meta( 51838, 'test_post_meta', true ); echo '<pre>' . print_r( number_format( microtime( true ) - $timestart_dos, 7 ), true ) . '</pre>';
I ran this test 20 times. The averages for the JSON data vs. the serialized post meta data are found below.
Average for JSON: 0.0000329
Average for Post Meta: 0.0001115
Performance Percentage Difference: 238.9%
Astounding! On average, the JSON data retrieval method was more than 239% faster at retrieving the data than using the post meta API.
I mean… !!!!!!! That number is simply way to large to ignore.
Thoughts on the Stats
Now, I am making a few assumptions about this method (which is still experimental). First, I am making the assumption that you will be storing all of your data into one JSON string. It would appear that even if you had a very large JSON string of data, it would still be very tough for even a simple key/value pair retrieval to beat that kind of speed. The
post_content field gets the unseen benefit of hot caching whereas post meta does not, which would also help point to why this method is so much faster than its alternative.
In terms of usability, I’m not sure. I know that a lot of people expect the content from the
post_content field to be a lot different than JSON string, but if you are controlling both the input and output, this really could be a viable method of data storage and retrieval for your custom post type. It is very fast and efficient, and you get the added benefit of choosing how you want your data to be decoded (either to an object or back to an array). The performance benefits, though, are too large to simply ignore. While
post_content may not be the perfect answer, it’s at least the start of an interesting discussion to how to improve performance in regards to data storage and collection.
I’m always looking for ways to make WordPress perform better – sometimes it takes looking outside of the box to find some interesting and very plausible solutions! What about you – any thoughts? Drawbacks? Praises? I’m keen to them all!