Color Theory for Data Visualization

The first time someone showed me a chart that was genuinely unreadable for colorblind users, I was stunned by my own blindness to the problem. The chart used red and green encoding for two series, and to my eyes they looked quite different. To the approximately 8% of men with red-green color vision deficiency, those two series were nearly indistinguishable. That realization changed how I approach color in every visualization I create. Color in data visualization serves two distinct functions: encoding data and highlighting elements. These functions require different approaches. Data encoding colors must be perceptually uniform and, crucially, must work for viewers with various types of color vision deficiency. Highlighting colors can be more expressive because they serve attention rather than comparison. The most reliable approach to categorical data encoding uses color palettes designed for perceptual uniformity. The ColorBrewer system, developed by Cynthia Brewer for cartography, provides tested palettes that maintain distinguishable colors across the full spectrum of color vision types. These palettes come in sequential, diverging, and categorical variants, each suited to different data types. Using pre-tested palettes removes the guesswork from color selection for categorical data. Sequential palettes show magnitude from low to high using a single hue or related hues with varying lightness. They work for data that has inherent ordering—temperature, income, population density. The viewer's eye naturally reads from light to dark or vice versa, building an intuitive sense of magnitude without requiring constant legend consultation. Diverging palettes emphasize extremes relative to a meaningful midpoint. They're perfect for data where deviation above and below a reference point matters equally—temperature anomalies from baseline, profit versus loss, approval ratings above and below 50%. The critical design decision in diverging palettes is choosing a meaningful center value and ensuring both extremes receive equal visual weight. Categorical palettes assume no inherent ordering—each color represents a different category with no implied rank. These palettes require colors that are maximally distinguishable from each other. Beyond about seven categories, the challenge becomes nearly impossible; even the best categorical palettes strain at this level. When you need more categories, consider grouping, using secondary encodings (shapes, patterns), or breaking the data into multiple charts. The most common color mistake in data visualization is using rainbow colormaps for continuous data. The rainbow colormap—mapping data values through the full visible spectrum—creates artificial perceptual boundaries where none exist in the data, makes it impossible to tell whether yellow is higher or lower than cyan, and performs terribly for colorblind viewers. Use sequential palettes for continuous data instead; they're more accurate and more accessible. Color for highlighting serves different goals than color for encoding. Highlighting can draw attention to specific data points, emphasize key insights, or mark the "exception that proves the rule." These uses benefit from colors that stand apart from your data encoding palette—perhaps a single saturated accent against desaturated data colors. The goal is temporary attention, not systematic comparison. Grayscale deserves consideration as a final tool in the data visualization palette. Converting a chart to grayscale and asking whether it still communicates its message tests whether color is adding information or merely decoration. Charts that survive grayscale conversion with their message intact are using color well. Those whose message collapses without color are using color as a crutch rather than a tool.