How To Correctly Shift The Baseline In An Area Plot To A Particular Y Location And Change The Fill Color Correspondingly, In Altair?
Solution 1:
You can define your baseline at 1 by providing a second y-encoding via the y2
parameter. Using this approach with bar charts is relatively straightforward:
import pandas as pd
import altair as alt
data = pd.DataFrame(
{'date': pd.date_range(start='1/1/2018', end='1/11/2018'),
'stock': [0.1, 0.3, 0.9, 1, 1.5, 1.2, 0.8, 1.1, 0.4, 0.8, 1.6],
'baseline': [1]*11})
# You could also set the bar width instead of binning
alt.Chart(data).mark_bar().encode(
x=alt.X('monthdate(date):T'),
y='stock:Q',
y2='baseline',
color = alt.condition(alt.datum.stock < 1, alt.value('grey'), alt.value('red')))
This works well because the bars are individual graphical elements, so they will be colored individually. The area chart is a single graphical element, so the conditional comparison is only performed against the first stock value and then the entire area is colored in this color. To get different colors we need to break the area into multiple marks by grouping it as in the answer you linked (this would work with the bars also). You can do this either by creating a grouping column in the dataframe beforehand or via transform_calculate
.
(alt.Chart(data.reset_index()).mark_area().encode(
x=alt.X('date:T'),
y=alt.Y('stock:Q', impute={'value': 1}),
y2='baseline',
color=alt.Color('negative:N', scale=alt.Scale(range=['red', 'grey'])))
.transform_calculate(negative='datum.stock < 1'))
Why is there overlap between the points? The reason for this is the sparsity of the data and that the default interpolation method is "linear" for area and line marks. If you would change it to mark_area(interpolate='step')
, the borders between the areas would be sharp:
To achieve sharp transitions of the area mark around the baseline while keeping its shape, the data needs to be of higher resolution. Borrowing from the answer you linked, you can see that the areas there also overlap when the data is sparse:
import altair as alt
import pandas as pd
import numpy as np
x = np.linspace(2, 4, 4)
df = pd.DataFrame({'x': x, 'y': np.sin(x)})
(alt.Chart(df).mark_area().encode(
x='x',
y=alt.Y('y', impute={'value': 0}),
color='negative:N')
.transform_calculate(negative='datum.y < 0'))
If we increase the number of points tenfold (x = np.linspace(2, 4, 40)
), the transition becomes sharper as the interpolation happens between points closer in space (changing the interpolation from linear to monotone, might also help a little while preserving the shape).
To increase the resolution of timeseries data, you can upsample using the pandas resample
and interpolate
methods. The worry when doing something like this is if you artificially change your data in a meaningful way. I find it useful to ask yourself whether the operation changes the conclusion you would make about your data.
(alt.Chart(data.set_index('date').resample('1h').interpolate().reset_index()).mark_area().encode(
x=alt.X('date:T'),
y=alt.Y('stock:Q', impute={'value': 1}),
y2='baseline',
color=alt.Color('negative:N', scale=alt.Scale(range=['red', 'grey'])))
.transform_calculate(negative='datum.stock < 1'))
Here, we upsampled to hourly data points and interpolated linearly between the original points. To me this does not change the conclusions I draw from studying the plot as the the linear interpolation preserves the blocky appearance of the areas and so we're not making our data look artificially smooth. The only drawback that comes to mind is that we do send an unnecessary amount of data to Altair and you might be able to use the transforms in Altair to perform the interpolation but I am not sure how on the top of my head.
Post a Comment for "How To Correctly Shift The Baseline In An Area Plot To A Particular Y Location And Change The Fill Color Correspondingly, In Altair?"