Building an audio waveform “progress bar” in React for Quadio

One of the core visual components of the original Quadio design was the Soundcloud inspired waveform accompanying each track in the feed and on the track page. Like on Soundcloud, the waveform served as a stylized track playback progress bar that the user could interact with to seek through the song. I’d like to share some of the engineering team’s process and learnings from building our waveform component below with anyone looking to build something similar because, at the end of the day, it’s a pretty cool way to convey track and playback information to users.

If you’re only interested in the “final” code, feel free to skip to the end of this post.

Before we could actually build anything, we needed to get a “waveform” of our audio file in a format the client could use.

We used this tool from the BBC in an AWS Lamda function to consume an uploaded audio file and return JSON data representing the amplitude values (peaks and valleys of the waveform) of that file. That same Lamda function also modifies the waveform data by removing the valleys (values below 0, as sound waves oscillate above and below the “equilibrium” amplitude of 0) and normalizing the values of our waveform data roughly relative to a new maximum value of 100.

const { data } = await getWaveformFile(waveformFileName)
const peaks = data.filter(point => point >= 0)
const ratio = Math.max(...peaks) / 100
const normalized = peaks.map(point => Math.round(point / ratio))

As an example:

const { data } = await getWaveformFile(waveformFileName)
// data = [0, 0, 2, -2, 5, -7, 20, -21, 55, -56, 110, -100, 20]
const peaks = data.filter(point => point >= 0)
// peaks = [0, 0, 2, 5, 20, 55, 110, 20]
const ratio = Math.max(...peaks) / 100
// ratio = 1.1
const normalized = peaks.map(point => Math.round(point / ratio))
// normalized = [0, 0, 2, 5, 18, 50, 100, 18]

We modify the data this way for two reasons:

Firstly, though waveforms are not actually symmetrical, almost all of them are close enough that displaying them as such is both acceptable visually and efficient memory-wise. Rather than have the actual peaks and valleys of the waveform, we can simply reflect each peak of the waveform over the x-axis to achieve a pretty-damn-good representation of the waveform. That cuts out a full half of the amplitude data required for our waveform on the client-side.

Secondly, we didn’t actually care what the actual amplitude values were. It is much better to know that the audio at timestamp x is roughly y% of the highest amplitude (which can be easily translated to y% of the height of our eventual waveform container) rather than the bit-depth based values (-128 to +127 in 8-bit mode, and -32,768 to +32,767 in 16-bit mode) returned from the BBC application.

So, after processing, the waveform data for a real track looks like this:

[0,0,0,0,0,30,34,36,37,35,37,63,73,83,67,86,76,74,83,90,79,85,90,83,83,75,57,70,82,58,73,73,85,73,76,75, ...]

This data is saved in the database under the uuid of the audio file generated on file upload. Now that we had our waveform data, it was time to start building a way to display it.

Our initial, proof-of-concept idea for displaying this waveform was roughly the following:

Which creates this waveform:

Not too shabby! That definitely looks like a waveform, and, listening through the audio, it seems to match up with the actual dynamics of the song.

However, this is a lot of divs. 1,924 divs to be exact. Performance is going to draaaaaaaag when we start adding hover states and effects on one of these waveforms, never mind a whole list of them. Users definitely don’t need 2,000 potential seek points. Why don’t we simplify our waveform down to 100 points?

This gives us a waveform with a much lower resolution, but also a nearly 20x reduction in divs created:

By adding a wrapper div and applying a little more css we can hover over each chunk of the waveform to see it light up:

At this point we were close to something usable. For seeking purposes, we could have just set an onClick handler on each of the pointWrappers to fire off a seek action with a seek percentage based on the index of the point clicked. As a progress bar, it wouldn’t have been difficult to consume some kind of trackProgress state (which would return a percentage x) and apply a class to any pointWrapper of x index or lower coloring it in a different color. But it turns out 100 divs per waveform is still too many on the performance front. Even 10 or 20 would be a lot with an unknown number of waveforms loaded on the page, and the resolution would be so bad as to make it useless to the user for seeking or watching their progress. Enter canvas

The canvas element allowed us to create the waveform as a single tag which made our lists much more lightweight. Using javascript to repaint the canvas as the user moves their mouse across the canvas and as the track plays is also more performant than updating classes on tens or hundreds of divs. However, manipulating a canvas to create a waveform required a lot more math and manual manipulation. Here’s the basic version with canvas:

Looks good (if a little squished)

Let’s step through the changes.

const Waveform = ({ waveformData }) => {
const canvasRef = useRef()
const chunkedData = waveformAvgChunker(waveformData)
const canvasHeight = 56
useEffect(() => {
if (canvasRef.current) {
paintCanvas({
canvasRef,
waveformData: chunkedData,
canvasHeight,
})
}
}, [canvasRef])
return (
<div style={{ padding: 16 }}>
<canvas
className={classes.canvas}
style={{ height: canvasHeight }}
ref={canvasRef}
height={canvasHeight}
width={window.innerWidth - 32}
/>
</div>
)
}

First and foremost, the Waveform component now returns a canvas (duh). canvas elements need to have a manually set height and width, as these are used for scaling their plotted contents (kind of like the viewbox prop on an svg). The height of the canvas is saved as a const to be used later in the waveform sample bar plotting calculation. In this example, the width is set to be equal to the width of the window minus the padding, but in the Quadio app we had defined waveform sizes for each breakpoint, and we’ll switch to that in later examples here as well.

We’re still using the waveformAvgChunker to split the wave data into usable chunks, but now we’re adding an effect that runs once the canvasRef has been initialized that paints the canvas.

const paintCanvas = ({
canvasRef, waveformData, canvasHeight,
}) => {
const ref = canvasRef.current
const ctx = ref.getContext('2d')
// On every canvas update, erase the canvas before painting
// If you don't do this, you'll end up stacking waveforms and waveform
// colors on top of each other
ctx.clearRect(0, 0, ref.width, ref.height)
waveformData.forEach(
(p, i) => {
ctx.beginPath()
const coordinates = pointCoordinates({
index: i,
pointWidth: 2,
pointMargin: 0,
canvasHeight,
amplitude: p,
})
ctx.rect(...coordinates)
ctx.fillStyle = 'blue'
ctx.fill()
}
)
}

As noted in the comment, we want to clear the waveform before any repaints (such as when, for example, we want to update the played progress on the waveform or when switching to a different version of the same page displaying a waveform). If we don’t do this, we end up with overlapping waveforms. For each point in the waveform data we use a function called pointCoordinates to build an appropriate coordinate array that will be passed to the canvas context and painted:

const pointCoordinates = ({
index, pointWidth, pointMargin, canvasHeight, amplitude,
}) => {
const pointHeight = Math.round((amplitude / 100) * canvasHeight)
const verticalCenter = Math.round((canvasHeight - pointHeight) / 2)
return [
index * (pointWidth + pointMargin), // x starting point
(canvasHeight - pointHeight) - verticalCenter, // y starting point
pointWidth, // width
pointHeight, // height
]
}

In this pointCoordinates function we take the amplitude, turn it into a percentage, and then multiply that percentage by the height of the canvas (this is why we saved it as a const earlier) to give us a height for the waveform sample bar. We also vertically center the bar (although you could align them to the top or bottom of the canvas for a different effect). Lastly, we return an array of the coordinates (as labeled with comments), which are then used to create a rectangle on the canvas representing the data point.

You may have noticed the fact that we’re manually setting and passing a pointWidth and pointMargin to pointCoordinates. In the waveform example using divs we were able to give each div a display: flex which automatically handled the width and spacing of our points. Unfortunately, canvas doesn’t offer anything quite that simple. It wouldn’t be that hard to build some kind of function that checks your waveform width and automatically assigns widths and spacing accordingly (not too dissimilar from the waveformAvgChunker), but at Quadio we went with pre-determined sizes. Since we had fixed widths for our waveforms, it was trivial to do some calculations ahead of time to figure out that given x number of points and y waveform width, give each point a width of z. Changing those numbers yields the following results:

// Changing pointWidth from 2 to 4
...
const coordinates = pointCoordinates({
index: i,
pointWidth: 4,
pointMargin: 0,
canvasHeight,
amplitude: p,
})
...
Looking less squished!
// Adding a pointMargin of 2
...
const coordinates = pointCoordinates({
index: i,
pointWidth: 4,
pointMargin: 2,
canvasHeight,
amplitude: p,
})
...
A different effect, but easier to parse

With canvas as our new waveform visualization solution, we needed to start adding some interactivity: the ability for the waveform to automatically show track progression as a track played, a color change on hover to show users where they would seek to on click, and an onClick that would actually update the playing position. Let’s start with the track progression.

Another round of substantial changes here, including a new paint job and a pretty hefty new effect to boot.

  const waveformWidth = 500
const canvasHeight = 56
const pointWidth = 4
const pointMargin = 1
const { trackDuration } = waveformMeta
const [trackProgress, setTrackProgress] = useState(0)
const [startTime, setStartTime] = useState(Date.now())
const [trackPlaying, setTrackPlaying] = useState(true)
const playingPoint = (
(trackProgress * waveformWidth / 100)
/ (pointWidth + pointMargin)
)

At the top of our Waveform component we’re defining a few more things. As mentioned above, Quadio used set waveform widths, and that makes a lot of the logic in this revision much easier, so we’re now using waveformWidth. pointWidth and pointMargin are defined higher up rather than within the paintCanvas function where they’re used because we’re also using them to determine the current playingPoint. In addition to the waveform point data we’re also now getting waveformMeta, which presumably you’d get from your API. The relevant bit is trackDuration (which is in milliseconds to play nice with some date logic we do in a bit).

There is also some new stateful logic, trackProgress, startTime, and trackPlaying. For the purposes of this example I’ve set these up as if our track has been set to play, but in a real project you’d want to control these things with some kind of higher order state management system.

playingPoint tells us (as an approximated index) which of the bars on our waveform is the one we’re currently “at” based on how far we are through the track.

const paintWaveform = useCallback(() => {
paintCanvas({
canvasRef,
waveformData: chunkedData,
canvasHeight,
pointWidth,
pointMargin,
playingPoint,
})
}, [playingPoint])

Because we’re painting in multiple places now, it’s a good idea to define the fairly beefy paintCanvas function as a constant and memoize it with useCallback.

useSetTrackProgress({
trackProgress, setTrackProgress, trackDuration, startTime,
trackPlaying
})

This effect is our most substantial change. useSetTrackProgress controls the visual updates to our waveform as a track progresses, and it does it in “real time”. Let’s jump to the useSetTrackProgress file.

...
useEffect(() => {
let animation
if (trackPlaying) {
animation = window.requestAnimationFrame(() => {
const trackProgressPerc = ((Date.now() - startTime)) * 100 / trackDuration
setTrackProgress(
clamp(
trackProgressPerc,
0, 100,
),
)
})
}
return () => {
window.cancelAnimationFrame(animation)
}
}, [
trackPlaying,
trackDuration,
startTime,
trackProgress,
])

First, the effect checks to see if the track is playing. If it is, great, we can continue, but if not, there’s no need to run the rest of the effect. In a real app, it’s likely that you’ll have multiple tracks that could be playing, so it’s probably smart to add a check that not only is a track playing, but this track is the one playing.

Assuming the track is playing we request an animation frame, which works double duty syncing our updates with the users display refresh rate for smoother animations and acts as a throttle, preventing updates from running more frequently than could be displayed. Theoretically, we could trigger an update every ms (aka the precision of Date.now()), but since a user’s display only updates at something like 60Hz (or ~every 16ms), there’s no point in firing an update to our progress between these refreshes. Let’s pause for a moment to talk about requestAnimationFrame, effects, and how our animations are actually being applied.

Those of you who are well versed in animations and/or react might notice that we aren’t directly updating our waveform every time this animation frame runs; we’re actually triggering a state update which in turn triggers a re-render and that is what actually updates our waveform.

This means that we aren’t actually perfectly synced up with the user’s refresh rate. If a user has a display with a refresh rate of 60Hz, then our signal to update the state triggers around 16ms after initial render. We wait some x amount of ms for the re-render to happen, at which point our effect is run again because trackProgress has changed, and we schedule another update for the next refresh. It’s entirely possible that that x ms wait between state change and next effect call is greater than a frame depending on what else is happening in your app. Normally, this would cause animation jank, but in this case we have some leeway, as our waveform likely doesn’t need to update every frame. In fact, if you take a 3 minute song and split it up into 100 1% chunks, you only need to update your waveform every 1.8 seconds, or once every 108 frames. This brings up a potential performance optimization: you only really need to trigger a trackProgress state change if the user has advanced far enough through the song to be on the next waveform chunk. This could be accomplished by returning a rounded trackProgress value or by memoizing your waveform to only update when a whole percentage point of trackProgress has changed. This has the added benefit of preventing unnecessary re-renders which increases performance.

Take note of the fact that our requestAnimationFrame calls are triggered by the effect and are not called recursively, as is often the case with requestAnimationFrame. While it is possible to use refs to access the ever-changing state of our waveform inside a recursive rAF, we found in our testing that this was needlessly complicated. Because our rAF can only be run once per re-render, we ended up with more consistent, predictable updates versus trying to track and apply updates being made in an untethered recursive animation frame loop. Now, let’s get back to the effect code itself.

Inside our rAF we calculate the percentage of the track that has been listened to by comparing our startTime timestamp to the current timestamp and compare that difference to the length of the track. That percentage is clamped to between 0 and 100% because our percentage is arbitrarily constructed using timestamps. For example, for however long it takes for your app to change from one song to the next after a track is finished you might see values greater than 100% as the effect continues to fire. You could cancel your animation frame if trackProgressPerc exceeds 100, but that’s a fairly minor optimization in the scheme of things. After calculation and clamping, the trackProgress state is updated with the new trackProgressPerc triggering our re-render, updating playingPoint and therefore our waveform:

useEffect(() => {
paintWaveform()
}, [playingPoint])

The only other thing of note here is startTime in the useSetTrackProgress dependency array. This is important to include since in order to seek the track we’re going to be modifying the startTime to reflect a start time that gives us an appropriate percentage of the song played.

Adding the second and third pieces of interactivity I mentioned above (hover visualization and click-to-seek, respectively) go hand in hand. In practice, clicking the waveform will trigger a seek action somewhere higher up in your app which will then update the waveform through trickle-down state. In this example I’m simply going to link waveform clicking to an update of the startTime state which will give us the visual effect we want.

The hover portion is fairly straight forward. After defining a new piece of local state, hoverXCoord, we add the following functions:

const setDefaultX = useCallback(() => {
setHoverXCoord()
}, [])
const handleMouseMove = useCallback((e) => {
setHoverXCoord(
e.clientX - canvasRef.current.getBoundingClientRect().left,
)
}, [])

setDefaultX is self explanatory, and handleMouseMove takes your current cursor position minus everything to the left of the canvas, leaving you with your cursor’s position relative to the canvas itself rather than the screen.

     <canvas
...
onMouseOut={setDefaultX}
onMouseMove={handleMouseMove}
/>

We use these two functions as listeners on our canvas element.

const paintWaveform = useCallback(() => {
paintCanvas({
canvasRef,
waveformData: chunkedData,
canvasHeight,
pointWidth,
pointMargin,
playingPoint,
hoverXCoord,
})
}, [playingPoint])

hoverXCoord gets passed as a new part of the argument object to paintCanvas so that we can change the color of the waveform based on a user’s hover position.

const paintCanvas = ({
canvasRef, waveformData, canvasHeight, pointWidth, pointMargin,
playingPoint, hoverXCoord,
}) => {
...
waveformData.forEach(
(p, i) => {
ctx.beginPath()
const coordinates = pointCoordinates({
index: i,
pointWidth,
pointMargin,
canvasHeight,
amplitude: p,
})
ctx.rect(...coordinates)
const withinHover = hoverXCoord >= coordinates[0]
const alreadyPlayed = i < playingPoint
if (withinHover) {
ctx.fillStyle = alreadyPlayed ? '#94b398' : '#badebf'
} else if (alreadyPlayed) {
ctx.fillStyle = '#228741'
} else {
ctx.fillStyle = '#88bf99'
}
ctx.fill()
}
)
}

The logic for coloring in the waveform doesn’t actually change that much, there are just a few more cases to consider: within the hovered area but not already played and within the hovered area and already played. Put it all together and…

Adding track seeking only requires the addition of another event handler, seekTrack:

const seekTrack = (e) => {
const xCoord = e.clientX - canvasRef.current.getBoundingClientRect().left
const seekPerc = xCoord * 100 / waveformWidth
const seekMs = trackDuration * seekPerc / 100
setStartTime(Date.now() - seekMs)
}

seekTrack works similarly to the handleMouseMove hover function. It grabs the x position of your cursor on click relative to the canvas, converts that into a percentage of the length of the canvas itself, and then finds that percentage of the track’s duration. It then sets the startTime to a new timestamp representing a start time that is long enough ago that the track would have played to the desired seek point by now. Remember how useSetTrackProgress calculates trackProgress:

const trackProgressPerc = ((Date.now() - startTime)) * 100 / trackDuration

In a real app, you might want to name / retain some of these variables differently. startTime might actually matter to you for analytics purposes (to track when listeners are doing the most listening, for example), and so our start timestamp trickery could be stored in something like seekTime or desiredStart, anything that better gets across its function. At Quadio, this timestamp manipulation was done in reducers and thunks that did a lot more with audio playback and track data collection, so we weren’t ever directly manipulating it from the component like this, although the principle is the same.

With all of these pieces in place, we have a functional, performant waveform progress bar. As mentioned a few times, the logic in a real app will doubtlessly be more complicated as you handle things like audio playback, analytics, and general application state, but I hope this post serves as a good starting place.

The final code for the waveform component can be found below, but if you’ve skipped down here without reading the post body, do note that I suggested a few performance optimizations throughout. If you end up having performance questions, consider reading back through to see what I suggested.

Developer, Songwriter, Bugbear Monk

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store