Consider a situation where an audio asset is updated and has a different length now, or a localized voiceover has a different length in different languages, and there's a timeline where e.g. a character says something, followed by a gesture. It should be possible to position the gesture clip relative to the beginning or the end of the audio clip (plus offset, e.g. "1 second before the end"), so that it always stays correct. And if somebody moves the audio clip in the editor (or programmatically?), the gesture clip should move with it. Basically, a transform hierarchy for clips.