Conceptual Talk: Realistic Game Audio

BugInTheSYS · Post by **BugInTheSYS** » Fri Feb 15, 2013 12:15 pm

Good evening folks,

During the past few days I have been thinking a lot about game audio. In this case, I'm talking about achieving ‘phonorealism’ (I'm just going to coin this term for the purpose, I don't know if it's been used before), i.e., a most accurate representation of sound in computer games. This kind of implies that I'm not talking about doing a side-scroller or anything two-dimensional really, but creating a sound environment as realistic as the visual environment which it accompanies. And I'm fairly sure it adds to the gaming experience to care about these things.

I want to apply the same standards to sound as most people do to graphics in a simulator. What is not to be forgotten is that, because I talk about it today, what I mean is actually three to ten years in the future. So, to put it correctly, I want to apply the same standards to sound as most people will apply to graphics in five years. The trade-offs should be comparable.

What I am trying to achieve is best described by implementing a simulation of acoustics, the behavior of sound in three-dimensional room and the acoustic behaviors of materials, terrain, walls, and closed rooms. Most importantly traveling sound, resonance, reflection, absorption and reverberation.

I suppose that simulating sound waves in three dimensions for forty-four thousand and one hundred sound samples per second is too complex to do, and not even constructive, because we still keep simplifying literally everywhere in computer games. So, a simplified model of sound is needed which yield results comparable in accuracy to graphics.

There is an example I recently tried in a sound editing program, in which I tried to simulate the sound of a car radio from the ‘POV’ (actually, the point of listening) of the driver inside the car. After a six-page derivation of what filters I would need, I was able to construct some kind of realistic sound, incorporating sample offsets for different travel distances, a cuboid model of resonance behavior for the interior, a speaker modeler, and lastly, reverb. So I saw that it could be done, but neither am I an expert in writing sound filters, nor do I have any experience in approaching more general cases, like shaking your head inside the car. It’s sort of static. So I had to come up with a different thing.

I had one basic idea. The idea is to implement a system similar to graphics (you will see the analogy in a moment), in which sound sources are represented almost like light sources. The different materials would get separate ‘sound shaders,’ which define the reflection/absorption behavior of the former. I would create a ‘sound image’ for each output channel (be it binaural, or even a 7-channel sound system), to collect information about the distance traveled, the filters applied in the shaders, and eventually compute a mix of the different sound sources just like I compute the image for the eyes on the GPU. I would actually use a similar or even the very same geometry like the visual scene.

I endorse that you comment on this concept and improve or add to it wherever you see points. Or even propose a totally different concept, of course.

BugInTheSYS · Post by **BugInTheSYS** » Sat Feb 16, 2013 6:37 pm

Time for a little update. I have read one decent article, written by Samuel Justice, a Sound Designer at Digital Illusions, which indirectly made me aware of a problem that I could have thought about earlier on. It was released one year ago and broaches the issues of present sound environments in games.

It was when I read about early reflections that a bell rang in my head, because in the most widely-used simplified models of 3D graphics, the velocity of light is ignored, because it is irrelevant. Unless you are writing a simulation of a slower speed of light at the MIT Game Lab, of course.

The speed of sound, though, is not. It's vital to our perception of room, as Samuel rightly pointed out. How can I revise my model in order to regard this aspect?

Think of an (imaginary) sound-emitting object, call it a loudspeaker, right in front of our imaginary listener at a distance of 100 meters, which broadcasts sound to all directions. This way, the listener perceives sound emitted by the loudspeaker approximately 291 ms ago. (Calculation is quite trivial, cba to write down the formula now)

Now, I install an imaginary wall at 25m right of the listener, adjusted to reflect the sound of the loudspeaker in his direction. This wall is 103 m away from the loudspeaker, and 25 m away from the listener. I get a problem: The left ear perceives the direct sound from the loudspeakers with an offset of 291 ms. The right ear perceives this very same signal, but on top, it perceives sound from the direction of the wall. The wall itself has now become a sound-emitting object. It reflects the sound of the loudspeaker, it 'emits' the sound which the loudspeaker emitted 300 ms ago. The sound travels an additional 77 ms until it reaches the right ear of the listener. So altogether, we have the left ear with sound from 291 ms ago, and the right ear, with sound from 291 and 377 ms ago.

This means I actually have to take care that the sound buffers still exist 377 ms after the end of the sound emission, so I can still read from them when I really need them. Normally I would be proud to discard them the moment my play pointer hits the end of the stream.

It's starting to give me headaches already. So far this is a resource management problem, but I also lack any semblance of an idea about how to keep track of this travel distance stuff at all. I'd be glad about any ideas that you contribute.

Post by **GroundUpEngine** » Sun Feb 17, 2013 4:37 am

Have you try OpenAL?

BugInTheSYS · Post by **BugInTheSYS** » Sun Feb 17, 2013 4:51 am

GroundUpEngine wrote:Have you try OpenAL?

No, I haven't. I haven't come across a description of its exact features yet, and how extensible it is. If it actively addressed the problem of early reflections, this guy wouldn't have had to write an article about it, I thought.

BugInTheSYS · Post by **BugInTheSYS** » Wed Feb 20, 2013 6:22 am

Back for an update now. I have started working on an implementation, and the first thing apart from direct sound that I'm going to care about is reflection of sound at quadrangles.

So far I have implemented a component-based design for a sound environment, where each object in the world, which are of type SoundNode, contains a couple of SoundComponents. This ABC has three derived abstract classes and one derived concrete class. It is the parent of SoundEmitter, SoundReceiver and SoundGeometry, which are all abstract classes.
A typical sound emitting node would only contain a derived class of SoundEmitter that reads samples from a wave file, and a listener would probably contain a type SoundReciver which is assigned a file on the hard drive to save its input to.

The interaction between these nodes works with what I dubbed 3-D sample streams. When you grab the output of a source node, you will receive a struct that contains an array of floating-point sound samples, and a 'GetDirectivity' function pointer that you can call with the position of the listener to get information about how the samples will reach this target point: How far they will have traveled, what their origin is, how much time they will need for traveling from the source to the target, and lastly, how loud the emitter emits sound in the direction of the target point (amplification coefficient).

For example, if you were to place a loudspeaker in 3-D space, and a listener behind that loudspeaker, the listener will not receive direct sound from the loudspeaker, because they are not facing each other. In this case, the amplification coefficient will be 0. For an omnidirectional sound source, though, this coefficient will be 1, no matter what target point you pass in.

So far, so good. Now I can simulate the relationship between an emitter and a receiver in 3-D space, where most of the ugly calculation stuff is hidden in derived component classes. Different case: A setup with a rectangular wall that reflects all sound like specular reflection in the Phong lighting model, one receiver, and one listener, like this:

(Fig. 1)

The wall is actually a special case where the component list contains some type of bridged SoundEmitter, SoundGeometry and SoundReceiver. Whenever the bridged receiver gets to process samples, it will forward them to the geometry component through a special derived class of SoundComponent, the ComponentBridge. The reflecting quad geometry component registered with the ComponentBridge will then save the GetDirectivity function pointer it finds in the 3-D sample stream. The geometry component itself contains a different GetDirectivity function that will call the saved function pointer for calculations whenever called, so effectively, I'm doing some recursion right here.

What this recursive GetDirectivity function will then calculate is just some analytical geometry. In reference to Fig. 1, it would look like this:
Step 1: Build a vector called m that connects S and L.
Step 2: Multiply m by 0.5 and add S to the result.
Step 3: Build a parametric vector line with x(t) = m + t * n, where n is the normal vector of the wall plane. This line is orthogonal to the wall plane and starts in the middle between S and L.
Step 4: Calculate the point of intersection called I between this line and the wall plane.
Step 5: Return the following information: The sound has to travel the distance d from S via I to L, the sound needs d / (speed of sound) to travel this distance, the origin of the sound is I and the amplification coefficient is as returned by the saved function pointer.

I hope my explanation is not too jumbled to understand. It's just all floating around in my head. You can see the code if you want, or ask questions if you believe I'm close to mental illness at some point.
The problem that I have now is that I cannot transfer this model into a situation where sound is reflected by two walls in a row until it reaches the listener. I need to do some kind of ray tracing. Please tell me what approach would be appropriate to manage reflection in your eyes, and how you do multiple reflection in graphics.

Post by **dandymcgee** » Wed Feb 20, 2013 8:04 pm

Even though I don't have time to read very far into this at the moment, thank you for posting it here. I'm sure future visitors trying to implement a similar system will find your research invaluable.

Post by **eatcomics** » Mon Mar 25, 2013 10:56 am

This is exactly something a friend and I have been passing ideas back and forth about for the project we're working. Actually... there's been a lot of posts that are covering ideas we had. I'm really glad I read through all the forums today. Anyways, I'm watching this thread very close. Thank you for the posts.

gamenovice · Post by **gamenovice** » Fri May 03, 2013 3:51 pm

hopefully i dont come off as a complete idiot when i try to respond to the situation at hand, but in terms of graphics, i doubt that most engines calculate multiple reflections, since you as the camera only see in one direction at a time.. unless its mirrors, then i guess im debunked. For the most part, trying to realistically perceive sound sounds like a unique problem. Because your ears can pick up sound from virtually any direction, whereas eyesight typically relies on what's in front of you.

to support multiple reflections by sound, i think you are already close with what you have, but it should be extended. I do not have a definitive answer, but ideally the sound waves we care about, even if its being emmitted by a source that sends wave in all directions, would be the waves directed to the listener. these waves should be going in the direction of the listener, but it shouldn't a straight line to the listener, rather in the vicinity of the listener's listening 'frustum' so to speak. also, to make sure not too much processing is spent calculating erroneous waves that dont reach the listener, it should be noted that the angle of emmission of waves should affect the volume of the sound wave with respect to the direction its facing to the direction of the listener (might've worded that wrong).

thinking of the sound listening as a range rather than a linear point might make it more clear on how to approach multiple reflections, but for now that is the most i've got.

P.S. Doesn't the sound wave get weaker with each bounce? I might be dead wrong on that, but that is something to consider

ill do what i can to find out more about the subject *cracks open physics textbook*

please let me know if this entire post was idiotic, or if i managed to get a clue about something

Elysian Shadows

Conceptual Talk: Realistic Game Audio

Conceptual Talk: Realistic Game Audio

Re: Conceptual Talk: Realistic Game Audio

Re: Conceptual Talk: Realistic Game Audio

Re: Conceptual Talk: Realistic Game Audio

Re: Conceptual Talk: Realistic Game Audio

Re: Conceptual Talk: Realistic Game Audio

Re: Conceptual Talk: Realistic Game Audio

Re: Conceptual Talk: Realistic Game Audio