|
Post by Quint on Dec 27, 2018 6:33:46 GMT -6
|
|
|
Post by jeremygillespie on Dec 27, 2018 6:42:46 GMT -6
I was working with a buddy for about a year who had one of those Neumann head microphones. Pretty weird to listen with headphones on after recording with that thing.
Interesting article, but I’m not really sure what they are getting at or if they are even close to being the first to do this? Binaural mic’ing has been around for quite some time now.
|
|
|
Post by Quint on Dec 27, 2018 8:49:15 GMT -6
I was working with a buddy for about a year who had one of those Neumann head microphones. Pretty weird to listen with headphones on after recording with that thing. Interesting article, but I’m not really sure what they are getting at or if they are even close to being the first to do this? Binaural mic’ing has been around for quite some time now. But did you read the entire article and/or watch the video? The gist of it is that a MONO signal can be turned into a 3D (2.5D) signal using video of the source(s) creating the sound and AI algorithms. Binaural recording was used to help develop those algorithms, but the intent for real world use is to take mono signals and turn them into 3D. From the article: The researchers’ training method is relatively straightforward. The first step in training any machine-learning system is to create a database of examples of the effect it needs to learn. Grauman and Gao created one by making binaural recordings of over 2,000 musical clips that they also videoed.
Their binaural recorder consists of a pair of synthetic ears separated by the width of a human head, which also records the scene ahead using a GoPro camera.
The team then used these recordings to train a machine-learning algorithm to recognize where a sound was coming from given the video of the scene. Having learned this, it is able to watch a video and then distort a monaural recording in a way that simulates where the sound ought to be coming from. “We call the resulting output 2.5D visual sound—the visual stream helps ‘lift’ the flat single channel audio into spatialized sound,” say Grauman and Gao.
|
|
|
Post by M57 on Dec 27, 2018 9:50:51 GMT -6
Forget the video part - how do they achieve the spacial positioning? It doesn't sound like basic panning. There are products/reverbs out there that move the apparent position of a given source ..say in an x,y matrix. My first thought, especially given the amount of reverb present and the quality of the recordings was that's what they're using. It would work fine with a single source, but there are a number of clips in the video with two instruments.. What kind of product can do that with a mono source? Again, sans the video part.
|
|
|
Post by jeremygillespie on Dec 27, 2018 10:08:16 GMT -6
The gist I got is that in order for the computer to be “trained” to put the mono signal into fake binaural signal, it first needs to learn the binaural signal to begin with. So if you are only given a mono source, the computer wouldn’t be able to make it binaural on its own. It needs something to learn off of.
So I guess I either don’t understand, or don’t see the point. Or both hahah
But it also seems like something anybody can do with panning and effects.
I did have to stop the video at the piano/electronic drums weird polka thing that was happening. I couldn’t take anymore 😬
I could also be biased, as when I was coming up a job I had was to “engineer” for somebody that thought they were able to make the stereo field turn 3D by taking a stereo track, and inserting the S1 imager, mondo mod, and metaflanger one after another and automating the parameters ad nauseum until everyone in the room wanted to puke. This went on for 4 months or so before I decided it wasn’t worth the money...
|
|
|
Post by Quint on Dec 27, 2018 11:33:47 GMT -6
The gist I got is that in order for the computer to be “trained” to put the mono signal into fake binaural signal, it first needs to learn the binaural signal to begin with. So if you are only given a mono source, the computer wouldn’t be able to make it binaural on its own. It needs something to learn off of. So I guess I either don’t understand, or don’t see the point. Or both hahah But it also seems like something anybody can do with panning and effects. I did have to stop the video at the piano/electronic drums weird polka thing that was happening. I couldn’t take anymore 😬 I could also be biased, as when I was coming up a job I had was to “engineer” for somebody that thought they were able to make the stereo field turn 3D by taking a stereo track, and inserting the S1 imager, mondo mod, and metaflanger one after another and automating the parameters ad nauseum until everyone in the room wanted to puke. This went on for 4 months or so before I decided it wasn’t worth the money... As I understand it, you do NOT need a binaural signal present for the algorithms to work. Otherwise, what would be the point? That's the whole idea behind training it ahead of time, as discussed in the article. They taught the AI program to recognize what it is seeing, via video, and correlate that with the mono signal source and turn it into the 2.5D sound output. So it's not something that has to be trained for every signal source. They've already got the AI trained now to recognize things in future video feeds. All that would be needed is a mono sound source and corresponding video. That's it. I'm not saying whether or not this currently or ever will have real world applicability, but it might. It's an interesting concept either way.
|
|
ericn
Temp
Balance Engineer
Posts: 14,976
|
Post by ericn on Dec 27, 2018 11:50:00 GMT -6
There have been many efforts at something like this over the years and I’m sure as computer power has increased the results have improved. Those who have done any significant post will tell you though 9 times out of 10 you want control of those. Background sounds you want decide when a realistic image is a distraction and when you want to emphasize position. Now here is the acid test and where it will probably fail a harmonic rich symphony orchestra or I’ll bet when you have say a gaggle of geese. The natural acustic summing of signals will probably drive it batty, but that’s where it would probably prove most useful.
|
|
|
Post by Quint on Dec 27, 2018 12:14:04 GMT -6
There have been many efforts at something like this over the years and I’m sure as computer power has increased the results have improved. Those who have done any significant post will tell you though 9 times out of 10 you want control of those. Background sounds you want decide when a realistic image is a distraction and when you want to emphasize position. Now here is the acid test and where it will probably fail a harmonic rich symphony orchestra or I’ll bet when you have say a gaggle of geese. The natural acustic summing of signals will probably drive it batty, but that’s where it would probably prove most useful. I'd agree, something like a gaggle of geese would be pretty difficult. Though, in fairness, they did seem to indicate that this was only geared towards musical performance and not any and all sounds.
|
|
|
Post by sirthought on Dec 28, 2018 0:37:30 GMT -6
I think the real world implications would be something like this: The code that produces said 2.5D effect is baked into Facebook standalone video devices and their website. No matter who uploads a video, when a viewer watches it, the code analyzes the video and spits out a 2.5D soundfield. Maybe only with headphones or earbuds. Maybe with certain home theater setups. IDK. But if they can get software that just looks ahead in a video and places the audio just so, that would be pretty next-level stuff. No special mixing required.
|
|
|
Post by mulmany on Dec 28, 2018 12:26:51 GMT -6
15 years ago, I listened to a 5.1 demo of a thunderstorm that was recorded off a camcorder's mono microphone. It sounded very good. The decoder was some hybrid of Dolby tech. Don't remember the company. But the thought then was to allow for surround sound audio decoding even on small portable recording devices, without the need for a surround mic array.
|
|
|
Post by the other mark williams on Dec 28, 2018 13:04:28 GMT -6
Surely this will someday be used by the robots to hyper-localize where we humans are hiding from their destructive death-rays as they hunt us in the streets.
|
|
|
Post by hadaja on Dec 28, 2018 17:53:05 GMT -6
Hey Mark, nice to see another Sci-fi fan on the forum. Just love those types of shows.
|
|
ericn
Temp
Balance Engineer
Posts: 14,976
|
Post by ericn on Dec 28, 2018 21:41:48 GMT -6
Surely this will someday be used by the robots to hyper-localize where we humans are hiding from their destructive death-rays as they hunt us in the streets. Shhh they already are😁
|
|
|
Post by the other mark williams on Dec 28, 2018 23:01:16 GMT -6
Surely this will someday be used by the robots to hyper-localize where we humans are hiding from their destructive death-rays as they hunt us in the streets. Shhh they already are😁 Hahaha perfect response, ericn! I should note, too, that I am *mostly* kidding in my robot post. But not entirely. Not entirely at all. I have some trouble understanding what the benefit to humanity is of the tool from the original post. Besides, "wow, that sounds cool, doesn't it?!?!" I can't see what exactly it is achieving that is of any real positive benefit in the long run. But then again, AI kind of freaks me out. I feel freaked out when I get an ad for a product that I would never purchase, but that my wife and I had a discussion about two days ago. My wife would never buy the hypothetical product, either, FWIW. Neither of us will have searched for it online. I do believe our phones are listening to us. And that freaks me out.
|
|
ericn
Temp
Balance Engineer
Posts: 14,976
|
Post by ericn on Dec 28, 2018 23:35:33 GMT -6
Hahaha perfect response, ericn! I should note, too, that I am *mostly* kidding in my robot post. But not entirely. Not entirely at all. I have some trouble understanding what the benefit to humanity is of the tool from the original post. Besides, "wow, that sounds cool, doesn't it?!?!" I can't see what exactly it is achieving that is of any real positive benefit in the long run. But then again, AI kind of freaks me out. I feel freaked out when I get an ad for a product that I would never purchase, but that my wife and I had a discussion about two days ago. My wife would never buy the hypothetical product, either, FWIW. Neither of us will have searched for it online. I do believe our phones are listening to us. And that freaks me out. I agree other than say some restoration and archive applications the question is why? What modern recorder isn’t stereo? The thing with AI is the power your handing the guys writing the code! That scares the living hell out of me Mark, talk about the absolute destructive power of a couple of talented black hat havckers! Plus just think of when AI is powerful enough to write the AI!
|
|
|
Post by the other mark williams on Dec 29, 2018 1:32:41 GMT -6
Hahaha perfect response, ericn ! I should note, too, that I am *mostly* kidding in my robot post. But not entirely. Not entirely at all. I have some trouble understanding what the benefit to humanity is of the tool from the original post. Besides, "wow, that sounds cool, doesn't it?!?!" I can't see what exactly it is achieving that is of any real positive benefit in the long run. But then again, AI kind of freaks me out. I feel freaked out when I get an ad for a product that I would never purchase, but that my wife and I had a discussion about two days ago. My wife would never buy the hypothetical product, either, FWIW. Neither of us will have searched for it online. I do believe our phones are listening to us. And that freaks me out. [...] The thing with AI is the power your handing the guys writing the code! That scares the living hell out of me Mark, talk about the absolute destructive power of a couple of talented black hat havckers! Plus just think of when AI is powerful enough to write the AI! Bingo. Terrifying.
|
|
|
Post by iamasound on Dec 29, 2018 2:26:44 GMT -6
Surely this will someday be used by the robots to hyper-localize where we humans are hiding from their destructive death-rays as they hunt us in the streets. ... with help too from info gleaned from Home Mini and their ilk.
|
|
|
Post by Quint on Dec 29, 2018 8:19:42 GMT -6
I get the fear of AI, and don't think it's entirely unfounded, though I do think it can do some incredibly useful things and is misunderstood, at least in some cases. If you want to read some really interesting developments on this front: spectrum.ieee.org/tech-talk/transportation/self-driving/cheap-centimeterprecision-gps-for-cars-and-dronesI went to a talk on this particular research at the Texas GIS forum a few years ago. It was kind of mind blowing what this sort of advancement would allow. The word "Skynet" came out of my mouth more than once. It was both incredibly interesting and also kind of scary. It's not that this particular research involved AI itself, but it IS the kind of advancement that would give AI a truly global reach as far as the kind of information net it would need to cast to do the very thing that is being fearfully discussed in this thread. At this conference, they were discussing the ability of everyone's phone to cheaply and easily create a centimeter accurate, georeferenced, 3D image (with corresponding photos) of the entire world. Think Google Street View, but EVERYWHERE, not just what can be seen from a Google car driving down the street. This includes the inside of buildings. Anywhere a smart phone can go, so can this network. Both interesting and scary stuff.
|
|
|
Post by johneppstein on Dec 29, 2018 12:23:32 GMT -6
It seems to me that the processing is probably some combination of time delay and phase manipulation - I doubt that much EQ would really be necessary, but maybe.
As to why? Well, these days if an idea is technically feasible we can be pretty sure that somebody's going to do it just because they can, regardless of how dumb the idea might be.
|
|
|
Post by jin167 on Dec 29, 2018 20:46:33 GMT -6
arxiv.org/pdf/1812.04204.pdfRead the original paper and it will answer most of your questions (or contact the author. They will be more than willing to tell you about their project. At least that was my experience when I was working on my project based on www.ofai.at/~jan.schlueter/pubs/2017_eusipco.pdf). I really liked their idea of using video frames to provide visual spatial information. I wonder how they have implemented that feature in their code (pity they didn't make their code open to public..). p.s. Don't trash talk someone else's work if you don't know what you're talking about.
|
|
|
Post by johneppstein on Dec 31, 2018 0:55:58 GMT -6
arxiv.org/pdf/1812.04204.pdfRead the original paper and it will answer most of your questions (or contact the author. They will be more than willing to tell you about their project. At least that was my experience when I was working on my project based on www.ofai.at/~jan.schlueter/pubs/2017_eusipco.pdf). I really liked their idea of using video frames to provide visual spatial information. I wonder how they have implemented that feature in their code (pity they didn't make their code open to public..). p.s. Don't trash talk someone else's work if you don't know what you're talking about. So I was more or less correct, albeit in a very simplified way and witjh taking the video analysis part for granted? Cool, that's very interesting.
Howevber I find binaural playback to be something of a niche application, given that it requires the use of headphones. Nonetheles, interesting.... very interersting.
|
|
Deleted
Deleted Member
Posts: 0
|
Post by Deleted on Dec 31, 2018 12:15:29 GMT -6
I was working with a buddy for about a year who had one of those Neumann head microphones. Pretty weird to listen with headphones on after recording with that thing. Interesting article, but I’m not really sure what they are getting at or if they are even close to being the first to do this? Binaural mic’ing has been around for quite some time now. I actually built a binaural head about 40 years ago. It was a wig dummy that I covered in putty (to have a skin-like consistency). A couple of cardboard pinnae and clips for the old Sony EC-1s (all I could afford). The pinnae stuck out a bit, so I named the head Lyndon (after President Johnson, with whom the head had some similarities). The thing actually worked pretty well--at least well enough for my experiments. The problem was the players. They couldn't stop laughing at the thing.
|
|
|
Post by the other mark williams on Dec 31, 2018 12:24:55 GMT -6
arxiv.org/pdf/1812.04204.pdfRead the original paper and it will answer most of your questions (or contact the author. They will be more than willing to tell you about their project. At least that was my experience when I was working on my project based on www.ofai.at/~jan.schlueter/pubs/2017_eusipco.pdf). I really liked their idea of using video frames to provide visual spatial information. I wonder how they have implemented that feature in their code (pity they didn't make their code open to public..). p.s. Don't trash talk someone else's work if you don't know what you're talking about. Hi there, I'm not sure if you're referencing my comment or not, but if so, I apologize if I came off as trashing someone's work. That wasn't my intention at all. I *do* think there are many outstanding questions re: AI that are not being discussed openly, and that too much AI research is happening without any public oversight. From my admittedly limited vantage point, it appears the governing question in AI research is "can we do this?" rather than " should we do this?" I find that terribly troubling. Also, re: the two publications you referenced: The first link goes to a paper where one of the lead authors is a Facebook employee. This does nothing to assuage my concerns in any way, but rather amplifies them exponentially. The second link appears to be broken: it goes to a 404. Again, not trying to trash anyone's work, just conversing. Cheers.
|
|
|
Post by jin167 on Dec 31, 2018 21:19:22 GMT -6
arxiv.org/pdf/1812.04204.pdfRead the original paper and it will answer most of your questions (or contact the author. They will be more than willing to tell you about their project. At least that was my experience when I was working on my project based on www.ofai.at/~jan.schlueter/pubs/2017_eusipco.pdf). I really liked their idea of using video frames to provide visual spatial information. I wonder how they have implemented that feature in their code (pity they didn't make their code open to public..). p.s. Don't trash talk someone else's work if you don't know what you're talking about. Hi there, I'm not sure if you're referencing my comment or not, but if so, I apologize if I came off as trashing someone's work. That wasn't my intention at all. I *do* think there are many outstanding questions re: AI that are not being discussed openly, and that too much AI research is happening without any public oversight. From my admittedly limited vantage point, it appears the governing question in AI research is "can we do this?" rather than " should we do this?" I find that terribly troubling. Also, re: the two publications you referenced: The first link goes to a paper where one of the lead authors is a Facebook employee. This does nothing to assuage my concerns in any way, but rather amplifies them exponentially. The second link appears to be broken: it goes to a 404. Again, not trying to trash anyone's work, just conversing. Cheers. Hi Mark, No, my comment wasn't aimed at you don't worry. I share some of your concerns about the current trend and speed at which AI research projects are being carried out and the lack of control over them but this is the dilemma we have to deal with every time we come across a technology that has the potential to completely transform the way we live (e.g. CRISPR). Anyways, the author wrote the paper while he was working as an intern at Facebook AI. As for the broken link try www.eurasip.org/Proceedings/Eusipco/Eusipco2017/papers/1570347092.pdf or google 'Two Convolutional Neural Networks for Bird Detection in Audio Signals'. Happy new year!
|
|