ConfusedDog 5 years ago

What you need a couple CNN layers to identify the most funny possibilities of translations and make a YouTube channel like the Bad Lip Reading, then profit! Even now, https://www.youtube.com/watch?v=5Krz-dyD-UQ still cracks me up.

  • gliptic 5 years ago

    It also has some funny Easter eggs, like a "Fibonacchos" stand.

  • kakarot 5 years ago

    Combine it with some parametric voice tech and you've got yourself a delicious automated stew.

    • lwansbrough 5 years ago

      Using AI to make fun of AI, in a nutshell.

samstave 5 years ago

How many secret efforts are there to accomplish this already for the MIC?

I can't imagine that there arent already some Palantir-like efforts to accomplish this.

Imagine a REALLY good zoom lens on a very small drone that can not be seen/heard by a target and that drone is doing something like this to gain info.

Imagine the same zooming through windows as well.

This will be the next big ML-military step towards Total Information Awareness taken, if its not already available in the wild.

  • kakarot 5 years ago

    Here you go:

    https://news.mit.edu/2014/algorithm-recovers-speech-from-vib...

    Frequency attenuation + sub-pixel color profiling means you don't even need an expensive camera in a lot of cases.

    Get a plastic cup of water or similar object, put it on someone's desk, record video from far away, combine with something like this [0] and you've got a very interesting avenue for corporate espionage. If you could reconstruct typed passwords from the object, it's a really powerful technique.

    [0] https://people.eecs.berkeley.edu/~tygar/papers/Keyboard_Acou...

  • asdfasgasdgasdg 5 years ago

    For windows, they already have this, IIUC. You bound a laser beam of the window and measure the vibrations. Random guys can just do this in their garage.

    https://www.youtube.com/watch?v=1MrudVza6mo

    • conistonwater 5 years ago

      The Applied Science guy is most definitely not a random guy in a garage, though, he's incredibly skilled and talented. The rest of his youtube channel is pretty amazing also.

      • Shish2k 5 years ago

        I am a random guy in a garage, and I made a functional laser microphone using random bits of electronics I had in my spare-bits box ($1 laser pointer, old pair of earphones, snip off the earphone and wire in a light dependent resistor) -- admittedly the quality was awful (you could only just make out voices if people in the room talked abnormally loud), but it was great for a fun weekend science project :D

        • samstave 5 years ago

          A write-up or vid of the components and build would be interesting..

        • gugagore 5 years ago

          I would imagine an LDR would be too slow to have a good response to audio.

gok 5 years ago

Maybe I'm misunderstanding the code, but it looks like it's matching audio to video, not actually recognizing speech given a video. That is, it could answer "does this audio line up with this video?" but not "what is being said in this video?"

  • derimagia 5 years ago

    I didn't take a deep dive of the code but in order to train it's going to need to be fed audio files with the actual video/mouth shapes/etc. Essentially it needs it to tell the reward to give back (if it was right). Once it "learns" it wouldn't need the audio file.

  • pavs 5 years ago

    in order to train doesn't it have to match audio output to a video of mouth movement?

    Doesn't deep learning imply training on sample result?

sgt 5 years ago

Open the pod bay doors, HAL.

  • snakeboy 5 years ago

    This scene would actually make a really cool test case!

  • donohoe 5 years ago

    I'm sorry, Dave. I'm afraid I can't do that.

  • biarity 5 years ago

    We're getting there!

meow_mix 5 years ago

This is fascinating. Has anyone considered repurposing this for something like sign language?

Havoc 5 years ago

That's actually a really good application with some real potential for improving lives. High five mate

  • PowerfulWizard 5 years ago

    Yeah it is interesting, and it could also be a big boost to plain olde speech to text in cases where you have video if the errors were non-correlated (which I wasn't able to determine from skimming the readme.)

    edit: now I see it is being used to match audio samples, not to generate text so it wouldn't create an independent value from the audio in this arrangement. Other than i.e. speaker attribution which they mentioned.

orasis 5 years ago

OK. But WHY? All technology has moral implications. Did you create this to actually help people? Do you care if it is weaponized? Think before you create.

  • orasis 5 years ago

    It reflects poorly on this community that any comment that questions the ethics of technology gets downvoted.