Search Unity

Could voice recognition make game development easier/faster/better...

Discussion in 'General Discussion' started by Arowx, Feb 5, 2020.

  1. Billy4184

    Billy4184

    Joined:
    Jul 7, 2014
    Posts:
    6,024
    I don't believe that the ability to redefine a problem is outside of the ability of an AI. The question is, what are you redefining the problem to? Humans don't randomly define problems, we define them in such a way that the solution provides is with greater understanding of the world around us, strengthens us, increases the chances of survival at a more or less abstract level. The system for developing the problems and defining them is simply a question of experimentation and survival/thriving (or not).

    There is only one (though very broad) criteria for success in everything that is meaningful, and despite what modern art will tell you, it's not something you can choose at random or invert at will.
     
  2. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,493
    I don't think it's outside what ai can potentially done right now, I just think we don't have it yet, in fact we probably don't even have a proper framework, as human, to think about it beyond vague term.


    (written before reading everything, editing and redacting, left for the lulz lol because that's the second part of your argument) It's also possible that we as human overstate that problem and currently not doing any better, ie redefining a problem is probably something that happen because it's a subset of the bigger problem that is probably more unconscious that we conceptualize it, ie "surviving". (so yeah we kind of think alike lol)

    Being an art student, I don't think that's what modern art is ACTUALLY saying, unless you remove all context ad absurdum lol
     
  3. RichardKain

    RichardKain

    Joined:
    Oct 1, 2012
    Posts:
    1,261
    It's a common misconception to conflate the human mind with computers. Some people just think brains are big-ass organic computers, and that with sufficient sophistication, a computer can do anything a human brain can. This is not the case. Human minds do not function like computers. Brains can perform certain tasks far better than computers, while other processes are far more in the computer's wheelhouse.

    One of the prime examples of this is vocal recognition. Human minds are really great at understanding human language. They pick up up languages just by being surrounded by other people. The inference, intuition, and pattern recognition that language requires are all things that the human mind are good at.

    Computers absolutely suck at all of those tasks. Computers can quickly process logic that someone else has written, but creating brand-new logic is not their forte. And intuitive leaps are something that have never been properly replicated artificially. Computers simply don't function that way. Ultimately, human language is intended for humans. Using it for pre-defined keywords as interface shortcuts is do-able. Creating computers that properly "understand" human language is the realm of science fiction. We're barely at the point of mechanical recognition, and far off from actual cognition.
     
    Ryiah likes this.
  4. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,493
  5. Murgilod

    Murgilod

    Joined:
    Nov 12, 2013
    Posts:
    10,157
    This proves RichardKain's point more than anything. This is just Yet Another Transformer Implementation that requires absolutely massive datasets. On top of that, perplexity isn't a very useful metric outside of the novelty of conversational AI. If anything, if you want voice recognition for gamedev, the last thing you want is perplexity. You want to be able to rely on consistent results.

    Conversational AI of this type is a research toy and nothing more.
     
  6. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,493
    Oh I agree, that's why I put on the date, the title is toward, it's not we are there, let's pop a champagne. Also is a thinly veil recuperation of ai dungeon design with a tinge of a new revelation to make it "legit".

    Also massive dataset is what a child goes through from being born to uttering word in a sequence that make some sense.

    And like I pointed, you can't have that conversational agent from random text anyway, because, even if it can hold a conversation, it's like learning japanese culture and custom through badly written anime fantasy. It's doom to fail in principle. Also these AI are designed with no internal concern for existing in the context of that discussion, language is expression of internal state, there is none except structure of language.

    And that's the point I'm trying to make, I saying, with such faulty premise, we get result like that, that's goddamn scary.

    Two paper down the line and it will improve anyway, it's like being a slow boiling frog.
     
  7. RichardKain

    RichardKain

    Joined:
    Oct 1, 2012
    Posts:
    1,261
    I'm mainly concerned with immediate practical applications. Speculation on future developments are fun to wax poetic about, but don't particularly help with the here and now.

    Here and now, your best bet for voice recognition is the mobile libraries being fielded by Apple and Google. Microsoft also has some internal libraries for Windows, but they don't seem to be putting quite as much effort into them. The Apple and Google efforts are partially being fueled by their own smart-speaker initiatives. Those devices make it possible for them to pull down way more data for running comparison and analysis. Most voice recognition libraries require virtual "training," in order to adapt their response to a particular type of voice or style of speech. Having access to a huge data-set makes it easier and faster to adjust a system's expectations for different individuals. (working from similar examples recorded in the past)

    I'm actually looking at ways to exploit those libraries for a possible lip-sync animation solution, but it's slow going. And the results always seem to be hit-and-miss.
     
  8. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,493
    I was reacting to the second, because for the former, it was yesterday. It's already in used in search engine and many interfaces, and you have the testimony of ZombieGorilla, that for the sound recognition, it's been there for a while, complex semantic is what's missing.

    But the main problem is you don't need human level intelligence in most cases, practical use are especially not that deep. Somehow, being able to maintain a full conversation is always held at this proof it can't be use practically ...

    I think we should probably define concrete use case, of where it fails and where it wins. My proposition pre AI discussion was voice is cool to navigate a vast array of items intuitively, which already have latency due to overloading and memorization of various hack on top of traditional interfaces. It was much more straightforward than a "conversation" or "social awareness", which aren't interface concept.
     
  9. RichardKain

    RichardKain

    Joined:
    Oct 1, 2012
    Posts:
    1,261
    Using voice recognition for inventory retrieval would indeed be a very appropriate application. Certain types of games can get vast inventories stuffed with all manner of things that would be difficult to browse through. Being able to select items from such a system with a simple command could be a fantastic shortcut, instead of rapidly scrolling through huge windowed lists. The main consideration would be needing to keep the names of all of the items as distinct as possible, so as to avoid confusion in the recognition, but that is manageable.
     
    Kiwasi likes this.
  10. Murgilod

    Murgilod

    Joined:
    Nov 12, 2013
    Posts:
    10,157
    You can do this now. For instance, it takes me no time at all to find things in Unity because I have a structured name for everything. In the amount of time it would take me to say "open my matcap shader" I could have already typed "shd matc"
     
  11. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,493
    Yeah but you can do it simultaneously with doing something else, my initial use case is game, where you are forced to pause and navigate with d pad, or even mice.

    In your case it might make less sense, as the example probably happen in a sequence, but then it's not just about finding the item, you also have to click it, drag in the scene and applied to the object, maybe it's more fair if the commend recognize an object is selected, you ask for a shader and it automatically applied it to the object while also creating a material when needed (though I'm not fond of that because, it will probably put in a crap place, which mean you still have the dragging, and it will encourage beginner to leave crap all other the place, without proper folder structure to order thing, it's happening with hololens industrial apps ... where noob just put random valve they don't need in some random place instead of destroying them, like virtual littering is a thing futurist don't anticipate :p )

    For application it depend on how the overall interface structure and rhythm of a specific work, I don't need it for most actions I do in blender as the keyboard is already parallel to the main action (which use the mouse).

    There is no silver bullet, but that's not enough to completely write off a technique on a single point of failure. That's what design is for.
     
  12. Billy4184

    Billy4184

    Joined:
    Jul 7, 2014
    Posts:
    6,024
    There are of course differences at the most basic level .. but I think it is a common mistake to compare the average computer, which is more or less a general computing device, with brains, which are absolutely not such a thing.

    To begin with, you cannot compare a computer to a brain unless the software in the computer has, in some form or another, had the same volume of learning experiences as a human brain.

    On top of that, human brains are not general computing devices. They have structures that are built in, hardwired over long periods of evolution. For example, a human language cannot have arbitrary rules, it has to follow a certain pattern and structure that the brain seems to already have an interface built-in for. (I remember a talk by Noam Chomsky on the subject but I can't remember where). That's not even covering the systems that govern the formation and recollection of memories, the abstraction of knowledge ..

    That doesn't necessarily mean that computers and brains function very differently at the most basic physical level though. Perhaps a completely unadapted collection of brain cells operating on nothing but a boot process would also be a general computing device (or quite possibly not).

    But it's obvious that to compare a computer with a brain, at the very least you have to add something that accounts for the specific adaptation and specialization that a brain features. Otherwise you might as well consider the brain of a newborn baby and the art director at Naughty Dog to be the same thing.

    How much of the apparent unique capability of the human brain is just software, or whether it represents a fundamental gap between computers and brains, is hard to say. But they have to at least start from a hypothesized similar point of adaptation for the comparison to be potentially useful.
     
    zombiegorilla likes this.
  13. zombiegorilla

    zombiegorilla

    Moderator

    Joined:
    May 8, 2012
    Posts:
    9,052
    There is a great chapter in Randall Munroe's book "what if?" called Human Computer that does a great job of quantifying the differences between brain and computer. (Randall Munroe of xkcd.com fame)
     
    neoshaman and Billy4184 like this.
  14. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,493
    Then there is the notion of emulation, that is artificial neural network are crude emulation of actual neurons, and seems to have inherited some property. So while a computer don't work like a brain, the software emulation is close enough. We already have completely emulated a worm brain, though that's less complex than a human one. The human brain have high level structures we haven't parsed and understood yet.

    https://www.sciencealert.com/scientists-put-worm-brain-in-lego-robot-openworm-connectome

    Imho we are on the verge of not calling Neural network to be intelligence anymore (like we barely consider state machine, relational database, expert system, as intelligence anymore). It's more and more obvious that it's a statistical, fuzzy, self organizing database (with great property), the main question is really how do we one shoot insert semantics, and how do we one shot extract emergent self organized semantics (if it's possible at all).

    Once we done that, it will lost all the mystic it currently has, just to become another tool of the trade. Most progress in the field seems to be more about the architecture, than the neuron, combined with other techniques that are more supplements or layers to the actual neuron structure (like finding new reward or memory systems). Neural network are basically just one of the component.

    In Fact I would not be surprised if some big progress would be made by ditching neuron altogether, in some domain, and have the architecture replace them with a simpler equivalent (like some image recognition where found to be replaceable by the image equivalent of bag of word, which is way more simpler to understand and manipulate).
     
  15. Murgilod

    Murgilod

    Joined:
    Nov 12, 2013
    Posts:
    10,157
    Again though, this is so far off that it basically ends up being a pointless hypothetical. The human brain contains 331125827 more neurons than that of a worm, and even OpenWorm dramatically simplifies how its neuron system works to function at all.
     
  16. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,493
    That's literally what I said?

    I was just talking about the difference of hardware, and how emulation is possible despite the difference, I'm wasn't saying we will emulate the human brain anywhere soon, everything else is spent dispelling the mystic of neural network.

    I probably suck at conveying ideas :(
     
  17. Kiwasi

    Kiwasi

    Joined:
    Dec 5, 2013
    Posts:
    16,860
    Replaying Skyrim on the console right now. Voice commands to take use inventory items would certainly be a boon.

    However it could be argued this is just an artifact of how poor Skyrims inventory system is to begin with. I've seen plenty of systems that are way better where voice commands would be superfluous.
     
  18. Ryiah

    Ryiah

    Joined:
    Oct 11, 2012
    Posts:
    21,192
    neoshaman and Kiwasi like this.
  19. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,493
    Dang, it's just the dialog line not inventory, :( you got me exciting, that's the most clunky way to do it, I guess it's a proof of concept it can be modded in I guess.

    Some more thought on voice, less about defending it :p

    Skyrim on gamepad already have the dpad shortcut anyway and probably the num key on pc), that's a smaller breadth than voices, and you can't move as you must lift the same thumb controlling the movement stick, BUT let's be frank, inventory tend to be accessed in downtime anyway (that's secret counter argument I kept to myself while using inventory as an example).

    Command to companion would have been a better use if "precise aiming" wasn't needed (assuming a game like mass effect), cursor pointing remain the superior option when augmented with context sensitive mechanics. Everything that deal with precise space, selection of cloned object (ie enemies), etc ... would be hard, unique items (therefore direct selection) are a much better use case.

    Voice probably lack the evolution of complementary ideas like context sensitive action for direct input device, remember when text adventure had you type everything noun verb, then click text to associate to noun on the scene, then just click because object has affordances (yeah you gonna open the door mostly, no need to click open the door).

    Human tend to shorten high frequency use of "voice items". I wonder if a visual feedback of the last command would allow to have reference mechanics like we use in plain language, like using pronoun to refer to previously told noun. Just an example though, probably not that explicitly. Also interrupted input is something I have no clue how to actually manage, in a complex set of command. Because the finer you would want to get, the longer and more clunky command would be.

    We tend to avoid language and voice for repetitive task, it's more focus and directed into unique state composition in unique situation. It's better if punctual, for example it's best to unleash a super technique when a bar is full in a fighting game (only happen once in a while, tend to be highly meaningful) than regular punch (diluted meaning, happen many time in a minute).

    So the ideal use of voice is:
    - low frequency use
    - high latency situation
    - simple short utterance
    - highly meaningful to context
    - with a breadth of unique items to select
    - simultaneous to other inputs

    The closest thing I can think off is the surgeon at work lol. Any similar situation. In VR is probably great because input are sparser than other medium, and menu are intrusive.
     
  20. Ryiah

    Ryiah

    Joined:
    Oct 11, 2012
    Posts:
    21,192
    Latest beta release supports mapping voice commands to the keyboard and mouse, console commands, entries in your favorites menu (this one is only Skyrim VR), etc. Mention of it is made around the middle of the mod page.

    By the way here is the quick start guide for using the framework they choose in Unity. Supports more than just Windows.

    https://docs.microsoft.com/en-us/az...rts/speech-to-text-from-microphone?tabs=unity
     
    Last edited: Feb 11, 2020
    neoshaman likes this.
  21. juggyruggy

    juggyruggy

    Joined:
    Feb 11, 2020
    Posts:
    2
    Oh for sure, can't wait for it to become reality, just imagine, creating objects and making them move all with the power with your voice!
     
  22. digiross

    digiross

    Joined:
    Jun 29, 2012
    Posts:
    323
    I've used both Bixby and Alexa and they don't understand a damn thing. IMHO the technology would have to improve drastically to even be viable. If it worked it could be very cool. Reminiscent of the Star Trek holodeck which is amazing. Current VR is a joke compared to it unfortunately. But we're not in the 23rd century yet. One can hope. LOL
     
  23. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,493
    Even with a perfect VR now, I'm not sure you will solve the moving around in virtual space, and the force feedback associated. Abstraction of space is not just a tech limitation imho.
     
  24. Kiwasi

    Kiwasi

    Joined:
    Dec 5, 2013
    Posts:
    16,860
    Try out a heavy alchemy build. When you are accessing poisons and potions every few strikes Skyrim's inventory system gets old fast. Even with the quick access options.
     
    Ryiah and neoshaman like this.
  25. Ryiah

    Ryiah

    Joined:
    Oct 11, 2012
    Posts:
    21,192
    I honestly believe one of the reasons stealth archers are popular with the Elder Scrolls is because you can simply equip a bow, some arrows, and never open your inventory again. Anything else requires you to micromanage and it simply isn't enjoyable without an inventory overhaul mod.
     
    neoshaman and Kiwasi like this.
  26. Murgilod

    Murgilod

    Joined:
    Nov 12, 2013
    Posts:
    10,157
    I mean, there's that and the fact that stealth archery attack bonuses are absolutely ridiculously OP when you consider arrow DPS, and coupling that with stealth's later abilities basically letting you jump back into stealth whenever you want...
     
    neoshaman, Kiwasi and Ryiah like this.