Autocomplete Using Markov Chains

166 points by dennis714 5 years ago

rijoja 5 years ago

This is really cool and an inspiration of mine. As a matter of fact I am working on something very similar to this.

I don't know if you are familiar with the Dasher project for text input, but as of know I'm trying to improve on that work partly by improving on how many letters are available simultaneously by protecting the line of text upon a fractal surface. Something that should be a more efficient use of a 2d surface, theoretically infinitely so.

As far as autocomplete is concerned my approach is to try to do exactly this but on a character basis. I think this can lead to some interesting advantages, for example different dialects gives rise to words that not always conform to dictionary specifications.

The next level would be to go one step higher so to speak. If we imagine Markov Chain on letters as the first level and said chains level, I'd say that the third level in our hierarchy would be to apply markov chain on groups of words grouped by proximity in a word2vec space.

Having markov chains working on groups of word2vec words would give us a statistical analogy of grammar. However without having to implement it programatically, something that inevitably would lead to missed corner cases and if not that a too strict algorithm that would hinder intentional abuse of grammar by purpose.

Maybe this is already being implemented, as it to me seems as the logical next step. Anybody got any info on this?

ben_w 5 years ago

A friend of mine worked on Dasher (hi Alan, if you’re reading this!); I thought they achieved the same goal with a hyperbolic space rather than a fractal space?
Given this was about a decade ago, there are probably better predictive models than whatever was in the demo he showed me, but I wouldn’t be surprised if their model was a Markov chain where the size of each next-letter option was a function of probability.
(Memory haze: I was more excited by the hyperbolic space than the word model when he showed it to me, and it was c. 2008)
- rijoja 5 years ago
  
  I'd say that the prediction model that they are using basically would end up being the same as Markov chains. So if you just slide through a dasher structure set up with words you'd more or less get the most probable Markov chain output.
  Really I started this project before having any idea of the dasher so I'm coming from a bit different angle. My original idea was to make use of trees that are selected step by step, i.e. with discrete movements instead of continuous movement. So basically you'd have a tree that you step down through. The most frequent symbols would appear closer and as such have shorter paths describing them.
  https://www.bitchute.com/video/aKl7jUrtcOt8/
  As I see it the markov chain:ines is not the intent of the algorithm but rather a byproduct of it.
  I only heard of dasher last December, and my mind was blown. It was almost eerie how many things we had approached similarly.
  As for the rendering I am proposing I'm in the process of rewriting all my work up until now in nim as a form of advanced procrastination. But I aim at having a prototype up and running quite soon. One thing that could be optimised with in dasher is that some of the letters are not visible at all times. Why is this well, because they are drawn up on a line, so at best you can have sqrt(ww+hh)/font_height letters at a time, given a diagonal approach.
  Imagine instead that if we draw the line on the radius of a circle with the radius of w/2. Then we could have (pi*w/2)/font_height letters drawn out along that line at a time.
  However if we draw a fractal that more or less traces the same arc, the jagged edges would extend to an infinite surface. Of course we're still limited by resolution. Still the advantage here is that we make use of the 2-dimensional nature of the screen is put to use. From the edges that have no letter a deeper structure furthermore could be extended.
  The next advantage of this, where this becomes a synthesis of my more tree like structures, is that the most frequent letters could be placed closer to the cursor/player. As such the velocity on average by which the cursor travels would have to go through would be reduced by quite a lot.
  Sounds good in theory in my head. But you can't really be sure until you have it tested in code right.
  This will be out within the few coming days.
  B.t.w. I'm doing this research on my own, so if there is any interest in participation in any shape or form, be it suggestions, testing or whatever would be greatly appreciated.
  https://www.laarc.io/l/enow https://www.bitchute.com/channel/eruditenow/ My rather playful first stab :D http://sigma.eruditenow.com/
bryanrasmussen 5 years ago

I am familiar with Dasher (although my only considerations regarding it are as a tool to improve user accessibility), but I don't understand what is meant by "protecting the line of text upon a fractal surface"
Even if I change protecting to projecting I don't get it.
- rijoja 5 years ago
  
  Woops what an embarrassing typo sorry about that.
  Well imagine if you would like to draw as many intervals of any given shape but with the same length on a line.
  Let's start out with a line. Then you can only have so many symbols given by the height.
  To continue if we instead where to space them out over the circumference of a circle, we would on the same width have more symbols.
  If we go even further and give the circle jagged edges that go in and out. Theoretically we have an infinite length on the line, but resolution on screen and eyes sets an stop to that.
  So basically making use of the area of the screen instead of only a line.
  Did I make more sense to you? The fractal thingy is just an experiment and not the main point of the comment so I gave a very brief description of it. It may very well be that my capabilities of describing it in proper mathematical is a bit lacking.
  Will have a demo of it within a few days or so if you are interested keep your eyes open.
  
  bryanrasmussen 5 years ago
  
  Ok thanks, I probably need to see it, not the most visual thinker so hard for me to envision the idea.
  
  rijoja 5 years ago
  
  Yeah at first I did this as a fun sideproject a fun way of doing something of absolutely no use to anybody. Then I thought of accessibility issues, now my focus is somewhat on VR, where I envision that keyboards will break the immersion and pointing and moving a pointer on a virtual keyboard would be fun for consumer tasks, but for actual work say programming and whatnot it simply doesn't cut it.

stewbrew 5 years ago

I wonder what the result would look like if this were applied to source code. IMHO a probabilistic algorithm for code completion could turn out interesting.

Most code completion algorithms work deterministic by deducing the set of completion candidates from the receiver's type/class or a list of keywords. Given that people/teams tend to name variables in a certain fashion, a probabilistic completion algorithm could make use of this and adapt to team/project-specific conventions. Given a team's code base one could probably build a pretty good code completion algorithm without any knowledge about the programming language.

likelycomplete[1] tries to do this in a dilettantish ad-hoc way for vim. It rates completion candidates (that are gathers from previously seen code) on the basis of context information. It's hampered by the limited performance of vimscript though. A full fledged solution would require an external server.

[1] https://www.vim.org/scripts/script.php?script_id=4889

xkapastel 5 years ago

> I wonder what the result would look like if this were applied to source code. IMHO a probabilistic algorithm for code completion could turn out interesting.
Neural program synthesis is similar to what you describe, here's a sample paper:
> We consider the task of program synthesis in the presence of a reward function over the output of programs, where the goal is to find programs with maximal rewards. We employ an iterative optimization scheme, where we train an RNN on a dataset of K best programs from a priority queue of the generated programs so far. Then, we synthesize new programs and add them to the priority queue by sampling from the RNN. We benchmark our algorithm, called priority queue training (or PQT), against genetic algorithm and reinforcement learning baselines on a simple but expressive Turing complete programming language called BF. Our experimental results show that our simple PQT algorithm significantly outperforms the baselines. By adding a program length penalty to the reward function, we are able to synthesize short, human readable programs.
https://arxiv.org/abs/1801.03526
venuur 5 years ago

A product called Tabnine[1] was discussed on Hacker News recently that supposedly does this.
[1] https://news.ycombinator.com/item?id=18393364
ThoseWhoDied 5 years ago

This exists as a startup called Kite: https://kite.com/
agumonkey 5 years ago

The day a completer can fix a Y situation I'll be damned

darkpuma 5 years ago

I've thought about using a markov chain suggester to trip up stylometric analysis, but never got around to creating some sort of practical UX for it.

I think if you plugged it into Vim or Emacs's autocompletion functionality, that might do the trick.

xkapastel 5 years ago

Keep in mind this is essentially the same concept as the big scary AI from OpenAI that is making the news recently. They use neural nets, not markov chains, but the idea is similar: given a word, predict the next word.

> It's surprising how easy this can be turned into something rather practically useful

Given the above, it's not so surprising: this word prediction problem is fundamental, with a wide range of applications.

Sean1708 5 years ago

> Keep in mind this is essentially the same concept as the big scary AI from OpenAI that is making the news recently. They use neural nets, not markov chains, but the idea is similar: given a word, predict the next word.
Isn't this kind of like saying that a plane is essentially the same concept as a car because they both transport people from A to B? "given a word, predict the next word" is the problem statement (i.e. what the problem is) but that's not very interesting, what's interesting is the solution (i.e. how you solve the problem). Markov Chains and the kind of Neural Nets used in the text generator that made the news are very different, even if they're attempting to solve the same problem.

braindead_in 5 years ago

This looks to be quite useful. Is there a github project that I can play around with?

amrrs 5 years ago

I've got a similar Kaggle Kernel - https://www.kaggle.com/nulldata/meaningful-random-headlines-...

TomK32 5 years ago

would be nice to test this as a reply bot for spam mails.