| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389 | <!doctype html><html>  <head>    <title>CodeMirror: Internals</title>    <link rel="stylesheet" type="text/css" href="http://fonts.googleapis.com/css?family=Droid+Sans|Droid+Sans:bold"/>    <link rel="stylesheet" type="text/css" href="css/docs.css"/>    <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>    <style>dl dl {margin: 0;}</style>  </head>  <body><h1><span class="logo-braces">{ }</span> <a href="http://codemirror.net/">CodeMirror</a></h1><pre class="grey"><img src="css/baboon.png" class="logo" alt="logo"/>/* (Re-) Implementing A Syntax-   Highlighting Editor in JavaScript */</pre><div class="clear"><div class="leftbig blk"><p style="font-size: 85%" id="intro">  <strong>Topic:</strong> JavaScript, code editor implementation<br>  <strong>Author:</strong> Marijn Haverbeke<br>  <strong>Date:</strong> March 2nd 2011</p><p>This is a followup tomy <a href="http://codemirror.net/story.html">Brutal Odyssey to theDark Side of the DOM Tree</a> story. That one describes themind-bending process of implementing (what would become) CodeMirror 1.This one describes the internals of CodeMirror 2, a complete rewriteand rethink of the old code base. I wanted to give this piece anotherHunter Thompson copycat subtitle, but somehow that would be out ofplace—the process this time around was one of straightforwardengineering, requiring no serious mind-bending whatsoever.</p><p>So, what is wrong with CodeMirror 1? I'd estimate, by mailing listactivity and general search-engine presence, that it has beenintegrated into about a thousand systems by now. The most prominentone, since a few weeks,being <a href="http://googlecode.blogspot.com/2011/01/make-quick-fixes-quicker-on-google.html">Googlecode's project hosting</a>. It works, and it's being used widely.</a><p>Still, I did not start replacing it because I was bored. CodeMirror1 was heavily reliant on <code>designMode</code>or <code>contentEditable</code> (depending on the browser). Neither ofthese are well specified (HTML5 triesto <a href="http://www.w3.org/TR/html5/editing.html#contenteditable">specify</a>their basics), and, more importantly, they tend to be one of the moreobscure and buggy areas of browser functionality—CodeMirror, by usingthis functionality in a non-typical way, was constantly running upagainst browser bugs. WebKit wouldn't show an empty line at the end ofthe document, and in some releases would suddenly get unbearably slow.Firefox would show the cursor in the wrong place. Internet Explorerwould insist on linkifying everything that looked like a URL or emailaddress, a behaviour that can't be turned off. Some bugs I managed towork around (which was often a frustrating, painful process), others,such as the Firefox cursor placement, I gave up on, and had to telluser after user that they were known problems, but not something Icould help.</p><p>Also, there is the fact that <code>designMode</code> (which seemedto be less buggy than <code>contentEditable</code> in Webkit andFirefox, and was thus used by CodeMirror 1 in those browsers) requiresa frame. Frames are another tricky area. It takes some effort toprevent getting tripped up by domain restrictions, they don'tinitialize synchronously, behave strangely in response to the backbutton, and, on several browsers, can't be moved around the DOMwithout having them re-initialize. They did provide a very nice way tonamespace the library, though—CodeMirror 1 could freely pollute thenamespace inside the frame.</p><p>Finally, working with an editable document means working withselection in arbitrary DOM structures. Internet Explorer (8 andbefore) has an utterly different (and awkward) selection API than allof the other browsers, and even among the different implementations of<code>document.selection</code>, details about how exactly a selectionis represented vary quite a bit. Add to that the fact that Opera'sselection support tended to be very buggy until recently, and you canimagine why CodeMirror 1 contains 700 lines of selection-handlingcode.</p><p>And that brings us to the main issue with the CodeMirror 1code base: The proportion of browser-bug-workarounds to realapplication code was getting dangerously high. By building on top of afew dodgy features, I put the system in a vulnerable position—anyincompatibility and bugginess in these features, I had to paper overwith my own code. Not only did I have to do some serious stunt-work toget it to work on older browsers (as detailed in theprevious <a href="http://codemirror.net/story.html">story</a>), thingsalso kept breaking in newly released versions, requiring me to come upwith <em>new</em> scary hacks in order to keep up. This was startingto lose its appeal.</p><h2 id="approach">General Approach</h2><p>What CodeMirror 2 does is try to sidestep most of the hairy hacksthat came up in version 1. I owe a lot to the<a href="http://ace.ajax.org">ACE</a> editor for inspiration on how toapproach this.</p><p>I absolutely did not want to be completely reliant on key events togenerate my input. Every JavaScript programmer knows that key eventinformation is horrible and incomplete. Some people (most awesomelyMihai Bazon with <a href="http://ymacs.org">Ymacs</a>) have been ableto build more or less functioning editors by directly reading keyevents, but it takes a lot of work (the kind of never-ending, fragilework I described earlier), and will never be able to properly supportthings like multi-keystoke international character input.</p><p>So what I do is focus a hidden textarea, and let the browserbelieve that the user is typing into that. What we show to the user isa DOM structure we built to represent his document. If this is updatedquickly enough, and shows some kind of believable cursor, it feelslike a real text-input control.</p><p>Another big win is that this DOM representation does not have tospan the whole document. Some CodeMirror 1 users insisted that theyneeded to put a 30 thousand line XML document into CodeMirror. Puttingall that into the DOM takes a while, especially since, for somereason, an editable DOM tree is slower than a normal one on mostbrowsers. If we have full control over what we show, we must onlyensure that the visible part of the document has been added, and cando the rest only when needed. (Fortunately, the <code>onscroll</code>event works almost the same on all browsers, and lends itself well todisplaying things only as they are scrolled into view.)</p><h2 id="input">Input</h2><p>ACE uses its hidden textarea only as a text input shim, and doesall cursor movement and things like text deletion itself by directlyhandling key events. CodeMirror's way is to let the browser do itsthing as much as possible, and not, for example, define its own set ofkey bindings. One way to do this would have been to have the wholedocument inside the hidden textarea, and after each key event updatethe display DOM to reflect what's in that textarea.</p><p>That'd be simple, but it is not realistic. For even medium-sizeddocument the editor would be constantly munging huge strings, and getterribly slow. What CodeMirror 2 does is put the current selection,along with an extra line on the top and on the bottom, into thetextarea.</p><p>This means that the arrow keys (and their ctrl-variations), home,end, etcetera, do not have to be handled specially. We just read thecursor position in the textarea, and update our cursor to match it.Also, copy and paste work pretty much for free, and people get theirnative key bindings, without any special work on my part. For example,I have emacs key bindings configured for Chrome and Firefox. There isno way for a script to detect this.</p><p>Of course, since only a small part of the document sits in thetextarea, keys like page up and ctrl-end won't do the right thing.CodeMirror is catching those events and handling them itself.</p><h2 id="selection">Selection</h2><p>Getting and setting the selection range of a textarea in modernbrowsers is trivial—you just use the <code>selectionStart</code>and <code>selectionEnd</code> properties. On IE you have to do someinsane stuff with temporary ranges and compensating for the fact thatmoving the selection by a 'character' will treat \r\n as a singlecharacter, but even there it is possible to build functions thatreliably set and get the selection range.</p><p>But consider this typical case: When I'm somewhere in my document,press shift, and press the up arrow, something gets selected. Then, ifI, still holding shift, press the up arrow again, the top of myselection is adjusted. The selection remembers where its <em>head</em>and its <em>anchor</em> are, and moves the head when we shift-move.This is a generally accepted property of selections, and done right byevery editing component built in the past twenty years.</p><p>But not something that the browser selection APIs expose.</p><p>Great. So when someone creates an 'upside-down' selection, the nexttime CodeMirror has to update the textarea, it'll re-create theselection as an 'upside-up' selection, with the anchor at the top, andthe next cursor motion will behave in an unexpected way—our secondup-arrow press in the example above will not do anything, since it isinterpreted in exactly the same way as the first.</p><p>No problem. We'll just, ehm, detect that the selection isupside-down (you can tell by the way it was created), and then, whenan upside-down selection is present, and a cursor-moving key ispressed in combination with shift, we quickly collapse the selectionin the textarea to its start, allow the key to take effect, and thencombine its new head with its old anchor to get the <em>real</em>selection.</p><p>In short, scary hacks could not be avoided entirely in CodeMirror2.</p><p>And, the observant reader might ask, how do you even know that akey combo is a cursor-moving combo, if you claim you support anynative key bindings? Well, we don't, but we can learn. The editorkeeps a set known cursor-movement combos (initialized to thepredictable defaults), and updates this set when it observes thatpressing a certain key had (only) the effect of moving the cursor.This, of course, doesn't work if the first time the key is used wasfor extending an inverted selection, but it works most of thetime.</p><h2 id="update">Intelligent Updating</h2><p>One thing that always comes up when you have a complicated internalstate that's reflected in some user-visible external representation(in this case, the displayed code and the textarea's content) iskeeping the two in sync. The naive way is to just update the displayevery time you change your state, but this is not only error prone(you'll forget), it also easily leads to duplicate work on big,composite operations. Then you start passing around flags indicatingwhether the display should be updated in an attempt to be efficientagain and, well, at that point you might as well give up completely.</p><p>I did go down that road, but then switched to a much simpler model:simply keep track of all the things that have been changed during anaction, and then, only at the end, use this information to update theuser-visible display.</p><p>CodeMirror uses a concept of <em>operations</em>, which start bycalling a specific set-up function that clears the state and end bycalling another function that reads this state and does the requiredupdating. Most event handlers, and all the user-visible methods thatchange state are wrapped like this. There's a methodcalled <code>operation</code> that accepts a function, and returnsanother function that wraps the given function as an operation.</p><p>It's trivial to extend this (as CodeMirror does) to detect nesting,and, when an operation is started inside an operation, simplyincrement the nesting count, and only do the updating when this countreaches zero again.</p><p>If we have a set of changed ranges and know the currently shownrange, we can (with some awkward code to deal with the fact thatchanges can add and remove lines, so we're dealing with a changingcoordinate system) construct a map of the ranges that were leftintact. We can then compare this map with the part of the documentthat's currently visible (based on scroll offset and editor height) todetermine whether something needs to be updated.</p><p>CodeMirror uses two update algorithms—a full refresh, where it justdiscards the whole part of the DOM that contains the edited text andrebuilds it, and a patch algorithm, where it uses the informationabout changed and intact ranges to update only the out-of-date partsof the DOM. When more than 30 percent (which is the current heuristic,might change) of the lines need to be updated, the full refresh ischosen (since it's faster to do than painstakingly finding andupdating all the changed lines), in the other case it does thepatching (so that, if you scroll a line or select another character,the whole screen doesn't have to be re-rendered).</p><p>All updating uses <code>innerHTML</code> rather than direct DOMmanipulation, since that still seems to be by far the fastest way tobuild documents. There's a per-line function that combines thehighlighting, <a href="manual.html#markText">marking</a>, andselection info for that line into a snippet of HTML. The patch updateruses this to reset individual lines, the refresh updater builds anHTML chunk for the whole visible document at once, and then uses asingle <code>innerHTML</code> update to do the refresh.</p><h2 id="parse">Parsers can be Simple</h2><p>When I wrote CodeMirror 1, Ithought <a href="http://codemirror.net/story.html#parser">interruptableparsers</a> were a hugely scary and complicated thing, and I used abunch of heavyweight abstractions to keep this supposed complexityunder control: parserswere <a href="http://bob.pythonmac.org/archives/2005/07/06/iteration-in-javascript/">iterators</a>that consumed input from another iterator, and used funnyclosure-resetting tricks to copy and resume themselves.</p><p>This made for a rather nice system, in that parsers formed strictlyseparate modules, and could be composed in predictable ways.Unfortunately, it was quite slow (stacking three or four iterators ontop of each other), and extremely intimidating to people not used to afunctional programming style.</p><p>With a few small changes, however, we can keep all thoseadvantages, but simplify the API and make the whole thing lessindirect and inefficient. CodeMirror2's <a href="manual.html#modeapi">mode API</a> uses explicit stateobjects, and makes the parser/tokenizer a function that simply takes astate and a character stream abstraction, advances the stream onetoken, and returns the way the token should be styled. This state maybe copied, optionally in a mode-defined way, in order to be able tocontinue a parse at a given point. Even someone who's never touched alambda in his life can understand this approach. Additionally, farfewer objects are allocated in the course of parsing now.</p><p>The biggest speedup comes from the fact that the parsing no longerhas to touch the DOM though. In CodeMirror 1, on an older browser, youcould <em>see</em> the parser work its way through the document,managing some twenty lines in each 50-millisecond time slice it got. Itwas reading its input from the DOM, and updating the DOM as it wentalong, which any experienced JavaScript programmer will immediatelyspot as a recipe for slowness. In CodeMirror 2, the parser usuallyfinishes the whole document in a single 100-millisecond time slice—itmanages some 1500 lines during that time on Chrome. All it has to dois munge strings, so there is no real reason for it to be slowanymore.</p><h2 id="summary">What Gives?</h2><p>Given all this, what can you expect from CodeMirror 2? First, thegood:</p><ul><li><strong>Small.</strong> the base library is some 32k when minifiednow, 12k when gzipped. It's smaller than its own logo.</li><li><strong>Lightweight.</strong> CodeMirror 2 initializes veryquickly, and does almost no work when it is not focused. This meansyou can treat it almost like a textarea, have multiple instances on apage without trouble.</li><li><strong>Huge document support.</strong> Since highlighting isreally fast, and no DOM structure is being built for non-visiblecontent, you don't have to worry about locking up your browser when auser enters a megabyte-sized document.</li><li><strong>Extended API.</strong> Some things kept coming up in themailing list, such as marking pieces of text or lines, which wereextremely hard to do with CodeMirror 1. The new version has propersupport for these built in.</li><li><strong>Tab support.</strong> Tabs inside editable documents were,for some reason, a no-go. At least six different people announced theywere going to add tab support to CodeMirror 1, none survived (I mean,none delivered a working version). CodeMirror 2 no longer removes tabsfrom your document.</li><li><strong>Sane styling.</strong> <code>iframe</code> nodes aren'treally known for respecting document flow. Now that an editor instanceis a plain <code>div</code> element, it is much easier to size it tofit the surrounding elements. You don't even have to make it scroll ifyou do not <a href="demo/resize.html">want to</a>.</li></ul><p>Then, the bad:</p><ul><li><strong>No line-wrapping.</strong> I'd have liked to getline-wrapping to work, but it doesn't match the model I'm using verywell. It is important that cursor movement in the textarea matcheswhat you see on the screen, and it seems to be impossible to have thelines wrapped the same in the textarea and the normal DOM.</li><li><strong>Some cursor flakiness.</strong> The textarea hack does notreally do justice to the complexity of cursor handling—a selection istypically more than just an offset into a string. For example, if youuse the up and down arrow keys to move to a shorter line and thenback, you'll end up in your old position in most editor controls, butCodeMirror 2 currently doesn't remember the 'real' cursor column inthis case. These can be worked around on a case-by-case basis, butI haven't put much energy into that yet.</li><li><strong>Limited interaction with the editable panel.</strong>Since the element you're looking at is not a real editable panel,native browser behaviour for editable controls doesn't workautomatically. Through a lot of event glue code, I've managed to makedrag and drop work pretty well, have context menus work on mostbrowsers (except Opera). Middle-click paste on Firefox in Linux isbroken until someone finds a way to intercept it.</li></ul></div><div class="rightsmall blk">    <h2>Contents</h2>    <ul>      <li><a href="#intro">Introduction</a></li>      <li><a href="#approach">General Approach</a></li>      <li><a href="#input">Input</a></li>      <li><a href="#selection">Selection</a></li>      <li><a href="#update">Intelligent Updating</a></li>      <li><a href="#parse">Parsing</a></li>      <li><a href="#summary">What Gives?</a></li>    </ul></div></div><div style="height: 2em"> </div></body></html>
 |