1
0

internals.html 19 KB


  1. <!doctype html>
  2. <html>
  3. <head>
  4. <title>CodeMirror: Internals</title>
  5. <link rel="stylesheet" type="text/css" href="http://fonts.googleapis.com/css?family=Droid+Sans|Droid+Sans:bold"/>
  6. <link rel="stylesheet" type="text/css" href="css/docs.css"/>
  7. <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
  8. <style>dl dl {margin: 0;}</style>
  9. </head>
  10. <body>
  11. <h1><span class="logo-braces">{ }</span> <a href="http://codemirror.net/">CodeMirror</a></h1>
  12. <pre class="grey">
  13. <img src="css/baboon.png" class="logo" alt="logo"/>/* (Re-) Implementing A Syntax-
  14. Highlighting Editor in JavaScript */
  15. </pre>
  16. <div class="clear"><div class="leftbig blk">
  17. <p style="font-size: 85%" id="intro">
  18. <strong>Topic:</strong> JavaScript, code editor implementation<br>
  19. <strong>Author:</strong> Marijn Haverbeke<br>
  20. <strong>Date:</strong> March 2nd 2011
  21. </p>
  22. <p>This is a followup to
  23. my <a href="http://codemirror.net/story.html">Brutal Odyssey to the
  24. Dark Side of the DOM Tree</a> story. That one describes the
  25. mind-bending process of implementing (what would become) CodeMirror 1.
  26. This one describes the internals of CodeMirror 2, a complete rewrite
  27. and rethink of the old code base. I wanted to give this piece another
  28. Hunter Thompson copycat subtitle, but somehow that would be out of
  29. place—the process this time around was one of straightforward
  30. engineering, requiring no serious mind-bending whatsoever.</p>
  31. <p>So, what is wrong with CodeMirror 1? I'd estimate, by mailing list
  32. activity and general search-engine presence, that it has been
  33. integrated into about a thousand systems by now. The most prominent
  34. one, since a few weeks,
  35. being <a href="http://googlecode.blogspot.com/2011/01/make-quick-fixes-quicker-on-google.html">Google
  36. code's project hosting</a>. It works, and it's being used widely.</a>
  37. <p>Still, I did not start replacing it because I was bored. CodeMirror
  38. 1 was heavily reliant on <code>designMode</code>
  39. or <code>contentEditable</code> (depending on the browser). Neither of
  40. these are well specified (HTML5 tries
  41. to <a href="http://www.w3.org/TR/html5/editing.html#contenteditable">specify</a>
  42. their basics), and, more importantly, they tend to be one of the more
  43. obscure and buggy areas of browser functionality—CodeMirror, by using
  44. this functionality in a non-typical way, was constantly running up
  45. against browser bugs. WebKit wouldn't show an empty line at the end of
  46. the document, and in some releases would suddenly get unbearably slow.
  47. Firefox would show the cursor in the wrong place. Internet Explorer
  48. would insist on linkifying everything that looked like a URL or email
  49. address, a behaviour that can't be turned off. Some bugs I managed to
  50. work around (which was often a frustrating, painful process), others,
  51. such as the Firefox cursor placement, I gave up on, and had to tell
  52. user after user that they were known problems, but not something I
  53. could help.</p>
  54. <p>Also, there is the fact that <code>designMode</code> (which seemed
  55. to be less buggy than <code>contentEditable</code> in Webkit and
  56. Firefox, and was thus used by CodeMirror 1 in those browsers) requires
  57. a frame. Frames are another tricky area. It takes some effort to
  58. prevent getting tripped up by domain restrictions, they don't
  59. initialize synchronously, behave strangely in response to the back
  60. button, and, on several browsers, can't be moved around the DOM
  61. without having them re-initialize. They did provide a very nice way to
  62. namespace the library, though—CodeMirror 1 could freely pollute the
  63. namespace inside the frame.</p>
  64. <p>Finally, working with an editable document means working with
  65. selection in arbitrary DOM structures. Internet Explorer (8 and
  66. before) has an utterly different (and awkward) selection API than all
  67. of the other browsers, and even among the different implementations of
  68. <code>document.selection</code>, details about how exactly a selection
  69. is represented vary quite a bit. Add to that the fact that Opera's
  70. selection support tended to be very buggy until recently, and you can
  71. imagine why CodeMirror 1 contains 700 lines of selection-handling
  72. code.</p>
  73. <p>And that brings us to the main issue with the CodeMirror 1
  74. code base: The proportion of browser-bug-workarounds to real
  75. application code was getting dangerously high. By building on top of a
  76. few dodgy features, I put the system in a vulnerable position—any
  77. incompatibility and bugginess in these features, I had to paper over
  78. with my own code. Not only did I have to do some serious stunt-work to
  79. get it to work on older browsers (as detailed in the
  80. previous <a href="http://codemirror.net/story.html">story</a>), things
  81. also kept breaking in newly released versions, requiring me to come up
  82. with <em>new</em> scary hacks in order to keep up. This was starting
  83. to lose its appeal.</p>
  84. <h2 id="approach">General Approach</h2>
  85. <p>What CodeMirror 2 does is try to sidestep most of the hairy hacks
  86. that came up in version 1. I owe a lot to the
  87. <a href="http://ace.ajax.org">ACE</a> editor for inspiration on how to
  88. approach this.</p>
  89. <p>I absolutely did not want to be completely reliant on key events to
  90. generate my input. Every JavaScript programmer knows that key event
  91. information is horrible and incomplete. Some people (most awesomely
  92. Mihai Bazon with <a href="http://ymacs.org">Ymacs</a>) have been able
  93. to build more or less functioning editors by directly reading key
  94. events, but it takes a lot of work (the kind of never-ending, fragile
  95. work I described earlier), and will never be able to properly support
  96. things like multi-keystoke international character input.</p>
  97. <p>So what I do is focus a hidden textarea, and let the browser
  98. believe that the user is typing into that. What we show to the user is
  99. a DOM structure we built to represent his document. If this is updated
  100. quickly enough, and shows some kind of believable cursor, it feels
  101. like a real text-input control.</p>
  102. <p>Another big win is that this DOM representation does not have to
  103. span the whole document. Some CodeMirror 1 users insisted that they
  104. needed to put a 30 thousand line XML document into CodeMirror. Putting
  105. all that into the DOM takes a while, especially since, for some
  106. reason, an editable DOM tree is slower than a normal one on most
  107. browsers. If we have full control over what we show, we must only
  108. ensure that the visible part of the document has been added, and can
  109. do the rest only when needed. (Fortunately, the <code>onscroll</code>
  110. event works almost the same on all browsers, and lends itself well to
  111. displaying things only as they are scrolled into view.)</p>
  112. <h2 id="input">Input</h2>
  113. <p>ACE uses its hidden textarea only as a text input shim, and does
  114. all cursor movement and things like text deletion itself by directly
  115. handling key events. CodeMirror's way is to let the browser do its
  116. thing as much as possible, and not, for example, define its own set of
  117. key bindings. One way to do this would have been to have the whole
  118. document inside the hidden textarea, and after each key event update
  119. the display DOM to reflect what's in that textarea.</p>
  120. <p>That'd be simple, but it is not realistic. For even medium-sized
  121. document the editor would be constantly munging huge strings, and get
  122. terribly slow. What CodeMirror 2 does is put the current selection,
  123. along with an extra line on the top and on the bottom, into the
  124. textarea.</p>
  125. <p>This means that the arrow keys (and their ctrl-variations), home,
  126. end, etcetera, do not have to be handled specially. We just read the
  127. cursor position in the textarea, and update our cursor to match it.
  128. Also, copy and paste work pretty much for free, and people get their
  129. native key bindings, without any special work on my part. For example,
  130. I have emacs key bindings configured for Chrome and Firefox. There is
  131. no way for a script to detect this.</p>
  132. <p>Of course, since only a small part of the document sits in the
  133. textarea, keys like page up and ctrl-end won't do the right thing.
  134. CodeMirror is catching those events and handling them itself.</p>
  135. <h2 id="selection">Selection</h2>
  136. <p>Getting and setting the selection range of a textarea in modern
  137. browsers is trivial—you just use the <code>selectionStart</code>
  138. and <code>selectionEnd</code> properties. On IE you have to do some
  139. insane stuff with temporary ranges and compensating for the fact that
  140. moving the selection by a 'character' will treat \r\n as a single
  141. character, but even there it is possible to build functions that
  142. reliably set and get the selection range.</p>
  143. <p>But consider this typical case: When I'm somewhere in my document,
  144. press shift, and press the up arrow, something gets selected. Then, if
  145. I, still holding shift, press the up arrow again, the top of my
  146. selection is adjusted. The selection remembers where its <em>head</em>
  147. and its <em>anchor</em> are, and moves the head when we shift-move.
  148. This is a generally accepted property of selections, and done right by
  149. every editing component built in the past twenty years.</p>
  150. <p>But not something that the browser selection APIs expose.</p>
  151. <p>Great. So when someone creates an 'upside-down' selection, the next
  152. time CodeMirror has to update the textarea, it'll re-create the
  153. selection as an 'upside-up' selection, with the anchor at the top, and
  154. the next cursor motion will behave in an unexpected way—our second
  155. up-arrow press in the example above will not do anything, since it is
  156. interpreted in exactly the same way as the first.</p>
  157. <p>No problem. We'll just, ehm, detect that the selection is
  158. upside-down (you can tell by the way it was created), and then, when
  159. an upside-down selection is present, and a cursor-moving key is
  160. pressed in combination with shift, we quickly collapse the selection
  161. in the textarea to its start, allow the key to take effect, and then
  162. combine its new head with its old anchor to get the <em>real</em>
  163. selection.</p>
  164. <p>In short, scary hacks could not be avoided entirely in CodeMirror
  165. 2.</p>
  166. <p>And, the observant reader might ask, how do you even know that a
  167. key combo is a cursor-moving combo, if you claim you support any
  168. native key bindings? Well, we don't, but we can learn. The editor
  169. keeps a set known cursor-movement combos (initialized to the
  170. predictable defaults), and updates this set when it observes that
  171. pressing a certain key had (only) the effect of moving the cursor.
  172. This, of course, doesn't work if the first time the key is used was
  173. for extending an inverted selection, but it works most of the
  174. time.</p>
  175. <h2 id="update">Intelligent Updating</h2>
  176. <p>One thing that always comes up when you have a complicated internal
  177. state that's reflected in some user-visible external representation
  178. (in this case, the displayed code and the textarea's content) is
  179. keeping the two in sync. The naive way is to just update the display
  180. every time you change your state, but this is not only error prone
  181. (you'll forget), it also easily leads to duplicate work on big,
  182. composite operations. Then you start passing around flags indicating
  183. whether the display should be updated in an attempt to be efficient
  184. again and, well, at that point you might as well give up completely.</p>
  185. <p>I did go down that road, but then switched to a much simpler model:
  186. simply keep track of all the things that have been changed during an
  187. action, and then, only at the end, use this information to update the
  188. user-visible display.</p>
  189. <p>CodeMirror uses a concept of <em>operations</em>, which start by
  190. calling a specific set-up function that clears the state and end by
  191. calling another function that reads this state and does the required
  192. updating. Most event handlers, and all the user-visible methods that
  193. change state are wrapped like this. There's a method
  194. called <code>operation</code> that accepts a function, and returns
  195. another function that wraps the given function as an operation.</p>
  196. <p>It's trivial to extend this (as CodeMirror does) to detect nesting,
  197. and, when an operation is started inside an operation, simply
  198. increment the nesting count, and only do the updating when this count
  199. reaches zero again.</p>
  200. <p>If we have a set of changed ranges and know the currently shown
  201. range, we can (with some awkward code to deal with the fact that
  202. changes can add and remove lines, so we're dealing with a changing
  203. coordinate system) construct a map of the ranges that were left
  204. intact. We can then compare this map with the part of the document
  205. that's currently visible (based on scroll offset and editor height) to
  206. determine whether something needs to be updated.</p>
  207. <p>CodeMirror uses two update algorithms—a full refresh, where it just
  208. discards the whole part of the DOM that contains the edited text and
  209. rebuilds it, and a patch algorithm, where it uses the information
  210. about changed and intact ranges to update only the out-of-date parts
  211. of the DOM. When more than 30 percent (which is the current heuristic,
  212. might change) of the lines need to be updated, the full refresh is
  213. chosen (since it's faster to do than painstakingly finding and
  214. updating all the changed lines), in the other case it does the
  215. patching (so that, if you scroll a line or select another character,
  216. the whole screen doesn't have to be re-rendered).</p>
  217. <p>All updating uses <code>innerHTML</code> rather than direct DOM
  218. manipulation, since that still seems to be by far the fastest way to
  219. build documents. There's a per-line function that combines the
  220. highlighting, <a href="manual.html#markText">marking</a>, and
  221. selection info for that line into a snippet of HTML. The patch updater
  222. uses this to reset individual lines, the refresh updater builds an
  223. HTML chunk for the whole visible document at once, and then uses a
  224. single <code>innerHTML</code> update to do the refresh.</p>
  225. <h2 id="parse">Parsers can be Simple</h2>
  226. <p>When I wrote CodeMirror 1, I
  227. thought <a href="http://codemirror.net/story.html#parser">interruptable
  228. parsers</a> were a hugely scary and complicated thing, and I used a
  229. bunch of heavyweight abstractions to keep this supposed complexity
  230. under control: parsers
  231. were <a href="http://bob.pythonmac.org/archives/2005/07/06/iteration-in-javascript/">iterators</a>
  232. that consumed input from another iterator, and used funny
  233. closure-resetting tricks to copy and resume themselves.</p>
  234. <p>This made for a rather nice system, in that parsers formed strictly
  235. separate modules, and could be composed in predictable ways.
  236. Unfortunately, it was quite slow (stacking three or four iterators on
  237. top of each other), and extremely intimidating to people not used to a
  238. functional programming style.</p>
  239. <p>With a few small changes, however, we can keep all those
  240. advantages, but simplify the API and make the whole thing less
  241. indirect and inefficient. CodeMirror
  242. 2's <a href="manual.html#modeapi">mode API</a> uses explicit state
  243. objects, and makes the parser/tokenizer a function that simply takes a
  244. state and a character stream abstraction, advances the stream one
  245. token, and returns the way the token should be styled. This state may
  246. be copied, optionally in a mode-defined way, in order to be able to
  247. continue a parse at a given point. Even someone who's never touched a
  248. lambda in his life can understand this approach. Additionally, far
  249. fewer objects are allocated in the course of parsing now.</p>
  250. <p>The biggest speedup comes from the fact that the parsing no longer
  251. has to touch the DOM though. In CodeMirror 1, on an older browser, you
  252. could <em>see</em> the parser work its way through the document,
  253. managing some twenty lines in each 50-millisecond time slice it got. It
  254. was reading its input from the DOM, and updating the DOM as it went
  255. along, which any experienced JavaScript programmer will immediately
  256. spot as a recipe for slowness. In CodeMirror 2, the parser usually
  257. finishes the whole document in a single 100-millisecond time slice—it
  258. manages some 1500 lines during that time on Chrome. All it has to do
  259. is munge strings, so there is no real reason for it to be slow
  260. anymore.</p>
  261. <h2 id="summary">What Gives?</h2>
  262. <p>Given all this, what can you expect from CodeMirror 2? First, the
  263. good:</p>
  264. <ul>
  265. <li><strong>Small.</strong> the base library is some 32k when minified
  266. now, 12k when gzipped. It's smaller than its own logo.</li>
  267. <li><strong>Lightweight.</strong> CodeMirror 2 initializes very
  268. quickly, and does almost no work when it is not focused. This means
  269. you can treat it almost like a textarea, have multiple instances on a
  270. page without trouble.</li>
  271. <li><strong>Huge document support.</strong> Since highlighting is
  272. really fast, and no DOM structure is being built for non-visible
  273. content, you don't have to worry about locking up your browser when a
  274. user enters a megabyte-sized document.</li>
  275. <li><strong>Extended API.</strong> Some things kept coming up in the
  276. mailing list, such as marking pieces of text or lines, which were
  277. extremely hard to do with CodeMirror 1. The new version has proper
  278. support for these built in.</li>
  279. <li><strong>Tab support.</strong> Tabs inside editable documents were,
  280. for some reason, a no-go. At least six different people announced they
  281. were going to add tab support to CodeMirror 1, none survived (I mean,
  282. none delivered a working version). CodeMirror 2 no longer removes tabs
  283. from your document.</li>
  284. <li><strong>Sane styling.</strong> <code>iframe</code> nodes aren't
  285. really known for respecting document flow. Now that an editor instance
  286. is a plain <code>div</code> element, it is much easier to size it to
  287. fit the surrounding elements. You don't even have to make it scroll if
  288. you do not <a href="demo/resize.html">want to</a>.</li>
  289. </ul>
  290. <p>Then, the bad:</p>
  291. <ul>
  292. <li><strong>No line-wrapping.</strong> I'd have liked to get
  293. line-wrapping to work, but it doesn't match the model I'm using very
  294. well. It is important that cursor movement in the textarea matches
  295. what you see on the screen, and it seems to be impossible to have the
  296. lines wrapped the same in the textarea and the normal DOM.</li>
  297. <li><strong>Some cursor flakiness.</strong> The textarea hack does not
  298. really do justice to the complexity of cursor handling—a selection is
  299. typically more than just an offset into a string. For example, if you
  300. use the up and down arrow keys to move to a shorter line and then
  301. back, you'll end up in your old position in most editor controls, but
  302. CodeMirror 2 currently doesn't remember the 'real' cursor column in
  303. this case. These can be worked around on a case-by-case basis, but
  304. I haven't put much energy into that yet.</li>
  305. <li><strong>Limited interaction with the editable panel.</strong>
  306. Since the element you're looking at is not a real editable panel,
  307. native browser behaviour for editable controls doesn't work
  308. automatically. Through a lot of event glue code, I've managed to make
  309. drag and drop work pretty well, have context menus work on most
  310. browsers (except Opera). Middle-click paste on Firefox in Linux is
  311. broken until someone finds a way to intercept it.</li>
  312. </ul>
  313. </div><div class="rightsmall blk">
  314. <h2>Contents</h2>
  315. <ul>
  316. <li><a href="#intro">Introduction</a></li>
  317. <li><a href="#approach">General Approach</a></li>
  318. <li><a href="#input">Input</a></li>
  319. <li><a href="#selection">Selection</a></li>
  320. <li><a href="#update">Intelligent Updating</a></li>
  321. <li><a href="#parse">Parsing</a></li>
  322. <li><a href="#summary">What Gives?</a></li>
  323. </ul>
  324. </div></div>
  325. <div style="height: 2em">&nbsp;</div>
  326. </body></html>