You have complex mathematical models (millions of cells, hundreds thousand of formulas) in Excel. And you need to run it browser and mobile without excel. I will talk how we created own spreadsheet engine compatible with MS Excel which allows us to run any Excel model without Excel. I will talk about:
* Architecture
* Algorithms
* JavaScript performance optimization.
How to create a high performance excel engine in java script
1. Create own Excel with
JavaScript
Viktor Turskyi
CEO at WebbyLab
2019
2. Viktor Turskyi
● CEO and principal architect at
WebbyLab
● Open source developer
● More than 15 years of experience
● Delivered more than 60 projects of
different scale
● Did projects for 5 companies from
Fortune 500 list
3. Business task
Business logic is in Excel and you need to code
it in your app and run in browser and mobile.
This business logic is complex mathematical
models.
8. Requirements
● High performance (<2s full recompute)
● Small file size (suitable for work in
browser)
● Offline work in browser
● Work on server
● Offline work on tablets (iOS, Android)
28. Mistake 1: Trying to write own parser
from scratch
Own parser
1. Complex
2. A lot of time
3. Expensive
29. 90% of the work is the same as
writing a parser for programming
language
30. Good solution - ANTLR
1. Parser generator based on Grammars
(including JS)
2. Lexer and Parser
3. Emits AST (Abstract Syntax Tree)
4. The fastest and the most powerful
http://www.antlr.org/
34. Components
● LocalRunner(Engine) - works with model,
processes all cells dependencies
● Formula Evaluator - computes one
formula
● Address Parser - parses address in
runtime
● Functions - Excel functions
implementation
37. Implementation of EXCEL functions
● One function - one module.
● No side effects
● Use dependency injection
● Test test test test (excel functions often
does not work as documented)
Call example:
SQRT([ 9 ]) returns 3
SUM([2, [5, 6, 7, 9], 1 ]) returns 30
47. We can represent dependencies in
form of directed acyclic graph (DAG)
A1=1;
A2=A1+2;
A3=A1+A2;
Now we can recompute
dependent cells on
changes
48. Mistake 3: reay on synthetic models
too much
Test model with million cells -
2 seconds for recompute
Real model with million cells
- 1 hour for recompute
Reason: we recompute the
same cells several times
49. You can sort your dependency graph
with topological sort
Each cell will be calculated only one time
50. We did it: it worked for test files but did not
work on real models. Why?
51. Reason: Our graph is more than 10k nodes
deep. We got stack overflow (JS limits call
stack to 10k frames).
52. What to do
Do not use recursion, traverse graph manually
with own stack.
Real model results:
No toposort - 1 hour
With toposort - 6 seconds
56. OFFSET breaks everything
=OFFSET(D3, 3, -2) - displays the value in cell
B6
A2 = A1+OFFSET(D3, 3, -2).
Does A2 depend on B6? Do we have any
problem with it?
60. Conclusion
● Dependency injection (and SOLID) everywhere
● Make everything modular.
● You will need a lot of tests. There are tons of
edge cases in Excel behavior.
● Measure performance on real models.
● You need to have some sort of automatic model
tester.
● Create convenient debug tools (you will spent a
lot of time debugging)
● Understand how V8 works