This article is originally available at Sentry.engineering.
How does Node.js load the entry point?
In order to differentiate which loader to use, Node.js depends on several factors. The most important one is the file extension. If the file extension is
.mjs, Node.js will use the ES module loader. If the file extension is
.cjs, Node.js will use the CommonJS module loader. If the file extension is
.js, Node.js will use the CommonJS module loader if the
package.json file has
"type": "commonjs" field (or simply doesn't have the
type field). If the
package.json file has
"type": "module" field, Node.js will use the ES module loader.
This decision is made in
lib/internal/modules/run_main.js file. You can see a simplified version of the code below:
readPackageScope traverses the directory tree upwards until it finds a
package.json file. Prior to the optimizations done on this post,
readPackageScope calls an internal version of
fs.readFileSync until it finds a
package.json file. This synchronous call makes a filesystem operation and communicates with Node.js C++ layer. This operation has performance bottlenecks depending on the value/type it returns because of the cost of serialization/deserialization of data. This is why we want to avoid calling
readPackageScope as much as possible.
How Node.js parses
readPackage calls an internal version
fs.readFileSync to read the
package.json file. This synchronous call returns a string from Node.js C++ layer, which later gets parsed using V8's
JSON.parse() method. Depending on the validity of this JSON, Node.js checks and creates an object that's required for the remaining of the loaders to perform. These fields are
pkg.type. If the JSON has faulty syntax, Node.js will throw an error and exit the process.
The output of this function is later cached at an internal
Map to avoid calling
readPackageScope again for the same path. This cache is stored for the rest of the process lifetime.
package.json fields and the reader
Before we dive into what optimizations we can do, let's see how Node.js uses these fields. The common use cases in Node.js codebase for parsing and re-using
package.json fields are:
pkg.importsare used to resolve different modules according to your input.
pkg.mainis used to resolve the entry point of the application.
pkg.typeis used to resolve the module format of the file.
pkg.nameis used if there is a self referencing require/import.
Additionally, Node.js supports an experimental version of
Subresource Integrity checkwhich uses the result of this package.json to validate the integrity of the file.
The most important usage is that, for every
require/import call, Node.js needs to know the module format of the file. For example, if the user require's a NPM module that uses ESM on a CommonJS (CJS) application, Node.js will need to parse the
package.json file of that module and throw an error if the NPM package is ESM.
Because of all of these calls and usages across ESM and CJS loaders,
package.json reader is one of the most important parts of the Node.js loader implementation.
Optimizing caching layer
In order to optimize the
package.json reader performance, I first moved the caching layer to the C++ side to make the implementation be closer to the filesystem call as much as possible. This decision forced to parse the JSON file in C++. At this point, I had 2 options:
- Use V8's
v8::JSON::Parse()method which takes a
v8::Stringas an input and returns a
v8::Valueas an output.
simdjsonlibrary to parse the JSON file.
Since the filesystem returns a string, converting that string into a
v8::String just to retrieve the keys and values as a
std::string didn't make sense. Therefore, I added
Avoiding serialization cost
In order to avoid returning unnecessary large objects, I changed the signature of the
readPackage function to return only the necessary fields. This change simplified the
shouldUseESMLoader as follows:
Moving the caching layer to C++ enabled us to expose micro-functions that returns enums (integers) instead of strings to get a type of a
Reducing C++ calls to 1 to 1
readPackageConfig is implemented on the ESM loader under
getPackageScopeConfig function. This function made a lot of C++ calls in order to resolve and retrieve the applicable
package.json file. The implementation was as follows:
getPackageScopeConfig function calls C++ 3 times from the following functions:
new URL()if the input is a string
Moving this whole function to C++ enabled us to reduce the number of C++ calls to 1 to 1. This conversion also forced us to implement
url.fileURLToPath() in C++.
The PR that contains these changes can be found on Github.
On a real-world Svelte application, the results showed 5% faster ESM execution. It also reduced the size of the cache stored by the loader by avoiding unnecessary fields.
❯ hyperfine 'node ../sveltejs-realworld/node_modules/vite/dist/node/cli.js --version' 'out/Release/node ../sveltejs-realworld/node_modules/vite/dist/node/cli.js --version' -w 10
Benchmark 1: node ../sveltejs-realworld/node_modules/vite/dist/node/cli.js --version
Time (mean ± σ): 101.4 ms ± 0.6 ms [User: 96.6 ms, System: 10.8 ms]
Range (min … max): 100.3 ms … 102.5 ms 28 runs
Benchmark 2: out/Release/node ../sveltejs-realworld/node_modules/vite/dist/node/cli.js --version
Time (mean ± σ): 96.3 ms ± 0.5 ms [User: 90.9 ms, System: 10.1 ms]
Range (min … max): 95.6 ms … 98.1 ms 30 runs
out/Release/node ../sveltejs-realworld/node_modules/vite/dist/node/cli.js --version ran
1.05 ± 0.01 times faster than node ../sveltejs-realworld/node_modules/vite/dist/node/cli.js --version