Tripleequals

Blog

Tales of the monorepo: migrating existing projects to the Lerna-driven monorepo with Yarn workspaces

Recently I volunteered in the challenge of migrating multiple full-blown commercial frontend projects into a single repository. Sit back and hear the story about trials and errors, after which I can admit that I have been able to face it.

Motivation

The frontend ecosystem in my company grows really fast. Even though we are all trying to keep an eye on everything going on around, it is quite tricky to discover the existing codebase pieces that - after minor adjustments - can be easily reused across multiple projects. We often end up having two or three components doing exactly the same with different implementation under the hood, which means that the area for potential bugs is bigger, and we need to maintain twice/three times as much as we could with proper architecture that raises the awareness about existing modules.

Splitting different business subdomains relying on the same technological stack (JS + React) into separate modules makes perfect sense. Even with code splitting and other modern optimization techniques, there is no point in creating one huge monolith. Projects must be independent and keeping them separate makes their build process easier to customize and avoids tight coupling with a specific set of dependencies that may not necessarily be used by each of them. Some of the code we craft, however, should have the ability to be easily plugged into other projects - shared atomic components (to be consistent with styling guidelines), code style plugin configs, or even whole screens that - despite displaying different data - are present in multiple software packages.

While we all admit that projects have to be independent in general, it is also natural to agree that in order to follow the DRY principle and reduce the amount of duplicated code, some pieces should be reused. Placing multiple JS-based projects and their common dependencies in separate repositories without an automated process that links them together and keeps them up-to-date kills reusability.

sharing-dependencies-multiple-repos

The major bottlenecks in the above flowchart representing the current feature development process are indicated by asterisks. Some of the steps also require some manual overhead that involves context (repository) switching that reduces development efficiency even more.

When we intend to change something that is truly shared, we need to keep in mind that other projects dependent on that part of the software will be affected. Without enough system-wide knowledge about the entire frontend ecosystem, we can easily miss parts of it where the reusable piece of code we have just modified is enabled and, thus, potentially introduce a regression in the module we have not even touched.

Ash nazg durbatulûk (one ring to rule them all)

While the problem described in the previous section may not sound like a big issue when dealing with just a few repositories, it becomes a huge obstacle when we need to maintain a larger distributed codebase. That quickly arouses the need to introduce an orchestrator that will eliminate the manual work required and speed up the development process by getting rid of the PRs synchronization effort. Looks like a perfect use case for establishing the monorepo.

Fortunately, there already exist various tools that help with this process and - at least in theory - make repositories migration to a single monorepo a piece of cake. Given that it is the first time we are taking this action, and our frontend environment is already huge, we decided to give Lerna a try. Of course, it has some limitations and some strict rules that we need to stick to, but given that it seems to have broader community support than its alternatives and enables Yarn workspaces natively, it sounded like a natural first choice.

The longer I have been trying to adjust Lerna to our needs, the more I was eager to use Rush instead. Because a lot of effort had already been put on the monorepo migration, I eventually decided to stick with Lerna - transition to Rush looks pretty straightforward, though, so I may dedicate some time to that just to compare their pros and cons in the future.

Automating the most problematic steps in the current process is one of Lerna's capabilities. Additionally, given that all internal dependencies will be a part of one repository, there will be no need for time-consuming context switching anymore. With a proper configuration of hot reloading feature, output binary immediately reflects the most recent changes, without explicit modification of /node_modules directory or the obligation to stop and restart the entire development script (yarn dev), what is yet another development experience improvement that we have all been missing.

sharing-dependencies-monorepo

Looks and feels much easier and much more intuitive, doesn't it?

Of course, to ensure that our change in the shared fragment (shared_lib in our case) has not resulted in any unwanted consequences, all test suites for the projects that depend on it must still be run. It is easy for Lerna to determine what child packages are affected by the newly added code lines, so it is pretty straightforward to include the specific test execution in the new automated workflow. Given that, the developers do not need to worry about running test scripts themselves by hand.

Yarn workspaces vs. Lerna

Before we jump into an actual migration, it is reasonable to figure out what will be happening behind the scenes. Lerna is often wrongly perceived as an alternative to the mechanism provided by Yarn workspaces. Understanding that it serves completely different purposes is quite essential to get the entire underlying process right. When using Yarn as a default package manager (it also supports npm - in that case, Lerna provides fundamental workspaces implementation on its own), Lerna should be treated as a tool built on top of Yarn workspaces utilizing that feature during dependencies resolution and build stages. Leaving these two up to the package manager's implementation makes it easy to define the boundaries between the monorepo framework and Yarn. All of the responsibilities listed below will be referenced and explained more in-depth in the next paragraphs, so do not worry if you start to feel a bit lost now.

Yarn workspaces Lerna
maintaining single lockfile identifies which subprojects have been affected by recent commits
creating symbolic links to internal dependencies for target subprojects automatic SemVer versioning and tagging
monorepo-independent builds running subsets of commands based on what has changed
packages publishing

Such a clear separation of concerns beefs up the debugging process and helps to locate and eventually narrow down potential problems related to the tools we will be using from now onwards on a daily basis. It is crucial to make sure that all developers are on the same page, especially because the enhanced development process will differ from what we all are used to.

Prerequisites

While official migration docs do not include this step, I strongly recommend following it before executing lerna import (to import the project to the monorepo's scope) for the first time.

One crucial thing about Yarn workspaces that people tend to not pay much attention to is how the dependencies' integrity is handled. Even though you may have a per-project lockfile already generated in the single workspace's scope, it will be ignored. When dealing with workspaces, there is only one lockfile in the root directory that Yarn perceives as a single source of truth. It may be unintuitive at first glance (especially when you think about building single workspaces independently), but that really makes dependencies resolution much more conflictless.

Due to the above fact, I decided to review the existing projects that are going to be migrated and try to update their dependencies, so they all use as many matching versions as possible. The default behavior of yarn add <dependency> (without @<version|tag> suffix) registers latest version of the package as your project dependency and generates an entry prefixed with ^ in dependencies section of the corresponding package.json. Imagine the scenario when you want to migrate multiple projects using different versions of the same third-party library:

  • 1st project - "third-party-module": "^1.5.2" in package.json;
  • 2nd project - "third-party-module": "^1.0.1" in package.json;
  • 3rd project - "third-party-module": "^2.0.0" in package.json.

How many versions of third-party-module will be installed assuming that entries for these three are located in the single lockfile?

The answer is 2.

It is evident that for the 3rd project ^2.0.0 will tell the package manager to fetch and install the latest version of the dependency for the second major release. However, for both 1st and 2nd projects, third-party-module@1.7.0 will satisfy the entry in package.json.

To be clear - semantic versioning specification clearly underlines that minor version updates (1.0.0 vs. 1.1.0) should not introduce any breaking changes in the package's API, so in most cases you should be fine without analyzing dependencies versions beforehand. In practice, some package authors still violate this rule, so there is a chance that you will end up with build (or even worse and harder to spot - runtime) errors caused by accidental third party module's version bump.

It is always good to have your dependencies up-to-date anyway, isn't it?

Showtime!

For the sake of this guide, we are going to assume that we have three different repositories that we want to place in the monorepo we are about to create: repo_a and repo_b that represent projects that need to be independently built and deployed and shared_lib containing part of the code that is meant to be shared by other monorepo ingredients and publicly available for projects that are not yet migrated. We do not want to publish any NPM packages for either repo_a and repo_b; they are rather "private" projects not available for imports that will be later included in our CD pipeline.

In this section we are going to heavily utilize Lerna's CLI commands, so obviously you need to have it installed beforehand - yarn global add lerna does the trick. Make sure you are running this command with administrator rights.

Okay, no matter whether I convinced you to update your dependencies or not, time to begin the actual migration. Let's start by initializing our monorepo with yarn init. Do not forget about putting the new project within Github repository - Lerna relies on commit history and git tags to determine which subprojects have been affected by recent updates. When Lerna is successfully registered as the project's dependency, you can initialize the monorepo with lerna init --independent. After the basic setup, the root directory structure should look similar to below:

.
|-- .git/
|-- packages/
|-- lerna.json
|-- package.json

To make sure that Yarn will be used as a default package manager and enable workspaces support, we need to add the following self-descriptive lines to lerna.json - the file read by Lerna to detect monorepo configuration:

 {
   "packages": [
     "packages/*"
   ],
   "version": "independent",
+  "npmClient": "yarn",
+  "useWorkspaces": true
 }

together with this minor modification to package.json:

 {
-  "name": "root",
+  "name": "@tripleequalsdev/frontend",
   "private": true,
+    "workspaces": [
+    "packages/*"
+   ],
   "devDependencies": {
     "lerna": "^3.22.1"
   }
 }

And... configuration-wise, that is all we need. After committing that initial scaffolding to the newly created repository, we are now ready to move existing projects to the monorepo's scope managed by Lerna. Time to make the use of lerna import that is part of the basic API dedicated precisely to what we want to achieve.

$ lerna import <path_to_repo_a> --preserve-commit --flatten
$ lerna import <path_to_repo_b> --preserve-commit --flatten
$ lerna import <path_to_shared_lib> --preserve-commit --flatten
--preserve-commit option copies over the commit history of the repository being migrated, what may be handy for tracking the changes made before moving to a different repository. Unfortunately, Lerna does not do a great job on commit history migration (especially when they contain files renames or removals; what is a pretty likely scenario), hence the suggestion to use it in combination with --flatten flag.

Once all projects are migrated, and they are safe and sound in packages/ directory, time to say goodbye to all existing local lockfiles - Yarn will not use them anymore, so you can safely (well, more or less safely, depending on if you have revisited all the dependencies of migrated projects) remove them.

Keep in mind that even if some configuration entries in projects' package.json did not matter for you before, now you need to fill them in because they are going to be used internally by Lerna. Double-check if the projects you have just migrated have the valid name property defined - it is going to be used to resolve internal workspaces when you reference them in another project's dependencies section.

Another manual step required is to replace all references to shared_lib in package.json configuration files pointing to old dependency origin that will not be relevant anymore. In our case, the change in both packages/repo_a/package.json and packages/repo_b/package.json looks like below:

-    "@tripleequalsdev/shared_lib": "git://github.com/tripleequalsdev/shared_lib.git#master",
+    "@tripleequalsdev/shared_lib": "^1.0.0",

To generate the brand new top-level lockfile and fetch & link all the required packages for our entire ecosystem, simply run lerna bootstrap (you don't have to be in the root directory; you can run it from wherever in the monorepo scope), what in our case (with workspaces enabled) acts as an alias to well-known old-school yarn or yarn install. Now you can verify if your project works by going into its directory (under packages/) and executing your build/develop/test/whatever script that you have typically been running in the development process. As simple as that.

If you wonder what is happening beneath the lid, you do not need to look far for the explanation. When the dependency you have listed in either devDependencies or dependencies is located within the monorepo, instead of reaching for package registry, Yarn will create a symbolic link to the directory containing the requested workspace and put it in top-level /node_modules. Rest of the dependencies will be fetched and installed as usual. It is also important to notice that shared dependencies (both internal and external) will also be placed in the project's root dependencies catalog. When you inspect /node_modules contents created in the scopes of subprojects, they will contain only project-specific dependencies that, due to version mismatch, cannot be shared across the entire ecosystem.

Independent versioning

One thing that I have not elaborated about is the monorepo creation command; to be more specific - what does the --independent flag mean and why do we really need it. The reason for that is pretty straightforward.

Per documentation:

Independent mode Lerna projects allows maintainers to increment package versions independently of each other.

The main question we need to ask ourselves is: do we want to tightly couple all projects together in terms of releasing? In other words - do we expect repo_b to be released as well whenever totally independent repo_a needs to be released? Of course not! If the projects do not rely on each other, we should keep the release cycles separated. It does not sound right to deploy repo_b if an isolated change is applied to repo_a - its new release would not differ from the already deployed version after all. When a change is applied to a shared workspace (shared_lib), Lerna is smart enough to not only bump its version but also link it to its clients and mark them as changed as well.

Are we done yet?

As you can see, the whole migration process is not that complex, and - let's be honest - given that there already exist multiple blog posts and other web resources covering this part, you probably would not need another article to achieve the same outcome that the above elaboration results in. We have the monorepo in place, and... this is the moment when things become more interesting. There can be various use cases that you may need to have the aggregated repository for. In the next article I am trying to cover some of them, so even if you find this one useless, I am pretty sure you will benefit from being on the same page when you jump into it.

Go backGo back to all blog posts