In April 2022, we were informed of a vulnerability in our subgraph implementation by community contributor lcfr. Due to the way null characters are handled by The Graph and its PostgreSQL backend, it was possible to create lookalike domains that couldn't be distinguished from the names they imitated using only data from our subgraph.
We implemented and deployed mitigations for this in both the subgraph and the metadata API, paid lcfr a bounty, and closed the issue. However, in December 2022, the issue reocurred. The below is a postmortem describing the process issues that led to its reappearance, and describing how we will improve our process going forward to prevent this kind of reversion happening in future. We're posting it as part of our commitment to transparency and process improvement. We've also awarded lcfr a further bounty of $20k for finding the reversion.
Topic: Null byte character bug
Reported date: 05.04.2022 / 10.01.2023
Summary
When an attacker creates a domain with null byte character added in front or back of the name, due to a known issue in The Graph, the attacker is able to impersonate any domain which is already registered by someone else. The issue, first reported around April 2022, we as ENSLabs took an immediate action to prevent it from happening. 9 months later the issue re-appeared.
Root cause
PostgreSQL does not store null characters in text columns. The Graph uses PostgreSQL, meaning that any text data saved in a subgraph would never include null characters.
Root cause for the re-appearance
After hotfixes, we haven't followed up with persistent actions (as I know of) and the hotfix was never pushed into a repository. There was only a hot-fix shipment during the event, to prevent the issue happening till the permanent solution to be put in place. Later, during some other feature changes and refactors, patch was removed unintentionally on the metadata-service side.
The subgraph fix was never tested, and therefore was never confirmed as working. It assumed that the input label was already void of any null characters. However, event data in graph-node is stored as raw bytes in PostgreSQL. This meant that the label used for generating the comparison labelhash was yet to be stored in PostgreSQL, therefore still had null characters.
Resolution
In the first incident, as an immediate solution to the given problem we decided to make a hotfix on both metadata-service and our subgraph repositories (as I know). The solution was simply double checking if queried labelhash matches with the one provided by the subgraph.
In the second incident, we will be re-applying the same patches compatible with the current feature set in the metadata-service, will have necessary tests and descriptions, comments alongside the code. ensdomains/ens-metadata-service/pull/142
The fix for the subgraph is a different implementation, each label will be manually checked for null characters and disallowed if the label contains any. Alongside this, a unit test will be added to prevent regression.
Preventative Measures
Reporting and following persistent fix from the TheGraph side? Having a procedure for both temporary and permanent solutions? Semi - automated deployment for metadata-service, all the code merged into master branch will trigger -no-traffic deployment then we will split the traffic later.
Impact
There are 14 incidents found, some were taken down by marketplaces after our fixes, the others taken down after we gather the full list.
Lessons Learned
After an immediate action, be sure persistent solutions are discussed and applied. Even though the hotfixes are temporary, provide necessary testing as soon as possible. Put clear description and comments alongside the hotfixes.