Jimmy Cai Avatar
Home

Building an Inline Comment System for Markdown-based Blog

How to uniquely identify paragraphs in Markdown for reliable inline commenting

Last month, I implemented an inline comment system for my blog, allowing visitors to comment on individual block-level elements (including block-quotes, images, lists, etc.) in a blog post. The concept is inspired by Medium, which has supported this feature since its early versions.

In this post, I'll refer to each block element as a paragraph.

In most of comment systems, each comment is stored as an entry in the database, commonly with the following schema:

ts
const commentTable = sqliteTable(
    "comment",
    {
        id: text()
            .primaryKey()
            .notNull()
            .$defaultFn(() => randomUUID()),
        slug: text().notNull(),
        author: text()
            .notNull(),
        content: text().notNull(),
        createdAt: text()
            .notNull()
            .default(sql`(CURRENT_TIMESTAMP)`),
        parent: text(),
    }
);

Here, the slug field stores the ID of the article. For example, for this post, the slug would be markdown-inline-comment-system. This field will be used to query all comments within an article.

This works well when there is only one comment section per article. With inline comment, there will be multiple comment sections associated with the same slug, so I added an extra field pid (Paragraph ID) to distinguish each one of them.

H2 How to identify each paragraph?

So, how should the pid be generated? It has to identify each paragraph uniquely and permanently somehow. Medium has its own editor, which assigns an ID to each block, so this is not a problem for them. However, this blog is and will be written in Markdown for a long time, and there is not a native way to identify each line as it's just plain text.

The most straightforward approach to do this would be use the position of each paragraph as its ID. It's clear that ID assigned by this method is not permanent: IDs would shift upon content edits, such as insertions, deletions, or reordering. So this idea would only be viable if the post content does not change after getting published, otherwise the comment section would get misplaced.

Alternatively, a content-based identifier (e.g., raw text hash) can be used. For example, we could save the content of paragraph to database, and use this as the ID, or hash it somehow. This method is resistant to simple paragraph additions and movements, but might suffer ID collision (chance is low however, at least for my blog content). And the assigned ID might change depending on how the mapping is done. For example, if ID = plain text, then adding styles to the paragraph (like putting some text in bold) would not affect the resulting ID. Still, this method does not guarantee that the ID does not get changed by accident when editing the post, but might be good enough depending on cases.

The next idea is to use Git history. This blog uses GitHub to store all content. My initial idea was:

Ideally, this makes possible to edit the paragraph without affecting the generated ID, as the initial commit and line number won't change in the subsequent editions.

However, it did not work as I expected: If a paragraph is moved inside the Markdown file (for example from the beginning to the end), without modifying its content, what I expect is that Git would treat this as a moved block, not a newly introduced one.

Apparently VSCode and many other editors are able to detect this using additional algorithms, but I think that's just overkill for my usage, and introduces more headaches to this problem.

Just for future reference, I have used the following commands to access the Git history:

bash
git log -L start_line:end_line filename.md

H2 Back to easiest approach

Rather than overengineering a perfect mapping, I decided to just "manually" assign ID to each block. That would make the solution much easier and robust.

My initial thought is to use a HTML comment tag, something like <!--id:{a1b2c3}--> , and add it at the end of each paragraph. The advantage of using HTML comment is that it won't be directly visible in the frontend. So if in the future I decide to stop using this inline comment system, there won't be any unexpected visual artifacts.

Given that this blog runs on Markdoc and it supports attributes, instead of using HTML comment tag, I used {% id="a1b2c3" %} . And modified the Markdoc configuration file to add extra attribute definition:

markdoc.config.mjs
export default defineMarkdocConfig({
    nodes: {
        item: {
			/// List item
            ...nodes.item,
            attributes: {
                ...nodes.item.attributes,
                id: { type: String, required: false },   
            }
        },
        fence: {
			/// Code block
            ...nodes.fence,
            attributes: {
                ...nodes.fence.attributes,
                id: { type: String, required: false },   
            },
		},
        paragraph: {
            ...nodes.paragraph,
            attributes: {
                ...nodes.paragraph.attributes,
                id: { type: String, required: false },   
            },
		},
	}
});

In case your blog uses Markdown + Remark, you can also do similar thing by creating a Remark plugin to parse HTML comment tag and add ID attribute to the paragraph node.

H2 Script to Automatically Add IDs to Paragraphs

Obviously it's not feasible to add ID manually to each paragraph and each post, so I wrote a script to automate this task using the Remark processor. It uses Remark to parse the Markdown file and add ID attribute at the end of each paragraph line.

ID is generated using nanoid. Length is set to 10, but can be further reduced as the chance of collision is low (unless you have a lot of paragraphs).

ts
import fs, { glob } from 'fs';
import path from 'path';
import { customAlphabet } from 'nanoid'
import remarkParse from 'remark-parse'
import { remark } from 'remark';
import type { Paragraph } from 'nlcst';
import remarkGfm from 'remark-gfm'

const CONTENT_FOLDER = path.join(".", "blog");
const ALPHABET = '0123456789abcdefghijklmnopqrstuvwxyz';
const ID_LENGTH = 10;

const nanoid = customAlphabet(ALPHABET, ID_LENGTH);

const processor = remark()
    .use(remarkParse)
    .use(remarkGfm)

const processFile = async (filePath: string) => {
    const fileContent = fs.readFileSync(filePath, 'utf-8');

    const file = processor.parse(fileContent);

    const paragraphs: Paragraph[] = [];

    /// Also try to find children of paragraphs
    const findParagraphs = (node: any) => {        
        if (node.type === 'paragraph' || node.type === 'code') {
            paragraphs.push(node as Paragraph);
        }
        if (node.children) {
            node.children.forEach(findParagraphs);
        }
    };

    findParagraphs(file);

    const fileLines = fileContent.split('\n');

    paragraphs.forEach((paragraph) => {
        const lineNumber = paragraph.position?.start?.line;
        if (!lineNumber) {
            console.warn(`No line number found for paragraph in file ${filePath}`);
            return;
        }

        // Adjust lineNumber to be a zero-based index
        const lineIndex = lineNumber - 1;

        const line = fileLines[lineIndex];
        if (line.includes(`{% id="`)) {
            return; // Skip if the line already has an ID
        }

        const id = nanoid();
        const updatedLine = `${line} \{% id="${id}" %\}`;       /// Note: remove \ after copying this code
        fileLines[lineIndex] = updatedLine; // Update the line with the new ID
        console.log(`Added ID to line ${lineIndex} in file ${filePath}: ${updatedLine}`);
    });

    const updatedFileContent = fileLines.join('\n');
    fs.writeFileSync(filePath, updatedFileContent, 'utf-8');
    console.log(`Updated file: ${filePath}`);
};

const addIdsToFiles = (dir: string) => {
    /// Get all files in the directory with extension .mdoc

    glob(`${dir}/**/*.mdoc`, (err, files) => {
        if (err) {
            console.error(`Error reading files in directory ${dir}:`, err);
            return;
        }

        console.log(`Found ${files.length} markdown files in directory ${dir}`);

        files.forEach(file => {
            processFile(file); // Process each markdown file
            return;
        });
    });
};

// Start processing from the content folder
addIdsToFiles(CONTENT_FOLDER);

H2 Custom element or custom attribute

Because Markdoc also supports custom tag, so instead of custom attribute, I initially created a custom tag {% comment id="a1b2c3" /%}, and assigned an Astro component to this. The component would receive the id props, and fetch the data from the backend.

This approach feels natural because each inline comment section can be seen as independent components. However, when it comes to server-side data fetching, it's better to have single query that retrieve all the comments from the current post, instead of creating batch of individual requests (because most of them will return no result). I found no clean method to share fetched data between server-rendered components without introducing race conditions, as they are rendered independently.

Furthermore, the component would be rendered inside <p> tag. And according to the HTML specification, there can only be phrasing content inside of it (HTML Standard), and that excludes usage of <div> for example (it will be rendered outside of <p> automatically).

So at the end, I created a single Astro component, fetch all the data there, and rendered React components in the front end using createPortal:

tsx
export const AppInlineComment = ({
    slug,
    comments
}: {
    slug: string,
    comments: CommentsQuery
}) => {
    const elementsWithComments = (
        Array.from(
            document.querySelector(".content")
                ?.querySelectorAll<HTMLElement>('p[id], li[id], .gallery[id], .codeblock[id]')
            ?? []
        )
    );

    return elementsWithComments.map((element) =>
        createPortal(
            <InlineComment pid={element.id} slug={slug} pCommentData={
                comments?.[element.id] || { count: 0, comments: [] }
            } />,
            element
        )
    )
}

I won't explain in detail how the InlineComment component works, as it's just a normal comment component but with a extra pid prop.