Creating a Filesystem on the Web

Lessons from Building a Google Drive Clone

Not too long ago I worked on a client’s CMS platform that needed a Google Drive-like UI so customers could manage their content inside the app.

Filesystems seem so mundane, and from the outside looking in, you could easily overlook how much complexity hides beneath them. Tiny details in the UI, backend routes, and database queries all have to align perfectly for everything to work cohesively.

Building a Google Drive-style filesystem sounds easy until you realize you’re not just making a folder view, you’re recreating years of UX polish and recursive data logic under the hood. I learned this the hard way on a client project where we had to ship a full cloud storage experience in just two weeks (which turn into more like a month).

In this post, I’ll walk through the challenges and decisions involved in building a Drive clone from scratch with Vue, Pinia Store, and Postgres. what worked, what didn’t, and what I would do differently now.

initial thoughts and design

I initially had the thought to use recursive queries on the backend with Postgres to get the tree like data that could easily propagate to the frontend without having to do too many unions/joins in the process. Ultimately since the requirements never specified separate queries for folders and files I thought it best to keep it to one table for time and simplicity. We could always split it out when our requirements grew in scope and users could have the feature sooner:

  erDiagram
    FILESYSTEM {
        int id
        int parent_id
        enum type
        string name
    }
    MEDIA {
        int id
        enum type
        string s3_url
    }
    FILES_TO_MEDIA {
        int file_id
        int media_id
    }

    FILESYSTEM ||--o{ FILES_TO_MEDIA: "has"
    MEDIA ||--o{ FILES_TO_MEDIA : "belongs to"

I got some push back from the clients development team as they wanted the entities separate which makes sense under the circumstances of attaching searchable metadata to the files and folders in a way that wouldn’t kill your databases performance; but my argument at the time: we only had two weeks to push this feature out and our use-case at the time didn’t warrant that complexity out the gate. Ultimately I was strong armed to separate out the entities which greatly increased development time:

  erDiagram
    FOLDERS {
        int id
        int parent_id
        string name
    }
    FILES {
        int id
        int parent_id
        string name
    }
    MEDIA {
        int id
        enum type
        string s3_url
    }
    FOLDERS_TO_FOLDERS {
        int parent_id
        int folder_id
    }
    FOLDERS_TO_FILES {
        int folder_id
        int file_id
    }
    FILES_TO_MEDIA {
        int file_id
        int media_id
    }

    FOLDERS ||--o{ FOLDERS_TO_FOLDERS : has
    FOLDERS ||--o{ FOLDERS_TO_FOLDERS : belongs_to

    FOLDERS ||--o{ FOLDERS_TO_FILES : has
    FILES ||--o{ FOLDERS_TO_FILES : belongs_to

    FILES ||--o{ FILES_TO_MEDIA : has
    MEDIA ||--o{ FILES_TO_MEDIA : belongs_to

routes, queries, and ORM madness

I knew starting out, our ORM wouldn’t support recursive queries at all so I had to raw dawg some SQL myself. My kind of challenge! Personally I love it when I skip the ORM and just go straight for SQL in my projects. Really makes me feel like a REAL developer but I digress.

I created the CRUD routes for each entity along with more routes on folders to grab the children/parents (both files and folders) using a recursive query with parameters to control the depth, offset, and limit to make the endpoint paginate so to not overwhelm the client on big file trees. This allowed me to move up and down the tree no mater what folder in the tree and still allowed me to communicate in the frontend where you where with a path using an array of folder id’s.

These queries where especially tricky to get working correctly. Lots of moving parts to this query. The basic idea, based on the desired folder you where at you’d recursively look for all the child folders and join the files on that folder into the result.

I’d then format the rows into a JSONB object that would make sense for passing it to the frontend with the desired pagination and depth parameters so you could know where you where in tree and which page you where at:

WITH RECURSIVE folder_tree AS (
    -- Anchor: start at the root folder
    SELECT 
        folders.id,
        folders.name,
        0 AS depth,
        jsonb_build_array(folders.id) AS path
    FROM folders
    WHERE folders.id = :root_folder_id

    UNION ALL

    -- Recurse into child folders
    SELECT 
        child_folders.id,
        child_folders.name,
        folder_tree.depth + 1,
        folder_tree.path || to_jsonb(child_folders.id)
    FROM folder_tree
    JOIN folders_to_folders 
        ON folders_to_folders.folder_id = folder_tree.id
    JOIN folders AS child_folders 
        ON child_folders.id = folders_to_folders.child_id
    WHERE NOT child_folders.id = ANY (SELECT jsonb_array_elements_text(folder_tree.path)::int)
      AND folder_tree.depth < :max_depth
),

folders_json AS (
    SELECT jsonb_agg(jsonb_build_object(
        'id', folder_tree.id,
        'name', folder_tree.name,
        'depth', folder_tree.depth,
        'path', folder_tree.path
    ) ORDER BY folder_tree.path) AS folders
    FROM folder_tree
),

files_json AS (
    SELECT jsonb_agg(jsonb_build_object(
        'id', files.id,
        'name', files.name,
        'folder_id', folder_tree.id
    ) ORDER BY folder_tree.path, files.name) AS files
    FROM folder_tree
    JOIN folders_to_files 
        ON folders_to_files.folder_id = folder_tree.id
    JOIN files 
        ON files.id = folders_to_files.file_id
)

SELECT jsonb_build_object(
    'offset', :row_offset,
    'limit',  :row_limit,
    'files',  COALESCE(files_json.files, '[]'::jsonb),
    'folders',COALESCE(folders_json.folders, '[]'::jsonb)
) AS response
FROM files_json, folders_json;

I also needed a way to look up at the ancestors for the use-case of if you started in a nested folder and where working your way up the filesystem:

WITH RECURSIVE ancestors AS (
    -- Start from the target folder
    SELECT 
        folders.id,
        folders.name,
        0 AS depth,
        jsonb_build_array(folders.id) AS path
    FROM folders
    WHERE folders.id = :start_folder_id

    UNION ALL

    -- Move up the tree using association table
    SELECT 
        parent_folders.id,
        parent_folders.name,
        ancestors.depth + 1,
        jsonb_build_array(parent_folders.id) || ancestors.path
    FROM ancestors
    JOIN folders_to_folders l
        ON l.child_id = ancestors.id   -- reversed link
    JOIN folders parent_folders
        ON parent_folders.id = l.folder_id
)
SELECT jsonb_build_object(
    'offset', :row_offset,
    'limit', :row_limit,
    'folders', COALESCE(jsonb_agg(folder_json ORDER BY depth DESC), '[]'::jsonb)
) AS response
FROM (
    SELECT id, name, depth, path
    FROM ancestors
    ORDER BY depth DESC
    OFFSET :row_offset
    LIMIT :row_limit
) folder_json;

moving on to state management in the frontend

With the routes out of the way, I moved on to the frontend which I knew full well would be a challenge due to having separate routes for each entity. In pina, I created three stores each responsible for there respective entity type with there own state, metadata, actions, and getters. When you updated one of the resources we would update the respective store from the routes and anything that needed to change in the filesystem store.

import { defineStore } from pina;

export const useFilesystemStore = defineStore('filesystem', function() {
    const folderStore = useFolderStore();
    const fileStore = useFileStore();
    const mediaStore = useMediaStore();

    const path = ref([1, 2]);
    const graph = ref([
        { 1: [2, 3, 4] }, // level 0
        { 2: [5, 6, 7] }, // level 1
    ]);
    const tree = computed(function() {
        const tree = {};
        Object.entries(graph).forEach(function([key, val]) {
            // for each folder and there children in the graph, organize
            // them based on the metadata pulled from the various stores
        })

        return tree;
    });

    //CRUD getters/setters...

    return {path, graph, tree};
});

export const useFolderStore = defineStore('folder', function() {
    const entities = ref({
        1: {}, 2: {},
    });

    // CRUD getters/setters...

    return {entities};
});

export const useFileStore = defineStore('file', function() {
    const entities = ref({
        3: {}, 4: {}, 5: {}, 6: {}, 7: {},
    });

    // CRUD getters/setters...

    return {entities}
});

export const useMediaStore = defineStore('media', function() {
    const entities = ref({
        1: {}, 2: {}, 3: {}
    })

    // CRUD getters/setters...

    return {entities}
});

Above is a bit over simplified but you get the idea. Any time you triggered an action, you’d have to do cascading effects in the store. Example let’s say you moved a folder to a sub folder, you’d update on the route and reflect the changes in the graph. If you create a folder/file on the route you then have to update the graph. The graph would then trigger updates to the computed property tree so we could in the component represent it deeply nested as a tree (as that is how the component we fed it to wanted it).

This made it so the changes felt instant on the filesystem when using the app and minimized the amount of api calls needed aswell. We’d know what needed to be loaded when in the UI based on the level and we could effectively cache client side what files where loaded and where the lived in the tree and have a way to dynamically render then on frontend.

Then with the path if you wanted to go up or down the file system, you’d just pop or push the path on the filesystem array which made navigation easy in the tilebar component and the UI would reflect the changes based on that path ref.

lessons learned + what I’d do differently

Sometimes the most flexible solutions, especially on paper, don’t always equate to the right solutions. Given the time constraints and the scope of what the business needed, I wish I would have fought harder for the more simplified database ERD. It’s hard to know when it’s a good idea to start off with complexity in a system before it’s even built, but I’ve found over and over again throughout my career, you should only reach for complexity after you’ve fully integrated and understood the constraints of the simple solution.

Flexibility comes at the cost of complexity, always. Simple solutions should’t be ignored and I think more often then not, many companies make this tradeoff without the full understanding of what that means. This Costs time, resources, and people. You dont need a smart fridge to know what to eat, sometimes just opening it up and seeing whats there is more than enough for most.

If I could go back and build this system up from scratch again, I would have started with just the basics and layered on when features warranted it. It would have made development much smoother and would have met the expectations of the timeline.