NextPick · How it was made

Finding the Book Dataset

We sourced various Goodreads datasets, some coming from universities, some from Kaggle, and others sourced from forums. We tried to focus on English books that had information relating to book titles, genres, ratings, and ISBN13 in order to serve as the foundation for our recommendations.

Cleaning the Dataset

Much of the raw data was messy. We used Python for all the data processing and cleaning. This included normalizing fields, removing duplicates, handling missing values, and excluding books that did not have adequate information that we could replace. This was a key step, as having clean data would be needed for embedding.

Python

Vectorizing with OpenAI

Each book’s description was converted into a high-dimensional vector using OpenAI’s text-embedding-3-small model, then stored in a Pinecone vector database for similarity search.

text-embedding-3-small Pinecone

Building the Site

The frontend was built with HTML, CSS, and JavaScript. The backend runs on Node.js and Express, connecting the UI to the text model and Pinecone via the API.

HTML CSS JavaScript Node.js Express

Pushed to GitHub

All of our source code lives in our GitHub repository, which we use for version control and collaboration. That also gives us a clear history of the changes we have made throughout the project.

GitHub

Deployed with Vercel

Lastly, we use Vercel because it automatically builds and deploys the site to production. Paired with a custom domain, that setup is what keeps our production site running today.

Vercel