APrime’s LLM Kit, Part 1: How to host your own private AI models in AWS

DAMJAN KORAĆ
Staff Product Manager

Welcome to APrime’s three-part series on hosting your own Large Language Models (LLMs) in Amazon Web Services (AWS) by using our free, open-source tools. Whether you are eager to leverage AI in your product but wary of sharing your data with third parties like OpenAI or are simply unsure where to begin with deploying AI models, this guide is tailored for you.

Visit Part 2, our quickstart guide, to get up and running with default settings, and Part 3  for an in-depth walkthrough and detailed discussion of the scripts and Terraform modules.


Why Should You Host Your Own AI Models?

Are you excited to utilize AI in your product but worried about your data in the hands of third-party AI providers like OpenAI? Many companies are understandably hesitant to send sensitive data to external services. Moreover, the cost of such services can quickly escalate, and companies often wish to retain full control over their training data and fine-tuning of model performance.

To address these concerns, we recommend self-hosting AI models. You can do so in a cloud provider, like AWS. By setting up and managing your own AI models, you can ensure complete data privacy, maintain control over costs, and experiment with AI capabilities without needing to hire specialized AI engineers. 

Enter APrime. We’ve prepared a free, open-source AI kit and guides to help you set up your own AI model in your AWS account where it is fully owned and managed by your team. You can get the entire process running in less than 10 minutes, and you never have to worry about any of your data leaving your private environment. Read on below to see how you can get started!

Why Did We Build This?

Several of our customers have solid use cases for incorporating LLMs and generative AI into their product offerings or internal operations processes, but they were reluctant to start building due to data privacy concerns. Our customers in the fintech and healthtech spaces, in particular, were worried about AI “hallucinations” and their ability to ensure that their data remains isolated when using an LLM. 

We did our own investigation and found that there are no great, easy-to-use solutions out there that make it easy for engineers to quickly deploy and test an open-source model in their own private environment. We found some options – such as Ollama – for locally hosting a model on a laptop, but none of them provided the ability to host in the cloud or support for multiple concurrent users at once. Furthermore, these options would not support our planned future use cases, including providing tools for performing your own fine-tuning on a model. 

At the same time, we started receiving many notifications from services out there – including non-AI tools we have been using – to accept new privacy policies that allow these vendors to train their models on any of their downstream customers’ data. 

We decided to build a solution ourselves: our engineers took the time and effort to launch, test, optimize, and document the end-to-end setup process for hosting your own models in AWS. By sharing our open-source tools and knowledge, we hope to empower more teams to experiment with AI and LLMs while still maintaining control over their data and roadmap.

APrime’s Free AI Hosting Kit

We have provided a set of free, open-source scripts and infrastructure files that allow you to set up, deploy, and interact with an AI model within your own AWS account. Our tools automatically handle the infrastructure setup and deployment of an AI model while also providing you with an API and UI to directly interface with your model, all without ever sending any data outside of your AWS environment.  

We have included a default open source model from Hugging Face, but we designed the tools to allow you to deploy other open-source models on your own.

You can access these tools via our public repo in GitHub. You can perform an end-to-end setup by running the quickstart.sh script and following instructions in the README.md file. This blog post aims to provide additional context for why you should consider using those workflows, but – if you’re ready to get started right away – you can skip ahead to the next post in the series here.

When should a Team Consider a Vendor-Provided AI Solution instead? 

We completely understand that self-hosting is not for all teams and have helped clients incorporate existing AI models and solutions – such as ChatGPT or Google Gemini – into their product and service offerings. Here are some cases in which it makes more sense to go with one of these other solutions:

  • You aren’t worried about any of your proprietary data being used to train or improve another company’s models.
  • You have particularly computationally intensive or advanced generative AI use cases in mind, such as needing to analyze and generate video. 
  • You are not concerned with the high usage-based costs of continuously running a managed model (such as on AWS Bedrock).
  • You plan to use a pay-per-token model and expect to use the AI sparingly.
  • You are interested in using specific AI features or interfaces that are unique to an existing solution out there. 
  • Your desired AI model requires Internet access to process queries and cannot be hosted in an isolated environment.

If you are on the fence on which approach to take or want expert help on how to plan your approach to AI, schedule a consultation call with our founders today!

Why Should You Use Our Free AI Self-Hosting Tools?

Quick Experimentation and Testing

Our tools allow you to rapidly deploy and test open-source AI models by offering a single-click deployment script to set up all required infrastructure, including the backend hosting of the model itself and a frontend UI. This is ideal for teams that need to iterate quickly and validate ideas without the initial overhead and complexity of managing infrastructure setup themselves.  

Data Privacy and Security

By hosting your own models, you keep your data secure and private within your AWS account. There is no need to send sensitive information to any third-party providers, ensuring compliance with internal and external data protection policies. We also include instructions on how to “tear down” or delete your model and its environment, if needed. 

Setup Made Simple

You don’t need a dedicated AI or ML engineer to get started. Anyone with scripting knowledge and an AWS account can use our tools to deploy an AI model, and we provide detailed documentation/guides to help you through every step of the setup process. This lowers the effort required to get started and  empowers more of your team members to get involved in AI projects. 

While managing the infrastructure setup on your own can get complex, our quickstart script and guide help guide you through the process, while our in-depth walkthrough provides detailed commentary on each step of the setup process.  

Cost Control

Building with AI and incorporating it into your product can become expensive very quickly. We did some initial comparisons of a self-hosted solution with (1) an AI provided by a third party like Mistral or Anthropic, or (2) a managed AI model with a service like AWS Bedrock. We found that the self-hosted option was much more economical, especially since you are able to finely control when it is running. Additionally, being able to host your own model allows you to focus on the particular AI features that you need – for example, you can select an open-source model that is aimed at text generation and does not require you to pay for a more computationally heavy environment to do video or image generation. We plan to do a deep dive into cost comparisons in a future post. If you’d like to see this or even an AI cost calculator, send us a note at llm@aprime.io

Get Started

The rest of the posts in this series will guide you through the process of self-hosting AI models in AWS, leveraging the power of ECS, GPU instances, and Hugging Face‘s Text Generation Inference (TGI) service to create a robust and scalable AI deployment. We focus on how you can get started with a single GPU instance, but the patterns and modules used will allow you to both vertically and horizontally scale with ease. 

Get started with Part 2 of of this series, our Quickstart Guide!

Share Your Ideas & Stay Connected

We are excited to hear about your experience of setting up your own models in AWS, and any feedback or ideas on how we can further improve these tools. Here’s how you can stay connected and contribute to the project:

  • Email Us: Reach out with any questions, feedback, or support requests at llm@aprime.io.
  • Follow Us on LinkedIn/GitHub: Stay updated with the latest developments and connect with our community by following us on LinkedIn or GitHub.

Star the Repo or Open an Issue: You can participate in the project by reporting issues, suggesting features, or simply showing your support for our repo.

Let Aprime help you overcome your challenges

and build your core technology

Are you ready to accelerate?