Turnkey AWS with Paco: Private PyPI Server

By Kevin Teague
Published on Mar 10, 2020

One important aspect of software development is the use of libraries. The Python language arguably owes much of its popularity due to its rich ecosystem of well developed libraries. When deploying Python applications, it’s important to be able to have a process that can deterministically create application builds. Part of this process is to be able to manage library build artifacts — aka Python packages.

Python has the Python Package Index (PyPI) for hosting open source Python libraries, and it’s a robust and reliable place to install libraries from. However, PyPI is a public service, if you’re developing proprietary applications, how do you manage private libraries?

You need to have a private PyPI server to manage private libraries. You can sign-up with Gemfury and pay a monthly fee for your own private PyPI server. But maybe you’ve got an itch (or a corporate policy reason) to host your own PyPI server along with the rest of your AWS cloud infrastructure. If you’re in the latter camp, wouldn’t it be great to be able to build out a complete Private PyPI solution, complete with backups, monitoring, and fault-tolerance using a pre-built turnkey solution?

At Waterbear Cloud, we wanted to manage our own private PyPI server (because we had that itch). Our open source Paco tool is about being able to take AWS solutions like a private PyPI server and turn it into a turnkey process with Paco Starter Projects.

With Paco’s new Private PyPI Server Starter Project you too can have a complete PyPI solution provisioned and running in your own AWS account in under and hour!*

* We intend Paco Starter Projects as much as a way to learn how to use Paco to automate AWS cloud as much as a ready-to-go solution, and encourage you to spend more than an hour learning how Paco works and what it’s provisioned in your AWS account.

The Private PyPI Starter Project

To get started, follow the Private PyPI Server Starter Project, and after you’ve installed Paco and connected it to your AWS account, you will only need to run two commands:

paco provision resource.ec2
paco provision netenv.mynet.prod

We’ve designed this Starter Project to be able to run in two modes: “budget” and “professional”. The “budget” mode will let you create a single EC2 instance in a public subnet — you can run this on AWS Free Tier or for less than $10 per month. If you just want to get a small PyPI server up-and-running, use this version.

However, at Waterbear Cloud we run our own suite of applications with our networking architecture having both public and private subnets. An Application Load Balancer (ALB) lives in the public subnets and proxies requests to backend applications. In addition to giving us a single place to manage our SSL certificate, ALB’s are managed by AWS who automatically hosts the load balancer in all of our public subnets — this gives ALB’s better than %99.999 uptime.

The “professional” PyPI server configuration will provision this architecture: public and private subnets, ALB proxy and a NAT Gateway. It increases your AWS costs by over $30 per month, but allows you to make no downtime server changes. This version is intended to show you how to create Paco configuration to manage a shared ALB with several backend applications.

Cloud Orchestration means more than just Infrastructure as Code

We sometimes introduce Paco as an Infrastructure as Code (IaC) tool along the lines of Terraform or CDK, but this isn’t very accurate. While Paco can be used to automate the provisioning of cloud infrastructure like traditional IaC tools, we usually call it a cloud orchestration tool — this is because Paco doesn’t just provision infrastructure, but it also orchestrates configuration management and governance of your infrastructure. Paco handles the running of configuration management tooling, installing agents for roles such as monitoring and centralized logging, and creating alarms and notifications to alert you not only of infrastructure problems but also to notify you when your configuration management tooling is throwing errors.

Paco doesn’t directly embed a Configuration Management tool (aka Ansible, Chef, Puppet) inside of it , but instead allows you to declare a configuration management tool and supply configuration sets to be used with that tool. We started with AWS CloudFormationInit (aka cfn-init) as the first Configuration Management tool to be supported by Paco.

Paco’s Configuration Management orchestration under the hood

How does Paco orchestrate configuration management for the Private PyPI Server project? In the Paco project it declares that an AutoScalingGroup’s EC2 instances should use cfn-init for configuration management and which configuration sets should be included. You create one or more config sets and then in the launch_options field for an AutoScalingGroup resource, you declare a list ofcfn_init_config_sets that you would like to run when a new instance is launched in that ASG. This can look as simple as:

server:
type: ASG
launch_options:
cfn_init_config_sets:
- "InstallPyPI"
cfn_init:
configurations:
InstallPyPI:
... actual configuration files/scripts go here ...

There is a some fiddly and rote work to get configuration management integrated with an AutoScalingGroup — fortunately Paco hates rote work, so it takes care of that work for you:

  • Install a configuration management tool.
  • Download the applicable configuration sets.
  • Run the configuration management tool against those configuration sets.

This is a common pattern for managing configuration management with configuration sets. Paco implements all this for you behind the scenes by using a concept it calls Launch Bundles. These are zip files that configure a group of server(s) to fulfill a specific role. The cfn-init launch bundle takes care of ensuring cfn-init is installed, running cfn-init and telling it where it can download the configuration sets and which sets should be run. Paco does all these steps when your configuration declares it will use cfn-init:

  • Create an S3 Bucket to hold launch bundles.
  • Apply an S3 Bucket Policy to allow the AutoScalingGroup access to the S3 Bucket.
  • Create the launch bundle zip files and uploads them to the S3 Bucket.
  • Adds BASH to the AutoScalingGroup’s UserData to download all of that server’s launch bundles and run them in sequence.

Doing all of those steps is what we mean by Configuration Management orchestration. By combining configuration for both Infrastructure as Code and Configuration Management into one configuration set in Paco you get two major benefits:

1. Don’t Repeat Yourself (DRY): The ability to share the same metadata between your Infrastructure as Code and Configuration Management projects.

2. Validation that these two aspects of cloud automation are configured to work together correctly.

For example, Paco can detect and warn you that your logging agent is collecting log files with a different name than your centralized log alarms are alerting on or against infrastructure that wouldn’t actually exist.

Another configuration example: Elastic FileSystem (EFS) mounts

The Private PyPI Server uses AWS Elastic FileSystem (EFS) to store and serve it’s Python packages. Advantages of using EFS is that you can run more than one PyPI server in an AutoScalingGroup and roll out new servers or have faulty instances replaced without any downtime. You also only pay for what you use with EFS (AWS pricing is currently at $0.33 GB/month for EFS).

When a new EC2 instance is launched, it needs to mount an EFS filesystem. Paco does this with an EFS launch bundle. In the Paco project configuration, this process is as simple as declaring an EFS filesystem resource and then declaring that the ASG will mount that filesystem and what the mount point should be. This looks like:

packages:
type: EFS
# ... configuration options for EFS go here ...
server:
type: ASG
# declare this AutoScalingGroup should mount the above
# EFS resource under /var/pypi
efs_mounts:
- enabled: true
folder: /var/pypi
target: paco.ref netenv[...]pypi.resources.packages

Paco will parse that configuration and create an EFS launch bundle behind the scenes. This launch bundle can determine the EFS Id, add it to /etc/fstab and finally mount that filesystem.

Note that the target field for efs_mounts uses a Paco Reference. Paco References are special configuration which refer to another resource, name or cloud id within a Paco project. This way Paco can do configuration validation such as detecting that the EFS mount configuration refers to a valid EFS filesystem before it even provisions any cloud resources.

Paco’s philosophy is to allow you to not only declare your cloud resources but how they are connected together. Then Paco is smart enough to see those connections and supply bundles of code and configuration to do this rote work for you — and also stop you from deploying broken configuration 🙂

Paco is an open source cloud orchestration tool that was developed by Waterbear Cloud to allow us to provide customers with complete cloud automation solutions with far greater speed, cost, and quality than traditional cloud consultants projects could provide. Interested in taking the grunt work out of managing your cloud? Talk to Waterbear Cloud today.