Kin Lane

Lower Rate Limits and Build Timeouts

I finally figured out my challenges with using GitHub as my primary data project provider. I’ve used the platform since 2014 to manage the API Evangelist network of sites. In January of this year I hit pause on my sites, only coming back by June of this year. In this six month period it became much more difficult to reliably manage my network of sites, which as I discovered was due to two changes in how GitHub approaches things—lower API rate limits and timeouts for Jekyll build.

There wasn’t anything I changed with my sites since January, but I rarely could get my sites to build without getting API rate limit errors, or a range of unknown errors when I built using a straight Git commit. It would take days sometimes before API Evangelist would build after publishing a single blog post, and I found myself working hard to schedule out the week each weekend, only for it taking me even more time to sort out the various errors before I could actually see any results publicly on the sites. This was just my content publishing sites, I have not even fully got back to managing the large volumes of YAML data I store and publish via GitHub.

After discovering that my automation of data and content syndication across my hundreds of sites was failing about 50% of the time, I converted everything to build in batches using a Git commit instead of the APIs, but I still found that many of my sites wouldn’t build and render—-getting empty or no errors at all from GitHub. The increased errors from the API was clear—-they had lowered the threshold, but after about 10 hours of investigation I was finally able to find that the reason for the Git commits was about the timeout threshold for Jekyll build. Some of my GitHub Pages hosted sites are pretty data intensive, with hundreds or thousands of YAML cores. If I ran Jekyll build locally, and then published to GitHub they would pass, giving me a green check.

I get it. I’ve outgrown the platform. I was under the influence that they wanted to be a public data provider coming out of my work in Washington D.C. I’m guessing that in 2019 this has changed, and there is nobody internally advocating for open data publishing to the site. I’m guessing that these changes weren’t intentional, they just needed to start tightening things down, and rate limits and timeouts are a common place for platforms to start. I’m guessing that open data projects using Jekyll is not a priority, and since there is no way for me to pay for any sort of prioritization or support, I’m forced to make difficult decisions. Sadly this is a signal I see often from platforms, which is a sign for me to move on, and take my work somewhere else.

I was using GitHub in this way because it was free, open, observable, and forkable. I wanted all my work to open and publicly available, as well as something anyone could fork and build upon. Personally, I thought it was an innovative way for GitHub to generate interest in open data projects, and headless CMS deployments. I’m guessing they are more interested in larger revenues elsewhere, rather than continuing to grow this fast growing area. It is something that makes me sad, because city, state, and federal government agencies won’t be able to use it as a free place to host data and engage with developers. In the end, I know that I rarely get what I want, and ultimately the enterprise and investors will get what they want, and the most lucrative approach will always win out.

As of Sunday, I moved 100 API Evangelist network sites off of GitHub. I have a couple of issues like pagination, tagging, and other common CMS related things to tackle before I move the main API Evangelist site off of GitHub. I’m not even going to be using Jekyll in my new world. I’ll keep things static, but they’ll be my own home brew publishing system. There really isn’t much ROI for me in keeping things open, forkable, and something that others can consume, so I’ll be shifting things towards a more proprietary stance. I wish people like me could make a living by publishing open data, but I’m just not convinced that platforms are open to supporting, and entrepreneurial minded consumers aren’t willing to support the folks doing the hard work to manage and publish open data. I don’t think free and open data will go away, but when it comes to what I do, I’ll be less open to sharing my data without more help from platforms, and the community.

If you are interested in any of the API research and open data work I do, as always, feel free to reach out. Depending on who you are and what you do I may be open to sharing, but I’m also always interested in helping pay the bills with a little partnership.