The unpredictability of bots

I recently made a Mastodon bot that didn’t really turn out how I expected. My goal was for it to be a bit cheeky, by being a bot who poses as a scholar. That’s not how it comes across. Rather, it presents itself as pedantic and over-confident. I suppose I could tweak it to make it more what I wanted. But I think it is unexpectedly interesting as it is.

The bot’s posts are written by code, and the curious thing about code is that it is often very deterministic and very unpredictable. One reason that code can be so unpredictable is because, as humans, we have trouble reasoning like computers, so we’re surprised by some of the weird things they produce. That is part of the joy of bots. They say some weird stuff.

So I don’t particularly like the personality of my bot, but I’m going to leave it as it is. It shows how hard it is to think like a computer does.

Posted in bots, mastodon, software | Leave a comment


I’ve recently had the honor of contributing to an open source project called ephemetoot. It’s a project by Hugh Rundle that auto-deletes your old Mastodon posts. I’ve wanted to contribute more to open source projects for a while now, but finding the right project is surprisingly hard to do.

Hugh’s project appealed to me for a couple of reasons: (1) I think it’s an awesome use case; (2) it’s a small code base that I could wrap my head around; (3) I’m enthusiastic about contributing to the Mastodon ecosystem; (4) Hugh is doing interesting work at the intersection of libraries and code.

Contributing has taught me some valuable lessons. I learned a few things about git. More importantly, I learned about collaborating on code with someone who I’ve never met. Up until now, most of the code I’ve written has been for myself. While I usually openly license my projects and on put them GitHub, they’re often written without that much thought about how others could use them. This approach needed a shake-up. Writing contributions for someone else’s project is a good catalyst to refocus on collaboration.

Posted in ephemetoot, mastodon, open source | Leave a comment

Learning the conceptual stuff

Because I’m a self-taught programmer, and still very much a beginner, there’s a lot of computer science theory that I’m totally unaware of. Yet I’m now beginning to see the value of classic theoretical solutions to common programming problems. When you can immediately identify a problem as being solvable with, say, a concept like a deque, or recursion, or whatever, you save an huge amount of time that would otherwise be spent groping around for a homemade (and probably inefficient) solution.

The classic solutions are really interesting too. Rather than spending my time just hacking together stuff that barely works, they have got me thinking about how a computer might best tackle a certain problem. Learning these approaches is definitely making me a better programmer, because it means I can come up with an efficient, realizable solution to a problem more quickly. Even better, problems which seemed unsolvable to me are suddenly accessible, because I’ve learned a way to approach them.

Anyhow, this post is in part inspired by this book, which I’m finding interesting.

Posted in learning | Leave a comment

Friday thoughts on greenOA

As Kingsborough’s representative for CUNY Academic Works, our university’s institutional repository, I help faculty share their publications freely and openly online. This is useful work, because it increases the visibility of their work, and allows many people around the world to access scholarship that might otherwise have been unavailable to them. The institutional repository provides a second home for many publications that may be paywalled elsewhere.

My work with the institutional repository mostly involves reading about licensing and writing emails to faculty. This is not glamorous stuff, but it is rewarding work because it is increasing access to scholarly knowledge in a very tangible way: one publication at a time. It is also very typical of the behind the scenes work done by academic librarians, who keep the wheels of knowledge sharing turning, so that scholars everywhere can continue working effectively.

Posted in institutional repository, open access | Comments closed

Fuzzy string matching

A few months ago, I wrote about a tool I made called the Fictograph, which graphs the awesomeness of authors’ works over time. It leans heavily on data from the Goodreads API. I expected the Goodreads API to be reliable, but it turns out it has some design problems. For example, if you query an author name with a minor spelling mistake, you sometimes get back data on a random author who is totally unrelated to your search.

This behavior is irritating to users, who get results for a different author than they intended. Plus, Fictograph users aren’t going to have much sympathy for my whining and blaming the underlying API for the problem. So I needed to find a programmatic way to compensate for this unhelpful API behavior.

I was stuck on this problem until I saw a presentation at PyGotham that touched on fuzzy string matching. This was a plausible solution, as fuzzy string matching can evaluate whether the name entered by the user is more or less the same as the name returned from the API. If they’re pretty much the same, great! If they’re not, it means the API is probably returning an unexpected result, so the Fictograph should probably return an “author not found” error.

The best part is that I didn’t have to write any string matching code myself; Python has libraries like fuzzywuzzy that will take care of fuzzy string matching for you.

Posted in api, goodreads | Comments closed

Keeping librarians up to date on electronic products

Teaching librarians usually want to stay atop the latest changes to their institution’s electronic products to be able to teach research skills effectively. As an instructor, it’s important to be comfortable using the latest features of the various services.

However, keeping up can be a challenge. Vendors regularly roll out updates, but these aren’t always communicated in a timely way to front-line librarians. Compounding this problem, most libraries have dozens of electronic products to keep track of. This is a hard problem to solve, in part because librarians often have unique, personal workflows, and most one-size-fits-all communication approaches will not work for everyone.

So, our challenge was to get the canonical, vendor-produced training materials into the hands of our librarians in a timely and convenient way. Our solution was to build a page that draws on vendors’ YouTube training videos. This relies on YouTube’s RSS feeds. RSS used to be popular with people who read blogs, but it isn’t usually user-facing anymore; nonetheless, it lives on as internet infrastructure. In our case, we drew from the RSS feeds of vendors’ YouTube channels to create an auto-updating documentation page. Thanks RSS! Librarians can now review the latest training videos at their own pace, here.

While I made this with WordPress, I think a similar solution may have also been possible in LibGuides. I might move this content to LibGuides in the future.

Posted in learning, rss, youtube | Comments closed

Build small

Software can sometimes be big and unwieldly. But it doesn’t have to be. Software can also be small, unimportant and ephemeral. Software can have small goals and limited use cases. It can be fun to build and deploy. There is a lot of value in building small applications for libraries. Here are some benefits:

  • Building a small application requires very little time commitment. Make something over the weekend!
  • Tools we build don’t have to be that complex. Do you really need a database to make that interesting project? Probably not!
  • It’s easy to iterate with small applications. Not happy with your code? Rewrite the whole thing if you like.
  • Small applications are a great way to show off library projects or resources. Highlight an amazing aspect of your library.
Posted in software | Comments closed

On developer conferences

Going to a developer conference can be pretty intimidating when you’re not a professional programmer. The imposter syndrome of being the non-developer at the table can be substantial. But I think it can be useful for librarians who write code to attend these events.

First of all, it is reassuring to see the issues that professional developers are dealing with. Their programming problems are not really that different from those faced by programming librarians. It turns out that they are grappling with human-scale problems like the rest of us.

Second, it is empowering to see what can be done with code. Going to a developer conference can inspire ideas for actually realizable projects that can benefit our libraries. Having a sense of the possibilities can motivate us to push forward interesting projects at our workplaces.

Finally, they’re usually pretty fun. Programming is a great way to build upon one’s interests, however idiosyncratic those may be. The developer conferences that I’ve been to reflect that, with lots of oddball presentations that are usually quite entertaining. It’s usually a good time.

This post was inspired by PyGotham, which wrapped up on Saturday.

Posted in conference, imposter syndrome | Comments closed

Burn it all down

This week I rewrote SeeCollections, a data visualization application that I had originally built in 2015. The rewrite was sorely needed, for a couple of reasons:

  • The original code was really bad. Which is to be expected; I was a beginner when I wrote it. The newer code is better. It’s clearer. It went from over 400 lines of code to under 200, while maintaining the same functionality. It will now be much easier to debug.
  • SeeCollections was originally based on the Primo X-Services API, which is now deprecated. For my application to keep working, I needed to move it to the newer Primo Search API. An added bonus was that the new API works with key-based authentication, which allows for more deployment options than with IP-based authentication.
  • I wanted to move this project off SDF a wonderful hobbyist community, but the infrastructure is sometimes hard to work with. SDF is a few thousand programmers sharing a handful of servers, which results in all kinds of strange and unexpected technical roadblocks. It’s fun to tinker with, but not to rely on. Key-based authentication allowed me to move this to PythonAnywhere, which uses dynamic IPs, and is more reliable than SDF.

I had been putting off the rewrite, because it seemed like a lot of work, but now I’m glad I did it. I don’t have to look at the previous code anymore, and the whole project is now a lot more reliable and maintainable.

Posted in api, visualization | Comments closed

What I learned from 52,080 tweets

A few weeks ago, over the course of 15 days, I gathered 52,080 tweets about learning to code. I did this with TCAT, the open source tool that our library uses for Twitter archiving. I gathered all tweets that matched any of three popular learn to code hashtags: #codenewbie, #learntocode and #100daysofcode.

What I learned is that libraries and librarians aren’t very involved in the learn to code conversation on Twitter. Searching the results for the word “library” turned up 138 mentions (or 0.265% of tweets). This sounds like a modest contribution, but in fact only 17 of these were about libraries as institutions (0.035% of tweets), while the rest were about software libraries. “Librarian” only turned up in two results (0.004% of tweets).

What does this mean? First of all, I feel that 52K tweets is a decent sample size. Second, libraries and librarians are not well represented in this conversation. While launching learn to code initiatives is often good publicity for libraries, I think this data suggests that our profession is not really following through.

Posted in libraries, tcat, twitter | Comments closed
Need help with the Commons? Visit our
help page
Send us a message
Skip to toolbar