Category Uncategorized

A Prayer for Distributed Systems Developers

Over 16 years, I’ve written software up-and-down the entire stack. Earliest in my career I wrote boot ROM software for specialized embedded devices. This kind of programming taught me so much about how computers really work.

InfluxDB and Grafana HOWTO

This blog describes working with InfluxDB 0.8. InfluxDB 0.8 is no longer supported, and has been superseded by the 1.0 release. I recently came across InfluxDB — it’s a time-series database built on LevelDB. It’s designed to support horizontal as…

What I wish I’d been told about the JVM

Java is the predominant language of Big Data technologies. HBase, Lucene, elasticsearch, Cassandra – all are written in Java and, of course, run inside a Java Virtual Machine (JVM). There are some other important Big Data technologies, while not written in…

Always thinking of the next guy

My father worked for many years in Quality Assurance at Beckman, an American medical instruments firm. His job was to ensure that newly-manufactured centrifuge rotors would hold up when spun at thousands of RPMs. He used to tell me that…

Welcome to your data

After 2 years at Loggly, tomorrow I start a new role at Jut. While I will miss the team at Loggly very much, and the wonderful product we built during my team there, I’m looking forward very much to working…

Distributed Systems for Fun and Profit

I came across a very readable paper on distributed systems — Distributed systems for fun and profit. I recommend it for anyone interested in learning more about distributed systems, and the challenges involved with designing, building, and operating distributed systems.

Book Review: Mastering ElasticSearch

Packt recently asked me to review their new publication Mastering ElasticSearch by Rafał Kuć and Marek Rogoziński. Since most of my experience with elasticsearch has been from a systems points of view — index management, cluster maintenance, indexing performance — I…

Speaking at AWS re:Invent 2013

This past week I had the opportunity to speak, with my colleague Jim Nisbet, at AWS re:Invent 2013. Titled “Unmeltable Infrastructure at Scale: Using Apache Kafka, Twitter Storm, and Elastic Search on AWS“, Jim and I described the architecture of…

Avoiding elasticsearch split-brain

Loggly recently held an elasticsearch meetup, which was a great success. One question that was repeatedly asked was how to ensure elasticsearch does not suffer a partition — known as a split-brain. This can be a particular problem in AWS…

Loggly Generation 2 Released!

After 14 months of hard work, the next generation of Loggly has been released. It’s been a great time to be part of the Software Infrastructure team at Loggly and we have put together a superb log aggregation & real-time…

Technical Leadership through Testing

As technical lead at Loggly, responsibility for a well-engineered infrastructure ends with me. And one way to ensure the system is designed and implemented well is to stay as close as possible to the code, ensuring that the team and…

Using the Source

I have written another post for the Loggly blog — all about our guidelines for choosing and integrating open-source software and technology in your next project. Check it out here.

If you love your logs, set them free

I recently wrote my first post for the Loggly blog. It illustrates why host machines are often the worst place to store the logs those machines are generating. You can check it out here.

Monitoring Storm Kafka Spouts using Python

When running a large real-time processing system, monitoring is critical. But it does more than allow you to keep an eye on your system. During development it allows you test hypotheses about how it works, how it performs when certain…

Boost ASIO timers — errors are never enough

The Boost ASIO Library is a wonderful piece of software. I’ve built high-performance event-driven IO C++ programs that just scream — it works very well. However, there is one subtlety when it comes to timers — specifically when it comes…

Bootstrapping Cassandra

Cassandra is an open-source, distributed database, informally known as a NoSQL database. It is designed to store large amounts of data, offer high-write performance, and provide fault-tolerance. I recently needed some hands-on experience with Cassandra, and being relatively new to…

Generating Type-1 UUIDs using C++

I needed some C++ code to generate Type-1 time-based UUIDs. The Boost libraries, while offering support for other types, don’t have support for time-based UUIDs. A cut of my code can be found in github.

mutt and Google Mail

I finally moved to mutt for my Loggly e-mail (which runs on Google Mail). After moving from e-mail client to e-mail client, I was keen to give it a try — the minimalist design and speed really appealed. It took…

New Challenges

After almost 5 years at Riverbed Technology, it’s time for new challenges. I’ve started a new development position at Loggly in San Francisco, helping to build their Cloud-based Logging-as-a-Service platform. I spent significant time at building systems that needed comprehensive…

Eating your own dogfood

Dogfood testing is an effective way to increase testing, and get valuable feedback, on one’s products. It can be especially effective in the earlier stages of a product’s development, when the user base can be small. Having a forgiving —…

Kubuntu 11.04 on the Chembook 2370VA

My experience with Fedora 15 was not as I had hoped. The ATI graphics driver was particularly problematic (regular minute-long hangs due to spinlock issues) so I decided to try a completely different distribution. I decided to go with Kubuntu…