Never thought I'd blog, but here we go..

It’s probably the engineer in me, but I’d much rather be programming that writing a dumb blog. I generally don’t care for them, and never had any interest in writing one. They’re mostly full of noise and fluff, or rehashed ideas but… these days I’ve been working a lot in open source, and I’ve seen some great posts. I guess sometimes a blog is the best way to spread useful information or build up some interest around a good idea. Hopefully, this will accomplish that and give back to some of the great online communities I have been working with - or maybe all of it is just noise and fluff, I don’t really know.

Currently, my work in open source has been focused on Apache Spark and lately Apache Arrow. In Spark, I have contributed to several different areas but find myself mostly working on Python, MLlib, and Core/SQL. I started working with Arrow to help with some of the deficiencies in PySpark that end up causing poor performance, and ultimately a bad experience for Python users. Because of that, I’ve mostly helped out with the Java interface and integration between Python. Both projects have incredible communities with many extremely capable people who are also very welcoming to new contributors. Please feel free to ping me regarding either project if I can be of some help.

I will be following this up with some posts related to my work in Spark, Arrow and other open source projects. The general theme is usually about the same - to make big data analytics/ML faster, more reliable, and simple to use. This is not always an easy task and sometimes takes a good amount to patience to get a change worked out just right, or to convince people of another view. I have seen tremendous progress in both of these projects while I’ve been involved, but there is still plenty more to do that will keep me programming more than blogging, thankfully!

Written on August 3, 2017