HyperLogLog++: Google’s Take On Engineering HLL

Originally posted on Research:

Matt Abrams recently pointed me to Google’s excellent paper “HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm” [UPDATE: changed the link to the paper version without typos] and I thought I’d share my take on it and explain a few points that I had trouble getting through the first time. The paper offers a few interesting improvements that are worth noting:

  1. Move to a 64-bit hash function
  2. A new small-cardinality estimation regime
  3. A sparse representation

I’ll take a look at these one at a time and share our experience with similar optimizations we’ve developed for a streaming (low latency, high throughput) environment.

32-bit vs. 64-bit hash function

I’ll motivate the move to a 64-bit hash function in the context of the original paper a bit more since the Google paper doesn’t really cover it except to note that they wanted to count billions…

View original 2,264 more words

Breaking the silence: Politics, Dictorships and ignorance VS the Web

I try to avoid talking about politics. I think most politicians represent the worst of the general populous.

Throughout history history the human race has proven time and time again that we’re our own worse enemies. We fear the things we don’t know or understand, persecute each other and those that dare question or step outside the norm. We are a destructive group of beings.

Perhaps our greatest flaw is our estranged disillusion of the illusion of control. That didn’t make sense to me the first time I said it in my head but I’m sure it’s what I mean. We’re always trying to control things, people and even time.
We like to experiment, and for very brief moments in each era the experiment starts to look to be successful. We appear to be in control (this is where the illusion of control sits). But every so often, every now and again, a little piece of our frankenstein contraption falls off. That is when we realize we’re not really in control. And the disillusion begins and at the strike of reality we become estranged to control. Read more of this post

Android debugging over bluetooth (without root)

So I’m on a train to Reading and wanted to continue working on an app I’ve been hacking at but my USB cable is annoyingly inconvenient and Ubuntu keeps throwing a fit because I haven’t been able to configure it to properly handle MTP enabled devices.   To get debugging going here’s what I’ve just done. Read more of this post

My PhD’s a go! – Graph database with the distributed nature of Cassandra and the Graph properties of Neo4J

I’ve been thinking about this for a while now and I’ve made a solid decision finally. At some point later this year I’ll be starting a PhD (Probably October).

I’ve been using Apache Cassandra for years now, since 2008 not long after Facebook open sourced it. Since then I’ve played with most of the major NoSQL databases and frameworks (Neo4J, HBase, CouchDB, Hadoop, etc) and in virtually all the projects I’ve found the need to repeatedly be modelling graph or graph-like data. In some cases it’s worked out great, in others it was a terrible idea but luckily I’ve always recognised very early on when the data model is just terrible for that DB so haven’t wasted time on it. Read more of this post

Institutionally racist – London MET, UK Politicians, you’re pissing off someone in tech, a programmer, hacker and I’m gonna piss you back off!

WARNING: I am going to rant. I am going to swear. This is very off topic to my usual posts, if you’re going to be offended by me swearing or slagging off the London MET or UK Government, STOP reading now because I will not apologise for it and chances are if you nag me about being explicit I’ll tell you to bugger off. And that’s me putting it politely…

Read more of this post

Finding that blogging balance!

I work hard. Least that’s what I get told :P – I’ve found it hard to find the time to blog as I much as I’d like to but after a recent conversation I had an epiphany. While I’m working on Fillta I’ve decided to do some contract jobs, “student internships” and whatever else I can to survive while I do it. After a recent interview I got a really nice compliment from the guy and it got me thinking. He basically said I’m way more knowledgeable than he expected, my age and being a recent graduate (Technically not true, graduation is in July) completely hides my experience and knowledge from an outset. He said he’d checked out my blog from my CV and it didn’t reflect it either…

So starting next week (finishing up 2 projects now) I’ll be writing a blog post at least 3 per week with the intention that as I get back into it I’ll increase the amount. Read more of this post

Configuring Ubuntu for VirtualBox to detect USB devices

I’ve been developing Android Apps are more often recently but I’ve found the Linux support for some android tools less than appealing.
In particular, I’ve been using MOSYNC and they don’t provide a Linux installer, you have to compile it from source. That’s all well and good until something goes wrong then you spend forever trying to re-compile and fix build issues, time I’d rather spend doing something else.

Since I refuse to go back to windows as a main OS after it annoyed the crap outta me recently, I settled for the next best thing, virtualisation! I’m using VirtualBox which is honestly one of the most awesome opensource software I know of! With Windows installed in VirtualBox, I was set…or so I thought. USB devices weren’t being picked up by VirtualBox. A quick Google revealed that this is a common issue and well documented on page 52 of the VirtualBox user manual (People read those?). If you’d like to read it, it’s all well documented in the user manual under VirtualBox USB support.

So to the point, the problem is that when VirtualBox is installed your user account probably wasn’t added to the group it created. You can check and make sure you’re facing the same issue by running the command:
Read more of this post

AngularJS: If you don’t have a dot, you’re doing it wrong!

This has bitten me twice in the last 3 days so I’m doing a quick post to remind myself.

With AngularJS models, you typically have two way bindings between UI elements and your controller’s properties. Directly from the docs (Plunker)

<!doctype html>
<html ng-app>
 <script src="http://ajax.googleapis.com/ajax/libs/angularjs/1.0.6/angular.min.js"></script>
 <script src="script.js"></script>


<form name="myForm" ng-controller="Ctrl">
 Single word: <input type="text" name="input" ng-model="text"
 ng-pattern="word" required>
 <span class="error" ng-show="myForm.input.$error.required">
 <span class="error" ng-show="myForm.input.$error.pattern">
 Single word only!</span>

<tt>text = {{text}}</tt><br/>
 <tt>myForm.input.$valid = {{myForm.input.$valid}}</tt><br/>
 <tt>myForm.input.$error = {{myForm.input.$error}}</tt><br/>
 <tt>myForm.$valid = {{myForm.$valid}}</tt><br/>
 <tt>myForm.$error.required = {{!!myForm.$error.required}}</tt><br/>

Read more of this post

AngularJS: Building completely asynchronous and autonomous single page user interfaces (Part 2)

Originally posted on Fillta:

So in the previous post we discussed using an EventBus to power a client side UI. In this post we’ll show you how to implement it with AngularJS.

Last time I got excited and got a bit carried away. So this time let’s cut to the chase. There’s a simplistic demo of what was discussed in a JSFiddle that’s been prepared for this post, it’s available here, got click around!.



The CSS:


The JavaScript:

angular.module('fillta.EventBus', [])
.factory('Events', function () {
//keeps events that couldn't be sent because we weren't connected when they happened
var buffer = [],
host = window.location.host + "/data",
connected = false

View original 473 more words

AngularJS: Building completely asynchronous and autonomous single page user interfaces (Part 1)

Originally posted on Fillta:

NOTE: This is a technical post.


We all know how important it is to have re-usable components in our application right? This is especially true in a start-up environment (as we’ve learnt). In a start-up environment, things change so often we cannot afford to have our entire stack tightly coupled. We’ve found that we’ve written things in many ways, shape and form. Some we’ve kept but actually, we’ve thrown out quite a bit.

We’ve been very accepting of the fact that if it’s not right, it’s just not right. And on occasion we’ve thrown away a bulk of work and went back to the drawing board. If that sounds like a scary thing or a waste of time, trust us,it isn’t. We’ve designed our architecture to be an ever evolving prototype. We can build new parts, plug them in, replace existing parts and so on with ease! Best of…

View original 473 more words


Get every new post delivered to your Inbox.

Join 1,385 other followers

%d bloggers like this: