Getting to Go: Garbage collection and runtime issues
- Hudson (Rick) is best known for his work in memory management including the invention of the Train, Sapphire, and Mississippi Delta algorithms as well as GC stack maps which enabled garbage collection in statically typed languages such as Modula-3, Java, C#, and Go. Rick is currently a member of Google's Go team where he is working on Go's garbage collection and runtime issues.
- The next thing that is important is the fact that Go is a value-oriented language in the tradition of C-like systems languages rather than reference-oriented language in the tradition of most managed runtime languages.
- Low fragmentation: Experience with C, besides Google's TCMalloc and Hoard, I was intimately involved with Intel's Scalable Malloc and that work gave us confidence that fragmentation was not going to be a problem with non-moving allocators.
Log and Exponential for Programmers
- If an algorithm is said to have a runtime of O(log n) and it takes a second to get the result from 10 data points, how long does it take approximately to get a result from 1,000,000 data points?
- If another algorithm is said to have a runtime of O(2n) (exponential time), and it takes a second to calculate a result from 10 data points.
- Fun stuff: Try calculating how long this will take for 100 data points.
The 5 Clustering Algorithms Data Scientists Need to Know
- K-Means has the advantage that it’s pretty fast, as all we’re really doing is computing the distances between points and group centers; very few computations!
- Mean shift clustering is a sliding-window-based algorithm that attempts to find dense areas of data points.
- DBSCAN is a density based clustered algorithm similar to mean-shift, but with a couple of notable advantages.
- With GMMs we assume that the data points are Gaussian distributed; this is a less restrictive assumption than saying they are circular by using the mean.
- Then we can proceed on to the process of Expectation–Maximization clustering using GMMs. There are really 2 key advantages to using GMMs. Firstly GMMs are a lot more flexible in terms of cluster covariance than K-Means; due to the standard deviation parameter, the clusters can take on any ellipse shape, rather than being restricted to circles.
Differential Synchronization for collaborative editing
- If the user made any changes to the document during the time this synchronization was in flight, the client is forced to throw the newly received version away and try again later.
- This time the Common Shadow is the same as Client Text was in the previous half of the synchronization, so the resulting diff will return modifications made to Server Text, not the result of the patch in step 5.
- The method described above is the simplest form of differential synchronization, but it will not work on client-server systems since the Common Shadow is, well, common.
- After the connection times out, the client takes another diff, updates the 'n' again, and sends both sets of edits to the server.