博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Cluster by
阅读量:7052 次
发布时间:2019-06-28

本文共 3465 字,大约阅读时间需要 11 分钟。

hot3.png

I recently gave a "mental challenge" in form of writing a clusterBy function that take a list and a function that calculates some result on every element of the list; the result of the clusterBy function should be a list of clusters. A cluster is a list of elements from the original list for which the clustering function returns the same value.

As an example, let's cluster a list of strings using the length of the string. In other words, ("one", "two", "three", "eleven", "twelve") should cluster to (List(three), List(twelve, eleven), List(two, one)).
We create three clusters: one for strings of length 3, one for strings of length 5 and one for strings of length 6. We then add the values to the appropriate cluster.
I had a go at implementing the code in Scala, and here's my first attempt. For historical purposes, and to show different approaches, I will keep returning to this post as I discover better implementations.

def clusterBy[A, B](l: Iterable[A])(f: A => B) =  l.map(e => (f(e), e))./:(new HashMap[B, List[A]])(    (res, cur) => res + ((cur._1, cur._2 :: res.getOrElse(cur._1, Nil)))  ).values

Calling clusterBy(List("one", "two", "three", "eleven", "twelve"))(_.length) returns MapLike(List(three), List(twelve, eleven), List(two, one)). Success! So, what did I do? Let's explore each step of the way.

First, the concepts. We need to go through all elements of the given container and calculate the value of f for the elements. This gives us element -> f(element) container. We then need to group the pairs into a new structure with elements that contain f(element) -> (element1, element2, …, elementn). The result of the entire operation are the values of the grouped by container.
cluster-by-1
Onwards to the code, then.
We start with def clusterBy[A, B](l: Iterable[A])(f: A => B), which defines a function called clusterBy with two generic types A and B--that's the def clusterBy[A, B] bit. The function is curried--it takes two parameter lists. The first parameter list includes just one element, l: Iterable[A]. This is the source data, a container that holds source values of type A. The second parameter list includes just one element, f: A => B, which is a function that takes the element type (A) and maps it to some other type B.
The next line starts with l.map(e => (f(e), e)) which maps the elements of the parameter l into some other container that contains tuples made up of the mapped element and the original element. So, if l were List("x", "xy")l.map(e => (f(e), e)) would return List((1, "x"), (2, "xy")).
Next, we fold left the returned list using empty map (that's the new HashMap[B, List[A]]--our grouped by container). The fold operation (working on the tuples) takes the first element of the tuple as the key in the group by map and the second element of the tuple is the value that should be added to the value associated with the key in the map. The line

(res, cur) => res + ((cur._1, cur._2 :: res.getOrElse(cur._1, Nil)))

receives the res running result of the fold left operation (the map) and cur, which is every (f(element) -> element) pair. The operation returns a map that is constructed by putting (in the old Java speak) a pair where the key is cur._1 and the value is either a new list with just cur._2 in it, or a list constructed by adding cur._2 to the existing list.

Finally, the result of the entire operation are the values of the constructed group by container.

转载于:https://my.oschina.net/u/2935389/blog/1475433

你可能感兴趣的文章
android graphics: 2D animation
查看>>
升级 python 2.6.6 系统到 2.7.10 版本
查看>>
start with connect by prior 递归查询用法
查看>>
OS X 10.11 安装Cocoapods
查看>>
基于 HTML5 的工业互联网 3D 可视化应用
查看>>
洛谷——P1160 队列安排(链表的基础操作)
查看>>
MATLAB测试机器零阈值的大小
查看>>
单元格内文本显示超过单元格宽度的解决办法
查看>>
List遍历以及剔除指定数据
查看>>
[UIKit学习]06.懒加载,模型,自定义代码段,重写构造方法
查看>>
mv 批量
查看>>
require.js的AMD规范详解
查看>>
再转一篇gtest1.6安装
查看>>
sql Truncate 与 delete的区别
查看>>
linux case ${variable} in
查看>>
洛谷3801:红色的幻想乡——题解
查看>>
Hosts文件
查看>>
算法之旅
查看>>
wget的使用方法及一些举例
查看>>
UIEdgeInsetsMake(CGFloat top, CGFloat left, CGFloat bottom, CGFloat right)
查看>>