Scala DSL for Specifying Filter Conditions

When loading data from MapR Database as an Apache Spark RDD, you can use Scala DSL to specify filter conditions. This section shows examples of these filter conditions.

In the following examples, a class named field is introduced to represent a field in a condition. The field condition takes an argument as a String. The following table shows conditions written using Scala DSL:

Condition Example
equality
val idOnlyPredicate = field("_id") === "k2"
greatherThan
val simplePredicateWithComparisonOperator = field("a.c.d") > 10
notexists
val simpleNotExistsPredicate = field("a.c.e") notexists
IN
val inPredicate = field("a.c.d") in Seq(ODate.parse("2011-05-21"), ODate.parse("2013-02-22"))
typeof
val simpleTypeOfPredicate = field("a.c.d") typeof "INT"
complex condition with and
val inPredicateWithMapAndArray = (field("a.c.d") in Seq(5,10)) and 
     (field("a.c.e") notin Seq("aaa","bbb"))
another complex condition
val compositePredicateWithAndOnly = ((field("a.b") notexists ) and
                                            	  (field("p.q") typeof "DATE")) and
                                            	  (field("a.c.d") > 20L)
between
val predicateWithBetweenOp = field("a.c.d") between 
				(ODate.parse("2015-01-15"), ODate.parse("2015-05-15"))
predicate with equality check on Sequence of elements (representing array)

val eqPredicateWithList = field("a.b") === Seq(12345L, "xyz")
predicate with equality check on a map
val eqWithMapPredicate = field("a") === Map("k" -> "kite",
                  	    			    "m" -> "map")

The MapR Database OJAI Connector for Apache Spark supports these predicates:

  • >
  • >=
  • <
  • <=
  • ===
  • !=
  • between
  • exists
  • notin
  • in
  • notexists
  • typeof
  • nottypeof
  • like
  • notlike
  • matches
  • notmatches
  • sizeOf

Here are examples for these operators:

  • field("a") > 10
  • field("a") >= 10
  • field("a") < 10
  • field("a") <= 10
  • field("a") === 10
  • field("a") === Seq("aa", 10)
  • field("a") === Map("aa" -> 10)
  • field("a") != 10
  • field("a") != Seq("aa", 10)
  • field("a") != Map("aa" -> 10)
  • field("a) between (10,20)
  • field("a") exists
  • field("a") notin Seq(10,20)
  • field("a") in Seq(10, 20)
  • field("a") notexists
  • field("a") typeof "INT"
  • field("a") nottypeof "INT"
  • field("a") like "%s"
  • field("a") notlike "%s"
  • field("a") matches "*s"
  • field("a") notmatches "*s"

For typeof, these are the right-hand side values:

  • "INT"
  • "INTEGER"
  • "LONG"
  • "BOOLEAN"
  • "STRING"
  • "SHORT"
  • "BYTE"
  • "NULL"
  • "FLOAT"
  • "DOUBLE"
  • "DECIMAL"
  • "DATE"
  • "TIME"
  • "TIMESTAMP"
  • "INTERVAL"
  • "BINARY"
  • "MAP"
  • "ARRAY"

The sizeOf operator can have the following operations:

  • sizeOf(field("a")) === 10
  • sizeOf(field("a")) < 10
  • sizeOf(field("a")) > 10
  • sizeOf(field("a")) >= 10
  • sizeOf(field("a")) <= 10
  • sizeOf(field("a")) != 10