Tuesday, 17 July 2018

Postgres 9 and AWS Aurora Postgres Table Partitioning

At the time of writing the highest version of Postgres available on AWS Aurora is 9.6. This means declarative partitioning in Postgres 10 (see here) is unavailable.  Partitioning on Postgres aurora must be done the manual way. Below is how partitioning can be implemented in Postgres Aurora or Postgres 9.6.

The guide is adapted from postgres docs here. This approach uses table inheritance and insert trigger function

1. Create Base table 

First create the table which will be partitioned. This table will hold no data, but only define the schema


CREATE TABLE measurement (
    city_id         int not null,
    logdate         date not null,
    peaktemp        int,
    unitsales       int
);

2. Create partition tables

Each partition table inherits from parent, we also specify checks, this are important to ensure data not matching partition does not get inserted in a partition table. Also it used by the query planner in combination with constraint_exclusion.
-- ================== Partition 1 ============================
CREATE TABLE measurement_y2006m02 (
    CHECK ( logdate >= DATE '2006-02-01' AND logdate < DATE '2006-03-01' )
) INHERITS (measurement);

-- create index for the partition table
CREATE INDEX measurement_y2006m02_logdate ON measurement_y2006m02 (logdate);

-- ================== Partition 2 ============================
CREATE TABLE measurement_y2006m03 (
  CHECK ( logdate >= DATE '2006-03-01' AND logdate < DATE '2006-04-01' )
) INHERITS (measurement);

CREATE INDEX measurement_y2006m03_logdate ON measurement_y2006m03 (logdate);


-- ================== Partition 3 ============================
CREATE TABLE measurement_y2006m04 (
  CHECK ( logdate >= DATE '2006-04-01' AND logdate < DATE '2006-05-01' )
) INHERITS (measurement);

CREATE INDEX measurement_y2006m04_logdate ON measurement_y2006m04 (logdate);

3. Insert routing 

Define the insert routing function on parent table. This redirects all inserts, done on the main table to appropriate partition table

CREATE OR REPLACE FUNCTION measurement_insert_trigger()
  RETURNS TRIGGER AS $$
  BEGIN
      -- Insert into partition based on date range, NEW = new record to be inserted
      IF ( NEW.logdate >= DATE '2006-02-01' AND NEW.logdate < DATE '2006-03-01' ) THEN
          INSERT INTO measurement_y2006m02 VALUES (NEW.*);

      ELSIF ( NEW.logdate >= DATE '2006-03-01' AND NEW.logdate < DATE '2006-04-01' ) THEN
          INSERT INTO measurement_y2006m03 VALUES (NEW.*);

      ELSIF ( NEW.logdate >= DATE '2008-04-01' AND NEW.logdate < DATE '2008-05-01' ) THEN
          INSERT INTO measurement_y2008m01 VALUES (NEW.*);

      ELSE
          RAISE EXCEPTION 'Date out of range.  Fix the measurement_insert_trigger() function!';
      END IF;

      RETURN NULL;
  END;
  $$
  LANGUAGE plpgsql;

  -- attach trigger to measurement table before insert
  CREATE TRIGGER measurement_insert_trigger
  BEFORE INSERT ON measurement
  FOR EACH ROW EXECUTE PROCEDURE measurement_insert_trigger();


Usage

Insert a row

INSERT INTO measurement (city_id, logdate, peaktemp, unitsales) VALUES(1, '2006-03-10',0,0)


Check row is inserted in the correct partition
Select * from measurement_y2006m03;

Making select queries partition aware

This tells the query planner, to use the CHECK constraints defined on the table, to only look into appropriate partition.

SET constraint_exclusion = on;

SELECT count(*) FROM measurement WHERE logdate >= DATE '2006-03-01';


Use Explain to verify query planner path

SET constraint_exclusion = on;

EXPLAIN SELECT count(*) FROM measurement WHERE logdate >= DATE '2006-03-01';

Removing a partition, completely including the data

 DROP TABLE measurement_y2006m02;


Note: Make sure to to update the trigger function, to remove the if statement redirecting inserts to this table

Removing partition from from partition set, but still have access to the data

ALTER TABLE measurement_y2006m02 NO INHERIT measurement;

Note: Make sure to to update the trigger function, to remove the if statement redirecting inserts to this table

Monday, 9 April 2018

Scala Try With Resource

Java 7 introduced try-with-resources, which is a handy feature. It automatically closes resources once done, such as InputStreams and Database connections. This seems to be missing from Scala. However this feature can be easily added to Scala through function currying.

Below is a definition of a tryWith function

def tryWith[R, T <: AutoCloseable](resource: T)(doWork: T => R): R = {
  try {
    doWork(resource)
  }
  finally {
    try {
      if (resource != null) {
        resource.close()
      }
    }
    catch {
      case e: Exception => throw e
    }
  }
}




There are 2 functions. The first function takes an AutoCloseable resource. The second function uses the resource. The try finally block ensures the resource is closed once the second function returns.

Below is how the function can be used:

val request = new HttpGet("http://www.google.com")

val statusCode = tryWith(client.execute(request)){response =>
  response.getStatusLine.getStatusCode
}

client.execute() returns an AutoCloseable Response, which will be automatically closed. After return the StatusCode.


Wednesday, 4 April 2018

Testing database querries with embedded Postgres

In many database driven applications it can be useful to automate the testing of SQL against a real database. However setting up and maintaining a database on a build server isn't ideal. Databases such as Postgres and MySQL provide embeddable variants, which can be used for testing purposes. A database instance can be created for testing and then removed.

The example below shows how this can be done with postgres within a spring boot application. This provides a clean state for testing each time. The database state would usually be set up using a db migration library such as flyway, however that is out of scope for this example, see here for more on flyway and spring boot.

Maven Dependency

<dependency>
   <groupId>com.opentable.components</groupId>
   <artifactId>otj-pg-embedded</artifactId>
   <version>0.11.4</version>
   <scope>test</scope>
 </dependency>
Open table provides a great library for embedding postgres. click here

Abstract Integration Test

@RunWith(SpringRunner.class)
@SpringBootTest(classes = Initialiser.class)
@TestPropertySource("classpath:application-test.properties")
public class AbstractIntegrationTest {

  private static EmbeddedPostgres embeddedPostgres;

  @BeforeClass
  public static void initialise() throws Exception{
    if(embeddedPostgres == null) {
      //Create an instance of embedded postgress
      embeddedPostgres = EmbeddedPostgres.builder()
                                         .setPort(5433).start();

      try (Connection conn = embeddedPostgres.getPostgresDatabase().getConnection()) {
        Statement statement = conn.createStatement();
        statement.execute("CREATE DATABASE EXAMPLEDB");
      }
    }
  }
}
All test interacting with the database will extend this class. This assures that a single postgres is created per jvm. On port 5433. We then create a database named EXAMPLEDB.


Application Properties

spring.datasource.driver-class-name=org.postgresql.Driver
spring.datasource.url=jdbc:postgresql://localhost:5433/EXAMPLEDB
spring.datasource.username=postgres
spring.datasource.password=password
Since we are using spring boot, we put connection string in application-test.properties

Junit DB Test

public class SomeDaoTest extends AbstractIntegrationTest {  

   @Autowired DataSource dataSource;
   
   @Test 
   public void exampleTest(){
     try(Connection conn = dataSource.getConnection()){
       Statement statement = conn.createStatement();
       statement.executeQuery("SELECT * FROM EXAMPLE_TABLE")
     }
     catch (Exception ex){
       throw new IllegalStateException(ex);
     }
   }

}

Now we can run tests against the db like above. The above test should fail since the table does not exist.

Thursday, 22 March 2018

MyBatis

Modern java applications generally use some sort of ORM to talk to the database. In an ORM each table in the database is mapped to a java class (Pojo's). Most querries are done through the ORM's api or an ORM specific dialect which is similar to SQL.

MyBatis provides an alternative approach to working with the database. Instead of mapping each table to a java class, MyBatis maps the results of querries to java classes (Pojos's). This allows queries to be written in native sql, allowing for complex querries and leveraging the database. Only results required by the application need to be mapped to java classes. Most mapping can be handled by automatically so mapping code is significanlty reduced.

Below is an example of how this is done in MyBatis. The comple code can be checkeout from here:

https://github.com/rizvn/my-batis-tutorial

This example using the Dao pattern. A Dao is a class used to interact with the database.

1. Create the DAO interface

package com.rizvn.app.dao;
import com.rizvn.app.domain.Customer;
import java.util.List;

/**
 * Created by Riz
 */
public interface CustomerDao
{

  Customer findByRef(String ref);

  List<Customer> selectTop3();
}

This has 2 moethod to get a customer by ref, and list top3 customers

2. Create the domain object

This will be a plain java object (POJO) with gettters and setters. This is a the result of a query. MyBatis will map a result row to an instance of the class where the resultset column names match property names of the class. There are also other mapping approaches supported by myBatis, such as returning a hashmap or writting mapper sections, but that is out of scope for this example. For more click here.

Contents of Customer.java

public class Customer
{
  String ref;
  String name;
  String addressLine1;
  String town;
  String county;
  String postcode;
  Boolean active;
// getters and setters removed for brevity
}


3. Write SQL

Write querries for each method in the dao. The file name should match the name of the dao. Since the dao is named CustomerDao.java, the xml file will be named CustomerDao.xml

The xml file must be placed under the same package structure as the Dao. Since this project uses maven, non java classpath files are placed under src/main/resources. So the paths would be as follow:

  dao class location:  src/main/java/com/rizvn/app/dao/CustomerDao.java
  query file location: src/main/resources/com/rizvn/app/dao/CustomerDao.xml


Below are  the contents of CustomerDao.xml

<?xml version="1.0" encoding="UTF-8" ?><!DOCTYPE mapper PUBLIC "-//mybatis.org//DTD Mapper 3.0//EN" "http://mybatis.org/dtd/mybatis-3-mapper.dtd"><mapper namespace="com.rizvn.app.dao.CustomerDao">

  <select id="findByRef" resultType="com.rizvn.app.domain.Customer">
    SELECT
      ref,
      name,
      addressLine1,
      town,
      county,
      postcode,
      active
    FROM Customer
    WHERE ref = #{ref}
    AND active = TRUE
  </select>

  <select id="selectTop3" resultType="com.rizvn.app.domain.Customer">
    SELECT
      ref,
      name,
      addressLine1,
      town,
      county,
      postcode,
      active
    FROM Customer
    WHERE active = TRUE
    LIMIT 3
  </select>

</mapper>

Each method in the interface is mapped to a query by id. The result type is also defined.
The findRef method in the Customer Dao has the following signature.

Customer findByRef(String ref);


The argument is accessible in the query section  as  #{ref}

Test

The CustomerDaoTest, under com.rizvn.app.dao.CustomerDaoTest shows usage examples of using mybatis with this set up.
@Test
public void getCustomerTest(){
  try(SqlSession session = conf.getSqlSessionFactory().openSession())
  {
    CustomerDao customerDao = session.getMapper(CustomerDao.class);      // get Dao class
    Customer customer = customerDao.findByRef("001");                    // call mapped query passing arguments
    System.out.println(customer);
    Assert.assertNotNull(customer);
  }
}

conf is the config classm which setups My Batis see here: https://github.com/rizvn/my-batis-tutorial/blob/master/src/main/java/com/rizvn/app/Config.java




Friday, 23 February 2018

Scala + HTML

Scala's support for inline xml can be used to render HTML, making template languages traditionally used witin web application's redundant. This approach is used by the Apache Spark Web UI and is similar to the approach taken by modern web frameworks such as React.

This also means that the view's would be type safe, i.e  if person.firstname was removed, the view would fail to compile. So errors can be cought sooner.

The code below demonstrates the approach:
    //define a class
    case class Person(firstName: String, lastName: String)

    //create a list of people, this will be the model
    val people = List(
                    Person("Riz", "Fujas"),
                    Person("John", "Smith"),
                    Person("Fred", "Clause")
                )

    val view  =
      <div class="container">
        <table>
          <tr>
            <th>First Name</th>
            <th>Second Name</th>
          </tr>

          {people.map{ person =>
            <tr>
              <td>{person.firstName}</td> 
              <td>{person.lastName}</td>
            </tr>
          }}
        </table>
      </div>


    println(view)


The println(view) will produce the following output:

<div class="container">
  <table>
    <tr>
       <th>First Name</th>
        <th>Second Name</th>
    </tr>
<tr> <td>Riz</td> <td>Fujas</td> </tr> <tr> <td>John</td> <td>Smith</td> </tr> <tr> <td>Fred</td> <td>Clause</td> </tr> </table> </div>






Thursday, 18 January 2018

Datalake vs Data Warehouse

This post outlines the usages of datalakes and data warehouses. 



Datalakes are generally built on databases which can support storage of unstructured or semi-structured data. There isn’t a strict schema such as like datawarehouse built on top of RDBMS’s.
This allows adding additional fields to tables(documents) when new fields become available. For example:  
We may have 1 million customers already stored. Now we need to import another million with a new field for each customer such as credit rating. This can be done in a datalake without having to change the previous customers and adding null credit ratings to historical rows.

The Datalake acts as a buffer between live application databases and analytics. Data from live databases is periodically exported and placed into the datalake. Analytics teams would then only work with the data held in the datalake.

Datalake is an intermediate storage for analytics, original data imported to a data lake is generally not readily analysable. The data would go through series of transformations to transform it to a form that can be analysed. This may be in the forms of running ETL jobs or Stream transformations to convert from the semi structured data into warehouse schema. The warehouse schema may be held in a RDBMS allowing it to be queried using SQL by business professionals. Transformations from datalake into structured data would require scripting and not just querying. This is achieved languages such as python, scala hosted through platforms such as spark or hadoop. They take the semi structured data and transform them into a schema. Once the data is transformed it can be queried for reporting purposes.

Below is a typical data pipeline to describe where a datalake would sit:





Below is an article outlining the differences:


Tuesday, 16 January 2018

Spring boot +Swagger + Pretty Swag

This post describes how to build a a Rest Api with Spring  Boot and Swagger. The tutorial adds additional rest api functionality to an existing spring boot application.

1. Add the following maven dependencies

<dependency>
  <groupId>io.springfox</groupId>
  <artifactId>springfox-swagger2</artifactId>
  <version>2.6.1</version>
</dependency>

<!-- optional to produce swagger ui -->
<dependency>
  <groupId>io.springfox</groupId>
  <artifactId>springfox-swagger-ui</artifactId>
  <version>2.6.1</version>
</dependency>

2. Create swagger configuration class

@Configuration@EnableSwagger2
public class SwaggerConf {
  @Bean public Docket api() {
    ApiInfo apiInfo = new ApiInfo("My Store","Api for the store backend", "1.0", "", new Contact("Riz", "http://rizvn.com", "me@example.com"),"", "");

    return new Docket(DocumentationType.SWAGGER_2)
    .select()
    .apis(RequestHandlerSelectors.any())
    .paths(Predicates.or(
      PathSelectors.regex("/store.*"),    //expose paths starting with /store
      PathSelectors.regex("/health.*")  //expose paths staring with  /health
    )) 
    .build()
    .apiInfo(apiInfo);
  }
}

3. Create a rest controller

@RestController(value ="Rest Api" )
@io.swagger.annotations.Api(value = "/store", description = "Store Rest Api Operations")
@RequestMapping("/store")
public class RestApi
{

  @Resource ProductService productService;
  @Resource OrderService orderService;

  @ApiOperation(value = "Find a product by id")
  @RequestMapping(path = "/product/{id}", method = RequestMethod.GET)
  public Product getProduct(@PathVariable("id") String aId)
  {
    return productService.read(aId);
  }


  @ApiOperation(value = "Find a order by id")
  @RequestMapping(path = "/order/{id}", method = RequestMethod.GET)
  public Order getOrder(@PathVariable("id") String aId)
  {
    return orderService.read(aId); 
  }
}

Both methods return java objects. Spring Boot will automatically handle the conversion to json through jackson. 

On startup the api json schema will be available at under v2/api-docs. For example: 
(Port number will vary per app)

An interactive Swagger UI will be available swagger-ui.html. Swagger UI is an interactive single page application allowing to interactively test your api. 
(Port number will vary per app)


Offline Documentation

Its great to have online documentation of a rest endpoint. However during development we may need to share our api with other developers, so they may build applications against the api, before the application is built. The api is a contract, stating which functions the application supports. 
Pretty Swag is nice utility, to produce an offline version the rest api documentation. see: https://www.npmjs.com/package/pretty-swag

To provide an offline version of the application api. First start the application. 

1. Install Node.js on your machine if not already installed. 


2. Install pretty-swag

npm install pretty-swag -g
-g installs globally

3. Generate offline documentation

pretty-swag -i http://localhost:7070/v2/api-docs -f offline
This will produce a doc.html file, and doc_files directory which can be distributed independently.  http://localhost:7070/v2/api-docs points to the running instance of our development application.