Monday, May 8, 2023

Partition with Boost.Geometry

Partition with Boost.Geometry

 

Speeding up with divide and conquer

Boost.Geometry contains many algorithms in its official interface. Most of them follow the OGC interface specification, such as

  • distance
  • area
  • intersection
  • within

But there are also some other algorithms, which are mainly, or exclusively, used internally. One of them is partition and this blog post is about spatial partitioning. 

Suppose you have 1000 points and 1000 polygons, and you want to calculate how many points are within a polygon.

With Boost.Geometry that is easy: iterate over all points, for each of them iterate over all polygons, call the within algorithm, and if it returns true then increase the count. If polygons overlap, then one point might be in several polygons, but we ignore this for now (it’s trivial to fix).

It is easy but it can perform badly. Because it is a so-called “quadratic loop”. If there are, for example, just 10 elements, it is all fine. But if there are one million points and one million polygons, the within operation will be called one trillion times. Which is way too much.

The Boost.Geometry implementation, internally, often needs to compare such sets. For example, the intersection operation of two polygons starts with intersecting all segments of both polygons with each other. Walking over all segments of big polygons would be too slow.

To speed up this process, partition was implemented. It is similar to a kd-tree or to a quad-tree, but not identical. There is no data structure built, or kept. The space is recursively divided if necessary.

In more detail, it works as following:

  • the bounding box of the two sets are calculated
  • this box is divided into two halves, left and right
  • it takes the left half and tries to execute the operation there
But it is often, of course, still too much. In the figure below, the left half could be handled, maybe, but the right half contains for sure too many points.

It then continues:

  • the left box is divided into two halves, top and bottom
  • it takes the bottom half and tries to execute the operation there

This process goes on and on, dividing each box into two halves, alternating horizontally and vertically. Until there are a small number (16 by default) of points or polygons in the box. It then calls the specified operation, in this case within.

After the left half is completely handled, we get:

And after handling the right side, the very end of the process, we can see:

Some boxes are smaller than other boxes, because there are more points there (on purpose, the points are not distributed equally for this example), and/or more polygons.

The "operation" can be specified at compile time, like the compare algorithm in sorting. So in this case we can to calculate within for point-in-polygon, but in other cases we want to calculate intersections.

So, in summary, partition divides the space into halves, alternating left/right and top/bottom until there are just a few geometry objects, and then these geometries are fed into the specified operation.

Of course: if a box is divided into two halves, there can be polygons on the dividing line. They are in both the left and the right half. But this is recognized, and these polygons are handled separately, calling partition again but then for their bounding box.

If we compare calling the within operation with partition and with the trivial outer/inner loop implementation, we can see that it helps a lot ((all measurements on my Linux laptop, in release mode).
  • 2000 points and 2000 polygons:
    • partition takes about 5 ms 
    • loops version takes about 115 ms
  • 20000 points and 20000 polygons:
    • partition takes about 205 ms 
    • loops version takes about 11150 ms

The partition algorithm, for this example, is called as following. I'm showing the experimental version here, using lambda's (Boost.Geometry is written before lambda's existed - the "normal" function is still using a policy with an 'apply' method).

std::size_t count = 0;
boost::geometry::experimental::partition<box_type>(points, rings,
[](auto& box, auto const& point) { bg::expand(box, point); },
[](auto const& box, auto const& point) { return ! bg::disjoint(point, box); },
[](auto& box, auto const& item) { bg::expand(box, item.box); },
overlaps_ring,
[&count](auto const& p, auto const& ring_item)
{
if (bg::within(p, ring_item.ring))
{
count++;
}
return true;
});

The last lambda is the main operation and is only called for a points and polygons which are in the same partitioned box. bg is a shorthand for Boost.Geometry.
 
There are four other lambda's specified. That is because partition needs to calculate bounding boxes from its input (of which it is unaware, it can be anything, not even geometries but structures containing geometries, for example). Therefore, for the two input sets (points and rings) we need two additional lambda's (expanding a box with a point or ring, and returning true if a box intersects with a point or ring).

Friday, December 30, 2022

Composed Blues


Composed Blues


Exactly 10 years ago I wrote this blog about sets of preludes in major and minor keys. One of these sets is written by Kapustin, which I had recently "discovered" at that time.

I remember that I, back then, googled for music similar to Gershwin and found Kapustin. He was a great composer, having produced not only two sets of these preludes, but also 20 jazzy piano sonatas! In the meantime, unfortunately, he passed away in 2020.

Apart from Gershwin and Kapustin, there are many more composers who compose(d) blues or jazz for piano. Composed blues. Often jazz and blues are improvised. I remember having read an article in which Kapustin said that he preferred composing over improvizing, because it becomes "better". He could improvise very well, but still he preferred composing the pieces, that way getting perfect structure and balance.

After having found Kapustin I also found Erwin Schulhoff and Alexandre Tansman. They did not write exclusively jazz or blues, but there are quite some pieces inspired by jazz, or really jazzy or bluesy. Also we have Copland and Barber writing pieces in that style. Even Ravel wrote a cute blues for piano and violin in his violin sonata.

Recently I created a playlist where I collected 100 composed blues or jazzy piano pieces. Just jazz would have been easy (Kapustin alone wrote already hundreds of them), but I aimed to hear the more bluesy pieces, or else quite smooth and slow jazzy pieces. It is often a gray area of course. I tried to avoid foxtrot, ragtime, golliwog, cakewalk (several great composers created rag, such as Debussy, Stravinsky, Satie).

The resulting playlist contains awesome blueses and georgeous slow jazz pieces from:



The playlist on Spotify, enjoy!



Friday, December 31, 2021

Easy books to learn Russian - 1: Children’s Literature

Introduction

I am a Russian language learner. This year, which ends today, I have started reading Russian literature. Russian literature is famous, phenomenal and very interesting, but much of it is way too difficult for beginners.

Searching for easy literature I encountered that many sites list the classics, Chekhov, Tolstoy, Pushkin, and even though these authors did write some easier works, or works for children - it's still old, it contains many words not used anymore, and often still hard...

What I liked was this advice here on Quora: "However, for a simple, yet good language I would advice you to read children literature of Soviet era (don't worry there is no brainwashing propaganda to turn you into Commie)."

As a sort of poll, I asked many Russian colleagues, teachers and acquaintances for their favourite children’s books. Surprisingly, they often named translations. For example: Pinocchio, The Three Musketeers, Tarzan. Nice, probably, but not my target.

Also my Russian colleagues advised me to read “Dunno on the Moon” - and I tried, around two years ago, but it was then still too difficult. But this year I read it! And I’ve read more books, a bit more than ten. This blog contains my top list.

Soviet Children’s Literature

So I started reading Soviet Children's literature and I liked it! Some books are a bit harder than others, and sometimes I needed translations, but in the end I managed to read and understand these books and to enjoy them! Below is the report.

Apart from Soviet writers, I also read some translations which are well-known in Russia. For example: Karlsson is originally Swedish, but there are two popular Soviet animated films. Most Russian watched them and know it.

This link gives some of the Soviet books I read, and some others. I recently encountered it, it was not my guideline.

Readability Index

I was looking for a tool to measure the level of a book, and found some measures such as the Automated Readability Index. It’s a cool index which can be applied to any text - without even knowing in which language it is written.

This Readability Index (and it’s variants) gave surprising results, such as the fact that “Dunno on the Moon” is harder to read than Dostoevski's Crime and Punishment! Therefore (and because I’m a programmer) I made a variant which does not have that problem. I’ll describe it later in another blog. I called it "Text Readability Index".

Ranked list of recommended books

Based on this figures, and on my own experience, I present a list here a list with ordered recommendations:

Title Author Page count % of top 2500 words Text Readability Index Automated Readability Index
Uncle Fedya, His Dog, and His CatEduard Uspensky6574 %2.703.27
Vitya Maleev at School and at HomeNikolay Nosov16078 %6.465.28
The Adventures of Dunno and his FriendsNikolay Nosov13270 %5.617.12
The Drummer's FateArkady Gaidar10873 %6.285.72
Charlie and the Chocolate FactoryRoald Dahl8769 %6.667.60
Karlsson on the RoofAstrid Lindgren8972 %6.596.62
Dunno in Sun CityNikolay Nosov25571 %9.019.13
Chuk and GekArkady Gaidar3173 %8.176.45
Harry Potter and the Philosopher's StoneJ. K. Rowling33270 %9.118.30
Dunno on the MoonNikolay Nosov44769 %10.2810.08

This list is ordered by the ratings and by my own feeling. I will not describe all the books in detail, links are provided above, and there are many reviews available of these books.

What I can tell is that I've become a fan of Nosov: all these four books are great! Especially Dunno on the Moon is an epic work - and never translated into Dutch. What a miss... Dunno on the Moon was considerably harder to read than Dunno in Sun City, which is also a great book. Self-driving cars, robot vacuum cleaners, it describes current world - but it was written in 1958.

The book of Uspensky is also a must read, it is well known in Russia, one of my colleagues could still recite some citations by heart (mainly from the cat Matrushkin).

Gaidar is sometimes a bit harder to read, but I loved the poetic story of Chuk and Gek (there is a useful English translation too which can be downloaded from archive.org). The Drummer's Fate was a easier to read, cool story! There is a Soviet film based on this book, with the same title.

From the translated work, Karlsson is also cool. It is a trilogy and I read the other two books too - but I think the first one is by far the best. As said above there is a famous Soviet animation film. A must see.

Charlie and the Chocolate Factory is a book I had never read before and I wanted to read it, so why not in Russian... And it was worth it!

And (of course) there is Harry Potter, I actually started with that one, at the beginning of the year, not really knowing Nosov yet, but it was still quite hard. But now, at the end of the year, I can more easily read it. Nice proof of progress!

Wednesday, November 10, 2021

C++20 concepts and traits

Intro

I like writing libraries. In 2007 I refactored the heart of our library using types into a library using templates. This looked a bit like:

struct point { double x, y; };

and then some functions using these types (there was also a polygon, etc).

When I was satisfied with the results I submitted this as a preview to Boost. The general reaction was: interesting, but it is not concept based, and therefore not generic enough.

So I had to make the types concept based, but I didn't know what it was. And I found out that concepts were not yet supported by the language - so how could I use them... That became all clear, with help of reviewers and joiners. The library was refactored again into a concept based library, it was accepted and it is now known as Boost.Geometry.

Now (14 years later) the C++ language has support for concepts.

This blog is targeted to the approach Boost.Geometry uses. It uses traits to adapt normal structs or classes to concepts. Functions in the library use those concepts. 

The example in  this blog

We start again with a point, and a function calculating the distance:

struct point { double x, y; };
template <typename Point>
double distance_using_point(const Point& p1, const Point& p2)
{
  auto sqr = [](const double v) -> double { return v * v; };
  return std::sqrt(sqr(p1.x - p2.x) + sqr(p1.y - p2.y));
}


This function is template based but not generic. It only works for any point type having .x and .y member variables. But suppose it is not a member variable but a function .x(). Or you are using a std::array with two elements [0] and [1] (denoting x, y).

Traits as free functions

We will use traits for that. There are more ways to define and use traits. A first, concise way (not recommended! see below for a better version) is to just define free templated functions:
 
namespace point_traits
{
  template <typename P> double get_x(const P& p);
  template <typename P> double get_y(const P& p);
}

and these functions can then be specialized for the type:
 
namespace point_traits
{
  template<> inline double get_x<point>(const point& p) { return p.x; }
  template<> inline double get_y<point>(const point& p) { return p.y; }
}

And this works! Suppose you are to write a distance function, using these traits, it could look like this:

template <typename P1, typename P2>
double
calculate_distance(const P1& p1, const P2& p2)
{
  auto sqr = [](const double x) -> double { return x * x; };
  return std::sqrt(sqr(point_traits::get_x<P1>(p1) - point_traits::get_x<P2>(p2))
                 + sqr(point_traits::get_y<P1>(p1) - point_traits::get_y<P2>(p2)));
}

If you use a std::array, then you can specialize it too:

namespace point_traits
{
  template<> inline double get_x<std::array<double, 2>>(const std::array<double, 2>& p) { return p[0]; }
  template<> inline double get_y<std::array<double, 2>>(const std::array<double, 2>& p) { return p[1]; }
}
 

And you can calculate the distance of two different types, for example:

  point mp{1, 1};
  std::array<double, 2> sa{2, 2};
  std::cout << calculate_distance(mp, sa) << std::endl;


It writes: 1.41421 as expected.

 

 

Evaluation of traits-as-free-functions 


These traits give the example already a kind of concept based look. And it works, but there are some problems:

  • functions cannot be specialized partially - this can be inconvenient
  • suppose we want floats too, and long doubles, and we want to have that coordinate type defined in a meta-function. Where to define it?
  • suppose we forget a traits adaptation: there is not appearing any compiler error message! Because it knows the generic templated function, and will not complain. It gives a linker error.
  • it is (therefore) not suitable for C++20 concepts 

The linker error, in case the specialization for point is forgotten or not right, will look like:

undefined reference to `double point_traits::get_x<point>(point const&) 
undefined reference to `double point_traits::get_y<point>(point const&) 
 
The linker just can't find these definitions. Compilation was fine. You can live with this, but there is a better alternative.

Concepts 

Now we will try to make a C++ 20 concept using these traits:

template <typename P> 
concept IsPoint = requires (P p) 
{ 
  point_traits::get_x<P>(p);
  point_traits::get_y<P>(p);
};

template <typename P1, typename P2>
requires IsPoint<P1> && IsPoint<P2>
double calculate_distance(const P1& p1, const P2& p2)
{
  // the same implementation
}


So we added, in bright magenta:
  1. a new concept IsPoint. That concept tries to use the functions in namespace point_traits. If they are not there, the type P does not fulfill the concept.
  2. a require clause decorating the distance function, stating that both point types should fulfill the IsPoint concept.

How cool is that!

However, even using this concept, there is the same linker error message. There is no neat compiler message about concepts, which we wished and maybe expected...

 

Traits as a struct 

To fix this, the traits can be modeled as structures. So we adapt the code, we introduce a structure (here replacing the namespace - but that is only for the sake of the sample - these struct traits are often placed in an own namespace). And the structure is short!

template <typename P> struct point_traits {};

So it is an empty structure. The magic happens in the specializations:

template<> struct point_traits<point> 
{
  static double get_x(const point& p) { return p.x; }
  static double get_y(const point& p) { return p.y; }
};

template<> struct point_traits<std::array<double, 2>> 
{
  static double get_x(const std::array<double, 2>& p) { return p[0]; }
  static double get_y(const std::array<double, 2>& p) { return p[1]; }
};

It looks nearly the same, just that now the struct is templatized with point or the std::array, instead of the free functions. And the free functions are now static member functions.

The concept is also similar, it's just the <P> part moving a bit left to the structure:

template <typename P> 
concept IsPoint = requires (P p) 
{ 
  point_traits<P>::get_x(p); 
  point_traits<P>::get_y(p); 
};

And so does the final generic concept-based distance function:

template <typename P1, typename P2>
requires IsPoint<P1> && IsPoint<P2>
double calculate_distance(const P1& p1, const P2& p2)
{
  auto sqr = [](const double x) -> double { return x * x; };
  return std::sqrt(sqr(point_traits<P1>::get_x(p1) - point_traits<P2>::get_x(p2)) 
                 + sqr(point_traits<P1>::get_y(p1) - point_traits<P2>::get_y(p2)));
}

And if we now forget a specialization? We get a neat error message. In clang (13) it reads:

prog.cc:62:16: error: no matching function for call to 'calculate_distance'
  std::cout << calculate_distance(mp, sa) << std::endl;
               ^~~~~~~~~~~~~~~~~~
prog.cc:45:8: note: candidate template ignored: constraints not satisfied [with P1 = point, P2 = std::array<double, 2>]
double calculate_distance(const P1& p1, const P2& p2)
       ^
prog.cc:43:10: note: because 'point' does not satisfy 'IsPoint'
requires IsPoint<P1> && IsPoint<P2>
         ^
prog.cc:35:20: note: because 'point_traits<P>::get_x(p)' would be invalid: no member named 'get_x' in 'point_traits<point>'
  point_traits<P>::get_x(p); 
                   ^
1 error generated. 
 
You can use C++20 and concepts, for example, in Wandbox.

Samples from libraries

Boost.Geometry uses this approach, also generalized for coordinate type and dimension. Here is the adaptation of the get function for std::array (in namespace boost::geometry::traits). It's partially specialized.
 
template <typename CoordinateType, std::size_t DimensionCount, std::size_t Dimension>
struct access<std::array<CoordinateType, DimensionCount>, Dimension>
{
    static inline CoordinateType get(std::array<CoordinateType, DimensionCount> const& a)
    {
        return a[Dimension];
    }
};

MapBox uses a similar approach, here is the adaptation for our point (in namespace mapbox::util):

template <> struct nth<0, point> 
{
  inline static auto get(const point &t) { return t.x; }
};
template <> struct nth<1, point> 
{
  inline static auto get(const point &t) { return t.y; }
};

Summary

The blog above shows, shortly, how to define traits, preferably as structs, and how to define a concept and use this concept in a free function distance.

Wednesday, May 5, 2021

C++20 Concepts

C++20 Concepts


This week (2021) I'm at the C++Now conference. The conference is normally in Aspen, Colorado (I was there more than 10 years ago), but this year it's online, for obvious reasons.

Yesterday there were two excellent interesting talks from Jeff Garland about concepts in C++20, how it works, what you can do with it, how you can use them and how you can write them.

Some compilers (notably clang) don't yet support concepts, but you can already play with toy projects online using https://godbolt.org, where your code is compiled and run on the fly.

So here is my toy project, which I wrote during the talks (there were two consecutive talks):

 

#include <array>
#include <type_traits>
#include <iostream>

template <typename C> concept ValidCoordinateType = std::is_arithmetic_v<C>;
template <int D> concept ValidDimension = D >= 2 and D <= 3;
template <int D> concept HasZ = D >= 3;

template <typename C, int D>
requires(ValidDimension<D> and ValidCoordinateType<C>)
struct mypoint
{
  mypoint() = default;
 
  mypoint(C x, C y) 
    : coors{x, y} {}
 
  // Only available for 3D
  mypoint(C x, C y, C z) requires HasZ<D>
    : coors{x, y, z} {}
 
  auto x() const { return coors[0]; }
  auto y() const { return coors[1]; }

  // Avoids compilation for Dim < 3
  auto z() const requires HasZ<D> { return coors[2]; }
 
private :
   std::array<C, D> coors;
};

struct mytype {};

int main()
{
  mypoint<double, 2> two(1, 2);
  mypoint<float, 3> three(3, 4, 5);
  std::cout << "Hi " << two.x() << " " << two.y() << " " << three.z() << "\n";

  // These declarations will not compile
  //mypoint<mytype, 2> p1;
  //mypoint<double, 4> p2;
 
  // This line will not compile
  //std::cout << two.z(); // Fails because .z() is not available

  return 0;
}

So this code is using C++20 concepts. You can see it clearly at and around the keywords concept and requires. It is really cool, especially compared with what we had to do with C++03 (Boost.Geometry was written in C++03 and only a few releases ago went to C++14) to achieve this.

Some things could earlier be done with static_assert too (such as checking the number of dimensions), but now the compiler neatly warns that that is better written using a concept.

Other things needed SFINAE earlier (such as the enabling  / disabling the functions for the Z coordinate).

If the last declarations in main are uncommented, you get a neat compiler error report (for instance for p2):

<source>: In function 'int main()':
<source>:40:20: error: template constraint failure for 'template<class C, int D> requires (ValidDimension<D>) && (ValidCoordinateType<C>) struct mypoint'
40 | mypoint<double, 4> p2; // Fails because 4 coordinates are not allowed by the concept
| ^
<source>:40:20: note: constraints not satisfied
<source>:6:26: required for the satisfaction of 'ValidDimension<D>' [with D = 4]
<source>:6:56: note: the expression 'D <= 3 [with D = 4]' evaluated to 'false'
6 | template <int D> concept ValidDimension = D >= 2 and D <= 3;
| ~~^~~~

Wow, it's only a few lines! And so clear! In the past we got hundreds of hard to interpret error messages about templates...

And if you uncomment the line streaming z for a 2d coordinate, you get:

<source>: In function 'int main()':
<source>:43:22: error: no matching function for call to 'mypoint<double, 2>::z()'
43 | std::cout << two.z();
| ^
<source>:23:8: note: candidate: 'auto mypoint<C, D>::z() const requires HasZ<D> [with C = double; int D = 2]'
23 | auto z() const requires HasZ<D>
| ^
<source>:23:8: note: constraints not satisfied
<source>: In instantiation of 'auto mypoint<C, D>::z() const requires HasZ<D> [with C = double; int D = 2]':
<source>:43:22: required from here
<source>:7:26: required for the satisfaction of 'HasZ<D>' [with D = 2]
<source>:7:35: note: the expression 'D >= 3 [with D = 2]' evaluated to 'false'
7 | template <int D> concept HasZ = D >= 3;
| ~~^~~~