Структура данных для хранения тысяч векторов

У меня есть до 10 000 случайным образом расположенных точек в пространстве, и я должен быть в состоянии сказать, к которому курсор является самым близким в любой момент времени. Для добавления некоторого контекста точки в форме векторного рисунка, таким образом, они могут постоянно и быстро добавляться и удаляться пользователем и также потенциально разбалансироваться через пространство холста..

Я поэтому пытаюсь найти самую эффективную структуру данных для того, чтобы сохранить и запросить эти точки. Я хотел бы сохранить этого агностика языка вопроса, если это возможно.

10
задан Tom 17 December 2009 в 12:56
поделиться

7 ответов

After the Update to the Question

  1. Use two Red-Black Tree or Skip_list maps. Both are compact self-balancing data structures giving you O(log n) time for search, insert and delete operations. One map will use X-coordinate for every point as a key and the point itself as a value and the other will use Y-coordinate as a key and the point itself as a value.

  2. As a trade-off I suggest to initially restrict the search area around the cursor by a square. For perfect match the square side should equal to diameter of your "sensitivity circle” around the cursor. I.e. if you’re interested only in a nearest neighbour within 10 pixel radius from the cursor then the square side needs to be 20px. As an alternative, if you’re after nearest neighbour regardless of proximity you might try finding the boundary dynamically by evaluating floor and ceiling relative to cursor.

  3. Then retrieve two subsets of points from the maps that are within the boundaries, merge to include only the points within both sub sets.

  4. Loop through the result, calculate proximity to each point (dx^2+dy^2, avoid square root since you're not interested in the actual distance, just proximity), find the nearest neighbour.

  5. Take root square from the proximity figure to measure the distance to the nearest neighbour, see if it’s greater than the radius of the “sensitivity circle”, if it is it means there is no points within the circle.

  6. I suggest doing some benchmarks every approach; it’s two easy to go over the top with optimisations. On my modest hardware (Duo Core 2) naïve single-threaded search of a nearest neighbour within 10K points repeated a thousand times takes 350 milliseconds in Java. As long as the overall UI re-action time is under 100 milliseconds it will seem instant to a user, keeping that in mind even naïve search might give you sufficiently fast response.

Generic Solution

The most efficient data structure depends on the algorithm you’re planning to use, time-space trade off and the expected relative distribution of points:

  • If space is not an issue the most efficient way may be to pre-calculate the nearest neighbour for each point on the screen and then store nearest neighbour unique id in a two-dimensional array representing the screen.
  • If time is not an issue storing 10K points in a simple 2D array and doing naïve search every time, i.e. looping through each point and calculating the distance may be a good and simple easy to maintain option.
  • For a number of trade-offs between the two, here is a good presentation on various Nearest Neighbour Search options available: http://dimacs.rutgers.edu/Workshops/MiningTutorial/pindyk-slides.ppt
  • A bunch of good detailed materials for various Nearest Neighbour Search algorithms: http://simsearch.yury.name/tutorial.html, just pick one that suits your needs best.

So it's really impossible to evaluate the data structure is isolation from algorithm which in turn is hard to evaluate without good idea of task constraints and priorities.

Sample Java Implementation

import java.util.*;
import java.util.concurrent.ConcurrentSkipListMap;

class Test
{

  public static void main (String[] args)
  {

      Drawing naive = new NaiveDrawing();
      Drawing skip  = new SkipListDrawing();

      long start;

      start = System.currentTimeMillis();
      testInsert(naive);
      System.out.println("Naive insert: "+(System.currentTimeMillis() - start)+"ms");
      start = System.currentTimeMillis();
      testSearch(naive);
      System.out.println("Naive search: "+(System.currentTimeMillis() - start)+"ms");


      start = System.currentTimeMillis();
      testInsert(skip);
      System.out.println("Skip List insert: "+(System.currentTimeMillis() - start)+"ms");
      start = System.currentTimeMillis();
      testSearch(skip);
      System.out.println("Skip List search: "+(System.currentTimeMillis() - start)+"ms");

  }

  public static void testInsert(Drawing d)
  {
      Random r = new Random();
      for (int i=0;i<100000;i++)
            d.addPoint(new Point(r.nextInt(4096),r.nextInt(2048)));
  }

  public static void testSearch(Drawing d)
  {
      Point cursor;
      Random r = new Random();
      for (int i=0;i<1000;i++)
      {
          cursor = new Point(r.nextInt(4096),r.nextInt(2048));
          d.getNearestFrom(cursor,10);
      }
  }


}

// A simple point class
class Point
{
    public Point (int x, int y)
    {
        this.x = x;
        this.y = y;
    }
    public final int x,y;

    public String toString()
    {
        return "["+x+","+y+"]";
    }
}

// Interface will make the benchmarking easier
interface Drawing
{
    void addPoint (Point p);
    Set<Point> getNearestFrom (Point source,int radius);

}


class SkipListDrawing implements Drawing
{

    // Helper class to store an index of point by a single coordinate
    // Unlike standard Map it's capable of storing several points against the same coordinate, i.e.
    // [10,15] [10,40] [10,49] all can be stored against X-coordinate and retrieved later
    // This is achieved by storing a list of points against the key, as opposed to storing just a point.
    private class Index
    {
        final private NavigableMap<Integer,List<Point>> index = new ConcurrentSkipListMap <Integer,List<Point>> ();

        void add (Point p,int indexKey)
        {
            List<Point> list = index.get(indexKey);
            if (list==null)
            {
                list = new ArrayList<Point>();
                index.put(indexKey,list);
            }
            list.add(p);
        }

        HashSet<Point> get (int fromKey,int toKey)
        {
            final HashSet<Point> result = new HashSet<Point> ();

            // Use NavigableMap.subMap to quickly retrieve all entries matching
            // search boundaries, then flatten resulting lists of points into
            // a single HashSet of points.
            for (List<Point> s: index.subMap(fromKey,true,toKey,true).values())
                for (Point p: s)
                 result.add(p);

            return result;
        }

    }

    // Store each point index by it's X and Y coordinate in two separate indices
    final private Index xIndex = new Index();
    final private Index yIndex = new Index();

    public void addPoint (Point p)
    {
        xIndex.add(p,p.x);
        yIndex.add(p,p.y);
    }


    public Set<Point> getNearestFrom (Point origin,int radius)
    {


          final Set<Point> searchSpace;
          // search space is going to contain only the points that are within
          // "sensitivity square". First get all points where X coordinate
          // is within the given range.
          searchSpace = xIndex.get(origin.x-radius,origin.x+radius);

          // Then get all points where Y is within the range, and store
          // within searchSpace the intersection of two sets, i.e. only
          // points where both X and Y are within the range.
          searchSpace.retainAll(yIndex.get(origin.y-radius,origin.y+radius));


          // Loop through search space, calculate proximity to each point
          // Don't take square root as it's expensive and really unneccessary
          // at this stage.
          //
          // Keep track of nearest points list if there are several
          // at the same distance.
          int dist,dx,dy, minDist = Integer.MAX_VALUE;

          Set<Point> nearest = new HashSet<Point>();

          for (Point p: searchSpace)
          {
             dx=p.x-origin.x;
             dy=p.y-origin.y;
             dist=dx*dx+dy*dy;

             if (dist<minDist)
             {
                   minDist=dist;
                   nearest.clear();
                   nearest.add(p);
             }
             else if (dist==minDist)
             {
                 nearest.add(p);
             }


          }

          // Ok, now we have the list of nearest points, it might be empty.
          // But let's check if they are still beyond the sensitivity radius:
          // we search area we have evaluated was square with an side to
          // the diameter of the actual circle. If points we've found are
          // in the corners of the square area they might be outside the circle.
          // Let's see what the distance is and if it greater than the radius
          // then we don't have a single point within proximity boundaries.
          if (Math.sqrt(minDist) > radius) nearest.clear();
          return nearest;
   }
}

// Naive approach: just loop through every point and see if it's nearest.
class NaiveDrawing implements Drawing
{
    final private List<Point> points = new ArrayList<Point> ();

    public void addPoint (Point p)
    {
        points.add(p);
    }

    public Set<Point> getNearestFrom (Point origin,int radius)
    {

          int prevDist = Integer.MAX_VALUE;
          int dist;

          Set<Point> nearest = Collections.emptySet();

          for (Point p: points)
          {
             int dx = p.x-origin.x;
             int dy = p.y-origin.y;

             dist =  dx * dx + dy * dy;
             if (dist < prevDist)
             {
                   prevDist = dist;
                   nearest  = new HashSet<Point>();
                   nearest.add(p);
             }
             else if (dist==prevDist) nearest.add(p);

          }

          if (Math.sqrt(prevDist) > radius) nearest = Collections.emptySet();

          return nearest;
   }
}
6
ответ дан 3 December 2019 в 19:33
поделиться

I would like to suggest creating a Voronoi Diagram and a Trapezoidal Map (Basically the same answer as I gave to this question). The Voronoi Diagram will partition the space in polygons. Every point will have a polygon describing all points that are closest to it. Now when you get a query of a point, you need to find in which polygon it lies. This problem is called Point Location and can be solved by constructing a Trapezoidal Map.

The Voronoi Diagram can be created using Fortune's algorithm which takes O(n log n) computational steps and costs O(n) space. This website shows you how to make a trapezoidal map and how to query it. You can also find some bounds there:

  • Expected creation time: O(n log n)
  • Expected space complexity: O(n) But
  • most importantly, expected query time: O(log n).
    (This is (theoretically) better than O(√n) of the kD-tree.)
  • Updating will be linear (O(n)) I think.

My source(other than the links above) is: Computational Geometry: algorithms and applications, chapters six and seven.

There you will find detailed information about the two data structures (including detailed proofs). The Google books version only has a part of what you need, but the other links should be sufficient for your purpose. Just buy the book if you are interested in that sort of thing (it's a good book).

6
ответ дан 3 December 2019 в 19:33
поделиться

It depends on the frequency of updates and query. For fast query, slow updates, a Quadtree (which is a form of jd-tree for 2-D) would probably be best. Quadtree are very good for non-uniform point too.

If you have a low resolution you could consider using a raw array of width x height of pre-computed values.

If you have very few points or fast update, a simple array is enough, or may be a simple partitioning (which goes toward the quadtree).

So the answer depends on parameters of you dynamics. Also I would add that nowadays the algo isn't everything; making it use multiple processors or CUDA can give a huge boost.

1
ответ дан 3 December 2019 в 19:33
поделиться

Наиболее эффективной структурой данных будет текст ссылки kd-tree

5
ответ дан 3 December 2019 в 19:33
поделиться

Равномерно ли распределены точки?

Вы можете построить дерево квадратов до определенной глубины, скажем, 8. Вверху у вас есть узел дерева, который делит экран на четыре квадранты. Сохранение в каждом узле:

  • Верхняя левая и нижняя правая координаты
  • Указатели на четыре дочерних узла, которые делят узел на четыре квадранта.

Построить дерево, скажем, до глубины 8 и в листовые узлы хранят список точек, связанных с этим регионом. В этом списке вы можете выполнять линейный поиск.

Если вам нужна большая степень детализации, постройте дерево квадратов с большей глубиной.

2
ответ дан 3 December 2019 в 19:33
поделиться

Вы не указали размеры ваших точек, но если это 2D-рисование линий, а затем сегмент растрового изображения - 2D-массив списков точек в области, где вы сканируете сегменты, соответствующие курсору и рядом с ним, могут работать очень хорошо. Большинство систем с удовольствием обрабатывают сегменты растрового изображения размером от 100x100 до 1000x1000, малый конец которых поместил бы в среднем одну точку на сегмент. Хотя асимптотическая производительность равна O (N), реальная производительность обычно очень хорошая. Перемещение отдельных точек между ведрами может быть быстрым; перемещение объектов также можно ускорить, если вы поместите объекты в сегменты, а не в точки (таким образом, на многоугольник из 12 точек будут ссылаться 12 сегментов; его перемещение в 12 раз превышает стоимость вставки и удаления списка сегментов; поиск вверх по ведру - это постоянное время в 2D-массиве). Большая часть затрат - это реорганизация всего, если размер холста растет очень быстро.

0
ответ дан 3 December 2019 в 19:33
поделиться

Если это в 2D, вы можете создать виртуальную сетку, покрывающую все пространство ( ширина и высота соответствуют вашему фактическому пространству точек) и найдите все 2D-точки, которые принадлежат каждой ячейке. После этого ячейка станет корзиной в хеш-таблице.

0
ответ дан 3 December 2019 в 19:33
поделиться
Другие вопросы по тегам:

Похожие вопросы: