Our library is made up of multiple components that are partitioned into GitHub repositories by language and dependencies. The dependencies of the core components are kept to a bare minimum to enable flexible integration into many different environments. The Platform Adaptor components will have major dependencies on the respective platform envionments.
If you have a specific issue or bug report that impacts only one of these components please open an issue on the respective component. If you are a developer and wish to submit a PR, please choose the appropriate repository.
If you like what you see give us a Star on these sites!
The key sketches of the Apache DataSketches libraries are available in three (soon four) programming languages. By design, a sketch that is available in one language that is also available in a different language will be “binary compatible” via serialization. For example, when serialized into its compact form, a sketch created by the DataSketches C++ library, can be read by the DataSketches Java library and visa versa.
Because of differences inherent in the languages, there will be some differences in the APIs, but we try to make the same basic functionality available across all the languages.
Repository | Distribution | Comments |
---|---|---|
Java Core | Downloads | This is the original and the most comprehensive collection of sketch algorithms. It has a dependency on the Memory component |
Memory (supports Java Core) | Downloads | Provides high-performance access to off-heap memory |
C++ Core | Downloads | C++ was our second core language library and provides most of the major algorithms available in Java as well as a few sketches unique to C++. |
Python Core | Downloads, PyPI | Python was our third core language library and contains most of the major sketch families that are in Java and C++. All the Python sketches are backed by the C++ library via Pybind. |
Go Core | Under Development | Go is our fourth core language and is still evolving. |
Adapters integrate the core library components into the aggregation APIs of specific data processing platforms. Some of these adapters are available as an Apache DataSketches distribution, other adapters are directly integrated into the target platform.
Repository | Distribution | Comments |
---|---|---|
Google BigQuery Adaptor | Under Development | Depends on C++ Core |
Apache Hive Adaptor | Downloads | Depends on Java Core, Integrations |
Apache Pig Adaptor | Downloads | Depends on Java Core, Integrations |
PostgreSQL Adaptor | Downloads, pgxn.org | Depends on C++ Core, Integrations |
Apache Druid Adaptor | Apache Druid Release | Depends on Java Core, Integrations |
Repository | Distribution | Comments |
---|---|---|
Characterization | Not Formally Released | Used for long-running studies of accuracy and speed performance over many different parameters. |
Website | Not Formally Released | Public website |
Vector | Not Formally Released | This component implements the Frequent Directions Algorithm [GLP16]. It is still experimental in that the theoretical work has not yet supplied a suitable measure of error for production work. It can be used as is, but it will not go through a formal Apache Release until we can find a way to provide better error properties. It dependends on the Memory component. |
Server | Not Formally Released | Under development |