-
Updated
Feb 4, 2021 - Shell
#
dataops
Here are 40 public repositories matching this topic...
Kafka Docker for development. Kafka, Zookeeper, Schema Registry, Kafka-Connect, Landoop Tools, 20+ connectors
kellydavid
commented
Jan 8, 2021
It would be great to make the REST timeout configurable. I have run into timeouts while making requests to a schema registry with a large number of schemas.
It currently seems to be hardcoded here:
https://github.com/cloudhut/kowl/blob/5b135edb5f237f5600742514c001150fa93ac5fe/frontend/src/state/backendApi.ts#L19
Streaming reference architecture for ETL with Kafka and Kafka-Connect. You can find more on http://lenses.io on how we provide a unified solution to manage your connectors, most advanced SQL engine for Kafka and Kafka Streams, cluster monitoring and alerting, and more.
kubernetes
redis
mqtt
elasticsearch
coap
streaming
kafka
cassandra
mongodb
influxdb
connector
hazelcast
ftp
jms
kudu
rethinkdb
hbase
dataops
kafka-connect
cosmosdb
-
Updated
Feb 13, 2021 - Scala
Profile and monitor your ML data pipeline end-to-end
python
logging
dataset
dataops
data-pipeline
data-quality
calculate-statistics
aiops
mlops
ml-pipelines
ai-pipelines
approximate-statistics
statistical-properties
-
Updated
Feb 11, 2021 - Jupyter Notebook
A list of tools for annotating data, managing annotations, etc.
-
Updated
Feb 11, 2021
-
Updated
Feb 4, 2021 - Shell
Protobuf converter plugin for Kafka Connect
-
Updated
Feb 6, 2021 - Java
Interactive computing for complex data processing, modeling and analysis in Python 3
-
Updated
Jan 5, 2021 - Python
In-deprecation. For Lenses please check lensesio/lenses-helm-charts. Soon Stream Reactor will also get its own Helm repository.
-
Updated
Aug 2, 2020 - Smarty
jason-godden
commented
Jan 4, 2020
Open
Tests
DataOps for Government
-
Updated
Sep 13, 2018
-
Updated
Nov 27, 2017 - R
Cloud Native Data Management Landscape
-
Updated
May 17, 2020
HokStack - Run Hadoop Stack on Kubernetes
-
Updated
May 10, 2020 - Shell
leomaurodesenv
commented
Aug 13, 2020
In thread in Slack datahackersbr proposes:
- split the project in folders:
framework,endpoint,config.py frameworkwill contain api logic, and aendpointfactor to load all endpoints setted inconfig.py- the user will only add files to
endpointand editconfig.pyfile.
Open source clients for working with Data Culpa Validator services from data pipelines
-
Updated
Feb 13, 2021 - Python
Airflow Data Processing Pipeline for TUL Catalog on Blacklight Data
-
Updated
Feb 12, 2021 - Python
Workflow management platforms comparison
-
Updated
Dec 19, 2020 - Python
Available templates to deploy Lenses in different cloud providers
-
Updated
Dec 4, 2020 - Python
Step-by-step instructions on how to set up a virtual machine for Data Science usiing Cloud Infrastructures
python
digitalocean
data-science
cloud
r
rstudio
shiny-server
dataops
r-shiny
jupyterlab
r-stats
rstudio-server
-
Updated
Nov 7, 2019 - R
Awesome list of dataops products, open source and resources
-
Updated
Jun 12, 2020
Airflow DAGs for the Manifold (TUL Website) application
-
Updated
Feb 10, 2021 - Python
Airflow DAGs for Funnel Cake Indexing + Related Jobs
-
Updated
Feb 12, 2021 - Python
Improve this page
Add a description, image, and links to the dataops topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the dataops topic, visit your repo's landing page and select "manage topics."


Motivation: Why do you think this is important?
Flytekit should support Vaex as a pandas alternative for FlyteSchema object.
https://github.com/vaexio/vaex
Vaex has great performance on a single machine, which is usually needed for most datasets. Spark & dask are overkill with lots of complexity for datasets of sizes in few gigabytes. Addition of Vaex and support for automatic serializati