Data is the source of all learning, and nothing should get in the way of that. Data should be easy to access and explore. We spend too long working on our data instead of with our data. Data tells a story, and it’s our job to unearth and tell that story. Data is quickly becoming too difficult for people to work on. As it becomes more complex, our tools must evolve to handle the complex but mundane tasks that computers are good at, so we can focus on telling the story.
Enable everyone, from your business analyst to stakeholders, to query data and to access the data they need, without distracting your team or taking your time from the real work at hand.
Don’t ETL those files
Data Exploration should not require extensive infrastructure. Data Analysts and Data Scientists who “hack” through datasets to explore stories, trends, and initial insights can do that with sample data (a subset of your data). Our strength is our commitment to standard syntax, no pipelines, and no ETLs. Most data science projects, even projects that involve Big Data, will have a portion of data kept in CSV or XLS. To seamlessly integrate data from any format, whether large or small, is what we do best.
DataDistillr is an enterprise platform that enables data scientists to explore with ease and confidence.
It allows the data scientist to join data sets from multiple disparate locations, including mappings to referential data. Here’s a list of some of the file types that we currently handle:
CSV and other delimited data, Excel, XML, JSON, HDF5, Fixed width, PDF, SPSS, SAS, LTSV, ESRI Shape Files, Image metadata, Parquet, Avro and Sequence files.
We also support specialty formats: PCAP, PCAP-NG, Logfiles, Syslog, Web Access logs, ACH (Bulk Financial Data), Swift Messages, and blockchains.
Don’t copy those databases
Increasingly, businesses deploy a variety of data management and query systems that are specialised to particular types of data or workload. Well known examples run from relational databases like Oracle Database through Hive and Snowflake to Splunk. Data warehouse engineers are presented with a never ending project pipeline of ETL work which does a valuable job of centralising core data but leaves analysts out in the cold when they need data that is not yet under ETL. Using DataDistillr, those analysts can proceed to query across a wide range of sources including those mentioned in this paragraph and many more.
Into Africa
DataDistillr was not the first US technology firm to decide that it wants South African talent on its side when, as part of its rapid team growth in 2021, it opened a new Johannesburg software development office. As South Africans we welcome foreign investment in our human resources, an affirmation of our known characteristics of self-sufficiency and dedication, as well as a reminder that our time zone alignment to the European market is a valuable advantage that we enjoy for free. More broadly, DataDistillr is committed to building a strong, diverse team that spans latitudes, longitudes and cultures, creating a very real manifestation of a global village.