Spark Data Engineer

Company Description

Giant Oak builds software to make the world a safer place. Giant Oak Search Technology (GOST®) makes screening easy by targeting the right kinds of information to help our customers combat fraud, detect crime, and enhance security. GOST® is the fastest, most reliable negative media search tool on the market because we see data differently. We look behind the numbers to see individuals and communities. And we strive to do our part to make the world a better, freer, and more secure place.


The Data Engineer for Spark will work with the Chief Scientist and the science team to acquire, maintain and deploy web scale data sources for Giant Oak’s GOST® product. GOST® provides automated entity search and prioritization against publicly available web data. To do this, GOST® employs its own index of the web. The Data Engineer for Spark supports analytics development by the science team, maintain the index and ensure that it prioritizes data appropriately for multiple customer domains., as well as acquiring, transforming, analyzing, and deploying to production a wide range of textual resources. The Data Engineer for Spark will manage a machine learning pipeline and deploy innovative models and approaches on terabyte scale data.


  • Build and maintain a text resource deployment pipeline that ensures control over versioning and deployment of GOST® domain specific text indices.

  • Acquire and deploy text resources in an appropriately timely manner.

  • Work closely with social science team to implement and deploy innovative algorithms and queries across data in testing and production.

  • Provide Spark/Hadoop knowledge and advice throughout Giant Oak as necessary.

  • Design and modify cloud storage architecture for maintaining Giant Oak’s index.


Required Qualifications

  • Bachelor’s degree in a quantitative subject or equivalent expertise.

  • Experience building ETL systems, data pipelines and analytical data warehouse solutions.

  • Minimum of 3 years of software development experience.

  • Minimum of 3 years of experience with cloud architectures, e.g., AWS (preferred), Azure, Google Cloud.

  • Minimum of 3 years of experience with open source, big data implementations on Hadoop.

  • Minimum of 3 years of experience with relational databases, e.g., PostgreSQL, SQL Server, Oracle.

  • Eligible for US Security Clearance.

Preferred Qualifications:

  • Experience with search technologies including Lucene, Solr, or Elasticsearch.

  • Experience with statistics/machine learning at large scale.

Apply here

Giant Oak, Inc. is an Equal Opportunity Employer