Data Engineer


Job Title
Data Engineer
Job Description

Graphika seeks an experienced Data engineer to join our technology team full time. The technology team at Graphika builds the tools that drive our cutting-edge analysis platform. We work with large scale graph algorithms and streaming data to tackle interesting questions in new ways. The Data engineer will contribute to building and scaling our various data pipelines, working closely with our data science and analysis teams. The data engineer will also collaborate with various other members of the team (including other backend engineers, frontend engineers and product team) to help plan and implement solutions to fix business problems.

All Graphika Tech Team Members

  • understand and appreciate good software engineering practices, including version control, code reviews, testing, and refactoring
  • are comfortable debugging and optimizing code
  • write tests to make sure code is reliable
  • help shape technical decisions within the team
  • collaborate within and across departments to ensure successful product creation
  • have the ability to pick up new tools and technologies as needed

Areas of Responsibility

  • Create and optimize large-scale batch and real-time data pipelines that ingest large quantities of structured and unstructured data from a variety of sources
  • Actively own systems which support diverse applications across Product, Tech, and Labs teams
  • Design and implement ETL processes through cloud-based solutions
  • Share ownership in ensuring the quality of our data and data infrastructure
  • Consistently test code and systems for robustness
  • Strategize around new data storage solutions and support existing ones

Ideal Candidate Profile

You have demonstrated the ability to build, deploy and maintain large-scale, data-driven solutions. You love to take on complex data-related problems, and can direct your own work. You have the skills and desire to interrogate data sets to understand their various foibles, and respond accordingly. You have a working knowledge of CS fundamentals like algorithms, data structures, and time complexity. You can imagine and design architectural solutions at scale.

You think beyond the task at hand to deeply understand the 'why' behind what you are doing. You can maintain a focus on shipping software products, understanding that done is often preferable to perfect.

You are an enthusiastic teammate, who engages in collaboration and proactive discussion. You are an effective communicator who can explain technical concepts to product leaders, customer support, and other engineers. You work with confidence and without ego. You have deep knowledge and exercise a high degree of ownership in your daily work. You have loosely-held, defensible ideas, and advocate for what you believe is right. You can surface your unarticulated assumptions. You are also adept at identifying and evaluating trade-offs, willing to be proven wrong, and quick to support your fellow teammates.

What this job is NOT

This job is not an analyst or data science role. It is not intended as a stepping stone to either of those roles within the organization. It is not directly involved in the highly publicized reports Graphika generates. This job ensures the robust, clean data on which those reports and further scientific discovery can be based with integrity.


  • Unlimited PTO, with a company-mandated minimum of ten days of vacation time taken per year.
  • 100% healthcare (health, vision, dental) premium coverage for employees; 50% premium coverage for families
  • Remote personal office setup stipend
  • Telecommuting is OK
  • No Agencies Please

It is required that the candidate:

  • has the ability to work legally in the US without visa sponsorship
  • has experience in writing production quality software in Python which is understandable, testable, and has an eye towards maintainability.
  • is familiar with AWS services: S3, Lambda, Kinesis, SQS, etc, or similar cloud-based tools
  • has knowledge of and ability to interact with DevOps tooling (Terraform, Ansible, Packer, Docker, etc.)
  • has knowledge of tradeoffs between different distributed systems architectures
  • is comfortable with designing and scaling massive munging efforts on unstructured data
  • is experienced with the Python data science stack (numpy, pandas, matplotlib, sklearn, Jupyter, etc.)
  • is capable of leading data architecture discussions
  • has knowledge of SQL and common relational database systems such as PostgreSQL and MySQL
  • is familiar with schema design for a variety of domains
  • is knowledgeable about data storage solutions
  • has a strong dedication to code quality, automation and operational excellence: unit/integration tests, scripts, workflows.

It would be nice for the candidate to have:

  • Hands-on experience with Apache Spark
  • Acquaintance with social media data sources and formats
  • Experience with workflow management systems (such as Airflow or Luigi)
  • Knowledge of NoSQL technologies like Redis

Education Requirements

Bachelor's degree or equivalent work experience

About the Company

Graphika empowers the world to understand and navigate the “cybersocial terrain.” We create large-scale, in-depth maps of social media landscapes and conversations to discover how communities form online and how influence and information flow within large scale networks. Our interdisciplinary team uses our unique, patented set of technologies and tools to create and apply new, rigorous analytical methods to answer difficult questions about online conversations.

An important note about joining Graphika during this extraordinary time

Graphika is growing! Despite the downturn and accompanying reductions in other sectors and companies, Graphika is retaining current employees and looking to hire more.

In the BeforeTimes, Graphika's Technology Team was fully co-located in our NYC office. On March 12, 2020, Graphika moved to a fully-distributed model, and we've been working together as a company to respond to the changing realities of the AfterTimes. As a result, we are happy to consider applicants who are located in the continental US, with the caveat that the Technology Team works on Eastern time and begins their day at around 10am.

Contact Info

👉 Please mention in your application that you found the job on pyremote, this helps us get more companies to post here!

This job is sourced from Jobs. When clicking on the button to apply above, you will leave pyremote and go to the job application page. pyremote accepts no liability or responsibility as a consequence of any reliance upon information on there (external sites) or here.