Adevinta is an international group present in 16 countries, and we are sure that you know our brands in Spain : , , , , , and , reach more than 18 million unique users per month : we are one of the top 10 companies with the largest internet audience in Spain! This and our more than 1,000 employees make us one of the leading tech companies on the Spanish market.
As a tech company, we have digital DNA. We love challenging the status quo and we are constantly thinking of new ways of doing things.
We work using PEAK Performance, a streamlined way of working based on OKRs, and self-sufficient and self-organising teams.
We believe in a culture of feedback and a close leadership style. In short, we create ever smarter solutions to offer our users and clients the best experience.
If you like challenges and believe in the power of connection as much as we do...Join our team!
We are looking for a Data Engineer for our Big Data core team, whose general mission will be :
net, Motos.net, Habitaclia
Implement data pipelines, with batch and streaming jobs.
Provide a data federation platform to discover and query all Adevinta data in a uniform way.
Create an experimentation platform for building machine learning and data products.
Help Adevinta Spain taking data-driven decisions to empower & improve our customer's experience.
Your challenge :
Collect and process data of million of users from some of the most important sites in Spain.
Collaborate with all our sites to develop data products.
Build data catalog and data lineage tool for hundreds of sources integrating our pipelines.
Governance our Data Lake and provide access to hundreds of users with different roles.
Provide real time data solutions
Desired skills & Professional experience :
Strong knowledge of functional programming in Scala and / or Java.
Experience working with data processing frameworks like Spark, Flink, Kafka Streams or Beam.
Proficiency in building well-tested code.
Querying tools : Hive, Athena, Presto, BigQuery...
Nice to have knowledge :
Experience with streaming processing tools like Kafka / Kinesis, Structured Streaming, Akka Streams, Flink, Spark Streaming...
Experience with Docker and container orchestration tools like Kubernetes, ECS, Docker Swarm.
Experience working with some Hadoop distribution : Hortonworks, Cloudera, EMR, Azure HDInsight, Google Cloud Dataproc...
Notebooks : Jupyter, Zeppelin, Databricks.
Good programming skills with Python
Familiarity developing data pipelines with scheduling tools like : Airflow, Luigi, Cloud Componer, Argo.
NoSQL databases : Redshift, Cassandra, HBase.
Experience securing Big Data stacks : Okta, Ranger, Knox, Kerberos.
Familiarity working with AWS.
Machine learning tools : scikit-learn, Spark ML, Kubeflow, MLflow...