We are building systems that ingest, model and analyze massive amounts of data from online, mobile and offline commerce/user activity. This data sits on top of Flume/Hadoop/HBase/Hive/Spark/Kafka.
We are looking for a self-starter who is comfortable with Performance and Data Architecture and has the ability to code hands on in Java and Python or Scala. The ideal candidate will embed seamlessly in the engineering team, working alongside developers helping to balance response time, system constraints such as memory and data availability with increased relevancy.
The ideal candidate is someone who wants to mine our data, analyze our hypotheses, help us optimize our experiments and devise algorithms to increase relevancy for personalized content, promotions, recommendations and natural language search.
We are looking for someone who can thrive in a Lean and Agile test and learn culture—which is highly collaborative, with frequent deliveries to production, where we are constantly experimenting to find out which strategies are resonating with our customers.
The ideal candidate is excited to experiment and can leverage their data analysis skills to help hindsight and assess which strategies are working.
Our work is real-time where the machine is learning versus preparing presentations after running analysis jobs against an offline Hadoop/Spark cluster.
Your work will be visible to millions of people and you will have a direct impact on the business goals.
Perform other duties as assigned.
We work on relevance algorithms from information retrieval, machine learning and ranking to deliver a high-availability, low-latency service, which directly impacts business metrics.
• Responsible for analyzing large data sets to develop custom models and algorithms to drive business solutions.
• Responsible for building large data sets from multiple sources in order to build algorithms for predicting future data characteristics. Those algorithms will be tested, validated, and applied to large data sets.
• Responsible for training the algorithms so they can be applied to future data sets and provide the appropriate recommendations in real time.
• Responsible for researching new trends in the industry and utilizing up-to-date technology (for example, HBase, MapReduce, LAPack, Gurobi).
• Build complex data sets from multiple data sources.
• Build learning systems to analyze and filter continuous data flows.
• Combine data features to determine models.
• Conduct advanced statistical analysis to determine trends and significant data relationships.
• Develop custom data models to drive recommendations.
• Scale new algorithms to large data sets.
• Train algorithms to apply models to new data sets.
• Utilize system tools including (MySQL, Hadoop, Weka, R, Matlab,ILog).
• Validate models and algorithmic techniques.
• Work with cross-functional partners across the business.
• Consistently demonstrates regular, dependable attendance & punctuality.
• PhD in computer science, mathmatics or similar field or MS with at least 1-3 years of related experience.
• Deep knowledge of machine learning, information retrieval, data mining, statistics, NLP or related field.
• Solid functional coding skills with 1 – 3 years of experience in Java or C++, Java is highly preferred. Must be capable of spending up to 50% of time writing production code in either Java/Scala/C++/Hadoop/Hive.
• Expert level knowledge of one of the scripting languages such as Python or Perl.
• Strong preference for hands-on experience with TensorFlow, Scikit-learn, PredictionIO, Spark MLlib, MXNet, Caffe, H2O or other ML Libraries.
• Proven experience working with statistical languages such as R.
• Experience working with large data sets and distributed computing tools a plus (Map/Reduce, Hadoop, Hive, Spark etc.).
• Good communication skills, both written and verbal.
• Self starter, quick learner, keen observer, eye for detail and someone who relishes challenges.