npm, Inc. is the company that hosts and manages npm, the most widely used package manager for Javascript. npm uses the HEAVY.AI SQL engine for log file analysis on datasets up to 7 billion rows.
The Challenge
npm must analyze a tremendous number of ad-hoc queries that cannot be predicted in advance. These queries include requests from larger JavaScript package providers for information about how their packages are being utilized, trends in Javascript development, and changes to the versions of Node.js that developers use. Their existing log file analysis software could not keep up.
The Solution
npm considered alternative log file analysis tools but found the price/performance attributes to be lacking. In HEAVY.AI, npm found lightning fast response times, a small hardware footprint, and no requirement to index the data.
“HEAVY.AI lets us answer questions about our community and explore trends in all different dimensions of our data in real time. We're excited about HEAVY.AI Cloud, which gives us all that power in a convenient, scalable way.”
Laurie Voss, CTO
The Result
HEAVY.AI provides npm millisecond lag on multi-billion row datasets with a single server. As a result, npm is able to deliver superior performance, less administration, and minimal server and infrastructure cost.
npm Highlights
- Ad-hoc queries of 7 billion row dataset
- Now achieves millisecond lag time
- Minimal server & infrastructure cost
npm: the package manager for node.js
npm, Inc. is the company that hosts and manages npm, the most widely used package manager for JavaScript. The npm registry hosts over a quarter million packages of reusable code—the largest code registry in the world. It is used daily by 4 million developers worldwide with 4.5 billion packages downloaded every month. Because of npm’s keen focus on the long term success of the JavaScript community—including the open-source Node.js and npm projects, it is used by more than 30,000 companies, including DocuSign, SiriusXM, Uber, and Visa, to manage and deploy their code packages.
npm & HEAVY.AI
npm uses the HEAVY.AI analytics platform for exploring request log (log file) data. Request logs are a record of every request that the npm server has processed.
In production, npm runs HEAVY.AI on an Amazon EC2 r3.8xlarge instance and receives approximately 700 million events per day which are bulk-loaded hourly. Currently they keep a total of 10 days worth of data—approximately 7 billion rows. The log file data contains information such as date/timestamp, JavaScript package name, node and npm version number, proxy cache server point of presence (PoP), region—even the npm commands issued.
npm has a particular challenge, namely it experiences a tremendous number of ad-hoc queries that cannot be predicted in advance such as requests from larger JavaScript package providers for information about how their packages are being utilized, trends in JavaScript development and changes to the versions of Node.js that developers use.
The same log file data that provides trends are also used for diagnostic purposes.
This may range from looking at all requests in a given data center to find a faulty node, filtering requests from a specific user agent or IP for anomalous or failing requests. When performing log file analysis and reporting, npm also looks for changes to regular usage patterns in the log files such as when a remote IP suddenly spikes, possibly indicating a problem, or simply a large new customer.
“Our requirements demanded exceptional performance and scalability to power through large, complex queries and we found the answer in HEAVY.AI.”
- Laurie Voss, CTO of npm
Why HEAVY.AI
npm considered alternative log file analysis tools but found the price/performance attributes to be lacking. In HEAVY.AI, npm found lightning fast response times without any requirement to index the data. Further, after experimenting with various open source data fabrics, npm found competing solutions couldn’t scale or demanded more operational effort and hardware than npm’s small but talented team could spare.
HEAVY.AI delivered millisecond lag on multi-billion row datasets with a single server. As a result, npm is able to deliver against their varied objectives: superior performance, less administration, and minimal server and infrastructure cost.
Download the full big data analysis case study.