Issue #001 | Redefining Time Series Analytics & Scaling Dev Tools: An Inside Look at Pinterest and Meta

We explore how Pinterest streamlines time series database queries and analytics for observability, and how Meta scales dev tooling.

August 31, 2023

TScript & Goku to Streamline Time Series Analytics for Observability

TScript is a language the Pinterest engineering team designed to streamline time series database queries and analytics. The goal was to create a language that encapsulates DB queries as variables, allows for multiline entries, makes the queries expandable, provides a filter for DB queries, and is verbose enough for the user to quickly understand the syntax from a time series context.

Goku was the DB chosen by the Pinterest engineering team after evaluating the performance of Ganglia, Graphite, and OpenTSDB. Goku has four distinct advantages over OpenTSDB (the leading contender):

  • Goku uses an inverted index engine - a significant efficiency improvement over OpenTSDB.
  • Goku implements Gorilla compression - 12x compression.
  • Goku’s engine computes near the storage layer, allowing for parallel processing and, thus, faster response times.
  • Goku uses thrift binary over JSON (OpenTSDB) - faster queries at scale.
Figure 1.1 - Goku / TScript Queries Diagram

TScript leverages multiple features, making it easier for developers to create queries and dashboards that visualize the results. Combining explicit syntax and graphing functionalities, the following snippet (See Figure 1.2) will return a dashboard (See Figure 1.3) with appropriate markings.

Figure 1.2 - Basic TScript query that retrieves data on a 10-minute interval and labels it based on two thresholds
Figure 1.3 - Visualization of data from the Tscript query with color codes based on thresholds

The team's key challenge was data processing after it was returned from the DB. By preallocating memory using NumPy NaN values, it’s possible to consolidate DataFrames and achieve a considerable performance improvement.

Scaling developer tools - Meta’s approach

Meta has an extensive codebase; at scale, the company has experienced challenges with the current best-in-class tools (Ex: Git, GitHub) - Sapling was born.

Sapling provides multiple features aimed at working with multiple repositories, simplifying UI, and augmenting operations we’ve taken for granted. For example, it’s optimized for lazy loading the files of a repository, thus making it efficient for retrieving large amounts of files and making it possible to work on massive sets of files without local storage. The drawback is that this requires a constant connection to the web. The second major update was layering a set of features/commands on top of Git - Branching and merging now allow for concurrent development. As an engineer awaits code review, they’re able to keep working on their code.

Meta engineers push code to production on a regular basis. At scale, removing friction between developers and them running their code is a key undertaking for leadership. The engineering team has released the second version of Buck - Buck2. It’s an open-source large-scale build system that benchmarked 2x as fast at completing builds as Buck1.

Static code analysis plays a vital role at Meta. By incorporating several platforms into the process, Meta engineers have validated their code and mitigated crashes and potential issues - Infer, RacerD, and Jest.

Infer is easy to install and deploy within Java, C, C++, and Objective-C applications. It’s possible to run validations on portions of a project; here’s an example of validating a branch called “feature” and a “main.”

Figure 2.1 - Differential Workflow of Infer applied to a feature branch

RacerD was a project that stemmed from the need for concurrency analysis at scale. Engineers at Meta released an MVP in 2017 with the following features in mind:

  • Prioritizing a high signal detection ensures that numerous alerts don't overwhelm developers. High false positives can lead to "alert fatigue," where developers begin to ignore warnings, potentially overlooking real issues.
  • An interprocedural analysis approach indicates a deeper view of the codebase. By tracking data races through nested procedure calls, the tool can understand the context better, capturing the bigger picture of potential data races in larger, modular projects.
  • Eliminating the need for manual annotations streamlines the analysis process. Manual annotations can be error-prone, tedious, and become outdated as code evolves. Automating this or using tools that infer these relationships ensures consistency and reduces maintenance overhead.
  • Speed is crucial in modern software development, especially with CI/CD pipelines. Rapidly analyzing and reporting on a massive codebase ensures the development workflow remains uninterrupted, promoting higher code quality and faster release cycles.
  • Differentiating between joint (coarse-grained locking) and rare (fine-grained synchronization) in product code underscores the primary focus on understanding and analyzing coarse-grained locking. This pragmatic approach ensures the most frequent potential issues are addressed first, leading to more stable software.

The resulting tool showcases the possibility of static concurrency analysis for rapidly evolving, vast codebases like Meta's. Its genius is automating the understanding of threads, locks, and memory, eliminating the need for manual, error-prone human input. This automation seamlessly integrates into multiple developers' workflows. Core features like the News Feed, which preemptively addressed over 1,000 concurrency concerns, wouldn't be sustainable without such efficiency and scale in a concurrent environment. This represents the true impact of next-gen analysis.

You may have noticed that Infer isn’t available for languages other than C, C++, Java, and Objective-C. As JavaScript remains a dominant language in the web vertical, Meta needed a testing framework that addressed this segment - Jest.

Jest was created with simplicity in mind and currently supports projects that use the most popular runtimes, such as Node.js. Jest optimizes testing efficiency by ensuring tests maintain a unique global state, enabling parallel execution without hiccups. Jest smartly prioritizes previously failed tests to expedite the process and adjusts the test sequence based on file execution times. This strategy saves time and ensures the most crucial issues are addressed first, showcasing intelligent test management.

📝 References

  1. Analyzing Time Series for Pinterest Observability
  2. Meta developer tools: Working at scale
  3. Goku: Building a scalable and high-performant time series database system
  4. Sapling: Source control that’s user-friendly and scalable
  5. Build faster with Buck2: Our open-source build system
  6. A tool to detect bugs in Java and C/C++/Objective-C code before it ship
  7. Open-sourcing RacerD: Fast static race detection at scale
  8. Jest is a delightful JavaScript Testing Framework with a focus on simplicity.
Most popular