Author Archives: Kim Loughead


New Release from Driven, Inc. Delivers End-to-End Monitoring and Management for Hadoop and Spark Applications Running in Any Environment, Anywhere

SAN FRANCISCO – Jun. 21, 2016 – Driven Inc., formerly Concurrent, Inc., the leader in Big Data application performance monitoring (APM), today announced the next version of Driven. Driven 2.2 delivers to customers the ability to monitor and manage multi-tenant heterogeneous Hadoop and Spark environments, that are deployed anywhere, within a single solution.

Driven Cloud, a component of the Driven 2.2 software, is the first SaaS offering for Big Data application monitoring. Driven Cloud monitors and manages applications running on Hadoop as a service, Hadoop or Spark on premise or both. It is designed to serve the needs of organizations that need the power of Driven to quickly troubleshoot, optimize and monitor their Hadoop data processes and the simplicity and low setup costs of a cloud solution.

The Big Data landscape is getting more complex, with new open source projects and products being announced almost weekly. With little industry standardization, DevOps teams are faced with managing and monitoring multiple technologies as organizations implement the technologies best suited to solve specific business problems.

Driven 2.2 helps DevOps teams gain control over the increasing chaos of large scale Hadoop and Spark implementations by enabling high fidelity monitoring of Apache Hive, MapReduce, Cascading, Scalding, Apache Spark applications and related Big Data technologies. No other single solution on the market delivers this level of coverage to empower teams with continuous visibility and traceability from start to finish.

Key features of Driven 2.2 include:

General Availability of Apache Spark Performance Monitoring: Monitor Spark Core, MLLib, or Spark Streaming applications by visualizing application logic and collecting all the key performance and operational metrics. Enabling developers and data scientists to visualize diagnose and fix their own performance anomalies and operations teams to quickly identify anomalous processes and their owners, allowing greater control of Spark resources on a multi-tenant cluster.

Driven Cloud: Organizations of any size can use Driven to build, monitor and manage their Big Data applications with minimal set up time. Driven Cloud users simply install the agent or plugin to start sending application performance data to the Driven Cloud Service. Driven Cloud supports all the major distributions both on premise (ex. Cloudera, Hortonworks, MapR) and cloud services (ex. Amazon EMR, Microsoft Azure, IBM BlueMix, Qubole, Altiscale).

Additional Enterprise Capabilities:

Together with Driven’s ability to bring business, organization, and performance context to each and every data process, DevOps and Administrators can manage user access to Driven with their directory service, thereby segment their own scheduled and ad-hoc transactions without involving the central administrators.

With this convenient secure access, Driven 2.2 is accompanied with a redesigned user interface allowing users to quickly find what they are looking for, analyze historical executions, share their findings with colleagues, and create custom dashboards.

To learn more about Driven and how customers, such as Expedia and LiveIntent, are using Driven to optimize performance and reduce time-to-market, register for our July 20th webinar. In 30-min, we will go through some common ways our customers use Driven and a quick demo so you can see Driven in action.

Driven is available to try for free at For pricing and more information, visit or email us at

Supporting Quote
“As organizations scale their use of Hadoop and Spark through their own data centers, through cloud service providers, or inevitably both, its becoming more and more difficult for DevOps teams to understand and react to what is happening or what has happened. The future of data processing is both in the cloud and on premise, and Driven is uniquely positioned to deliver a comprehensive data processing monitoring and management platform for next generation enterprise data.
Gary Nakamura, CEO, Driven, Inc.

“Driven Cloud was so simple to implement and gives us visibility into the performance of our applications whenever we need it. This helps us ensure our applications are consistently delivering data on-time to the business so they, in turn, can use it to better serve our customers.”
Eric Raab, EVP, Engineering, LiveIntent

Supporting Resources
• Driven:
• Cascading:
• Company:
• Contact us:
• Twitter:
• LinkedIn:

About Driven, Inc.
Driven, Inc., formerly Concurrent, is the leader in data application infrastructure, delivering products that help enterprises create, deploy, run and manage data applications at scale. The company’s flagship enterprise solution, Driven APM, was designed to accelerate the development and management of enterprise data applications. Driven is the team behind Cascading, the most widely deployed technology for data applications with more than 500,000 user downloads a month. Used by thousands of businesses including eBay, Etsy, The Climate Corp and Twitter, Cascading is the de facto standard in open source application infrastructure technology. Driven is headquartered in San Francisco and online at

Media Contact
Kim Loughead
VP, Marketing, Driven, Inc.
(415) 813-1010

The Driven team is excited to be sponsoring Hadoop Summit San Jose

We will be at Hadoop Summit San Jose from 6/28 – 6/30. If you are planning on going but have not registered, use discount code: 16SJspO20 for a 20% registration discount.

While you are there, stop by Booth E10 to

  • See a demo Driven 2.2, latest version of Driven APM, that monitors Hive, MapReduce, Spark and Cascading applications within a single solution
  • Meet with our Driven and Cascading technical experts to ask your questions
  • Enter our drawing for two chances to win an Amazon Echo.

We hope to see you there!Driven Amazon Echo Giveaway

We are happy to announce Cascading 3.1 is now available for download.

Version 3.1 improves the performance of Cascading over 3.0, resolves a number of issues during planning of complex workloads when running on MapReduce and Apache Tez, and further delivers on the promise of new platform portability with the addition of Apache Flink as an execution platform.

See the details about what is included in this latest release, see our news post on

More organizations are successfully moving their Hadoop big data applications to production and leveraging the power of their big data. A recent study by DNV GL – Business Assurance and research institute GFK Eurisko concluded that some organizations are starting to see results in the areas of greater efficiency, improved decision making, and more effective customer engagement. The level of growth big data can provide means these applications have graduated from experiments to first class citizens that enterprises need to ensure are running at peak performance and that any failures are minimized. Because of how complex the Hadoop environment can become, DevOps teams and Development Managers have a set of nightmarish challenges ahead of them, regardless of what platform they build upon or what tools they use to monitor performance (like Apache Ambari).

Expectations for management and DevOps teams are extremely high, and they must proactively troubleshoot technical issues to prevent any downtime. Big data applications are pulling data from multiple data sources across from many systems using various technologies that makes troubleshooting a monumental challenge to tackle.

To achieve peak operational performance, the right tools must be considered to have the right visibility and give teams the necessary data to press on. With a whole slew of big data monitoring solutions to choose from, how do you go about picking which ones will work for your organization?

Much like Cloudera Manager, organizations will often turn to Apache Ambari for cluster performance monitoring and management. However, as organizations scale up their Hadoop use, it because increasingly untenable to only have visibility into the cluster performance and have no visibility into how the applications running on the cluster(s) are performing. For a deeper analysis at the application level, DRIVEN is the way to strengthen troubleshooting beyond the cluster.

Is Monitoring the Cluster Enough?

Apache Ambari will tell you when something has failed, but it won’t help you understand why it failed. When exploring how to optimize your big data applications, looking at the cluster is only one piece of the puzzle. Figuring out why an application failed is running slow is the other piece that is simply missing from cluster monitoring solutions like Apache Ambari. The ability to quickly identify problem applications and why they failed without diving into log files can help DevOps fix issues with better efficiency.

Eventually, Driven can help DevOps teams empower their customers with self-service so they can see the status of the applications and do some triage without having to reach out to the operations teams. By answering questions like “Why is my application running slow today?” or “Did my application complete?”, Driven can enable dev teams to answer the majority of the common questions themselves, reducing the inquiries coming into the operations team.

Distribution Compatibility of DRIVEN & Apache Ambari

In most enterprises, multiple distributions are often used as different departments start independent projects. Apache Ambari would, for example, be used to monitor a Hortonworks distribution. It is not until these organizations move to production does the complexity using multiple monitoring solutions across different distributions becomes a hindrance to scaling your big data implementation.

With this issue in mind, DRIVEN was designed to be “technology agnostic,” so it can be installed on any distribution so you can monitor the applications running on all your clusters, regardless of distribution, within a single solution. Being able to see what is running on all your clusters is key to managing multi-tenant, multi-technology deployments. As you will get a macro level view of who and what are consuming the most resources, which applications are the largest and how long they take to complete and when the peak utilization times are on your clusters so you know when you need to limit submission of ad-hoc jobs.

Is There Anything Apache Ambari Can Do That Driven Can’t Do?

To be clear, DRIVEN is not meant to replace Apache Ambari. DRIVEN is not a cluster monitoring or deployment solution but it is designed to work with existing solutions in place, like Apache Ambari.

DRIVEN is meant for application monitoring and Apache Ambari for monitoring the cluster. DRIVEN is the best way to know how your big data applications running on your cluster are performing and Apache Ambari is an excellent solution to know how healthy your cluster is and how it is performing.

Hadoop Performance Management: What’s More Important… The Cluster Or The App?

Apache Ambari is an excellent choice if all you need is to manage the cluster. But for a more comprehensive big data application monitoring and management strategy, monitoring the cluster is just not enough. In tandem with DRIVEN’s application performance management tools, a granular approach is going to give better answers and help you optimize your Hadoop big data applications.

For more information about how DRIVEN compares to Apache Ambari, please download our free Hadoop Performance Monitoring Tools Comparison Guide.

Click The Button To Download A Full Guide To Hadoop Monitoring Tools


Enterprises that are using big data applications to successfully accelerate their growth will quickly find themselves leaning on their solution more than they ever imagined. For managers and practitioners in the DevOps department, this business-wide reliance on big data can be both a blessing and a curse.

In a business environment where 0% downtime is now the expectation, efficient troubleshooting and correction of technical issues with software is critical. Because data is coming from multiple sources, there is an amalgam of technologies that come together to execute a big data application workflow, including multiple monitoring tools (like Cloudera Manager, for instance). As a result, troubleshooting performance issues can be especially difficult.

The natural response is to implement whatever tools get the data you need in order to move forward. Several software companies have developed the tools they think answer enterprise big data needs, but the Hadoop landscape is a very complex world, how do you know which software solution is right for your installation?

For the longest time, managers and DevOps teams have relied on monitoring things just at the cluster level rather than a deep dive into how the Hadoop applications are performing. Cloudera Manager has one of the best solutions for monitoring the cluster, but this only solves a specific set of questions. To scale your production implementation of Hadoop or Spark, you need a more in-depth look into application performance and go beyond cluster performance monitoring. Here’s how DRIVEN provides the application performance visibility you need than just relying on Cloudera Manager.

Comparing Cloudera Manager To DRIVEN Is ‘Apples to Oranges’ For Application Monitoring

It’s only natural to compare Cloudera Manager and DRIVEN for monitoring big data application performance. But there is no one size fits all for the needs of every organization and these two solutions just don’t do the same thing.

As the old idiom goes, comparing apples to oranges is an oversimplified – but accurate – way to describe the differences between Cloudera and DRIVEN. Although when you take into consideration how each software performs, the correct comparison might actually be comparing a magnifying glass to an electron microscope. While one of the tools (Cloudera) is perfect for looking at performance issues on the surface, the other (DRIVEN) gives you an extremely granular look, allowing your DevOps team to better pinpoint and correct performance issues at the source of the problem.

Cloudera Manager is built specifically for cluster performance monitoring while DRIVEN’s specific purpose is to monitor the performance of all your big data applications. Monitoring the cluster is important, and you will be able to learn when something has failed using Cloudera – but you will not be able to determine why it failed. With an application monitoring tool like DRIVEN, your team will be provided answers for where it failed, why it failed, who is responsible, its business priority, and the downstream impact of that failure.

The easiest way to think about these two is understanding what level of visibility each one offers:

  • Cloudera = Macro level visibility and management
  • DRIVEN = Micro level visibility for more proactive analysis

Is There Anything Cloudera Can Do That Driven Can’t Do?

Because cluster and application monitoring are so different from one another, DRIVEN is the best way to know how your big data applications are performing. It does not monitor node health, CPU utilization, I/O utilization and more among other cluster monitoring factors.

Another thing that DRIVEN was not designed to do is act as a cluster deployment solution. Tools which already come with your particular distribution are designed to deploy their technology. For cluster performance management, Cloudera Manager or any other similar solution could fill this need. It is these differences between application and cluster performance monitoring that Cloudera Manager in combination with DRIVEN would provide next level visibility and complement each other quite well.

Hadoop Performance Management: The Cluster Or The App? Which Is More Important To You?

Cloudera Manager continues to be one of the market leaders for Apache Hadoop big data management space. However, it was not designed to be an appropriate tool for monitoring application performance.

When application performance fails completely (or even suffers), lack of visibility into the problem can be terrible for DevOps professionals. In order to pinpoint a problem in your big data application and optimize it efficiently, DRIVEN is the best software on the market to understand and improve the performance of your application.

To learn more about how DRIVEN compares to other Hadoop performance monitoring tools, download the full guide.

Click The Button To Download A Full Guide To Hadoop Monitoring Tools


The Big Data technology landscape seems to grow daily with new solutions to solve almost any problem.

Most organizations use, on average, five different technologies to run a single data processing application including Hive, Spark, Oozie, H2O, etc.

The challenge for Hadoop operations teams is how do you manage and monitor these data applications to ensure service levels are met?

In this 30-min webinar, we will go through 4 things you need in order to deliver Hadoop operations excellence. You will see how you can:

  • Monitor the performance of multi-technology data processing applications
  • Quickly troubleshoot application performance without “log-diving” or looking through lines of code
  • Understand how your business teams are using Hadoop to see who and what are consuming the most resources
  • Enable collaboration between dev, ops and other Hadoop team members to resolve issues faster

Don’t miss this informative webinar if you are responsible developing, managing or monitoring your organizations Hadoop or Spark implementations.


SAN FRANCISCO – March. 8, 2016 – Driven, Inc., the leader in Hadoop application performance management, today announced the latest release of Driven, the industry’s leading application performance management solution for monitoring and managing enterprise-scale Big Data applications.

Key features of Driven 2.1 include:

Deeper Support for Hive: You can now monitor all the queries running on your Hadoop infrastructure, including HiveServer. You can drill down into details about a specific query’s performance. Driven surfaces statistics about the health of the Hive queries such as uptime, the number of executions, the number of failed executions, etc.

New Application Details Information Panels: Summary performance metrics and important operation data about each job execution are easily accessible so all relevant information about the application is now located in a single place.

Enhanced Visibility into State Transitions: You can see how your data processes are moving from pending, to start, to running, and finally finished, accompanied with key performance metrics, and relevant metadata.

Driven is available at For pricing and more information, email

Supporting Resources

Offloading ETL processes to Hadoop is often one of the first Big Data efforts because of the obvious ROI benefits. However, you have hundreds, maybe thousands, of legacy ETL processes to migrate which makes achieving the benefits of Hadoop and ROI a distant goal.

What if you could automatically convert up to 70% of your existing ETL processes to run on Hadoop with no code changes?

In this 60-min webinar replay, you will hear from our partners, Bitwise, how leading Fortune 500 companies are accelerating their ETL offloading to Hadoop projects to achieve ROI in months, not years. You will see:

  • A detailed walk-through of migrating existing ETL processes to Hadoop without changing anything
  • How you can cut development time of new ETL process on Hadoop by up to 50%
  • How you can leverage your existing developers’ Java skills to turn them into Hadoop developers
  • Best practices for monitoring the performance of your ETL processes to ensure you meet your service level agreements

You should watch this webinar replay if you are responsible for your organization’s data warehouse, middleware, big data infrastructure or information management systems.


The truth is that no application will maintain perfect performance in its lifetime. We all know application performance can be influenced by a number of environmental variables and by other applications behavior. Optimizing performance is something that should (but not always does) happen during the development cycle but needs to continue throughout the lifetime of the application.

The reality is troubleshooting Hadoop application performance is complex and extremely time-consuming to do in both Hadoop and Spark.

This is because the information you need to effectively troubleshoot Hadoop application performance is scattered in multiple and disconnected systems. Think about the amount of time you’ve spent sifting through resource manager and log files trying to find out if you have a problem with the application code or if there is a resource contention issue. Finding these needles in the haystack can be done, but there is no getting around how much time it can eat up.

Unfortunately, the current method is complex for a reason because there is no consistency in methodology across frameworks and compute fabrics, i.e. MapReduce, Hive, Spark, Pig, etc. You may have some troubleshoot and monitoring capabilities in one framework but they do not translate to another. Since most data workflows/pipelines use multiple technologies, this makes troubleshooting Hadoop application performance for the entire workflow almost impossible. To simplify Hadoop troubleshooting, you need to change where and what you monitor. Meaning, in addition to monitoring the cluster, you need to be able to monitor the applications running on the cluster. By moving up a level from jobs/tasks to applications, you gain a number of key capabilities necessary to manage a production environment, at scale:

  • The ability to visualize the status of your entire environment

    You must be able to see the status of all your production applications in a single view so you can quickly understand what is running, what has failed, who is consuming what resources and how your environment is performing overall.

  • driven homepage 2.0 1283x1083

  • The ability to visualize how an application actually performs at run-time

    Building Big Data applications is complex because you have to deal with multiple technologies, data sources, joins, filters, reads/writes, oh my! You need to be able to visualize the application end-to-end to ensure it is behaving as you expected and not doing something that may impact performance. During development, this visualization is critical to debugging your application and troubleshooting Hadoop application performance. The last thing anyone wants is a new application being deployed that crashed the cluster when it scales up.

  • hive query dag 1283x1083

  • The ability to compare present and past performance

    Visualizing the performance of a single run is useful for debugging an application but for performance management, you need to be able to see performance trends to understand if you are at risk for hitting your service levels thresholds. Being able to see historic performance also enables you to understand when a performance degradation began and what happened at that time. For example, was a new version of the application published or did another team deploy an application that is consuming too many resources or did the size of the dataset change.

  • app details view 2.0 1283x1083

  • The ability to visualize how each step/job/task is performing and the resources consumed

    To get to the next level down for finding the root cause of a performance issue or failure, you need to be able to see the performance statistics at each step and quickly identify the step that has failed. Additionally, in the case of a failure, you need to be able to determine the dependent steps that will not be executed.

  • app performance view details 1283x1083

  • The ability to associate vital operational metadata to an application execution

    To be able to operate your production environment and meet business service levels, you need to be able to connect business context to your application executions. For example, when something has happened understanding who is responsible for the application, i.e. the team and the owner, and knowing the priority of the application so you know the business criticality and can make prioritization decisions quickly. Additionally, knowing the upstream and downstream dependencies of the application so you know what other applications are impacted. This is especially critical if the failed application prepares data for several downstream analytics processes. Failures like this can quickly escalate into multiple services level violations and missing datasets. By connecting performance monitoring to business context, you can quickly get ahead of these issues, potentially, before they impact the business and your reputation.

  • app view summary 2.0 1283x1083


    An “Easy Button” for Hadoop Troubleshooting and Application Performance

    The screenshots above are all from Driven APM which is a performance management solution specifically designed to monitor and manage Big Data applications. Unlike platform specific tools, Driven manages and monitors all of your Big Data applications, including MapReduce, Hive, Cascading, Scalding, Spark, and Pig, in one solution. Let us help you can take the guesswork (and hard work) out of troubleshooting, managing and monitoring your Big Data applications starting today. Sign up for a free trial of Driven APM.

    About the Author: Kim is VP of Marketing at Driven, Inc., providers of Big Data technology solutions, Cascading and Driven. Cascading is an open source data processing development platform with over 500,000 downloads per month. Driven is an application performance management solution to monitor, troubleshoot and manage all your Big Data applications in one place.

Get Better Visibility into Your Hive and MapReduce Application Performance


Apache Hive queries and MapReduce jobs often experience performance issues and bottlenecks because of the multi-tenant nature of Hadoop and a lack of visibility into performance.

This deep dive demo of Driven, performance management for Big Data applications, show you how you can get the visibility you need to meet business service levels.Bad things happen with poor visibility

In 30-min, you will learn how you can:

  • Understand who is consuming what resources
  • Get visibility into the performance of your queries and jobs
  • Segment your environment so teams can better manage their own applications and resource utilization
  • Understand the business context of the queries/jobs to align service levels to business priorities
  • Identify queries/jobs not using organization best practices or complying to operational policies
  • Driven is application performance management for MapReduce, Hive, Pig, Cascading, Scalding, Cascalog and Spark applications.