• Home

  • Custom Ecommerce
  • Application Development
  • Database Consulting
  • Cloud Hosting
  • Systems Integration
  • Legacy Business Systems
  • Security & Compliance
  • GIS

  • Expertise

  • About Us
  • Our Team
  • Clients
  • Blog
  • Careers

  • CasePointer

  • VisionPort

  • Contact
  • Our Blog

    Ongoing observations by End Point Dev people

    Little Spree Big Performance Problems

    Marina Lohova

    By Marina Lohova
    August 5, 2013

    Recently I worked on an online food store serving an area with very little infrastructure. As a result, the orders tended to be really big with lots of products.

    The website worked in the following environment:

    • Ruby 1.9.2
    • Spree 0.60
    • Heroku, Bamboo stack
    • PostgreSQL 9.2.4

    H12 timeout errors

    The performance problems started when we migrated Bamboo to Cedar on Heroku and replaced Thin webserver with Unicorn. We started getting a lot of Heroku Request timeout errors - H12:

    The problems happened mostly when logging in to admin dashboard or during the checkout for the certain orders. H12 errors occur when a HTTP request takes longer than 30 seconds to complete. For example, if a Rails app takes 35 seconds to render the page, the HTTP router returns a 503 after 30 seconds and abandons the incomplete Rails request for good. The Rails request will keep working and logging the normal errorless execution. After completion, the request will indefinitely hang in the application dyno.

    We started debugging H12: we set Unicorn timeout to 20 seconds to prevent the runaway requests and installed the rack-timeout gem with the timeout of 10 seconds to raise an error on a slow request. It all came down to a trivial database timeout!

    The application source has not changed during the transition from Bamboo to Cedar, but apparently Cedar/Unicorn is much more sensitive to the troubled code. Below is the list of performance bottlenecks and solutions in Spree. Some of them exist in version 0.60 only, but a lot of them are still present in Spree 1.x, which means that your application may have them too.

    Issue #1: Real-time reports

    Let’s take a closer look at the earlier database timeout code. It came from Admin Dashboard and admin/overview_controller.rb.

    The “Best Selling variants" report was being calculated real-time right in the web process:

    def best_selling_variants
      li = LineItem.includes(:order).
          where("orders.state = 'complete'").
          sum(:quantity, :group => :variant_id, :limit => 5)
      ...
    end
    

    Record counts in the database are large enough to crash the application on Heroku with the database timeout:

    irb(main):001:0> LineItem.count
    => 162279
    irb(main):002:0> Order.count
    => 13027
    irb(main):003:0> Variant.count
    => 14418
    

    Other reports on the Dashboard experience the same problem. They would cause the timeout in turns when logging into Admin:

    • top_grossing_variants
    • best_selling_taxons
    • last_five_orders
    • biggest_spenders

    Solution

    Fortunately, in Spree 1.x the internal reporting system has been replaced with Jirafe:

    If switching to Spree 1.x is not an option, another way is to move the calculation into a background job, using, for example, delayed_job gem and Heroku Scheduler Addon:

    task :statistics => :environment do
      Delayed::Job.enqueue StatisticsJob.new
    end
    

    Issue #2: Large numbers

    It is an established fact that humans eat a lot! Think about an order of a thousand Heineken 6-pack cans…

    Or even something like this:

    or this:

    Spree, both 0.60 and 1.x, proved to have a huge problem if an order has a lot of line items and/or a large quantity of single line items. The potentially dangerous code can be found all over the place. Consider the following example:

    class InventoryUnit < ActiveRecord::Base
      def self.destroy_units(order, variant, quantity)
        variant_units = order.inventory_units.group_by(&:variant_id)[variant.id].sort_by(&:state)
        quantity.ceil.times do
          inventory_unit = variant_units.shift
          inventory_unit.destroy
        end
      end
    end
    

    Now imagine what will happen with the order from the first screenshot. We have 15100 inventory units for that one. They will be meticulously destroyed from the inventory one by one in a loop after the checkout. This method was born to crash the application!

    Solution

    A simple mindful refactoring was enough to solve the problem for me. There is no need to call “destroy” in the loop for every single inventory unit because we can use the efficient “destroy_all” method. I’m sure this can be optimized further, but it was enough to get rid of the timeout:

    def self.destroy_units(order, variant, quantity)
      variant_units = order.inventory_units.
        group_by(&:variant_id)[variant.id].
        sort_by(&:state)
      variant_units = variant_units.shift(quantity.ceil)
    InventoryUnit.
        where(:order_id => order.id,:variant_id => variant.id).
        order('state asc').limit(quantity.ceil).
        destroy_all
    end
    

    Issue #3: Real-time emails

    All emails in Spree are sent in real-time.

    class Order < ActiveRecord::Base
      def finalize!
        ...
        OrderMailer.confirm_email(self).deliver
        ...
      end
    end
    

    Why is it bad? Let’s look at the following example from my application:

    def confirm_email(order)
      attachments["invoice.pdf"] = {
        'Content-type' => 'application/pdf',
        :content => OrderInvoice.new.to_pdf(order)}
    
      mail(:subject  => 'Order #' + order.number,
           :from   =>    Spree::Config[:order_from],
           :to => order.email)
    end
    

    In my application “confirm_email” was overridden and generated the PDF invoice. The invoice listed all the products in the order and had about 200 lines in it. Again, it lead to the H12 timeout error.

    Solution

    All emails should be sent in the background rather than in the web request. First, because network operations can take a long time, and second, because generating an email can also be slow. For example, sending in the background can be accomplished using delayed_job gem:

    OrderMailer.delay.confirm_email(self)
    

    Issue #4: Lazy-loading

    Ecommerce objects are usually complicated with a lot of associations. This is totally fine as long as you eager-load the associations that will be used most with the loaded object later on. In most cases, Spree does not preload associations for its orders. For example, in spree/base_controller.rb:

    @order = Order.find_by_number! params[:order_id]
    

    As the result, here is what I see in server console while loading the order display page on the frontend:

    …And five more screens like this! The associations of the order — line items, variants and products — generate an additional chain of queries to the database during lazy-loading.

    Solution: Eager-loading

    If I modify the line from the controller like this…

    @order = Order.where(:number => params[:order_id])
             .includes(:line_items => {:product => :taxons, :variant => [:product, :option_values]})
             .includes(:adjustments).first
    

    …SQL queries will shrink down to this:

    Eager-loading did the trick. Of course, not everything needs to be loaded eagerly, and the solution varies from case to case. It worked like charm in my case.

    Issue #5: Dangerous code all over the place

    There a lot of places in the Spree source code that are not optimized for performance. We don’t need to look far for an example, because there is another killer method right near the “destroy_units” one we inspected earlier!

    class InventoryUnit < ActiveRecord::Base
      def self.create_units(order, variant, sold, back_order)
        shipment = order.shipments.detect {|shipment| !shipment.shipped? }
        sold.ceil.times {
          order.inventory_units.create(:variant => variant, :state => "sold", :shipment => shipment)
        }
        back_order.ceil.times {
          order.inventory_units.create(:variant => variant, :state => "backordered", :shipment => shipment)
        }
      end
    end
    

    And then:

    order.line_items.each do |line_item|
      back_order = determine_backorder(order, variant, quantity)
      sold = quantity - back_order
      create_units(order, line_item.variant,  sold, back_order)
    end
    

    I received a timeout either in the “sold” or the “backorder” loop, but there wasn’t a time when I didn’t receive a timeout!

    Solution

    def self.create_units(order, variant, sold, back_order)
      shipment = order.shipments.detect {|shipment| !shipment.shipped? }
      values = sold.ceil.times.to_a.map { "(#{order.id},#{variant.id},'sold',#{shipment.id})" } +
        back_order.ceil.times.to_a.map { "(#{order.id},#{variant.id},'backordered',#{shipment.id})" }
      values.in_groups_of(500, false) do |group|
        InventoryUnit.connection.execute("INSERT INTO inventory_units(order_id, variant_id, state, shipment_id) VALUES #{group.join(',')}")
      end
    end
    

    Another example: every line item has the after_create and after_save callbacks. The callback invokes the Order.update! method. Order.update! method calls update_totals method…twice throughout the method.

    def update_totals
      self.payment_total = payments.completed.map(&:amount).sum
      self.item_total = line_items.map(&:amount).sum
      self.adjustment_total = adjustments.map(&:amount).sum
      self.total = item_total + adjustment_total
    end
    

    Now imagine the order from the second screenshot with a lot of line items. During the checkout “update_totals” will be called each time the line item is saved. Typically, this line would produce a timeout, because the “line_items” association was, of course, not preloaded:

    line_items.map(&:amount).sum
    

    I couldn’t list every circumstance like that, because it would require a lot of context to explain the catch, but I can still say many times …

    No long-running tasks in the web request!

    Be it Spree, Heroku, or any other context, environment or platform, please, never do the following in the web process:

    • (!!!) Heavy database usage (slow or numerous queries, N+1 queries)
    • Sending an email
    • Accessing a remote API (posting to Twitter, querying Flickr, etc.)
    • Rendering an image or PDF
    • Heavy computation (computing a fibonacci sequence, etc.)

    Say “No” to all these things to ensure a much happier life for your application!

    heroku performance ruby rails spree pdf


    Comments