Restarting Prism – Version 4.1.x and Later

One of the most popular posts on my blog is “Nutanix: Restarting Prism”, the Nutanix user interface which I wrote way back 2015. Three years is a long time in technology, especially in the hyper-converged world so I thought it was time for a quick update.

The following will enable you to restart prism services on nodes running AOS 4.1.x or later

Determine the current Prism Cluster Leader

Connect via ssh to any AOS host in your cluster and run:

ssh admin@{any_node_ipaddress}
Nutanix Controller VM
admin@{any_node_ipaddress}'s password: (default: nutanix/4u)
curl http://0:2019/prism/leader && echo

Output


{"leader":"{node_leader_ipaddress}:9080", "is_local":true}

{node_leader_ipaddress} is the active cluster leader for the prism service. If you’re connected to the leader AOS host, “is_local” will be true. In this case, we need to ssh to another AOS host to restart the service.

Stop the Prism Service on the Leader

Connect via ssh to the prism leader AOS host and run:


ssh admin@{node_leader_ipaddress}
Nutanix Controller VM
admin@{node_leader_ipaddress}'s password: (default: nutanix/4u)

genesis stop prism

Output


2018-09-09 18:46:55.797151: Stopping prism (pids [6579, 6607, 6608, 6645, 23894, 23933])</strong>
2018-09-09 18:46:56.380999: Services running on this node:
  insights_data_transfer: [6140, 6237, 6238, 6264, 6266, 6267, 6268]
  cluster_health: [2016, 2017, 2108, 2109, 2111, 2112, 2114, 2115, 2119, 2120, 2122, 2123, 2134, 2152, 7234, 7259, 7260, 7938, 8013, 8014, 10985, 10986]
  nutanix_guest_tools: [6980, 7025, 7026, 7038]
  pithos: [5816, 5877, 5878, 5936]
  cerebro: [6167, 6307, 6308, 6487]
  delphi: [7924, 7979, 7980, 7981]
  aplos_engine: [7745, 7805, 7806, 7807]
  uhura: [6728, 6859, 6860, 6861]
  acropolis: [6686, 6806, 6807, 6809]
  cluster_config: [7721, 7778, 7779, 7780]
  alert_manager: [6614, 6676, 6677, 6762]
  stargate: [6131, 6207, 6208, 6370, 6371]
  foundation: []
  curator: [6252, 6353, 6354, 6443]
  genesis: [2231, 2252, 2275, 2276, 3733, 3735]
  lazan: [7894, 7970, 7971, 7973]
  insights_server: [6134, 6197, 6198, 6321]
  minerva_cvm: [7717, 7749, 7750, 7752, 7888]
  snmp_manager: [6775, 6890, 6891, 6892]
  mantle: [5820, 5904, 5905, 5952]
  catalog: [6678, 6798, 6799, 6800]
  hera: [5825, 5928, 5929]
  chronos: [6211, 6325, 6326, 6372]
  sys_stat_collector: [6813, 6928, 6929, 6931]
  secure_file_sync: [5332, 5388, 5389, 5390]
  ergon: [6146, 6258, 6259, 6261]
  arithmos: [6619, 6718, 6719, 6893]
  dynamic_ring_changer: [5812, 5871, 5872, 5946]
  prism: []
  zookeeper: [2483, 2513, 2514, 2520, 2575, 2592]
  aplos: [7890, 7920, 7921, 7922, 8058, 8060]
  scavenger: [3611, 3640, 3641, 3642]
  ssl_terminator: [5328, 5360, 5361, 5362]
  janus: [6862, 6967, 6968]
  cim_service: [6610, 6666, 6667, 6683]
  tunnel_manager: [6832, 6959, 6960]
  cassandra: [2038, 5461, 5625, 5655, 5656]

All other services stay running, there should be no disruption to cluster storage services.

Start the Prism Service on the Leader

Stay connected to the prism leader AOS host and run:


cluster start
[sudo] password for admin:

Output


2018-09-09 18:47:42 INFO zookeeper_session.py:112 cluster is attempting to connect to Zookeeper
2018-09-09 18:47:42 INFO cluster:2609 Executing action start on SVMs {node_1},{node_2},{node_3}
Waiting on {node_1} (Up) to start:
Waiting on {node_2} (Up) to start:
Waiting on {node_3} (Up, ZeusLeader) to start:  Prism

Waiting on {node_1} (Up) to start:
Waiting on {node_2} (Up) to start:
Waiting on {node_3} (Up, ZeusLeader) to start:  Prism

Waiting on {node_1} (Up) to start:
Waiting on {node_2} (Up) to start:
Waiting on {node_3} (Up, ZeusLeader) to start:  Prism

Waiting on {node_1} (Up) to start:
Waiting on {node_2} (Up) to start:
Waiting on {node_3} (Up, ZeusLeader) to start:  Prism

Waiting on {node_1} (Up) to start:
Waiting on {node_2} (Up) to start:
Waiting on {node_3} (Up, ZeusLeader) to start:  Prism

Waiting on {node_1} (Up) to start:
Waiting on {node_2} (Up) to start:
Waiting on {node_3} (Up, ZeusLeader) to start:  Prism

Waiting on {node_1} (Up) to start:
Waiting on {node_2} (Up) to start:
Waiting on {node_3} (Up, ZeusLeader) to start:  Prism

Waiting on {node_1} (Up) to start:
Waiting on {node_2} (Up) to start:
Waiting on {node_3} (Up, ZeusLeader) to start:  Prism

Waiting on {node_1} (Up) to start:
Waiting on {node_2} (Up) to start:
Waiting on {node_3} (Up, ZeusLeader) to start:  Prism

Waiting on {node_1} (Up) to start:
Waiting on {node_2} (Up) to start:
Waiting on {node_3} (Up, ZeusLeader) to start:  Prism

Waiting on {node_1} (Up) to start:
Waiting on {node_2} (Up) to start:
Waiting on {node_3} (Up, ZeusLeader) to start:  Prism

Waiting on {node_1} (Up) to start:
Waiting on {node_2} (Up) to start:
Waiting on {node_3} (Up, ZeusLeader) to start:  Prism

Waiting on {node_1} (Up) to start:
Waiting on {node_2} (Up) to start:
Waiting on {node_3} (Up, ZeusLeader) to start:  Prism

Waiting on {node_1} (Up) to start:
Waiting on {node_2} (Up) to start:
Waiting on {node_3} (Up, ZeusLeader) to start:  Prism

Waiting on {node_1} (Up) to start:
Waiting on {node_2} (Up) to start:
Waiting on {node_3} (Up, ZeusLeader) to start:  Prism

Waiting on {node_1} (Up) to start:
Waiting on {node_2} (Up) to start:
Waiting on {node_3} (Up, ZeusLeader) to start:

Prism should now be operational. Refresh the browser.

Hope this helps! Until next time.

 

Advertisements

Nutanix – Redeploying nodes with Foundation

Problem

I recently deployed a Nutanix cluster which was a combination of old and new 3000 series nodes. Setup of the new machines went according to plan, however I found the old servers were reporting incorrect block IDs and/or positions.

For those who aren’t familiar with 3000 series systems, the product is Nutanix badged SuperMicro hardware, 4 servers in 2U of rack space. Each chassis has 4 slots (A-D).

Photo: 3000 series nodes

Solution

To fix the issue you need to update the factor settings on the Controller VM (CVM). Here are the steps:

  1. Document your node locations, double and triple check.
  2. Step two varies based on your hypervisor. In my case I was configuring an ESXi cluster so I needed to Foundation before I could expand the cluster.
  3. Run a scan in Foundation. Cross reference this with your doco from step 1.
  4. If it doesn’t match, edit the “factory_config.json” file which is found in “/etc/nutanix/” directory on each node. This assumes you’ve already configured your IPMI or have direct access to the physical equipment.
  5. Update the “rackable_unit_serial:”, “node_position:” or both to match your layout. Caution! Json files are particular… Don’t screw up the formatting.
  6. Restart the Genesis service by typing “genesis restart” at the $ prompt.
  7. Re-run Foundation. Nodes should now correspond to the desired layout.

Good luck.

Nutanix: Restarting Prism

UPDATE: The new version of “Nutanix: Restarting Prism” can be found here.  

_________________________

While running NOS 4.0.2.2 on a cluster of 3061s, there’s a bug that causes the Nutanix Web Interface, Prism, to become unresponsive. After raising a ticket with support, turns out there’s a fix in NOS 4.1.1. If you’re like me and can’t arrange the software update immediately, it’s possible to restart the service by performing the following steps:

Determine the current Prism Cluster Leader by running
ssh -t nutanix@prism_member_ip_addr 'curl http://localhost/prism/leader'

Returns: prism_leader_ip_addr:9443

Restart the Prism Service on the Leader
ssh -t nutanix@prism_leader_ip_addr 'curl http://localhost/h/exit'

Returns: Exiting in 1s

To verify the change, re run step 1 and check the prism_leader_ip_addr has changed to another member of the cluster.

To get this working you require Nutanix CVM username and password, plus a machine installed with Curl. If you’re running other operating systems and you’re a registered Nutanix customer, you can find more detail here.

Nutanix Hidden Contrast Feature for Google Chrome

A few months back I was speaking with Sudheesh Nair, (@sudheenair on twitter) from Nuntaix about the slightly washed out look of the Nutanix Prism interface. Both myself and a colleague mentioned the interface was difficult to present on a large screen, something we’d been doing a bunch since installing Nutanix in our environment.

Sudheesh gave us the heads up on a hidden feature that enables the user to select “normal” or “high” contrast when using Google Chrome. It was in early stages of development and would drop in a later release.

A few days back we performed a rolling upgrade from 4.0.2.2 to 4.1.1. The upgrade went through without issue so when I jumped into Chrome, first port of call was the contrast feature. I couldn’t remember the required key strokes so Cameron Stockwell (@ccstockwell) pointed me to KB 2021 which described the process. Turns out the key sequence is easy, Click Shift on the User Menu.

Standard Click on User Menu

NutanixUserMenu1

Shift Click on User Menu

NutanixUserMenu2

Click on Adjust Contrast and you’ll see the contrast box pop up at the bottom of the screen.

NutanixContrastWindow

Of course there are other wonderful and no doubt more important features in this release, the hypervisor one click upgrade comes to mind, but for some reason I was looking forward to this one given the number of complaints I’d had from people when displaying it on a large HDTV.

Nutanix Prism Central Basic Setup and Config

Recently I’ve been working with the hyper-converged compute and storage platform Nutanix. For those who haven’t heard of it, check out the Nutanix Bible written by Steven Poitras for all the in and outs of the product, including actual technical explanation of how it works! In short it’s a distributed compute and storage solution, based off concepts taken from the large web companies like Google and Facebook. The idea is simple, build a modular scale out solution that grows with you, don’t spend a bomb on massive and costly storage arrays and hope it performs for 5 years, do it over time on commodity hardware with predictable performance.

We decided to start with a 5 node cluster of 3061s in Sydney and a 4 node cluster of 3061s in Brisbane. In future posts I plan to blog about why I decided on Nutanix, the design, business case, setup and and lessons learned.

Jumping ahead, the clusters have now been operating for 4-5 months and the experience has been stress free. The product works, its doing what was advertised which I find refreshing.

Now things are up and running I decided to deploy Prism Central, the single pane of glass management console (yes I hear you grown!). The deployment was straight forward, jump into vCentre and deploy the 14GB ova, start the VM, jump into the console and edit the network settings, giving the box a static IP:

$ sudo vi /etc/sysconfig/network-scripts/ifcfg-eth0

DEVICE="eth0"
NM_CONTROLLED="no"
ONBOOT="yes"
BOOTPROTO="none"
IPADDR="10.0.20.20"
PREFIX="255.255.255.0"
GATEWAY="10.0.20.254"
DNS1="10.0.254.198"
DNS2="10.0.254.199"

$ sudo service network restart

I jumped onto my mac and did the ping of the box, no problems, working great… I then tried to hit the web UI and got an error, something like “Oops Server Error”. After some searching around on the Nutanix portal, I came across the fix. Turns out you need to bind the new IP address to the Nutanix cluster running on the machine with the following command:

$cluster --cluster_function_list="multicluster" -s 10.0.20.20 create

**Note remember to change the command to your IP Address**

Refresh the webpage, admin/admin and you’re in…

PrismCentral

In future posts I’ll detail how to add clusters and if it lives up to the single pane of glass reference.