Okta’s inbox platform supports various products in Okta’s Workforce Identity Cloud, where human input is required. It is most notably instrumental in the Access Request flow of Okta Identity Governance.
A lot of what we do is trigger-based. These triggers can be a human action — “grant access when the manager approves” — or a time event — “revoke access **after 24 hours.**” This post will explain how we leverage asynchronous tasks using message brokers like Google Pub/Sub and Google Cloud Tasks to take actions based on these triggers.
Internally, we call all asynchronous tasks “worker tasks.” We have an internal framework that exposes an interface to create and execute tasks while abstracting away the details of which message broker the task is sent to.
For us, a task is a function that needs to run, along with the metadata that dictates how it runs, in the worker tasks framework. From a conceptual perspective, a task is a collection of certain properties that determine its lifecycle and execution, as shown below:
1 | { |
Most of the metadata is self-explanatory. The driver defines the queue service to route the task from the publisher to the subscriber. In the above example, we are using GCP PubSub with the “fast_queue.”
NOTE: We use Google Cloud Tasks because of its built-in capability to delay the execution of tasks. In all other respects, our Google Cloud Tasks setup is similar to that of Google Pub/Sub. From here, the article focuses on Google Pub/Sub, and similar principles apply to Google Cloud Tasks as well.
Our Pub/Sub setup handles a diverse set of tasks. We’ve set up multiple GCP PubSub queues that allow us to segment the workloads based on traffic characteristics, priority, and the domain of the workload. We do this to ensure:
High-priority tasks get executed faster.
Backlog in processing one domain’s tasks does not affect those of another.
This allows us to scale selectively depending on the task priority. In high-traffic situations, we can easily scale the subscribers for essential tasks and ensure stable platform health while allowing the existing number of notification subscribers to process through the notifications, resulting in slightly delayed notifications.
In the previous example, we saw how the queue setup allowed us to scale in certain situations and accept delays in others. Practically, it’s likely that without scaling, the subscribers would get overwhelmed with the increased number of tasks and crash. We solve this by using flow control in various steps. The queues are configured with a maximum number of pending messages they can deliver at a time. This setting allows our subscribers to gracefully handle traffic bursts by managing the flow of incoming tasks.
Unfortunately, it’s hard to nail a magic number of tasks subscribers can handle. Other factors, besides the quantity of tasks, may affect the load, like the complexity of a particular task. Hence, as a failsafe measure, the subscribers are forced to stop accepting more tasks if resources like CPU and memory surpass a certain threshold. In such an event, the PubSub retries exponentially. It’s a naive but effective way to handle as many tasks as possible without crashing the subscribers or scaling up the number of active subscribers.
Like any other system, there are trade-offs in this architecture. Customers choose Okta because it’s a reliable and neutral vendor. We have the following expectations from our asynchronous task infrastructure.
The framework is highly extensible. We can add a new queue service like RabbitMQ, AWS SQS, etc, by adding a driver, making all our tasks compatible with the new services.
Having a central framework designed with reliability in mind means every new kind of task added to the system gets alerts on failures, dashboard reports, and anomaly detection by default without the task developer needing to do anything.
With our tasks organized into queues based on priorities and domain, we can identify the patterns in delays and make informed decisions about which subscribers need to scale. This process frees our users from staring at loading screens while keeping our infrastructure spending in check.
Many tasks in our system have side effects and are not idempotent. Such a task is challenging to recover in case of failure. We run the risk of having double side effects. The framework does not currently have a way to recover or retry these tasks.
Our current asynchronous task setup works for our current scale and infrastructure needs. In the future, we’ll be looking to use Google Cloud Functions, which will take the flow control out of the equation. Cloud Functions can scale as needed to process the number of tasks.
Another future improvement we’ve flagged is to increase confidence around “exactly-once” execution of tasks with side effects, like sending emails, which would require using idempotent vendor APIs and more granular checkpoints throughout task execution. Our current architecture provides a good base for these improvements to come.
This article originally appeared on Okta on November 20, 2023.
]]>oh-my-zsh
, or was I using fish
? What did I do last time to get the top bar show Bluetooth? I end up setting small things like these, slowly every day, until it becomes all familiar, and I forget about them.Not this time, I said to myself. As I start working at Okta, I decided to log the steps to set up my system, mostly for future me, but also for anyone else stumbling on the same StackOverflow articles over and over again.
Yeah, just update the OS now that you have patience. It is probably outdated and the next step may require it.
Do not wait on some CLI tool to fail on you later and install Xcode now. Might get lucky running xcode-select --install
.
Use Safari to install Firefox and Brave and log into them. Also set up extensions like -
Download the latest build here
Refer https://brew.sh/#install, or
1 | /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" |
Refer https://ohmyz.sh/#install, or
1 | sh -c "$(curl -fsSL https://raw.github.com/ohmyzsh/ohmyzsh/master/tools/install.sh)" |
echo “unsetopt share_history\nsetopt no_share_history” >> ~/.zshrc
SSH and GPG keys need to be generated/imported and updated everywhere, especially GitHub.
ssh-keygen -t ecdsa -b 521
gpg --import <private-key-file>
Refer https://github.com/nvm-sh/nvm#installing-and-updating
1 | curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.1/install.sh | bash |
1 | brew insatll tmuxinator |
Restore dotfiles and customize vim and other things.
At some point, I should figure out that something is wrong with the trackpad and check that box that says Tap to click in settings. This is a living document and I will keep adding things to it, as I find. Now I should get back to that new system I just configured and get some work done!
]]>You have integrated with the vendor, and the tests are passing. Everything looks good until you start getting reports from customer service that some of the users are seeing weird characters in their names. Jöhn Doé becomes J�hn Do�. We all have seen these characters at least once on the Internet. These characters appear when strings encoded in an encoding format are decoded with a different and incompatible format. Most of the time, we deal with UTF-8 which is a common text encoding format. However, there are legacy systems that use other, close but not quite the same standards. The bigger and older the vendor is, the higher the probability of them using an obsolete text encoding standard.
In my experience, most of the time, it ends up being a windows equivalent of popular standards, like windows-1252 for ASCII, and UCS-2 for UTF-16. I won’t try to explain the differences between these encodings and why they exist because it will only give a false impression that I understand them; I don’t. Hence my encoding of choice is WTF-8. Enough rant about encoding; coming back to the topic of vendors. Large vendors using Windows or IBM servers are likely to use these less popular encodings. So it is always good to clear that out with them.
The vendor system integration has been deployed and running successfully for a while. Suddenly you get paged that your system is behaving unpredictably and causing issues in the vendor system. You can not see anything wrong at your end and ask for logs to determine the issue. To your horror, the vendor does not keep logs, and you are on your own to fix the issue or prove there is no issue to begin with. To avoid such a scenario, I encourage logging more than required. All the points where control flows from your code to an external entity or the other way around should have excessive logging. Example:
1 | 2021-09-12T10:29:32.387Z [info] (api.js:5): Sending data to vendor X {"url":"https://vendor-x.example.com/system-y","data":{"tickets":[{"userId":"4567899","count":2,"date":"12/09/2021"}]}} |
Example of logging everything that goes out or comes in.
You have just started your week on a Monday morning with a cup of coffee when you notice that everything in the system is happening twice, and sometimes thrice. Some services are constantly alerting because they can not process duplicate events. Most of the time, these alerts are good. The problem is the services that are processing these duplicate events instead of alerting. This could lead your communication service to send multiple emails to customers or the billing service to charge the consumers more than once. Hours of debugging later, you hear the news that the vendor found the issue in a new version was released last weekend with a botched retry system.
The solution to this is building idempotent systems which can process duplicate events gracefully. While idempotency will allow your systems to stay afloat in a situation like this, the excessive logging we added in the previous step should help in debugging and communicating to your vendors.
1 | 2021-09-12T10:54:14.261Z [info] (api.js:6): Request received from vendor X {"route":"POST /tickets","checksum":"e4608c43cfeede8f0ccd0f770305ec01b94ba554d72b7ff8a6e659bfdf6727a9","data":{"tickets": {"userId":"4567899","count":2,"date":"12/09/2021"}]}} |
Example of ignoring duplicate requests based on checksum
Modern APIs and systems are web-scale, that means they can handle multiple sessions, requests, and processes at the same time. This is called the concurrency of the system. Generally, banks and other financial institutions limit the number of concurrent sessions by the same users to one. This could mean different things in the backend. Good APIs have well-defined limits on this concurrency and throw expected errors when this limit is breached. Poorly designed APIs have unclear limitations on concurrency and can result in undefined behavior on their breach.
One such situation which left me doubting everything was when a vendor could not handle multiple requests at the same time and it would often mix the response of two requests. It sent the response to one request in the channel created by another HTTP request.
Response 1 is sent in the channel created by Request 2 and vice versa
What I found helpful in a situation like this is tagging the requests with a reference. If the vendor system can echo back the reference sent in the request along with the response, it could be used to identify the discrepancy.
Matching the ref sent in request (a
) with the one in response (b
) helped identifying the issue
You just got hold of the endpoint for the vendor system, and you hook it into your application. It should work as-is, and in most cases, it would, but sometimes it would not. You would confirm with the team on the vendor’s end if it’s a firewall issue, but they would say it is not. At this point, you will start exploring everything, including the TLS version, self-signed certificates, network timeouts, keep-alive header, loadbalancer issues. You get the picture. You ask the developer team of the vendor again and this time they escalate your issue to the network team, which says, of course, it is the firewall.
Photo by David Pupaza
Some firewalls require whitelisting of IP addresses, which would require you to get hold of “permanent” IPs like AWS Elastic IP addresses. After your IP addresses are whitelisted, it should be good to go. Some firewalls are configured to strip some headers, making you wonder why your requests are ending up as “unauthorized”. Figuring these issues could be hard if there are not enough logs and communication between the teams.
Well, I may have painted a grim picture here, but it’s just a rant about all the things that can go wrong (and have gone wrong) when integrating with external systems. Things are rarely this bad if there is good documentation. I hope reading this will prompt you to ask more questions when you integrate with your vendor and save time.
]]>Pandoc is an awesome tool to convert documents from one format to another and it supports a lot of the formats. Due to it being highly configurable it can be difficult for someone new to start using it. So I will try to explain how you can utilize it for creating documents like ebooks just using Markdown. Though it should work similarly for other formats too.
Broadly there are three things required to get started.
📝 NOTE: I already have a base repository that you can use as a starter for your book which you can fork and get going. Do not forget to change the Author name though. ;)
Pandoc is the only tool you need to get started with your ebook. Pandoc’s installation guide has the most up-to-date information on how to get it on your system.
If you are an Ubuntu user, most probably the following command will do the job:
1 | apt-get install pandoc |
For Mac users:
1 | brew install pandoc |
A config file can be used to manage all the options passed to pandoc without passing each option through the cli. Here’s an example config file.
1 | metadata-file: metadata.yml # file to use for metadata |
Your book will have some metadata like title, author name, copyright. All of that goes in the metadata.yml
file. An example file will do a far better job in explaining than I can.
1 |
|
The cover photo provided here will be used as the cover photo of the ebook. More options can be found in Metadata docs of Pandoc.
You can write chapters in Markdown files. To preserve the order of the files, I suggest prefixing the name of the file with chapter number. It will ensure that the ordering of files is correct when you view it. The source files can also be individually passed to pandoc.
1 | pandoc -d config.yml 01-first-chapter.md 02-secon-chapter.md |
In the starter repo, all the Markdown files added to the src
will be picked up and used as chapters of the book sorted in alphanumeric order. The images can be added to the images
directory (or any other directory; I like to have my images in one place).
Thanks to the Makefile in the starter repo, generating an ebook is as simple as make epub
and similarly make pdf
can be used for generating a printable PDF version of the book.
The best part about using pandoc is that it supports a lot of formats formats and options, for example - custom css can be used to generate documents, if needed! I hope this gives you enough confidence and sets you on your eBook journey!
]]>A lot of people have similar stories and for most, Github is synonymous with Open Source. One thing that is also used interchangeably with Github is git. Even though these are three different entities, a lot of developers including me have never contributed to open source or used git without Github. Thus I was surprised when I learned that famous open source projects like Linux were not on Github even though they used git. Also, git was around before Github, so there must be other more native ways in git to get things done. I wondered how it worked but didn’t do anything about it until recently when I decided to experience a git only development workflow.
We use version control systems broadly for two use cases – to contribute to a repository where we have write access; to contribute to a public repository where we don’t have write access. To collaborate with developers to whom access can be given, Github or other platforms are not necessary. Anybody can host a Git server that can be used to host repositories. Developers can clone, push to, and pull from these remote repositories once access is granted to their ssh identity.
1 | git clone origin git@<host>/<repo>.git |
This is similar to how we upload our SSH keys to Github or other platforms for being able to clone the repositories using SSH.
One thing which is missing in the above-mentioned method is the ability to create Pull Requests, get them reviewed and merged into the code. It becomes a big issue when the developers don’t have write access to the repositories as is the case in open source development. General open source flow is to fork a repository, push the changes in your forked repositories, and create a pull request to the original repository from where the maintainers can review and merge. Pull request is not a feature of git, but of platforms like Github. Collaborative development in git has the following steps:
We introduced two new processes in this workflow – creating a patch and applying a patch. Let’s discuss them further.
A patch is a diff of code and metadata around it. A diff is the actual code change which can be viewed using the command git diff
. It is a Unix concept that is way older than git and Unix systems even have a builtin tool, diff
which compares changes between two files. Its output can then be saved to a file that can be processed by another utility tool called patch
which updates one of the files to make its content identical to another file.
1 | touch original.txt new.txt |
1 | diff original.txt new.txt -u > changes.patch |
1 | --- original.txt2020-10-18 23:43:32.000000000 +0200 |
If we want to update the original file with the contents of the new one, we need to “patch” it.
1 | patch original.txt changes.patch |
NOTE: Use git-before-github repository to practice this example and all the ones which are to follow.
Now that we understand the concepts of diff and patch, we can proceed with using them in git. To generate a patch file in git, we need to use git format-patch
command.
1 | git format-patch HEAD~1..HEAD |
This will create a patch file for the latest commit. Tinker with the argument to produce patch files for other commits too. By default, each file represents a single commit and the file name is of the format 0001-<commit message>.patch
. Interestingly, it is in Unix mbox format which means it has some email-like metadata (from, subject, etc.) followed by the patch data.
1 | From 619814a3ac21012580e725136864e8397c14e20b Mon Sep 17 00:00:00 2001 |
This is the patch file I created to add my name to the contributors’ list of the git-before-github repository as an exercise for myself.
We have a patch file which we can send to the maintainers so that they can apply the patch. We can either use traditional ways to send the patch or use git itself, but more on it later. In this section, we will explore the process to apply a patch.
git apply <patch file>
is a command which applies the changes described in the patch file locally but does not commit them. They will appear in our workspace and we can stage them and commit them. This is not ideal as even though we get the changes, the commit metadata like author name and message is lost. A better way to apply patches is git am
(abbrev. for “apply mailbox”).
1 | git am --signoff <patch file> |
The flag --signoff
is used if the commit message needs to be appended with a “Signed-off-by” line to indicate who applied the commit. Below is the signed-off and applied commit for the patch we generated as an exercise in the last section.
1 | Author: Tarun Batra <tarun.batra00@gmail.com> |
Now we can explore how to send a patch file to a maintainer in the “git” way. There are two ways that I know of –
imap-send
1 | git imap-send <patch file> |
It will upload the email to an IMAP folder from where it can be sent. To use it we need to add IMAP server details in our ~/.gitconfig
file. I use Gmail so my IMAP details look like:
1 | [imap] |
Gmail allows its IMAP server to be used if the Less secure app access setting is turned on.
send-email
1 | git send-email --to=<email> <patch file> |
It will use an SMTP server to send email. The corresponding config to use this command is:
1 | [sendemail] |
Both of these methods can be used to send an email per patch file. The subject of the commit by default is of the format [PATCH] <commit message>
.
NOTE: Sender email addresses can be spoofed quite easily and this raises the question of authentication and authorization when submitting patches. I tried creating patches of PGP signed git commits but the patch doesn’t retain the PGP signature. We could encrypt the email itself using PGP but that’s a lot of extra work for both the contributor and the maintainer.
So what can we do with this newly acquired knowledge?
youtube-dl
was taken down by Github due to a DMCA request. The whole episode brings to light that Github being a platform is subject to censorship, unlike git which is distributed.You can practice for yourself by adding your name to the CONTRIBUTORS.txt
file of the git-before-github repository, committing it, and then sending me the patch. I would readily merge apply it. 😀
npm
. It’s the world’s largest software registry and the ecosystem around it is advanced and mature. I realized it when I tried to port password-validator
, an open source npm package I maintain, to Python. I spent weeks to figure out things which were very intuitive to me in the Javascript world. This blog will list down some of the key differences I found between the two ecosystems for anybody else who treads the same path. By the end of this post, we should be able to install our package using:1 | pip install <my-awesome-package> |
In npm
, any folder which has a package.json
file is a package. In Python world, we need a file called setup.py
in the root folder to make our project a package for PyPi, a common repository for Python packages. This file is used to declare meta data like name, version, author, and even dependencies (with a catch). Look at this sample setup.py file to get an idea.
Normally, the actual package code goes inside a directory with the package name and it is required to have a __init__.py
file in it to indicate that it is a module. So the directory structure would look more or less like:
package-project├── setup.py└── my-package ├── __init__.py └── lib.py
In this structure your actual package code will go in __init__.py
and/or any other python file in this directory like lib.py
.
Dependency management is one area where I found Python ecosystem lacking and primordial. npm
has concepts like dependencies
and devDependencies
and package-lock
which make dependency management easier. The manifest file setup.py
has a section called install-requires
which is a list of the dependencies required to run the project. These dependencies are downloaded while installing the package using pip
. It’s not very widely used to define the exact dependencies and dependencies of dependencies.
Instead, a combination of a virtual environment and requirements file is used.
npm
allows installing packages globally or local to a project. In pip
, all packages are installed globally. To separate the local dependencies from system-wide packages, a virtual environment is created which contains all the dependencies local to the project and even a local Python distribution. Code running in a virtual environment doesn’t require any system dependency, so one can be assured that the code should run in all systems wherever the local virtual environment is set up correctly.
There are multiple ways of setting up a virtual environment, like venv
and virtualenv
. I found former to be the easiest to use.
1 | python3 -m venv <dir> |
The above command creates a virtual environment in directory <dir>
which means this directory will house everything required to run the program including Python and pip distributions. This directory needs not be committed to version control. To use the virtual environment, it has to be “activated”.
1 | source <dir>/bin/activate |
Every virtual environment has its own pip and Python distribution. So it’s a clean slate when you start. When we install dependencies, we can make a log of them using:
1 | pip freeze > requirements.txt |
This will log all the packages and their versions installed by pip in the file requirements.txt
. The file name can be anything but it’s a convention in the Python community to use this name. There is no concept of devDependencies in Python but it’s not uncommon for some projects to have a requirements-dev.txt
to log devDependencies of the project. Here’s a sample requirements file.
Installing a project’s dependencies now can be done with:
1 | pip install -r requirements.txt |
JSDocs
is the most common way to document code in JavaScript and the list of documentation tools alomst stops there. Documentation in Python is taken very seriously.
Sphinx
is a documentation generator for Python which can be used in various ways thanks to a variety of extensions available. To get started with sphinx, we need a conf.py
file which will have configuration for sphinx. The file can be generated using sphinx-quickstart
CLI utility. Here’s a sample conf.py file to get started.
Sphinx and Python ecosystem prefers using reStructuredText
(reST) for documentation instead of Markdown which is common in the JavaScript world. M↓ can be used with Sphinx too, but it’s a bit of work and reST is very similar to M↓. Here’s a sample README in reST.
Sphinx can look through special comments in Python code and include them in the documentation. These comments which are called Docstrings
are used to describe functions, classes etc. and are very similar to JSDocs in JavaScript and JavaDocs in Java. Here’s a method documented with Docstrings:
1 | def add(a, b): |
There are a number of formats for writing Docstrings, which can be a topic discussion in itself. I found myself going back to this guide by RealPython and I recommend giving it a read for a deeper understanding of Docstrings. Using the combination of reSt files and Docstrings, we can generate documentation using Sphinx in a form like PDF and HTML. Here’s a sample HTML documentation generated using Sphinx.
There are less popular alternatives to Sphinx which can also be used for documentation, most notable being ReadTheDocs.
I found writing unit tests in Python easier than in JavaScript due to the availability of mature and built-in tools.
Python has a built-in module unittest
which can be used to run the tests. The syntax is very similar to Jest or Mocha. Every “test” is a class and every class method is a test case. Methods like setUp()
and teardown()
have special meanings and are equivalent to before
and after
methods seen in JavaScript testing frameworks. Here’s a sample test file. Tests are run by:
1 | python3 -m unittest discover -s <test_directory> |
Pytest and nose2 are some of the alternatives but I found unittest
adequate.
Earlier in Documentation section, we discussed about Docstrings. The Examples section in a Docstring provide the user an understanding of how to use the function. What’s great is we can also run the example against an automated unit test, using doctest (another built-in module). This can keep the documentation and implementation in sync.
Packages can be publushed to PyPi after building. Earlier the builds were made in Egg format, however now it is being replaced by Wheel format. To create a wheel build, we need to install setuptools
and wheel
.
1 | python3 -m pip install --user --upgrade setuptools wheel |
Now the build can be created by running:
1 | python3 setup.py sdist bdist_wheel |
This will create two files in dist
folder:
.tar.gz
extension which is a source archive..whl
extension which is the build file ready to be installed.Now we need to upload the package build to PyPi. We need twine
for that.
1 | python3 -m pip install --user --upgrade twine |
Final step, the uploading of the build:
1 | python3 -m twine upload --repository testpypi dist/* |
--repository
here lets you select a specific package repository like custom or private npm registeries. It can be configured with a .pypirc
file.
Two resources which helped me a lot in fuguring out my issues were:
It was an interesting experience for me to tackle all the challenges faced when I was on a journey to publish my first package on PyPi. A mature testing frameworks was a breath of fresh air while dependency management and documentation generation was a daunting task due to all the different tools to manage. In the end I published password_validator and I am happy with the result.
Do reach out in comments or on my Twitter if you have any issues with the steps I have mentioned here. Happy publishing!
]]>I wanted to learn about IPFS and Web 3.0 so I started exploring it and tried to upload my site there. This is an account of the concepts I learnt along the way and how I finally got my site hosted on tbking.eth. I’ll give a brief intro about IPFS and why hosting static content there makes sense. If you are already familiar with IPFS, you can skip to the hosting part.
Inter-Planetary File System is a decentralized network of shared content. It has a very simple yet interesting design philosophy:
Build a network that works inter planets, and it will be a better network for communication across Earth too.
IPFS is a decentralized network where peers are connected and share files, in many ways like BitTorrent. The fundamental principle is that unlike the traditional web, where files are served based on their location, in IPFS files are served based on their content. Consider this comparison:
Google’s privacy policy is identified as a file hosted on Google’s servers on the address https://policies.google.com/privacy. The content of the policy doesn’t matter. This is called location-addressing
.
On the other hand, IPFS identifies files by their content, using the hash of the file. Let’s say you want to read the “XKCD #327 - Exploits of a Mom“. Its IPFS address is https://ipfs.io/ipfs/QmZVjV5jFV7Jo4Hfj6WPyRnHCxf8kbadkqtQBco2gef64x/. This address doesn’t depend on the location of the server but the hash. This is called content-addressing
. Whoever cares about XKCD, can host it. This makes broken links unlikely, which is also one of the goals of IPFS.
IPFS docs explain these concepts well. I recommend them to anyone who wants to dig deeper.
Deploying static content such as personal websites on IPFS is easy. The steps I’m listing below can be used for any file such as a plain HTML file, a website generated by static site generators like Jekyll, Hugo, Hexo, and Gatsby or even a media file. So let’s dive in.
If you have IPFS Desktop installed and running, you can add your files using a normal file selector. Just import the directory which has the contents of your static website.
Right-click on the newly added content and select Copy CID
. Open https://ipfs.io/ipfs/<cid>
to see our files on the web! 🚀
Easy right? 😀
IPFS CLI allows adding of files and directories using add
subcommand.
1 | ipfs add -r ./sample_website |
1 | added QmaYZE5QQVR3TqQSL5WDcwuxUwBqczbMkmbHeoeNEAumbe sample_website/index.css |
The hash printed in the last line is the CID for the whole directory and thus our website. We can see the sample website hosted on https://ipfs.io/ipfs/QmeUG2oZvyx4NzfpP9rruKbmV5UNDmTQ8MoxuhTJGVZVTW/
TIP: It is important to use relative links in your website since IPFS gateways have a URL which looks like
<gateway>/ipfs/<cid>/file.ext
.
In the last section, we added files to our IPFS node for the network to find. This is why the IPFS gateway was able to resolve it and show it in the browser. But the site will most likely become unreachable once you shut down your IPFS daemon. Even though, after requesting some content on IPFS, the receiving node also becomes a host of the content, but the content will be garbage collected after 12 hours. How to serve your site round the clock in a decentralized web without a server?
Welcome, Pinning!
A node that pins some content on IPFS hosts it forever (until it unpins it). There are pinning services like Pinata which pin the files on their IPFS nodes. This way the website is available always.
In Pinata, you can either upload a file or just provide it’s hash if the content has already been uploaded to IPFS. Here’s how I pinned the sample website we uploaded above.
TIP: It is good to pin your site using multiple pinning services for redundancy.
As you might have noticed already, using IPFS is very easy. I would go as far as to say it’s easier than dealing with the traditional web that we use. However, this process has to be repeated every time you want to change your files which is not very ideal. There are tools like Fleek which help in automating all the steps listed above.
Fleek is like Travis or CircleCi for IPFS deployments. You can link your Github account with it and using the Github hooks, Fleek will trigger a deployment on every push to the Github repository. They also pin everything they deploy.
So I use Hexo to generate this blog and I was able to add a build step in the Fleek dashboard itself so that I don’t need to generate the HTML and push to my repository. Here’s the build command that I use:
1 | git submodule update --recursive --init && npm i && npm run build |
Yes, I needed to install the submodules myself. They don’t do that by default (yet). Check it out, it’s super easy.
So now we have our site up and running, but content on IPFS is not as easy to look up as on the traditional web. My site can be found at https://tarunbatra.com. But on IPFS, the current version can be accessed at https://ipfs.io/ipfs/QmTPTa1ddoSkuakaW56SaL9dicbC71BbwfjRbVjasshCXs/.
You see the problem. This is not human readable. And the CID changes after every update. There are 2 solutions to this:
With DNSLink, you can point a normal domain to an IPFS content. It can be set up on Fleek very easily. I’ve pointed ipfs.tarunbatra.com to the IPFS version using Fleek and you’ll be able to open this site.
IPNS (Inter-Planetary Name Service) also exists and is similar to DNSLink but it’s much slower right now.
ENS is a naming service that lives on Ethereum blockchain and uses smart contracts to buy domains and set resolver records to them. Since Ethereum is involved, you’d need MetaMask to use it.
I bought the domain tbking.eth
and pointed it to the IPFS CID of this site. With changing content, the resolver record of the domain also needs to be updated every time to point at the new CID. Good thing is, Fleek has recently integrated ENS. You can set Fleek as a controller for the domain and they will update the record automatically on each deploy.
Sweet! This was it. IPFS is perfect as the storage layer of the Web 3.0
. Try for yourself, and if you liked this blog, share it, and maybe pin it! :)
Kafka and RabbitMQ – These two terms are frequently thrown around in tech meetings discussing distributed architecture. I have been part of a series of such meetings where we discussed their pros and cons, and whether they fit our needs or not. Here’s me documenting my findings for others and my future self.
Spoiler: We ended up using both for different use-cases.
With respect to message routing capabilities, Kafka is very light. Producers produce messages to topics. Topics can further have partitions (like sharding). Kafka logs the messages in its very simple data structure which resembles… a log! It can scale as much as the disk can.
Consumers connect to these partitions to read the messages. Kafka uses a pull-based approach, so the onus of fetching messages and tracking offsets of read messages lies on consumers.
RabbitMQ has very strong routing capabilities. It can route the messages through a complex system of exchanges and queues. Producers send messages to exchanges which act according to their configurations. For example, they can broadcast the message to every queue connected with them, or deliver the message to some selected queues, or even expire the messages if not read in a stipulated time.
Exchanges can also pass messages to other exchanges, making a wide variety of permutations possible. Consumers can listen to messages in a queue or a pattern of queues. Unlike Kafka, RabbitMQ pushes the messages to the consumers, so the consumers don’t need to keep track of what they have read.
Distributed systems can have 3 delivery semantics:
at-most-once delivery
In case of failure in message delivery, no retry is done which means data loss can happen, but data duplication can not. This isn’t the most used semantic due to obvious reasons.
at-least-once delivery
In case of failure in message delivery, retries are done untill delivery is successfully acknowledged. This ensures no data is lost but this can result in duplicated delivery.
exactly-once delivery
Messages are ensured to be delivered exactly once. This is the most desirable delivery semantic and almost impossible to achieve in a distributed enviornment.
Both Kafka and RabbitMQ offer at-most-once and at-least-once delivery guarantees.
Kafka provides exactly-once delivery between producer to the broker using idempotent producers (enable.idempotence=true
). Exactly-once message delivery to the consumers is more complex. It is achieved at consumers end by using transactions API and only reading messages belonging to committed transactions (isolation.level=read_committed
).
To truly achieve this, consumers would need to avoid non-idempotent processing of messages in case a transaction has to be aborted, which is not always possible. So, Kafka transactions are not very useful in my opinion.
In RabbitMQ, exactly-once delivery is not supported due to the combination of complex routing and the push-based delivery. Generally, it’s recommended to use at-least-once delivery with idempotent consumers.
NOTE: Kafka Streams is an example of truely idempotent system, which it achieves by eliminating non-idempotent operations in a transaction. It, however is out of the scope of this article. I recommend reading “Enabling Exactly-Once in Kafka Streams” by Confluent if you want to dig in it further.
Throughput of message queues depends on many variables like message size, number of nodes, replication configuration, delivery guarantees, etc. I will be focussing on the speed of messages produced versus consumed. The two cases which arise are:
RabbitMQ stores the messages in DRAM for consumption. In the case where consumers are not far behind, the messages are served quickly from the DRAM. Performance takes a hit when a lot of messages are unread and the queue is backed up. In this case, the messages are pushed to disk and reading from it is slower. So, RabbitMQ works faster with empty queues.
Kafka uses sequential disk I/O to read chunks of the log in an orderly fashion. Performance improves further in case fresh messages are being consumed, as the messages are served from the OS page cache without any I/O reads. However, it should be noted that implementing transactions as discussed in last section will have negative effect on the throughput.
Overall, Kafka can process millions of messages in a second and is faster than RabbitMQ. Whereas, RabbitMQ can process upwards of 20k messages per second.
Persistence of messages is another front where both of these tools can not be more different.
Kafka is designed with persistent (or retention as they call it) in mind. A Kafka system can be configured to retain messages – both delivered and undelivered, by configuring either log.retention.hours
or log.retention.bytes
.
Retaining messages doesn’t effect the performance of Kafka. The consumers can replay retained messages by changing the offset of messages they have read.
RabbitMQ on the other hand, works very differently. Messages when delivered to multiple queues, are duplicated in these queues. These copies are then governed independently of each other by the policy of the queues they are in, and the exchanges they are passing. So to persist the messages in RabbitMQ:
Not to mention, this will have performance impact since disk is involved in an otherwise memory operation.
RabbitMQ offers complex routing use-cases which can not be realized with Kafka’s simple architecture. However, Kafka provides higher throughput and persistence of messages.
Apart from these differences, both of them provide similar capabilities like fault-tolerance, high availability, scalability, etc. Keeping this in mind, we at smallcase used RabbitMQ for consistent polling in our transactions system and Kafka for making our notifications system quick and snappy.
Execution of computer programs is blazing fast. It’s only practical to need some delay in execution. Some of such use cases are:
At smallcase, we place orders to buy or sell equities on clients’ behalf. When the order is placed, it’s execution status is not immediately known. We need to poll the partner broker to know the status of order on the exchange.
Intraday traders will tell you that stock markets are time sensitive. Any platform that caters to them needs to be fast and deterministic. Polling for order status needs to happen in a timely manner after regular intervals because an endless loading is never a pleasant sight.
This blog explains the reliability issues we faced with our legacy system for scheduling polling and how we fixed it.
We will discuss the following delay mechanisms we used in our platform:
This approach is very simple. We heavily use Redis in our stack, so it was easy for us to use it for scheduling triggers. Redis allows to set keys with expiry using its SETEX command. For instance, setting a key TESTKEY
with value TESTVALUE
and expiry of 10 seconds can be achieved with:
1 | >redis SETEX TESTKEY 10 "TESTVALUE" |
Parallely, Redis provides Keyspace notifications for data changing events which can be subscribed by clients and can be used to trigger the delayed tasks. Key expiry events can be subscribed using:
1 | SUBSCRIBE __keyevent@0__:expired |
ttl
paramter of SETEX
command will work.If no command targets the key constantly, and there are many keys with a TTL associated, there can be a significant delay between the time the key time to live drops to zero, and the time the expired event is generated.
This worked fine for us in the beginning. But once the orders increased, a lot of clients started reporting really long waiting periods before they could see the status of their orders. On debugging, we saw delays of the tune of 40x and this was the deal breaker for us.
As a quick fix to this we used in-process timers. Timers are defined and executed by the application code itself and can be used to manage delays. Node.js provides a variety of in-built timer functions. Setting a 10 second timer is as simple as:
1 | setTimeout(() => { |
Timers worked well to mask the problem until we had bandwidth to think about the problem and fix it for good. Now we needed a permanent and reliable solution that would scale.
After researching a bit about this topic, we decided to try Dead letter queues. DLQ is a sophisticated but robust way to delay message delivery used in message queues like Amazon SQS, RabbitMQ and ActiveMQ.
In very basic terms, a message with an expiry is put into a queue. The queue is instructed to take some action if the messages are expired without being read. We can instruct these expired messages to be pushed to a different queue which is actively consumed by the target application and it simulates sdelayed delivery of message. Following diagram explains this concept:
We chose RabbitMQ based on AMQP 0-9-1 for our implementation due to maturity of the protocol and its community.
Stateless application code.
The application doesn’t need to store any state. The messages in the queue represent staete in this case.
Scalable.
Unlike redis events, message delivery in MQs can be configured to use worker pattern which loadbalances the consumers (subscribing instances of the application) to deliver the message.
Reliability.
Message Queues are expected to be highly reliable in delivery of messages. In our testing with RabbitMQ, the results were found to be close to in-memory timers.
The chart shown below analyzes the average delay observed in resolving the orders we handled in the past year and its relation with the delay mechanism being used.
It is clear that the polling delay in Redis was large but consistent to start with but grew out of proportions quickly as the volumes increased. This was quickly fixed using timers but like all quick-fixes, it had a short life too. Finally, we revamped our systems and used DLQs.
It is evident from the graph that DLQs brought consistency to the system.
We traded flexibility with robustness and reliability by using DLQs to manage the delays in our polling for order status. If you’ve solved a related problem in any other way, do let us know in the comments.
❤️ code
]]>Recently I found myself trying to restore corrupted data to it’s original form. A query gone wrong had erased valuable data from a MongoDB collection and it needed to be restored. This blog is a means to note my learnings from that incident for my future self and others.
If you’re using a MongoDB replica set, you can use MongoDB Oplog to undo almost any type of recent data loss. If you never heard of it, go read the docs as I’ll skip to the restoring part.
The oplog is a special capped collection that keeps a rolling record of all operations that modify the data stored in your databases.
Restore whatever latest backup you have before the the data was corrupted. It’ll be used as a base. Make sure the oplog has operations from or before the time the backup was taken, otherwise the restore may not work.
Now you need to browse the contents of the Oplog and look for the query which corrupted the data in the first place. Oplog is a special collection, but still a collection. So, you can query it to narrow down your results like this:
1 | Replica:PRIMARY> use local |
Let’s say you found the query and it looked like:
1 | ... |
Note down the value of field ts
. It will be needed later.
Before you can restore your your from oplog, you need to dump it to a file.
1 | mongodump -d <DBNAME> -c oplog.rs -o oplogD |
Now you have a dump of the oplog in oplogR/oplog.bson
.
You want to restore the data prior to the faulty write operation.
1 | mongorestore --oplogReplay --oplogLimit 1528225054:1 oplogR |
The value of oplogLimit
paramter is the ts
property of the faulty query you noted earier.
Mongo oplog is idempotent, i.e. it can be applied multiple times without duplicating data. This is why you need not give mongorestore
a start time.
The previous method helps when the good and bad operations are clearly separated by time. But that’s not the case mostly. It’s likely that the good and bad writes happened around the same time.
Here, you need to remember that Oplog is a collection too, and in addition to be queried on, it can also be modified. You need to copy oplog to an uncapped collection and delete the bad write operations.
After you’ve removed all bad operations you can continue the restoration process from a dump of the modified oplog. But this time, no limit argument would be required.
Hope this helps!
❤️ code
This post is inpired from resources like Asya’s stackoverflow answer which was used to solve the problem at hand.
]]>Invalid username/password
error at the login screen.It can be safely assumed that no user for such emails exists, yet the error messages shown by these prominent sites don’t indicate that. Do you see why?
On feeding random emails, a simple login/reset password may show that the user doesn’t exist and therefore no login/reset password is possible. A hostile client may create a list of valid users from this information. Not every site cares about it though. Facebook doesn’t.
It doesn’t hurt for a site like Facebook to leak the user base because everyone’s on it, but when the site is something like Ashley Maddison it can have implications.
People often argue that the user base is anyways leaked during the registration process, which can’t continue if the user already exists. This leak is avoidable too, by continuing the registration through an email. If the user already exists the email may say so, otherwise it may direct the receiver to continue the registration process.
In a brute-force attack, large number of combinations of usernames and passwords are tried to get into a user’s account. A dictionary attack is faster and better as it takes advantage of human tendencies of using common passwords and re-using old passwords. Specific messages like Invalid username
make these attacks faster by eliminating large number of combinations in a few tries.
To quantify the difference detailed error messages can make to brute-force or dictionary attacks, I created a script, login-attack-demo which included:
A collection of 1575 most commonly used passwords based on danielmiessler/SecLists.
A server which gives out specific error messages based on the failing stage of the login process. If the username is invalid, it’d say Invalid username
.
A server which gives out vague error messages on failure of the login process. If the password is invalid and not the username, it’d still say Invalid username/password
.
A hostile client which uses the dictionary to generate a pair of username and password and uses them to break into the system. It also analyzes the error messages on the following rules:
The blue points indicate the number of tries it took the attacker to break into the user’s account with vague error messages, while the red ones are when the error messages were specific. The yellow line is the max number of combinations to try.
It is evident that in theory vague error messages make it quite tough for the attacker, sometimes as much as 1000 times tougher. However, plugging all the holes may make the UX far from ideal. In the end, it all comes to trade off between UX and security. Depending on the space you’re operating, such methods can be used to safeguard the users.
]]>Microservices is a vague term which usually points to small independent services which together form up an application. Microservices architecture stands in contrast to Monolithic architecture, where the application is one big system. We will discuss about how to get started as a beginner and choosing the right tools for setting up microservice architechture.
Microservices have recently risen into popularity, as suggested by this Google trends index:
“microservices” in Google Trends
Before getting into what they are, and how to use them, let’s find out why microservices are rising into popularity even though it’s not a new concept. This can be explained by Conway’s Law which states that:
Organisations which design systems … are constrained to produce designs which are copies of the communication structures of these organisations
Until recently majority of companies were used to be giant enterprises, making equally giant enterprise software, or monoliths. Not anymore. Since last 5 years, companies are getting more distributed in nature. In fact, some of the new unicorns like Automattic and Gitlab are entirely distributed. That explains why microservices are rising in popularity.
Choosing a tech stack is always a tough call. Skills of the team, scalability, maintenance are all important factors in this choice. But many times what’s good for one portion of the app, is not good for the other.
For example, a trading platform needs PostgreSQL due to support for transactions, the user specific data has to be stored in DynamoDB. Generally, a trade off is made. But in Microservices Architecture, each microservice owns a set of business logic and related data in the form it deem best, and provide an interface to interact with it. So there can be microservices for trades and users each, with data hosted on PostgreSQL and DynamoDB, and access to external services is allowed only through the interface. So a computation intensive microservice can be written in Python while the API Server remains in NodeJS, and so on.
A sample architecture would look like
In a monolithic application, most of the people in a development team needs to know all the parts of the application even to develop a small feature. They have to collaborate with other teams on deployment schedules, versioning, data migrations, etc. On the other hand, in Microservices Architecture the teams just need to know the interface provided by other services and stick to the interface provided by theirs. Each microservice should be independently deployable. CI/CD tools like CirclCi, Travis and Heroku are of great help while developing microservices. To develop/maintain a service, a small team with limited knowledge is enough.
Microservices Architecture also resembles the Unix philosophy’s principle, Do one thing and do it well. It is a principle which has proved it’s worthiness and is one of the reasons Unix-like systems are not obsolete even after decades. Microservices should be designed to do one thing, tested to handle all the possible scenarios and then “piped” together to form a robust and flexible application.
Imagine error in a report generation module, takes the full trading application down. It’s terrible and unnecessary. If the application is based on microservices, the report generation module will fail, taking down only that specific feature. Users who are not generating reports won’t even notice, and the critical parts of the application will still be working. That’s a result of Loose coupling between microservices, now achievable through this architecture.
In Node.js, Seneca is a widely used framework to develop microservices. But if you want to get yours hands dirty, read on.
Microservices can be deployed without a main central service, but sometimes a thin central layer is usually kept to have shared libraries. Sometimes it also works as an API Gateway.
An API Gateway is used to provide a consistent API to the client side irrespective of the backend service, and vice versa. It is helpful when you have multiple microservices catering to clients like mobile, web, other servers, etc. Also it prevents client to make requests to multiple services by abstracting it into one request.
How should these services communicate to each other?
HTTP: Well, there are plenty of options. Most straightforward is to use HTTP/HTTPS and the interface can be made as RESTful APIs. Though, HTTP makes service discovery difficult and less dynamic. That’s a reason I don’t prefer it.
Message Queues: Message Queuing protocols like AMQP, STOMP and MQTT allows the use of publish subscribe pattern for communication between microservices. Services like RabbitMQ and Apache Kafka are widely used for this purpose. I personally prefer using Message Queues as they are fault tolerant and support message persistence.
microservice-template is a basic boilerplate which we use in our projects based on Microservices. It’s in Node and has MongoDB configured as a data store. It connects to RabbitMQ for inter-service communication via servicebus. Here’s how it looks in it’s bare minimum:
Microservice booting up and communicating
There are other concepts like Event Sourcing and CQRS which complement the Microservices Architecture. Event Sourcing is basically storing the changes to the data, instead of the changed state of the data. in a highly concurrent environment, event sourcing brings consistency of a higher order.
Productivity in Microservices vs Monolothic Architecture Source: Martin Fowler
But truth be told, Microservices Architecture is no silver bullet. Using microservices for fairly small apps can lead to loss in productivity, as evident in the above graph.
❤️ code
]]>Mocha added support for Promises back in version 1.18.0
. Recently while testing asynchronous code written using Promises, I was using the built-in promise support instead of the old-fashioned callbacks oriented done
callback and I found my tests were resulting in false positives. So here I’m explaining what happened and how to avoid it.
It’s easy to see that there are four possible cases when testing Promises:
Case | Expectation | Actual | Expected test status |
---|---|---|---|
1 | Resolved | Resolved | Pass |
2 | Resolved | Rejected | Fail |
3 | Rejected | Rejected | Pass |
4 | Rejected | Resolved | Fail |
1 | var promise = Promise.resolve(); |
The test expects the promise to resolve and so it does. The snippet above shows that the test passes expectedly.
1 | var promise = Promise.reject(new Error('some-error')); |
In the case when test expects the promise to resolve, but it doesn’t, Mocha gracefully fails the test. In the snippet above, the then
block was never executed and Mocha detected the error rejected by the code and failed the test.
1 | var promise = Promise.reject(new Error('some-error')); |
The test expects the promise to be rejected and it is caught by the catch
block, which results in this test passing as expected.
1 | var promise = Promise.resolve(); |
In this case, the test expects the promise to be rejected and the catch block to execute but instead, it doesn’t. The promise resolves. Mocha doesn’t throw an error here!
So while doing Negative Testing, the built-in promises support doesn’t hold up well and can result in false positives. This is not a bug and the reason behind this behavior totally makes sense, once given a careful thought, but at the very least, it’s not intuitive and causes the tests to pass even when they shouldn’t.
The simplest way to avoid this situation is to always handle the rejection scenarios yourself, rather than asking Mocha to do it. Here’s how:
1 | var promise = Promise.resolve(); |
The above test will fail due to timeout as Mocha waits for done
to be called but it doesn’t, and times out eventually.
Another way to avoid this situation is to chain a then
block to the existing catch
block and assert that the catch was called.
1 | var expect = require('chai').expect; |
The above test tries to assert that the catch block was called using Chai and Sinon, and fails.
If you know other elegant ways to avoid this behavior, please let me know in the comments!
❤️ code
]]>1 | function add(a, b) { |
1 | var add = function (a, b) { |
Now, what’s special about named functions?
Error and call stacks are a great help in debugging code. But the use of anonymous functions reduces the usefulness of the call stacks. We all have seen the infamous (anonymous) littered over the stack trace.
Of course, one can always look at the line number of the error, but with the ever-increasing pre-processing on the code (read Babel, WebPack, Browserify), this is not a reliable method.
See the Pen Anon Func by Tarun Batra (@tbking) on CodePen.
See the Pen Named Func by Tarun Batra (@tbking) on CodePen.
JavaScript ecosystem is already confusing as is and the developer can use as much help as she can get to make sense of the obscure errors.
While interpreting code, JavaScript looks for statements starting with function keyword (i.e. named function expressions) and move them to the top. This is called Hoisting.
So practically, a named function is available even before it’s declared, making the following code valid.
1 | add(1, 2); // => 3 |
But the following code won’t work.
1 | add(1, 2); // => Error |
It’s fairly difficult to understand a callback function’s purpose in first glance. That’s what makes callback hell the problem it is.
Look at the following jQuery code.
1 | $('form').submit(function () { |
The above code provides no information on what’s going to happen to the submitted form. Adding names to callbacks improves readability and acts as implicit documentation, as the following code does.
1 | $('form').submit(function hitAPI () { |
In my opinion, there’s no reason one should not use named functions when possible. Most of the JavaScript style guides out there don’t give due importance to them. ESLint users can use func-names rule in their projects to require the use of named functions, and enjoy the benefits.
❤️ code
️
Unless you’ve been living under a rock, you’d have heard about cryptocurrencies or Bitcoin atleast. There’s a lot of buzz around it, but disproportionately less clarity regarding what it actually is. In this article we’ll introduce ourselves to the fundamental principles of cryptocurrencies, clear the mist of confusion which surrounds them, and debunking some myths along the way too.
It’s human nature to compare everything new with something familiar. In case of cryptocurrencies, it’s the fiat money. Money is so fundamental to our daily lives that we always manage to see through them, and seldom do we try to understand the reason we need it and the unique purpose the serve.
Look at those paper and plastic sheets, do you feel they’re worth it?
Now I’m not an economist, but I know few common principles which make currencies what they are.
Almost every financial and economic activity is affected by this principle. The more the ratio of demand and supply of a commodity, more is it’s value.
Market sentiments greatly effect the value of the currencies, stocks and commodities. If the investors of a company think the company shares are going to be worth less, they are worth less.
Trust is essential for virtually all economic activity. Banknotes often have a promissory clause which reads like this:
without any gold or other precious resources to actually back that claim. (Earlier they used to back currency value to something like gold. Though it’s discontinued now, but the idea is same).
So we already have things in place to keep the system running, why do we need these cryptocurrencies? What problems do they solve, if any? Well the features of fiat money are also it’s limitations.
Cryptocurrenies are an attempt towards liberating the monetary system, something so essential to our lives, from these issues, in a way that one can be in absolute control of one’s wealth without trusting the system.
Any discussion about cryptocurrencies is incomplete without Bitcoin. Bitcoin is to cryptocurrencies what PowerPoint is to presentation programs, Ubuntu is to Linux and Xerox is to photocopying. We will try to understand cryptocurrencies with a case study of Bitcoin, which is not very much different from other cryptocurrencies.
Bitcoin is one of the first and certainly the most popular cryptocurrency, created by Satoshi Nakamoto(a pseudonym) in 2009. Many of the challenges which Bitcoin faced were first of their kind and success of Bitcoin paved the way for other cryptocurrencies.
Let’s see some of these challenges:
Bitcoin protocol is based on a Peer-to-peer (P2P) network to eliminate the need of any central authority, just like how BitTorrent protocol, based on a P2P network, eliminated the need of any central authority for file sharing.
When we transfer money through banks, we trust them to do so. But without any central authority, or nodes which one could trust, it was necessary to create a trust-less system. Bitcoin used blockchains to achieve this. Blockchains are a type of distributed database which lives on ever node and are used as immutable public ledger for the transactions. Blockchains are made of blocks which further consist of bitcoin transactions. Blockchains are similar to a linked list of blocks where every block refer to it’s parent block and this fact makes old transactions very secure and incorruptible as we’ll see later.
Blockchains, themselves are greatly hyped and are considered the biggest achievement of Bitcoin, as the idea can be used in various other industries to create trust-less systems.
If there’s no central authority, who decides the value of the currency?
Well, as we discussed earlier, the golden rule of demand and supply comes to play here too. Satoshi made the protocol in such a way that the maximum number of bitcoins that can ever be generated (minted) is fixed at 21 million.
Market sentiments also play a vital role. Recently bitcoin prices plunged when Bitfinex, a Hong Kong based cryptocurrecncy exchange was hacked and $65 million worth bitcoins were stolen.
Banks have interest in processing only valid transactions and keeping fraudulent transactions out of the system. This role is played by miners collectively.
Miners are the nodes which generate blocks for blockchains by verifying the unverified transactions and including them in their block.
The verification is done by adding Proof of work (POW) to the block data. Block data generally contains the transactions which form the block including the coinbase transaction and the hash of previous block.
Proof of work is a piece of data which is difficult to produce but easy to verify.
It’s generally the hash of block data with n number of leading zeroes. Miners append a value called nonce to the block data to change it’s hash value and keep changing the nonce until the desired hash is found.
So if the block data is Hello, world!
, and the number of zeroes required is 4, it takes 4251 tries (nonce is incremented from 0) to find the desired hash:
1 | 0000c3af42fc31103f1fdc0151fa747ff87349a4714df7cc52ea464e12dcd4e9. |
Now anyone can verify that this is a valid hash of Hello, world!4251
, but if someone tries to change a transaction in the block, the hash will be invalidated, and a new POW has to be generated, which is impractical.
We saw, finding POW can be a complex and a resource hungry process. So why do miners do that? Miners are incentivized for every block they generate. At the end of the block, the miner adds a special transaction called coinbase transaction which writes off a certain amount of bitcoins to the miner’s address from thin air, and that’s how new bitcoins are minted. Miners also get some money from the transactions, called transaction fee. The block, which is first to arrive, gets placed in blockchain, and other miners have to scrap their work, and start again after syncing with the blockchain.
Let’s say Alice bought something from Bob, and paid him in bitcoins. What if:
Alice broadcasts the transaction, but alters the transaction in her own copy of blockchain to pay herself instead of Bob.
Upon the verification of the transaction by the network, Bob ships the item.
Now Alice broadcasts her own version of the blockchain.
Which version of blockchain should the nodes trust?
If Alice manages to convince the nodes that her version (without Bob’s transaction) is valid, she gets to buy the item without paying anything.
Various elements of Bitcoin protocol come at play to stop this from happening.
Bitcoin nodes trust the longest blockchain and build their blocks on it. If Alice wants to invalidate the current longest blockchain, she has to re-generate the blocks after her fraudulent transaction at such a speed that her blockchain becomes the longest one. As long as most nodes in the network are honest (they should be, they’re getting paid for it!), Alice can’t corrupt the blockhain, and this is why we introduced blockhains as immutable public ledgers earlier in the article. The number of blocks a transaction is buried under, is called the confirmations of that particular transaction. Most exchanges deem a transaction verified after 6 confirmations.
More the confirmations of a transaction, more difficult it is to alter it.
Many people hail Bitcoin as an anonymous mode of payment. But the reality is quite the opposite. Every transaction made on Bitcoin is visible to everyone. Having said that, bitcoin transactions don’t necessarily link to the real identity of the user, but to a pseudonym, much like writing a book with a pen name. Thus, Bitcoin is pseudonymous.
The first block of the blockchain is called the Genesis Block. Every bitcoin in circulation is either derived from a coinbase transaction or the Genesis block. See how it looked like here.
Rewards from coinbase transactions are halved after every 210,000 transactions. In fact, the latest halving event occurred recently on 9th July 2016, reducing the reward amount from BTC 25 to BTC 12.5. Check when’s the next halving event here.
51% attack is a largely feared attack in Bitcoin, where an attacker controls 51% of the computational resources of the network, allowing it to alter the “immutable” blockchain. It ain’t plain theory, Read about Bitcoin’s brush with it here.
Cryptocurrencies other than Bitcoin are know as altcoins, or alternative to Bitcoin. Most of the altcoins are similar in principle to Bitcoin with some minor differences. Litecoin, Dogecoin, Namecoin and Nxt are some prominent altcoins.
The altcoins vary in terms of under lying principles. For example, Primecoin is a cryptocurrency which makes use of the vast amount of computational resources to find long chains of prime numbers which can contribute in advancement of cryptography as a science.
Ethereum is a disruptive blockchain-based platform, the highlight of which is smart contracts. Smart contracts are contracts enforced by computer code, instead of institutions set up by humans. This way, we can port whole economies and governments to digital ecosystem, let alone currencies. The possibilities are endless and out of the scope of this article.
Cryptocurrencies also vary in terms of user friendliness. More popular ones, like Bitcoin are very user friendly, while others are not so.
My bitcoin wallet app
You can see stark difference in my bitcoin wallet app shown above, and my Ethereum wallet shown below.
My Ethereum wallet
Cryptocurrencies are really trending and have a lot of buzz. The future of cryptocurrencies is very promising. Many people have made a fortune out of cryptocurrencies, however they should not be looked merely as an investment. The ideas which these cryptocurrencies represent and the future possibilities they hold are of far greater value than their monetary value.
If you’re thrilled and want to involve in bitcoins or cryptocurrencies in general, you may like the following links:
Original bitcoin paper by Satoshi Nakamoto.
Shapeshift is a site to convert one cryptocurrency to another.
Poloniex is an exchange / trading site for cryptocurrencies.
Coinbase is the PayPal of cryptocurrencies.
If you liked the article and feel generous, or just want to hone your newly acquired bitcoin skills, feel free to send me some bitcoins at the following address.
17ULzN8PCaCtE16WKtBhMYc28DQxxt4ja
❤️ code
]]>In this chapter specifically, we’re going to talk about meteor publications and subscriptions, and to explain these interwoven concepts, we’re going to use them in various scenarios and analyze their results using 3rd party tools to achieve a better understanding of how Meteor publications and subscriptions work and the ways we can use them.
Almost in every case, we’d need to publish a set of data from the server and subscribe to same in the client. Unlike conventional frameworks, Meteor server doesn’t serve HTML content, but the data which is used by the client to render the HTML. This feature is called data on the wire. The server sends a set of data to the client initially and then keep pushing new data as it changes in the database.
Code to publish all documents from a collection of posts
1 | if (Meteor.isServer) { |
Code to subscribe for all documents from a collection of posts
1 | if (Meteor.isClient) { |
If you’re new to Meteor, you’d already be sweating seeing that we just created a Mongo Collection on the client. You’re right, it doesn’t make sense.
We actually created a MiniMongo Collection. MiniMongo is a client-side data store with a Mongo-like API. When we talk about server sending data to client, it’s MiniMongo we’re talking about.
Subscribed data in MiniMongo visualized using Meteor DevTools showing the 2 posts that were pushed to MiniMongo
When a subscription becomes ready, the data is stored in MiniMongo and can be accessed by Mongo-like queries. The server keeps the MiniMongo in-sync with the data in back-end by sending additional data as the data changes in the back-end.
Syncing of MiniMongo data through DDP showing 1 item being pushed to MiniMongo as it got added on the server
Updated MiniMongo data showing total 3 posts
As shown in the above code, when the subscription becomes ready, onReady
callback is called. But when does the subscription become ready?
1 | Meteor.publish('listAllPosts', function () { |
this.ready()
is used as shown in the following code snippet.1 | Meteor.publish('listAllPosts', function () { |
This informs the client that whatever records which needed to be pushed initially are pushed, and subscription is deemed ready.
There are use cases when you may want the data not to be pushed to the client and the client to know that the subscription was stopped for some reason. How can that be done?
this.stop()
from the publish function. This will stop the subscription and call the onStop
callback.1 | Meteor.publish('listAllPosts', function () { |
onStop
callback being called with the error as the argument.1 | //server |
Meteor publications and subscriptions work like a charm if used properly. Syncing data in the background and updating the same in real-time looks pretty good. But it offers much more. You can take control of what gets pushed, when and how.
To gain more control over the data you publish, you need to know what happens when Meteor publishes data. The following screenshot shows all the DDP messages which are exchanged between client and server of our sample Meteor app.
DDP messages showing the publication of data using Arunoda‘s DDP Analyzer tool
listAllPosts
publication.added
message.ready
message.added
message.Now, if you’re paying attention, you’d know that we used this.ready()
earlier to ready a subscription. It’s responsible for sending the ready
message. Similarly, we have more methods which can be used to send the other kind of messages.
You can use this.added()
in a Meteor publish function to manually send an added
message. It’s signature is:
1 | this.added(collectionName, documentId, fields) |
Similarly, you can use this.changed()
to change a document already pushed to the client. It’s signature is:
1 | this.changed(collectionName, documentId, changedFields) |
To remove a document from client, Meteor provides this.removed()
with the following signature:
1 | this.removed(collectionName, documentId) |
Code to publish a post and then change it’s content
1 | Meteor.publish("sendCustomPost", function () { |
DDP messages after executing the above code
So, you can control the data you publish to the client. There are more interesting concepts, like intercepting a publication and changing the data but that’s out of the scope of this article.
Meteor publications and subscriptions make working with real time data easy and fun. To know more refer Meteor Documentation and after you’re done with that, maybe DDP specification too.
❤️ code
]]>